File Holes
Offsets
The Linux kernel tracks the offset of all open files. If you open a file, you
can read()
some data from it, then run another operation. When you come back
later and read()
again, you will start reading from the offset where you left
off.
The kernel provides a syscall named lseek
that allows you to seek to an
arbitrary offset in the file. Let’s take a look at how this works.
fd = open(nstr, O_RDONLY, S_IRUSR);
// offset is 0;
numread = read(fd, buf, 256);
// offset is numread
lseek(fd, 0, SEEK_SET);
// offset is 0;
In the above example, after opening the file, our offset is set to 0 (the beginning of the file). After we read some data from the file, our offset is increased by the number of bytes read. Finally, using the lseek() function, we can reset the offset back to the beginning of the file.
File Holes
Here’s where things get interesting. What happens if we seek past the end of a file?
Let’s say we have a file that is 8K in size, and we seek 400K past the end of
the file and write. We can do this on the CLI with the dd
[0] command
Our first dd
command will read in random data, and write out 2 blocks of size
4096B, for a total of 8192B.
% dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
% stat file_with_holes
File: 'file_with_holes'
Size: 8192 Blocks: 16 IO Block: 4096 regular file
If we look at the output of stat
we see that the size of the file is
predictably reported as 8192B and that it is taking up 16 “Blocks”.
I put the quotes around “Blocks” because these are 512 byte blocks and have no
relation to IO blocks used by the filesystem. Just remember that when using the
stat
command, the blocks are sized at 512B and not machine dependent.
So, the file takes up 8192B of space, and 16 * 512B blocks. Exactly what we would expect.
Now let’s see what stat
reports after seeking past the end of the file and
writing:
% dd if=/dev/urandom bs=4096 seek=100 count=1 of=file_with_holes
% stat file_with_holes
File: 'file_with_holes'
Size: 413696 Blocks: 24 IO Block: 4096 regular file
In this second dd
operation, we seeked forward 100x4096 bytes on a 8192 byte
file (we seeked 98x4096 bytes past the end of the file), and wrote a single
block of 4096 bytes.
The size of the file is now reported as roughly 413K, as we would expect, but the file is only taking up 12K (24 blocks * 512B) of disk space!
How can a file with a size of 413K be taking up only 12K on disk?
A clue can be found in the man page for lseek
:
Although lseek() may position the file offset beyond the end of the file, this function does not itself extend the size of the file.
lseek()
cannot extend the size of a file, but write()
can. When we write at
an offset beyond the end of the file, a file hole is created between the end
of our file and the offset we wrote to.
Most Linux filesystems will not allocate physical disk space for file holes. Instead, when we try to read from them, the filesystem will “create a page filled with zeroes”[1] and pass it to userspace.
If we try to read()
from a file hole, the read
function loads zeroes into
our buffer. If we write()
to a file hole, the hole is filled in and data
is written to disk.
[0] - How to create a file with file holes [1] - File holes, races, and mmap()