Offsets

The Linux kernel tracks the offset of all open files. If you open a file, you can read() some data from it, then run another operation. When you come back later and read() again, you will start reading from the offset where you left off.

The kernel provides a syscall named lseek that allows you to seek to an arbitrary offset in the file. Let’s take a look at how this works.

fd = open(nstr, O_RDONLY, S_IRUSR);
// offset is 0;
numread = read(fd, buf, 256);
// offset is numread
lseek(fd, 0, SEEK_SET);
// offset is 0;

In the above example, after opening the file, our offset is set to 0 (the beginning of the file). After we read some data from the file, our offset is increased by the number of bytes read. Finally, using the lseek() function, we can reset the offset back to the beginning of the file.

File Holes

Here’s where things get interesting. What happens if we seek past the end of a file?

Let’s say we have a file that is 8K in size, and we seek 400K past the end of the file and write. We can do this on the CLI with the dd[0] command

Our first dd command will read in random data, and write out 2 blocks of size 4096B, for a total of 8192B.

% dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
% stat file_with_holes
  File: 'file_with_holes'
  Size: 8192            Blocks: 16         IO Block: 4096   regular file

If we look at the output of stat we see that the size of the file is predictably reported as 8192B and that it is taking up 16 “Blocks”.

I put the quotes around “Blocks” because these are 512 byte blocks and have no relation to IO blocks used by the filesystem. Just remember that when using the stat command, the blocks are sized at 512B and not machine dependent.

So, the file takes up 8192B of space, and 16 * 512B blocks. Exactly what we would expect.

Now let’s see what stat reports after seeking past the end of the file and writing:

% dd if=/dev/urandom bs=4096 seek=100 count=1 of=file_with_holes
% stat file_with_holes
  File: 'file_with_holes'
  Size: 413696          Blocks: 24         IO Block: 4096   regular file

In this second dd operation, we seeked forward 100x4096 bytes on a 8192 byte file (we seeked 98x4096 bytes past the end of the file), and wrote a single block of 4096 bytes.

The size of the file is now reported as roughly 413K, as we would expect, but the file is only taking up 12K (24 blocks * 512B) of disk space!

How can a file with a size of 413K be taking up only 12K on disk?

A clue can be found in the man page for lseek:

Although lseek() may position the file offset beyond the end of the file, this function does not itself extend the size of the file.

lseek() cannot extend the size of a file, but write() can. When we write at an offset beyond the end of the file, a file hole is created between the end of our file and the offset we wrote to.

Most Linux filesystems will not allocate physical disk space for file holes. Instead, when we try to read from them, the filesystem will “create a page filled with zeroes”[1] and pass it to userspace.

If we try to read() from a file hole, the read function loads zeroes into our buffer. If we write() to a file hole, the hole is filled in and data is written to disk.

[0] - How to create a file with file holes [1] - File holes, races, and mmap()