Re: [Bug-tar] Detection of sparse files is broken on btrfs

Joerg Schilling Tue, 23 Jan 2018 03:26:24 -0800

Sorry for the resend mail, but it turned out that accidently typed "r" instead 
of "R" and I belive this may be of interes for more than just you Paul.

Paul Eggert <egg...@cs.ucla.edu> wrote:

> On 01/22/2018 09:47 AM, Joerg Schilling wrote:
> > we are talking about files that do not change while something like TAR
> > is reading them.
>
> It's reasonable for a file system to reorganize itself while 'tar' is 
> reading files. Even if a file's contents do not change, its space 
> utilization might change. When I use "du" or "df" to find out about 
> space utilization I want the numbers now, not what they were last week, 
> and this is true regardless of whether I have modified the files since 
> last week.

First: "df" output is not related to the stat() data but related only to the 
statvfs() data. 

Then, "du" output on Linux seems to have a tradition to be incorrect. I 
remember that reiserfs returned st_blocks based on a strange "fragment 
handling" that ignored the fact that st_blocks counts in multiples of 
DEV_BSIZE rather than in multiples os the logial block size from reiserfs.

In general, there is a major change in filesystems since WOFS introduced 
COW 30 years ago: before (when data blocks have always been overwritten), any
basic element of a filesysten was forbidden do be larger than the sector size 
because otherwise a system or power crash could leave the filesystem in an 
unrepairable state.

With COW, some of the structures are now allowed to be larger than the sector 
size and since this includes the "inode" equivalent (called "gnode" on WOFS or 
"dnode" on ZFS), this structure may be larger than the disks sector size, as 
it may be written to the background medium before the switch to the next 
stable filesystem state is introduced - given that the related filesystem is
organized in a way that this switch is not done by just writing the "inode" 
equivalent.

On WOFS with it's inverted structure, a file is going to the next state by just
writing the gnode to the next free gnode location. So WOFS does not allow the 
gnode to be larger than the sector size, unless there was an extension to allow 
to detect partially written gnodes as invalid.

On ZFS, with a "classical" filesystem structure, the file's next state is 
reached by writing the dnode, the directory it is in, .... up to the uberblock.
So only care needs to be taken with the way the next uberblock location is 
interpreted as valid. On ZFS, a dnode definitely could be larger than the 
sector size and in theory larger parts of the file's data could be held in the
meta data.

If btrfs does not do it this way, returning st_blocks == 0 for a file with 
DEV_BSIZE or more of data would be wrong. Your claim that reorganizing the 
filesystem could result in different stat() data to be returned applies only 
in case that the file content is moved from logically being file content to
logically being file meta data. So in theory, a stat() call could first return 
st_blocks == 1 and later (when the filesystem knows that the new/whole data of 
the file fits into the meta data) return st_blocks == 0. It seems however, that 
btrfs behaves just the other way round.

BTW: you mentioned that POSIX does not grant many things that people might 
believe to be required....This in special is the case for directories.

POSIX does not:

-       require the directory link count to be it's hard link count plus
        the number of sub-directories. This was an artefact from a design
        mistake in the 1970s.

-       require a directory to be readable, since there is readdir()

-       require a directory to return a stat.st_size that depends on it's
        "content".

-       require a directory to return "." or ".." with readdir().

WOFS follows the minimal POSIX requirements for directories only. A directory 
is a special file with size 0 and a link count of 1 except there is an inode 
related link (the equivalent to a hard link) to another directory. The entries
"." and ".." are understood by the filesystem's path handling routines but
readdir() never returns these entries.

ZFS emulates the historical directory link count from the 1970s but returns
sta.st_size to be the number of entries readable by readdir(). This usually 
let's the historic BSD function implementation for scandir() fail, as the 
historic scandir implementation allocates memory based on the assumption that 
the minimal stat.st_size of a directory is "number of entries" * "minimal 
struct dirent size" for UFS.

Does gtar deal correctly with these constrainst?

Jörg

-- 
 EMail:jo...@schily.net                    (home) Jörg Schilling D-13353 Berlin
    joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'

Re: [Bug-tar] Detection of sparse files is broken on btrfs

Reply via email to