Mark H Weaver <m...@netris.org> wrote: > I just got bitten by the same problem reported back in July 2016: > > https://lists.gnu.org/archive/html/bug-tar/2016-07/msg00000.html > > At the time, Joerg Schilling unilaterally refused to fix the bug, > claiming that Btrfs was broken and violated POSIX, although when asked > for a reference to back that up he never provided one. Everyone else in > the thread disagreed with him, but the bug never got fixed.
Of course I provided that reference by pointing to the POSIX standard. In order to make sure that every constraint is correct, I may enhance my statement: In theory, a filesystem could put data for a tiny file into some kind of "free space" in the meta-data-storage (sometimes called "inode") and thus legally report st_blocks == 0. But this would not be allowed to change as a result of just a "sync()" operation. But note that a file that could be sparse needs to have a minimal size of DEV_BSIZE in order to be "sparse" while known implementations do not store more than 64 bytes in that location. > Paul Eggert argued that there's no guarantee that st_blocks must be zero > for a file with nonzero data. As an example, he pointed out that if all > of the file's data fits within the inode, it would be reasonable to > report st_blocks == 0 for a file with nonzero data. See above.... BTW: There is a related comment in star/hole.c that explains that NetApp puts file data up to 64 bytes completely into the meta data storage and the method used by star avoids calling a file with st_blocks == 0 sparse as long as it follows the POSIX semantics. > Others pointed out that in Linux's /proc filesystem, all files have > st_blocks == 0. That is also the case on my system running > linux-libre-4.14.12. Joerg claimed that his /proc filesystem reported > nonzero st_blocks, but he was the only one in the thread who did so. This is incorrect: I pointed out that the *original* /proc filesystem implementation always returns st_blocks != 0 if st_size != 0. If you encounter a /proc filesystem that st_blocks == 0, this must be a buggy inofficial clone implementation. > It was also pointed out that with the advent of SEEK_HOLE and SEEK_DATA, > the st_blocks hack is no longer needed for efficiency on modern systems. > > I see from the GNU maintainers file that Paul Eggert is a maintainer for > GNU tar, and Joerg Schilling is not, so I don't see why we should let > Joerg continue to prevent us from fixing this bug. Given that http://austingroupbugs.net/view.php?id=415#c862 defines SEEK_HOLE and SEEK_DATA already and given that most OS alreday implement it, it would be the best way to just follow the accepted standard. BTW: I am in the group of core POSIX maintainers. > I propose that we revisit this bug and fix it. We clearly cannot assume > that st_blocks == 0 implies that the file contains only zeroes. This > bug is fairly serious for anyone using btrfs and possibly other > filesystems, as it has the potential to lose user data. I cannot speak for gnu tar, but star does not call a file "sparse" as long as this file follows POSIX semantics. This is implemented by requiring the size of the file (st_size) to be at least DEV_BSIZE larger than the size computed from st_blocks in order to be treated as "sparse". Conclusion: If btrfs returns st_blocks == 0 for larger (non sparse) files, this is a POSIX non-compliance that needs to be fixed. Jörg -- EMail:jo...@schily.net (home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'