Hi Andreas, Andreas Dilger <adil...@dilger.ca> writes:
> On Jan 10, 2018, at 4:50 AM, Pavel Raiskup <prais...@redhat.com> wrote: >> >> On Wednesday, January 10, 2018 3:42:52 AM CET Mark H Weaver wrote: >>> From da922703282b0d3b8837a99a9c7fdd32f1d20d49 Mon Sep 17 00:00:00 2001 >>> From: Mark H Weaver <m...@netris.org> >>> Date: Tue, 9 Jan 2018 20:16:14 -0500 >>> Subject: [PATCH] Remove nonportable check for files containing only zeroes. >>> >>> This check benefitted only one unlikely case (large files containing >>> only zeroes, on systems that do not support SEEK_HOLE) >> >> It drops the optimization even for situations when SEEK_HOLE is not >> available, which is not 100% necessary. I'm not proposing doing otherwise >> (I actually proposed this in [1]), but I'm rather CCing Andreas once more, >> as that's the original requester, the use-cases with lustre were >> definitely not unlikely and the question is whether SEEK_HOLE covers them >> nowadays. >> >> [1] https://lists.gnu.org/archive/html/bug-tar/2016-07/msg00017.html > > Sorry for the late reply on this thread. > > It should be noted that the real problem was not related to backing > up files at the Lustre level, but rather backing up files directly from > the backing ext4 filesystem of the metadata target, for example if migrating > to new hardware, or for backup/restore of only that target. The MDT stored > the actual file size in the metadata inode (could be GB or TB in size), but > the file data was stored on data servers on other nodes. > > This meant that using the old tar versions to do the MDT backup might take > days at 100% CPU just to write sparse files to the tarball. If tar is now > using SEEK_HOLE to determine sparseness, then the ext4-level backups with > newer systems should work OK (SEEK_HOLE was added to ext4 in the 3.0 kernel, > and was improved in 3.7, though a data consistency bug with unwritten data > was just fixed in 4.12). > > That means SEEK_HOLE is NOT available in RHEL 6.x kernels, which are still > in fairly widespread (though declining) use. Ah, I see, thank you for this explanation. So this corner case is not so unlikely as I supposed. > I'd prefer that the heuristic for sparse files without SEEK_HOLE not > be removed completely, but I do think that it needs to be fixed for > the small inline file and file in cache cases. I appreciate your concerns. Unfortunately, as I explain below, this heuristic is based on a false assumption, even for very large files, and leads to data loss on Btrfs. If we are to continue using this heuristic at all, we will need a way to decide when it can be used safely. I don't know of a good way to do this, but I'm open to suggestions. It makes me wonder how many RHEL 6.x users will use GNU tar 1.31 on their otherwise very old enterprise systems to back up their disks. I would expect those users to use their enterprise-grade old 'tar'. >>> and was based on an assumption about file system behavior that is not >>> mandated by POSIX and no longer holds in practice, namely that for >>> sufficiently large files, (st_blocks == 0) implies that the file >>> contains only zeroes. Examples of file systems that violate this >>> assumption include Linux's /proc file system and Btrfs. > > Is that comment correct, namely that btrfs has "large" files that report > st_blocks == 0 even though they contain data? Yes, on Btrfs I reliably see (st_blocks == 0) on a recently written, mostly sparse file with size > 8G, using linux-libre-4.14.14. More specifically, the "storing sparse files > 8G" test in tar's test suite reliably fails on my system: 140: storing sparse files > 8G FAILED (sparse03.at:29) The test creates a sparse file consisting of 8 gibibytes of hole followed by 512 bytes of 'A's at the end. When tar creates an archive from this file shortly after its creation, it sees (st_blocks == 0) and its large size, it erroneously concludes that it must be "completely sparse", i.e. containing only zeroes, leading to data loss. Here's the hexdump of the large sparse file, before and after going through tar: --8<---------------cut here---------------start------------->8--- mhw@jojen .../tar-1.29/tests/testsuite.dir/140/posix$ ls -l sparsefile directory/sparsefile -rw-r--r-- 1 mhw mhw 8589935104 Jan 20 17:31 directory/sparsefile -rw-r--r-- 1 mhw mhw 8589935104 Jan 20 17:31 sparsefile mhw@jojen .../tar-1.29/tests/testsuite.dir/140/posix$ hexdump -C sparsefile 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 200000000 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| * 200000200 mhw@jojen .../tar-1.29/tests/testsuite.dir/140/posix$ hexdump -C directory/sparsefile 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 200000200 --8<---------------cut here---------------end--------------->8--- What do you think? Regards, Mark