On Thu, 5 Sep 2019 14:42:07 -0400 Nathan Stratton Treadway <natha...@ontko.com> wrote:
> On Wed, Sep 04, 2019 at 13:48:33 -0300, Chris Mitchell wrote: > > I don't have any specific answer to your question, but here are some > general comments on the topic: As it turns out, I think the specific answer to my question is "user error". I spent a lot of this morning trying to recreate the behaviour (on the same system!), but with every variation, tar stubbornly kept preserving and restoring the sparse file exactly the way it's supposed to. > * The tar info page (e.g. > https://www.gnu.org/software/tar/manual/html_chapter/tar_8.html#SEC137 > section "8.1.2 Archiving Sparse Files") explains that the "-S" > option is only meaningful on archive creation (or update), not on > extraction. <...> That certainly makes life easier! > * In general, tar creates the archive "file" first, and then pipes the > contents of the archive to the compression program -- so that > sparse-file-detection step should happen before the compression > program is involved in any way. Yes, that certainly sounds like what one would expect to happen, and apparently it is indeed happening the way it should. > It might be helpful to post the exact commands you used to test these > various scenarios, etc. Just for the sake of thoroughness, I'll post the results I got. First, in order to closely reproduce what I was working with before, here's a freshly created .qcow2 volume with a 20 GiB quota and a minimal Debian install in it: $ ls -ls --block-size=1 1912025088 -rw------- 1 chris chris 21478375424 Sep 6 10:05 testvol.qcow2 Then I make some compressed tar archives: $ tar --use-compress-prog=pbzip2 -cSf pbz2archive.tar.bz2 testvol.qcow2 $ tar -cS testvol.qcow2 | pbzip2 -c > pipearchive.tar.bz2 $ tar -cSjf bz2archive.tar.bz2 testvol.qcow2 $ ls -ls --block-size=1 *.tar.* 514617344 -rw-r--r-- 1 chris chris 514615245 Sep 6 14:18 bz2archive.tar.bz2 518164480 -rw-r--r-- 1 chris chris 518163660 Sep 6 14:07 pbz2archive.tar.bz2 518164480 -rw-r--r-- 1 chris chris 518163660 Sep 6 14:12 pipearchive.tar.bz2 The bzip2 archive being a little smaller than the two pbzip2 archives is unsurprising. The pbzip manpage explains that it stores the data in chunks in the file, which bzip doesn't, so there's a little overhead. The fact that the two pbzip variants are *exactly* the same size was my first hint that I was wrong. Looking closer: $ if cmp pbz2archive.tar.bz2 pipearchive.tar.bz2; then echo "Yup, they're identical!"; fi Yup, they're identical! That looks pretty conclusive to me. But, just for completeness' sake: $ pbzip2 -dk pbz2archive.tar.bz2 $ pbzip2 -dk pipearchive.tar.bz2 $ bzip2 -dk bz2archive.tar.bz2 $ ls -ls --block-size=1 *.tar 1912057856 -rw-r--r-- 1 chris chris 1912053760 Sep 6 14:18 bz2archive.tar 1912057856 -rw-r--r-- 1 chris chris 1912053760 Sep 6 14:07 pbz2archive.tar 1912057856 -rw-r--r-- 1 chris chris 1912053760 Sep 6 14:12 pipearchive.tar $ if cmp bz2archive.tar pbz2archive.tar; then echo "Yup, they're identical!"; fi Yup, they're identical! $ if cmp bz2archive.tar pipearchive.tar; then echo "Yup, they're identical!"; fi Yup, they're identical! $ if cmp pbz2archive.tar pipearchive.tar; then echo "Yup, they're identical!"; fi Yup, they're identical! $ tar -xf bz2archive.tar -C bzip2/ $ tar -xf pbz2archive.tar -C pbzip2/ $ tar -xf pipearchive.tar -C piped/ $ ls -ls --block-size=1 bzip2/* pbzip2/* piped/* 1912025088 -rw------- 1 chris chris 21478375424 Sep 6 10:05 bzip2/testvol.qcow2 1912025088 -rw------- 1 chris chris 21478375424 Sep 6 10:05 pbzip2/testvol.qcow2 1912025088 -rw------- 1 chris chris 21478375424 Sep 6 10:05 piped/testvol.qcow2 $ if cmp bzip2/testvol.qcow2 pbzip2/testvol.qcow2; then echo "Yup, they're identical!"; fi Yup, they're identical! $ if cmp bzip2/testvol.qcow2 piped/testvol.qcow2; then echo "Yup, they're identical!"; fi Yup, they're identical! And, if I do the decompress and untar in one step: $ rm bzip2/* pbzip2/* piped/* $ tar --use-compress-program=pbzip2 -xf pbz2archive.tar.bz2 -C pbzip2/ $ pbzip2 -dc pipearchive.tar.bz2 | tar x -C piped/ $ tar -xjf bz2archive.tar.bz2 -C bzip2/ $ ls -ls --block-size=1 bzip2/* pbzip2/* piped/* $ ls -ls --block-size=1 bzip2/* pbzip2/* piped/* 1912025088 -rw------- 1 chris chris 21478375424 Sep 6 10:05 bzip2/testvol.qcow2 1912025088 -rw------- 1 chris chris 21478375424 Sep 6 10:05 pbzip2/testvol.qcow2 1912025088 -rw------- 1 chris chris 21478375424 Sep 6 10:05 piped/testvol.qcow2 $ if cmp bzip2/testvol.qcow2 pbzip2/testvol.qcow2; then echo "Yup, they're identical!"; fi Yup, they're identical! $ if cmp bzip2/testvol.qcow2 piped/testvol.qcow2; then echo "Yup, they're identical!"; fi Yup, they're identical! $ if cmp pbzip2/testvol.qcow2 piped/testvol.qcow2; then echo "Yup, they're identical!"; fi Yup, they're identical! ...so, at this point I'd say I have quite definitively disproved my own thesis, and the only mystery is whether I grabbed a fully-allocated .qcow volume thinking I had grabbed a sparse one, or missed a `-S` somewhere the last time around. Consider my bug report withdrawn, and sorry to waste everyone's time. > p.s. I have found that "ls -sl --block-size=1" is a handy way to see > in one command whether a file is sparse or not: > $ ls -sl --block-size=1 temp.sparse > 4096 -rw-r--r-- 1 root root 16896 Mar 30 2014 temp.sparse Thanks for the tip! Cheers! -- Chris Mitchell [they/them/their] Say hi on Matrix chat! My handle is @radine:matrix.org and you can get the app at https://riot.im