Due to a dodgy SQL query, I discovered a curious feature of GNU tar:
$ touch foo
$ tar cf foo.tar foo foo
$ tar tvf foo.tar
-rw-r--r-- ch12/team166 0 2020-12-02 17:34 foo
hrw-r--r-- ch12/team166 0 2020-12-02 17:34 foo link to foo
i.e., You can add the same file into a tarball and it will add it as a
link, internally. I don't really know what the point of this is, but I
digress.
The failure mode I encountered combines this feature with
--remove-files, where the input files are piped in via xargs:
some-process | xargs tar cf foo.tar --remove-files
If the duplicated file paths are separated by a sufficiently high number
of other paths, they may span the input buffer. (At least this is my
working theory.) Something like this:
file1
lots
of
other
-- BUFFER PARTITION --
stuff
file1
Here, "file1" will be deleted at the partition point -- presumably when
the buffer is flushed -- so when it's reached again, in a subsequent
partition, it can't be added as a link to the tar file because it's
already been deleted. At this point, tar complains with a file not found
error:
tar: /path/to/really/long/filename: Cannot stat: No such file or
directory
tar: Exiting with failure status due to previous errors
Steps to replicate:
declare FILE1="$(pwd)/$(dd if=/dev/urandom | tr -cd 'a-zA-Z0-9' |
head -c100)"
declare FILE2="$(pwd)/$(dd if=/dev/urandom | tr -cd 'a-zA-Z0-9' |
head -c100)"
touch "$FILE1" "$FILE2"
printf "%s\n" "$FILE1" "$FILE2" "$FILE2" "$FILE1" "$FILE2" "$FILE1" \
| xargs tar cPzf test.tar.gz --remove-files
--
Thanks;
Chris
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.