* tar incorrectly converts (almost) all soft links to point to a single file in the archive o I have seen all soft links saved as pointing to the target of the first soft link encountered o In this reproducer, they are all converted to hard links to the first target. o I haven't discovered what triggers the selection; both behaviors are bad. The first corrupts full system backups... * Some soft links are incorrectly converted to hard links in the archive
These are serious problems - I've just spent 4 daze reconstructing disks from faulty backups exhibiting these issues. I was lucky that it was ONLY 4 days; there were >30,000 files affected. I currently have NO reliable backups.
I have come up with a fairly small reproducer. Key observations: * The "all links go to one place" seems to have to do with passing files to save as a wildcard. o It occurs in 1.34, but not in 1.15.1 * The soft->hard conversion happens as far back as version 1.15.1Here is the subject data; /bin on a fairly old system. I'm showing just the links to keep this small. Note: NO hard links.
|# ls -l /bin | grep '^[lh]'|| ||lrwxrwxrwx 1 root root 4 Nov 28 08:45 awk -> gawk||||lrwxrwxrwx 1 root root 21 Nov 28 08:45 bash -> ../usr/local/bin/bash||
||lrwxrwxrwx 1 root root 4 Nov 28 08:45 csh -> tcsh|| ||lrwxrwxrwx 1 root root 8 Nov 28 08:45 dnsdomainname -> hostname|| ||lrwxrwxrwx 1 root root 8 Nov 28 08:45 domainname -> hostname|| ||lrwxrwxrwx 1 root root 4 Nov 28 08:45 egrep -> grep|| ||lrwxrwxrwx 1 root root 2 Nov 28 08:45 ex -> vi|| ||lrwxrwxrwx 1 root root 4 Nov 28 08:45 fgrep -> grep|| ||lrwxrwxrwx 1 root root 3 Nov 28 08:45 gtar -> tar|| ||lrwxrwxrwx 1 root root 4 Nov 28 08:45 mailx -> mail|| ||lrwxrwxrwx 1 root root 8 Nov 28 08:45 nisdomainname -> hostname|| ||lrwxrwxrwx 1 root root 13 Nov 28 08:45 perl -> /usr/bin/perl|| ||lrwxrwxrwx 1 root root 2 Nov 28 08:45 rvi -> vi|| ||lrwxrwxrwx 1 root root 2 Nov 28 08:45 rview -> vi|| ||lrwxrwxrwx 1 root root 4 Nov 28 08:45 sh -> bash|| ||lrwxrwxrwx 1 root root 10 Nov 28 08:45 traceroute6 -> traceroute|| ||lrwxrwxrwx 1 root root 10 Nov 28 08:45 tracert -> traceroute|| ||lrwxrwxrwx 1 root root 2 Nov 28 08:45 view -> vi|| ||lrwxrwxrwx 1 root root 8 Nov 28 08:45 ypdomainname -> hostname|| | It shouldn't matter, but FWIW the filesystem is ext3.Here's what happens with tar 1.34, which is the current release on ftp.gnu.org. I create an archive (explicit xz is to isolate & test the same way with older version)
Note that 'bin/*' is the key to global merging; ('cd / && .tar -cf - bin') will not fail in the same way as shown later.
Note that in the following example, all the links are converted to hard links in addition to being misdirected. It's actually more common for most of the soft links to remain soft links, but all pointing to the first soft link target encountered. (Think libc.so => vi...) I don't have a small reproducer for the latter.
Also, note that the output order differs. My guess is that tar is processing soft links as if they were hard links, and caching in order to merge names linked to a common inode. The directory is not changing; /bin is stable.
|# /usr/local/bin/tar --version | head -n1|| ||tar (GNU tar) 1.34|| ||# ( cd / && /usr/local/bin/tar -cf - bin/* | xz --stdout >/root/test.1.34.tar.xz )|
|# tar -tvf /root/test.1.34.tar.xz | grep -- ' -> \| link to '|| ||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/awk -> gawk|| ||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/bash link to bin/awk|| ||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/csh link to bin/awk||||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/dnsdomainname link to bin/awk|| ||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/domainname link to bin/awk|| ||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/egrep link to bin/awk||
||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/ex link to bin/awk||||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/fgrep link to bin/awk||
||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/gtar link to bin/awk||||hrwxr-xr-x root/root 0 2006-10-01 16:22 bin/gzip link to bin/gunzip|| ||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/mailx link to bin/awk|| ||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/nisdomainname link to bin/awk||
||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/perl link to bin/awk|| ||hrwxr-xr-x root/root 0 2007-01-18 06:59 bin/red link to bin/ed|| ||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/rvi link to bin/awk||||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/rview link to bin/awk||
||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/sh link to bin/awk||||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/traceroute6 link to bin/awk|| ||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/tracert link to bin/awk||
||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/view link to bin/awk||||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/ypdomainname link to bin/awk|| ||hrwxr-xr-x root/root 0 2006-10-01 16:22 bin/zcat link to bin/gunzip||
|Here is a run without the wildcard, showing the hard link conversion.|| But it does convert some soft links to hard, which is not semantically equivalent. (e.g. consider b soft linked to a. Update a; b gets the new version. Convert to hard link & update a. Now a is the new version, and b is the old.) I have flagged the hard links with !! hard so they stand out from the clutter.||
|# ( cd / && /usr/local/bin/tar -cf - bin | xz --stdout >/root/test.1.34.tar.xz )|
|# tar -tvf /root/test.1.34.tar.xz | grep -- ' -> \| link to '||||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/nisdomainname -> hostname|| ||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/traceroute6 -> traceroute|| ||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/dnsdomainname -> hostname|| ||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/bash -> ../usr/local/bin/bash||
||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/gtar -> tar|| ||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/awk -> gawk|| ||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/sh -> bash|| ||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/ex -> vi|| ||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/fgrep -> grep|| ||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/csh -> tcsh|| ||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/rview -> vi|| ||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/egrep -> grep|| ||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/mailx -> mail|| ||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/view -> vi||||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/tracert -> traceroute|| ||hrwxr-xr-x root/root 0 2006-10-01 16:22 bin/gzip link to bin/zcat !! hard|| ||hrwxr-xr-x root/root 0 2007-01-18 06:59 bin/ed link to bin/red !! hard||
||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/rvi -> vi||||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/ypdomainname -> hostname|| ||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/domainname -> hostname|| ||hrwxr-xr-x root/root 0 2006-10-01 16:22 bin/gunzip link to bin/zcat !! hard|| ||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/perl -> /usr/bin/perl||
|tar 1.15.1 exhibits the link conversion, but not the merging. Here is a sample
|# /bin/tar --vers||ion ||tar (GNU tar) 1.15.1||# ( cd /bin && /bin/tar -cf - * | xz --stdout >/root/test.1.15.1.tar.xz )|| ||[root@overkill:~]# tar -tvf /root/test.1.15.1.tar.xz | grep -- ' -> \| link to '||
||lrwxrwxrwx root/root 0 2022-11-28 08:45 awk -> gawk||||lrwxrwxrwx root/root 0 2022-11-28 08:45 bash -> ../usr/local/bin/bash||
||lrwxrwxrwx root/root 0 2022-11-28 08:45 csh -> tcsh||||lrwxrwxrwx root/root 0 2022-11-28 08:45 dnsdomainname -> hostname||
||lrwxrwxrwx root/root 0 2022-11-28 08:45 domainname -> hostname|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 egrep -> grep|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 ex -> vi|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 fgrep -> grep|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 gtar -> tar||||hrwxr-xr-x root/root 0 2006-10-01 16:22 gzip link to gunzip|| || hard
||lrwxrwxrwx root/root 0 2022-11-28 08:45 mailx -> mail||||lrwxrwxrwx root/root 0 2022-11-28 08:45 nisdomainname -> hostname||
||lrwxrwxrwx root/root 0 2022-11-28 08:45 perl -> /usr/bin/perl||||hrwxr-xr-x root/root 0 2007-01-18 06:59 red link to ed !! hard||
||lrwxrwxrwx root/root 0 2022-11-28 08:45 rvi -> vi|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 rview -> vi|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 sh -> bash||||lrwxrwxrwx root/root 0 2022-11-28 08:45 traceroute6 -> traceroute||
||lrwxrwxrwx root/root 0 2022-11-28 08:45 tracert -> traceroute|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 view -> vi|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 ypdomainname -> hostname||||hrwxr-xr-x root/root 0 2006-10-01 16:22 zcat link to gunzip !! hard||
|Without the wildcard, links remain distinct, but different files selected for bogus conversion to hard links.
|# ( cd / && /bin/tar -cf - bin | xz --stdout >/root/test.1.15.1.tar.xz )||||[root@overkill:~]# tar -tvf /root/test.1.15.1.tar.xz | grep -- ' -> \| link to '|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/nisdomainname -> hostname|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/traceroute6 -> traceroute|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/dnsdomainname -> hostname|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/bash -> ../usr/local/bin/bash||
||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/gtar -> tar|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/awk -> gawk|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/sh -> bash|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/ex -> vi|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/fgrep -> grep|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/csh -> tcsh|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/rview -> vi|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/egrep -> grep|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/mailx -> mail|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/view -> vi||||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/tracert -> traceroute|| ||hrwxr-xr-x root/root 0 2006-10-01 16:22 bin/gzip link to bin/zcat !! hard|| ||hrwxr-xr-x root/root 0 2007-01-18 06:59 bin/ed link to bin/red !! hard||
||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/rvi -> vi||||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/ypdomainname -> hostname|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/domainname -> hostname|| ||hrwxr-xr-x root/root 0 2006-10-01 16:22 bin/gunzip link to bin/zcat !! hard|| ||lrwxrwxrwx root/root 0 2022-11-28 08:45 bin/perl -> /usr/bin/perl !! hard||
|If you are wondering why many links have recent dates - that's an artifact of recovering the correct links after restoring from a corrupt archive.
Timothe Litt ACM Distinguished Engineer -------------------------- This communication may not represent the ACM or my employer's views, if any, on the matters discussed.
OpenPGP_signature
Description: OpenPGP digital signature