i've seen this on Fedora Core 4; the report is on FC 6. (Yes, they're old. But tar is new, built from source downloaded from ftp.gnu.org.)
The disk volume is a newly created (VirtualBox) vdi; 2 partitions, ext3, with the root mounted on hda2. (boot is on hda1).
The file structure was initialized on a newer Linux machine and the archive extracted. It's been a long few days, I don't remember if it was fc34 or debian...both were involved in putting things back together.
The original reproducer is cut down from about 130G (a 34G compressed archive). There are "only" 107 files in /bin.
Here is the information from your suggestions.The hard link problem reproduces with this (note the two soft links turning into a soft and a hard(!) - according to tar:
|# ( cd / && ls -li bin/awk bin/bash && tar -cf - bin/awk bin/bash | tar -tvf - )||
||22683669 lrwxrwxrwx 1 root root 4 Nov 28 08:45 bin/awk -> gawk||||22683657 lrwxrwxrwx 1 root root 21 Nov 28 08:45 bin/bash -> ../usr/local/bin/bash||
||lrwxrwxrwx root/root 0 2022-11-29 14:37 bin/awk -> gawk|| ||hrwxrwxrwx root/root 0 2022-11-29 14:37 bin/bash link to bin/awk|| |Clearly, the bin/bash (a) is not a hard link on disk, and (b) does not link to bin/awk.
The attached "hardlink_strace.txt" comes from a simplified command to reduce volume, but it should show the same syscalls:
( cd / && strace 2>hardlink_strace.txt tar -cf - bin/awk bin/bash >/dev/null )
A full ls -li is in full_ls.txtIn extract_from_tar_archive_showing_extent.txt is the first ~1900 lines of tar -tvf from an archive that merged all the soft links to "vi" when extracted to disk. Note that the listing (a) shows the links as hard links (they were all soft on the original disk), and (b) shows the links as to "bin/ex", when in fact they were extracted as "vi".
To me, this all points to soft links being processed as if they were hard - mostly.
Going further with the toy example, we see that while tar reports the links as hard, they are extracted as soft, but with the wrong target for the second link.
|foo]# ( cd / && tar -cf - bin/awk bin/bash | tar -C /root/foo -xvf - )|| ||bin/awk|| ||bin/bash|| ||foo]# ls -li bin||!! This is bin extracted from the archive ||total 0|| ||17418579 lrwxrwxrwx 2 root root 4 Dec 1 15:23 awk -> gawk|| ||17418579 lrwxrwxrwx 2 root root 4 Dec 1 15:23 bash -> gawk|| ||foo]# ls -li /bin/awk /bin/bash || This is the bin that was archived|| ||22683669 lrwxrwxrwx 1 root root 4 Nov 28 08:45 /bin/awk -> gawk||||22683657 lrwxrwxrwx 1 root root 21 Nov 28 08:45 /bin/bash -> ../usr/local/bin/bash||
|To close the shell wildcard lead: if we now use (shell) wildcards, which pick up a couple of extra files), note that the bash link (to ../usr/local...) is still extracted as a soft link to gawk.
Here's the modified test case: |foo]# ( cd / && tar -cf - bin/aw* bin/bas* | tar -C /root/foo -xvf - )|| ||bin/awk|| ||bin/basename|| ||bin/bash|| ||bin/bash.old|| ||:foo]# ls -li bin|| ||total 732|| ||17418579 lrwxrwxrwx 2 root root 4 Dec 1 15:32 awk -> gawk|| ||17418580 -rwxr-xr-x 1 root root 18484 Oct 31 2007 basename|| ||17418579 lrwxrwxrwx 2 root root 4 Dec 1 15:32 bash -> gawk|| ||17418581 -rwxr-xr-x 1 root root 722684 Jul 12 2006 bash.old|| |An strace of the above in strace_wild.txt was obtained as shown below (the inode #s are different)
|foo]# ( cd / && ls -li bin/aw* bin/bas* && strace 2>/root/strace_wild.txt tar -cf - bin/aw* bin/bas* >/dev/null )||
||22683669 lrwxrwxrwx 1 root root 4 Nov 28 08:45 bin/awk -> gawk|| ||22683748 -rwxr-xr-x 1 root root 18484 Oct 31 2007 bin/basename||||22683657 lrwxrwxrwx 1 root root 21 Nov 28 08:45 bin/bash -> ../usr/local/bin/bash||
||22683691 -rwxr-xr-x 1 root root 722684 Jul 12 2006 bin/bash.old|| ||foo]# ls -li bin/|| ||total 732|| ||17418579 lrwxrwxrwx 2 root root 4 Dec 1 15:32 awk -> gawk|| ||17418580 -rwxr-xr-x 1 root root 18484 Oct 31 2007 basename|| ||17418579 lrwxrwxrwx 2 root root 4 Dec 1 15:32 bash -> gawk|| ||17418581 -rwxr-xr-x 1 root root 722684 Jul 12 2006 bash.old|| |Also, while l didn't keep the build directory for tar, I did keep the configure cache file, which may be helpful.
Not sure if I can recover what's left of the original disk; will try if necessary. But I think this work has cut the problem down.
a) tar is confused about soft links.b) it is reporting soft links as hard in -t output, but extracting them as soft c) The extract uses the wrong target in the soft link - the target of the first soft link that it sees.
|# uname -a||||Linux 2.6.22.14-100 #1 SMP Wed Apr 8 18:07:54 EDT 2015 i686 i686 i386 GNU/Linux||
|Finally, an unrelated (except that it hit this incident and prevented an easy restore) issue: *tar skips some large files with*
|tar: root/sd/sd.tar.gz: Cannot stat: Value too large for defined data type||||||-rw-r--r-- 1 root root 32251081571 May 6 2007 /root/sd/sd.tar.gz||| Let me know if I can provide further information. I appreciate the attention. Thanks! || Timothe Litt ACM Distinguished Engineer -------------------------- This communication may not represent the ACM or my employer's views, if any, on the matters discussed. On 01-Dec-22 14:24, Paul Eggert wrote:
Thanks for reporting the problem. I'm not seeing the problem with GNU tar 1.34 as shipped with Ubuntu 22.10 x86-64. On this platform, the command:cd / tar -cf - bin/* | tar -tvf - >/tmp/tar.txtoutputs the attached file tar.txt, which looks OK, as it seems to match the output of the command 'cd /; ls -li bin/* >/tmp/ls-i.txt' which is attached. This is on an ext4 file system. (All the attachments are compressed with gzip.)What would help to debug here is a smaller reproducer. Can you reproduce it with a smaller command like this?tar -cf - bin/awk bin/bash In other words, make it as small as you can.Also, even if you can't make it small, it'd be helpful to see the strace output so that we can see the information that tar is basing its decisions on. For example, I ran this command:strace -v --trace %%stat -o /tmp/tar-tr.txt tar -cf /dev/null bin/*and got the attached file tar-tr.txt to see what the stat-like syscalls are yielding; can you do something similar?Also, can you send the output of 'ls -il bin/*'? The inode numbers would be helpful for debugging, I expect.
link_info.tar.gz
Description: GNU Zip compressed data
OpenPGP_signature
Description: OpenPGP digital signature