On 6/5/21 4:24 PM, Paul Eggert wrote:
On 6/5/21 9:39 AM, Daniel Villeneuve wrote:
And when a problem like the above occurs, our process tries it again and it
usually succeeds (that is, I've never witnessed two successive failures).
When you say "it usually succeeds", do you mean that you extract from the exact
same tarball and it works sometimes, but not others? Or that you generate a new tarball
and the extraction fails from the new tarball?
The same tarball is extracted anew.
Can you supply an example of a tarball where extraction failed?
Unfortunately, these tarballs contain proprietary information.
I might try to build something similar and expect a failure on _this_ instance.
# suspicious SIGCHLD
wait4(47737, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 47737
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=47737,
si_uid=759245037, si_status=0, si_utime=674, si_stime=37} ---
That looks OK to me; tar is being signaled that gzip exited.
Indeed, I have 58 extracts (with 2 failures) that all have this wait4+SIGCHLD.
It happens that the 2 failures are the only ones for which the SIGCHLD happens
so late (just before newfstatat).
As you say, maybe it's not relevant.
# some successful file time updates
Yes, these are from tar creating delayed symlinks (or hardlinks to symlinks)
when the symlink contents are dicey and could have caused problems if they had
been extracted earlier.
# first errors
newfstatat(AT_FDCWD, "XXXXXX_config/static/users", 0x7ffc9b136f50,
AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
This looks like the start of another attempt to create a delayed symlink; the
first step is to get the status of the placeholder file. Please go back to
earlier parts of the strace output, and look for references to this same
filename. What syscalls do you see?
This is what happens earlier in the traces, both the one which fails and the
one which succeeds:
openat(AT_FDCWD, "XXXXXX_config/static/unlockInLine.bm",
O_WRONLY|O_CREAT|O_EXCL, 000) = 4
fstat(4, {st_mode=S_IFREG|000, st_size=0, ...}) = 0
close(4) = 0
read(3, "XXXXXX_config/static/users\0\0\0\0\0\0"..., 10240) = 10240
openat(AT_FDCWD, "XXXXXX_config/static/users", O_WRONLY|O_CREAT|O_EXCL, 000) = 4
fstat(4, {st_mode=S_IFREG|000, st_size=0, ...}) = 0
close(4) = 0
where unlockInLine.bm is a symlink on an existing file, and users is a symlink
on a non-existent file.
At the end of the extract that fails:
# a few files with correct time settings
newfstatat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el8-gnu/jdk1.8.0",
{st_mode=S_IFREG|000, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
unlinkat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el8-gnu/jdk1.8.0", 0) = 0
linkat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el6-gnu/jdk1.8.0", AT_FDCWD,
"prebuilt/x86_64-pc-linux_el8-gnu/jdk1.8.0", 0) = 0
newfstatat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el7-gnu/jdk1.8.0",
{st_mode=S_IFREG|000, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
unlinkat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el7-gnu/jdk1.8.0", 0) = 0
linkat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el6-gnu/jdk1.8.0", AT_FDCWD,
"prebuilt/x86_64-pc-linux_el7-gnu/jdk1.8.0", 0) = 0
# first error (symlink on non-existent file)
newfstatat(AT_FDCWD, "XXXXXX_config/static/users", 0x7ffc9b136f50,
AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
# next error (symlink on existent file)
newfstatat(AT_FDCWD, "XXXXXX_config/static/unlockInLine.bm", 0x7ffc9b136f50,
AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
# all other attempts for newfstatat are ENOENT
At the end of the extract that succeeds:
# a few files with correct time settings
newfstatat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el8-gnu/jdk1.8.0",
{st_mode=S_IFREG|000, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
unlinkat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el8-gnu/jdk1.8.0", 0) = 0
linkat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el6-gnu/jdk1.8.0", AT_FDCWD,
"prebuilt/x86_64-pc-linux_el8-gnu/jdk1.8.0", 0) = 0
newfstatat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el7-gnu/jdk1.8.0",
{st_mode=S_IFREG|000, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
unlinkat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el7-gnu/jdk1.8.0", 0) = 0
linkat(AT_FDCWD, "prebuilt/x86_64-pc-linux_el6-gnu/jdk1.8.0", AT_FDCWD,
"prebuilt/x86_64-pc-linux_el7-gnu/jdk1.8.0", 0) = 0
# here, a success
newfstatat(AT_FDCWD, "XXXXXX_config/static/users", {st_mode=S_IFREG|000,
st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
unlinkat(AT_FDCWD, "XXXXXX_config/static/users", 0) = 0
symlinkat("../../gui/config/static/users", AT_FDCWD,
"XXXXXX_config/static/users") = 0
utimensat(AT_FDCWD, "XXXXXX_config/static/users", [UTIME_OMIT,
{tv_sec=978325200, tv_nsec=0} /* 2001-01-01T00:00:00-0500 */], AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "XXXXXX_config/static/users", {st_mode=S_IFLNK|0777,
st_size=29, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(AT_FDCWD, "XXXXXX_config/static/users",
O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 3
newfstatat(3, "", {st_mode=S_IFLNK|0777, st_size=29, ...}, AT_EMPTY_PATH) = 0
close(3) = 0
# followed by other successes
newfstatat(AT_FDCWD, "XXXXXX_config/static/unlockInLine.bm",
{st_mode=S_IFREG|000, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
unlinkat(AT_FDCWD, "XXXXXX_config/static/unlockInLine.bm", 0) = 0
symlinkat("../../gui/config/static/unlockInLine.bm", AT_FDCWD,
"XXXXXX_config/static/unlockInLine.bm") = 0
utimensat(AT_FDCWD, "XXXXXX_config/static/unlockInLine.bm", [UTIME_OMIT,
{tv_sec=978325200, tv_nsec=0} /* 2001-01-01T00:00:00-0500 */], AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "XXXXXX_config/static/unlockInLine.bm",
{st_mode=S_IFLNK|0777, st_size=39, ...}, AT_SYMLINK_NOFOLLOW) = 0
openat(AT_FDCWD, "XXXXXX_config/static/unlockInLine.bm",
O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 3
newfstatat(3, "", {st_mode=S_IFLNK|0777, st_size=39, ...}, AT_EMPTY_PATH) = 0
close(3) = 0
In the original data, what is the file XXXXXX_config/static/users? I assume it's a symlink; what
does it point to? Also, what's "XXXXXX_config" and "XXXXXX_config/static"? Are
they symlinks to directories?
XXXXXX_config/static/users: symlink on non-existent file
XXXXXX_config/static: real directory
XXXXXX_config: real directory
Was XXXXXX_config one of the files you explicitly mentioned when creating the
tarball? Did it follow other files?
Yes, it was one of the directory entries.
This is a plain tar of a directory, but with first level, reordered, containing
only existing files and directories.
No directories "finalized" multiple times (that I understand would require
--delay-directory-restore).
Please send along any data that could help us reproduce the bug. Thanks.
I have not succeeded at reproducing the problem by myself, with any archive
created.
I've just observed it from time to time out of multiple runs.
I've diff'ed the strace outputs, and the only differences (apart from addresses
and sizes of reads) are the moment SIGCHLD is received and the tail part
starting at the first error on the users symlink.
I'll try to find a way to share a failing tarball.
Regards,
--
Daniel Villeneuve