Christian Franke via Cygwin wrote:
Testcase:
$ uname -r
3.5.7-1.x86_64
$ cygcheck -f /bin/cp.exe
coreutils-9.6-1
$ for i in 1 2 3; do cat /bin/cygwin1.dll > file$i; done
$ compact /C file2 # NTFS compression
... (1.7 : 1) ...
$ compact /C /EXE:LZX file3 # Compact OS LZX compression
... (2.8 : 1) ...
$ stat -c '%b %s %n' file?
2928 2995253 file1
1720 2995253 file2
1044 2995253 file3
$ cp file1 copy1 # OK
$ cp file2 copy2 # Hangs
...[^C]
$ cp file3 copy3 # Hangs
...[^C]
$ md5sum file? copy?
2954646a9a0fe4579c3fc1f44dd4bb6a *file1
2954646a9a0fe4579c3fc1f44dd4bb6a *file2
2954646a9a0fe4579c3fc1f44dd4bb6a *file3
2954646a9a0fe4579c3fc1f44dd4bb6a *copy1
2954646a9a0fe4579c3fc1f44dd4bb6a *copy2
2954646a9a0fe4579c3fc1f44dd4bb6a *copy3
$ (sleep 2; pskill strace) & strace cp file3 copy3
...
47 2004141 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 0) #
SEEK_SET
46 2004187 [main] cp 5546 fhandler_base::lseek: setting file
pointer to 2995253 # EOF
47 2004234 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 3) #
SEEK_DATA
46 2004280 [main] cp 5546 fhandler_base::lseek: setting file
pointer to 2995253
47 2004327 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 4) #
SEEK_HOLE
46 2004373 [main] cp 5546 fhandler_base::lseek: setting file
pointer to 2995253
46 2004419 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 0)
51 2004470 [main] cp 5546 fhandler_base::lseek: setting file
pointer to 2995253
47 2004517 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 3)
47 2004564 [main] cp 5546 fhandler_base::lseek: setting file
pointer to 2995253
47 2004611 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 4)
46 2004657 [main] cp 5546 fhandler_base::lseek: setting file
pointer to 2995253
Process strace killed.
file1/2 are detected as a possible sparse files but the optimized copy
algorithm does not properly handle the non-sparse case.
Should be "file2/3" of course.
Upstream bug?
Possibly not. A closer look shows that the main loop in
copy.c:lseek_copy() expects that SEEK_DATA fails with ENXIO at EOF.
https://github.com/coreutils/coreutils/blob/v9.6/src/copy.c#L543
lseek_copy(..., off_t ext_start, ...)
{
...
while (0 <= ext_start) {
{
...
ext_start = lseek (src_fd, dest_pos, SEEK_DATA);
if (ext_start < 0 && errno != ENXIO)
goto cannot_lseek;
}
...
}
This works on Linux (checked on Debian 12) but Cygwin returns the offset
if it is equal to the file size.
Recent POSIX says:
"[ENXIO] The whence argument is SEEK_HOLE or SEEK_DATA, and offset is
greater than or equal to the file size"
https://pubs.opengroup.org/onlinepubs/9799919799/functions/lseek.html
But (at least older) Linux man pages suggest that Cygwin behavior may be
correct also:
"In the simplest implementation, a filesystem can support the operations
by making ... SEEK_DATA always return offset."
"ENXIO - whence is SEEK_DATA or SEEK_HOLE, and offset is beyond the end
of the file"
https://man7.org/linux/man-pages/man2/lseek.2.html
Hmm... does "beyond" mean '>=' or '>' ?
--
Regards,
Christian
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple