Christian Franke via Cygwin wrote:
Testcase:

$ uname -r
3.5.7-1.x86_64

$ cygcheck -f /bin/cp.exe
coreutils-9.6-1

$ for i in 1 2 3; do cat /bin/cygwin1.dll > file$i; done

$ compact /C file2 # NTFS compression
... (1.7 : 1) ...

$ compact /C /EXE:LZX file3 # Compact OS LZX compression
... (2.8 : 1) ...

$ stat -c '%b %s %n' file?
2928 2995253 file1
1720 2995253 file2
1044 2995253 file3

$ cp file1 copy1 # OK

$ cp file2 copy2 # Hangs
...[^C]

$ cp file3 copy3 # Hangs
...[^C]

$ md5sum file? copy?
2954646a9a0fe4579c3fc1f44dd4bb6a *file1
2954646a9a0fe4579c3fc1f44dd4bb6a *file2
2954646a9a0fe4579c3fc1f44dd4bb6a *file3
2954646a9a0fe4579c3fc1f44dd4bb6a *copy1
2954646a9a0fe4579c3fc1f44dd4bb6a *copy2
2954646a9a0fe4579c3fc1f44dd4bb6a *copy3

$ (sleep 2; pskill strace) & strace cp file3 copy3
...
   47 2004141 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 0) # SEEK_SET    46 2004187 [main] cp 5546 fhandler_base::lseek: setting file pointer to 2995253 # EOF    47 2004234 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 3) # SEEK_DATA    46 2004280 [main] cp 5546 fhandler_base::lseek: setting file pointer to 2995253    47 2004327 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 4) # SEEK_HOLE    46 2004373 [main] cp 5546 fhandler_base::lseek: setting file pointer to 2995253
   46 2004419 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 0)
   51 2004470 [main] cp 5546 fhandler_base::lseek: setting file pointer to 2995253
   47 2004517 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 3)
   47 2004564 [main] cp 5546 fhandler_base::lseek: setting file pointer to 2995253
   47 2004611 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 4)
   46 2004657 [main] cp 5546 fhandler_base::lseek: setting file pointer to 2995253
Process strace killed.


file1/2 are detected as a possible sparse files but the optimized copy algorithm does not properly handle the non-sparse case.

Should be "file2/3" of course.

Upstream bug?


Possibly not. A closer look shows that the main loop in copy.c:lseek_copy() expects that SEEK_DATA fails with ENXIO at EOF.

https://github.com/coreutils/coreutils/blob/v9.6/src/copy.c#L543

 lseek_copy(..., off_t ext_start, ...)
 {
   ...
   while (0 <= ext_start) {
     {
      ...
      ext_start = lseek (src_fd, dest_pos, SEEK_DATA);
      if (ext_start < 0 && errno != ENXIO)
        goto cannot_lseek;
     }
   ...
}

This works on Linux (checked on Debian 12) but Cygwin returns the offset if it is equal to the file size.

Recent POSIX says:
"[ENXIO] The whence argument is SEEK_HOLE or SEEK_DATA, and offset is greater than or equal to the file size"
https://pubs.opengroup.org/onlinepubs/9799919799/functions/lseek.html

But (at least older) Linux man pages suggest that Cygwin behavior may be correct also: "In the simplest implementation, a filesystem can support the operations by making ... SEEK_DATA always return offset." "ENXIO - whence is SEEK_DATA or SEEK_HOLE, and offset is beyond the end of the file"
https://man7.org/linux/man-pages/man2/lseek.2.html

Hmm... does "beyond" mean '>=' or '>' ?

--
Regards,
Christian


--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to