On Fri, 14 Feb 2025 at 12:25, Christian Franke via Cygwin <cygwin@cygwin.com> wrote: > > Christian Franke via Cygwin wrote: > > Testcase: > > > > $ uname -r > > 3.5.7-1.x86_64 > > > > $ cygcheck -f /bin/cp.exe > > coreutils-9.6-1 > > > > $ for i in 1 2 3; do cat /bin/cygwin1.dll > file$i; done > > > > $ compact /C file2 # NTFS compression > > ... (1.7 : 1) ... > > > > $ compact /C /EXE:LZX file3 # Compact OS LZX compression > > ... (2.8 : 1) ... > > > > $ stat -c '%b %s %n' file? > > 2928 2995253 file1 > > 1720 2995253 file2 > > 1044 2995253 file3 > > > > $ cp file1 copy1 # OK > > > > $ cp file2 copy2 # Hangs > > ...[^C] > > > > $ cp file3 copy3 # Hangs > > ...[^C] > > > > $ md5sum file? copy? > > 2954646a9a0fe4579c3fc1f44dd4bb6a *file1 > > 2954646a9a0fe4579c3fc1f44dd4bb6a *file2 > > 2954646a9a0fe4579c3fc1f44dd4bb6a *file3 > > 2954646a9a0fe4579c3fc1f44dd4bb6a *copy1 > > 2954646a9a0fe4579c3fc1f44dd4bb6a *copy2 > > 2954646a9a0fe4579c3fc1f44dd4bb6a *copy3 > > > > $ (sleep 2; pskill strace) & strace cp file3 copy3 > > ... > > 47 2004141 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 0) # > > SEEK_SET > > 46 2004187 [main] cp 5546 fhandler_base::lseek: setting file > > pointer to 2995253 # EOF > > 47 2004234 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 3) # > > SEEK_DATA > > 46 2004280 [main] cp 5546 fhandler_base::lseek: setting file > > pointer to 2995253 > > 47 2004327 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 4) # > > SEEK_HOLE > > 46 2004373 [main] cp 5546 fhandler_base::lseek: setting file > > pointer to 2995253 > > 46 2004419 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 0) > > 51 2004470 [main] cp 5546 fhandler_base::lseek: setting file > > pointer to 2995253 > > 47 2004517 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 3) > > 47 2004564 [main] cp 5546 fhandler_base::lseek: setting file > > pointer to 2995253 > > 47 2004611 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 4) > > 46 2004657 [main] cp 5546 fhandler_base::lseek: setting file > > pointer to 2995253 > > Process strace killed. > > > > > > file1/2 are detected as a possible sparse files but the optimized copy > > algorithm does not properly handle the non-sparse case. > > Should be "file2/3" of course. > > > Upstream bug? > > > > Possibly not. A closer look shows that the main loop in > copy.c:lseek_copy() expects that SEEK_DATA fails with ENXIO at EOF. > > https://github.com/coreutils/coreutils/blob/v9.6/src/copy.c#L543 > > lseek_copy(..., off_t ext_start, ...) > { > ... > while (0 <= ext_start) { > { > ... > ext_start = lseek (src_fd, dest_pos, SEEK_DATA); > if (ext_start < 0 && errno != ENXIO) > goto cannot_lseek; > } > ... > } > > This works on Linux (checked on Debian 12) but Cygwin returns the offset > if it is equal to the file size. > > Recent POSIX says: > "[ENXIO] The whence argument is SEEK_HOLE or SEEK_DATA, and offset is > greater than or equal to the file size" > https://pubs.opengroup.org/onlinepubs/9799919799/functions/lseek.html > > But (at least older) Linux man pages suggest that Cygwin behavior may be > correct also: > "In the simplest implementation, a filesystem can support the operations > by making ... SEEK_DATA always return offset." > "ENXIO - whence is SEEK_DATA or SEEK_HOLE, and offset is beyond the end > of the file" > https://man7.org/linux/man-pages/man2/lseek.2.html > > Hmm... does "beyond" mean '>=' or '>' ?
cc: illumos-dev@ list. How does Solaris or Illumos behave? SUN/Solaris invented SEEK_DATA/SEEK_HOLE, so this should be - aside from looking at the OpenGroup/POSIX specs - the reference implementation. Ced -- Cedric Blancher <cedric.blanc...@gmail.com> [https://plus.google.com/u/0/+CedricBlancher/] Institute Pasteur -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple