On Fri, 14 Feb 2025 at 12:25, Christian Franke via Cygwin
<cygwin@cygwin.com> wrote:
>
> Christian Franke via Cygwin wrote:
> > Testcase:
> >
> > $ uname -r
> > 3.5.7-1.x86_64
> >
> > $ cygcheck -f /bin/cp.exe
> > coreutils-9.6-1
> >
> > $ for i in 1 2 3; do cat /bin/cygwin1.dll > file$i; done
> >
> > $ compact /C file2 # NTFS compression
> > ... (1.7 : 1) ...
> >
> > $ compact /C /EXE:LZX file3 # Compact OS LZX compression
> > ... (2.8 : 1) ...
> >
> > $ stat -c '%b %s %n' file?
> > 2928 2995253 file1
> > 1720 2995253 file2
> > 1044 2995253 file3
> >
> > $ cp file1 copy1 # OK
> >
> > $ cp file2 copy2 # Hangs
> > ...[^C]
> >
> > $ cp file3 copy3 # Hangs
> > ...[^C]
> >
> > $ md5sum file? copy?
> > 2954646a9a0fe4579c3fc1f44dd4bb6a *file1
> > 2954646a9a0fe4579c3fc1f44dd4bb6a *file2
> > 2954646a9a0fe4579c3fc1f44dd4bb6a *file3
> > 2954646a9a0fe4579c3fc1f44dd4bb6a *copy1
> > 2954646a9a0fe4579c3fc1f44dd4bb6a *copy2
> > 2954646a9a0fe4579c3fc1f44dd4bb6a *copy3
> >
> > $ (sleep 2; pskill strace) & strace cp file3 copy3
> > ...
> >    47 2004141 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 0) #
> > SEEK_SET
> >    46 2004187 [main] cp 5546 fhandler_base::lseek: setting file
> > pointer to 2995253 # EOF
> >    47 2004234 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 3) #
> > SEEK_DATA
> >    46 2004280 [main] cp 5546 fhandler_base::lseek: setting file
> > pointer to 2995253
> >    47 2004327 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 4) #
> > SEEK_HOLE
> >    46 2004373 [main] cp 5546 fhandler_base::lseek: setting file
> > pointer to 2995253
> >    46 2004419 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 0)
> >    51 2004470 [main] cp 5546 fhandler_base::lseek: setting file
> > pointer to 2995253
> >    47 2004517 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 3)
> >    47 2004564 [main] cp 5546 fhandler_base::lseek: setting file
> > pointer to 2995253
> >    47 2004611 [main] cp 5546 lseek: 2995253 = lseek(3, 2995253, 4)
> >    46 2004657 [main] cp 5546 fhandler_base::lseek: setting file
> > pointer to 2995253
> > Process strace killed.
> >
> >
> > file1/2 are detected as a possible sparse files but the optimized copy
> > algorithm does not properly handle the non-sparse case.
>
> Should be "file2/3" of course.
>
> > Upstream bug?
> >
>
> Possibly not. A closer look shows that the main loop in
> copy.c:lseek_copy() expects that SEEK_DATA fails with ENXIO at EOF.
>
> https://github.com/coreutils/coreutils/blob/v9.6/src/copy.c#L543
>
>   lseek_copy(..., off_t ext_start, ...)
>   {
>     ...
>     while (0 <= ext_start) {
>       {
>        ...
>        ext_start = lseek (src_fd, dest_pos, SEEK_DATA);
>        if (ext_start < 0 && errno != ENXIO)
>          goto cannot_lseek;
>       }
>     ...
> }
>
> This works on Linux (checked on Debian 12) but Cygwin returns the offset
> if it is equal to the file size.
>
> Recent POSIX says:
> "[ENXIO] The whence argument is SEEK_HOLE or SEEK_DATA, and offset is
> greater than or equal to the file size"
> https://pubs.opengroup.org/onlinepubs/9799919799/functions/lseek.html
>
> But (at least older) Linux man pages suggest that Cygwin behavior may be
> correct also:
> "In the simplest implementation, a filesystem can support the operations
> by making ... SEEK_DATA always return offset."
> "ENXIO - whence is SEEK_DATA or SEEK_HOLE, and offset is beyond the end
> of the file"
> https://man7.org/linux/man-pages/man2/lseek.2.html
>
> Hmm... does "beyond" mean '>=' or '>' ?

cc: illumos-dev@ list. How does Solaris or Illumos behave? SUN/Solaris
invented SEEK_DATA/SEEK_HOLE, so this should be - aside from looking
at the OpenGroup/POSIX specs - the reference implementation.

Ced
-- 
Cedric Blancher <cedric.blanc...@gmail.com>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to