https://bugzilla.mindrot.org/show_bug.cgi?id=3798
--- Comment #1 from Lionel Cons <lionelcons1...@gmail.com> --- From openssh dev mailing list: I like the following idea to have a new sftp GET_PLUS request, which is designed like NFSv4.2 READ_PLUS: On Wed, Mar 5, 2025 at 11:14 AM Darren Tucker <dtuc...@dtucker.net> wrote: > On Tue, Mar 04, 2025 at 02:43:10PM +0100, Lionel Cons wrote: > [...] > > Really: Built In sparse file support, which is on by default, makes > > more sense, as we do not have to maintain&update&administer lots of > > tools just to get the job done. It's also less error-prone. > > > > FYI Sparse files are nothing new or magic, they have been around since > > the dawn of filesystems, and even WinXP&WinServer2000 have sparse file > > support. > > I wasn't aware that the SEEK_HOLE and SEEK_DATA had even been > standardised, although it looks like that was only some time last year. > As others have noted it's still not universally available. > > Having looked at it: [snip] > - I don't see sufficient information available in the sftp protocol > from the server to the client to support it for client "get". > Certainly the secsh-filefxer-02[0] (ie v3) version that OpenSSH > implements doesn't, but even the most recent -13 drafts only seem > to support only a per-file boolean that indicates if it's on or off. > I don't see a way for a client to determine the location and/or size of > any holes in a remote file in order to replicate them on a downloaded > file. The only way I can see it could be supported is by adding a > vendor extension (which would need to be supported by both client > and server) that could supply the information about holes/extents, > which would be a larger undertaking. FYI NFSv4.2 added a READ_PLUS operation to implement reading sparse files efficiently: Basically each time the NFSv.2 client does a READ_PLUS (see https://datatracker.ietf.org/doc/html/rfc7862#page-86) the server returns an array of elements. Each element can either be a "data" element with { data_offset, len, data } XOR an "hole" element with { hole_offset, hole_len }. IMHO the sftp protocol could do the same. There is even one optimisation: The standards people also debated whether to support a 3rd return type, an "application data block" (see https://datatracker.ietf.org/doc/html/rfc7862#section-15.12), which consists of a { data_offset, data_total_length, data_pattern_bytes, data_pattern_bytes_len, number_of_data_patterns }. The idea is that if an application stores repeated patterns in a file the server can return this as "ADB". Use case examples include filling files with 0xdeadbeef to invalidate data, or fill everything with '\0'-bytes (which are valid data, holes in sparse files mean "no here here"). But that is just an optimization, and IMHO the current sftp/scp-work support for sparse files should handle only support for holes+data - but maybe keep the protocol flexible enough to (later!) add ADB support. > +/* > + * Check a potentially-sparse file for location of holes and data, starting > + * from "offset". If the next hole points to EOF, there are no remaining > holes. > + */ > +static void > +sftp_check_sparse_file(int fd, off_t offset, off_t *data_offset, > + off_t *hole_offset) > +{ > +#if defined(SEEK_HOLE) && defined(SEEK_DATA) > + if ((*hole_offset = lseek(fd, offset, SEEK_HOLE)) == -1) > + fatal_f("lseek(SEEK_HOLE): %s", strerror(errno)); > + if ((*data_offset = lseek(fd, offset, SEEK_DATA)) == -1) > + fatal_f("lseek(SEEK_DATA): %s", strerror(errno)); > +#else > + /* No sparse file support, assume data spans start to end. */ > + *data_offset = offset; > + if ((*hole_offset = lseek(fd, offset, SEEK_END)) == -1) > + fatal_f("lseek(SEEK_SET): %s", strerror(errno)); > +#endif > + if (lseek(fd, offset, SEEK_SET) == -1) /* restore cursor */ > + fatal_f("lseek(SEEK_SET): %s", strerror(errno)); > + debug3_f("offset %llu data_offset %llu hole_offset %llu", > + (unsigned long long)offset, (unsigned long long)*data_offset, > + (unsigned long long)*hole_offset); [snip] Notes (I didn't test the code yet): * Code for this must cover: - Normal, non-sparse files - Sparse files which only consists of a single hole, no data - Sparse files which begin with a hole - Sparse files which begin with data - Sparse files which end with a hole - Sparse files which end with data - Sparse files with 60000 holes (no joke, at SUN we had customers who had many more holes in files) ---- Bye, Roland -- You are receiving this mail because: You are watching the assignee of the bug. _______________________________________________ openssh-bugs mailing list openssh-bugs@mindrot.org https://lists.mindrot.org/mailman/listinfo/openssh-bugs