Paul, any chance of having this pulled in? To recap, this simply uses posix_fadvise to provide a hint to the OS that we’re going to perform a sequential read of the source files when creating an archive. In our testing on linux 4.4 when creating a tar archive with source files on a ext4 filesystem (on a SAN volume) this patch doubles tar's throughput. This is because when linux is provided the FADV_SEQUENTIAL hint it doubles the readahead on the underlying block device.
Any feedback is welcome. Carlo > On Apr 13, 2017, at 1:32 PM, Carlo Alberto Ferraris <ca...@strayorange.com> > wrote: > > Paul, > friendly ping. > > Carlo > >> On Apr 5, 2017, at 10:16 AM, Carlo Alberto Ferraris <ca...@strayorange.com >> <mailto:ca...@strayorange.com>> wrote: >> >> Just as a comment about why in my patch I use len explicitly instead of 0: >> it’s to workaround a bug in linux versions <2.6.6. >> >> > In kernels before 2.6.6, if len was specified as 0, then this was >> > interpreted >> > literally as "zero bytes", rather than as meaning “all bytes through to >> > the >> > end of the file”. >> > http://man7.org/linux/man-pages/man2/posix_fadvise.2.html#BUGS >> > <http://man7.org/linux/man-pages/man2/posix_fadvise.2.html#BUGS> >> >> Since 0 means is supposed to mean “until the end of file”, passing >> explicitly the length of the file (that we already have) should be >> semantically the same. >> >> Carlo >> >>> On Apr 4, 2017, at 10:14 PM, Mark <ma...@clara.co.uk >>> <mailto:ma...@clara.co.uk>> wrote: >>> >>> On Mon, April 3, 2017 03:17, Paul Eggert wrote: >>>> I've lost context. I prefer not having this depend on an environment >>>> variable. >>>> >>>> Can't the filesystem in question be fixed to have decent performance in >>>> the typical case where applications access files sequentially? It's not >>>> like 'tar' is a special case. I'd hate to have to modify lots of >>>> programs just to work around a lame filesystem. >>> >>> I think you're confusing two things: >>> - Carlo's patch >>> - The suggestion to allow the user to tell tar to use POSIX_FADV_NOREUSE >>> and/or POSIX_FADV_DONTNEED. In certain scenarios one, both or neither of >>> those could perform best. [On Linux POSIX_FADV_NOREUSE is currently a >>> no-op.] >>> >>> Let's ignore the second point for now. >>> >>> Carlo's patch at >>> https://github.com/CAFxX/tar/commit/8b3ccb099c6ddf9f03d12d1f7c433c7927b964d5 >>> >>> <https://github.com/CAFxX/tar/commit/8b3ccb099c6ddf9f03d12d1f7c433c7927b964d5> >>> uses >>> posix_fadvise(fd, offset, len, POSIX_FADV_SEQUENTIAL); >>> posix_fadvise(fd, offset, len, POSIX_FADV_WILLNEED); >>> to give a hint to the OS/filesystem about how the file will be accessed. >>> There shouldn't be any down-side to doing that. On Linux for example, >>> POSIX_FADV_SEQUENTIAL causes the filesystem read-ahead amount to be >>> doubled. >>> >>> I'm not qualified to say whether the patch should be committed as-is, but >>> the principle is sound. [I might choose a different name for the >>> prefetch() function though.] >>> >>> >> >