bsdtar has a similar optimization. It decouples reads and writes, allowing it to use a more optimal size for each side.
When it opens an archive for writing, it checks the target device type. If it’s a character device (such as a tape drive), it writes the requested blocks exactly. When the target device is a block device, however, it instead buffers and writes much larger blocks, padding the file at the end as necessary to ensure the final size is a multiple of the requested block size. This produces the exact same end result as if it had written blocks as requested but much more efficiently. Tim > On Jul 18, 2018, at 9:58 AM, Andreas Dilger <adil...@dilger.ca> wrote: > > On Jul 18, 2018, at 9:03 AM, Ralph Corderoy <ra...@inputplus.co.uk> wrote: >> >> Hi Christian, >> >>> $ stat -c %o data/blob >>> 2097152 >> ... >>> **tar** does not explicitly use the block size of the file system >>> where the files are located, but, for a reason I don't know (feel free to >>> educate me), 10 KiB: >> >> Historic, that being 20 blocks where a block is 512 B. See `Blocking >> Factor'. https://www.gnu.org/software/tar/manual/tar.html#SEC160 >> >> It can be changed. >> >> $ strace -e write -s 10 tar cbf 4096 foo.tar foo >> write(3, "foo\0\0\0\0\0\0\0"..., 2097152) = 2097152 >> +++ exited with 0 +++ >> $ >> >>> I would like to propose to use the native file system block size in >>> favor of the currently used 10 KiB. >> >> I can't see the default changing. POSIX's pax(1) states for ustar >> format that the default for character devices is 10 KiB, and allows for >> multiples of 512 up to an including 32,256. So you're suggesting the >> default is to produce an incompatible tar file. > > The IO size from the storage does not need to match the recordsize > of the tar file. It may be that writing to an actual tape character > device needs to use 10KB writes, but for a regular file on a block > device (which is 99% of tar usage) it can still write 10KB records, > but just write a few hundred of them at a time. > > What network filesystem are you using? Typically, such small IOPS > should be hidden from the filesystem with readahead and writeback > cache, though of course there is still more overhead from having > lots of system calls. > > Cheers, Andreas