The man page section on sparse files[1] suggests that the "old GNU format" is risky, as it violates POSIX. As such, I was expecting to use tar --sparse-version=1.0 to get better behaviour, or for that to be the default for --sparse.
However, for the default of --format=gnu, --sparse-version=1.0 is silently ignored, and the "old GNU format" is written, as you can see by the 'S' (GNUTYPE_SPARSE) at 0x9C in the header here: % truncate -s 100M foo % tar --create --format=gnu --sparse --sparse-version=1.0 foo > foo.tar % xxd foo.tar | sed -n 9,11p 00000080: 3030 3130 3030 3000 3133 3136 3436 3430 0010000.13164640 00000090: 3334 3600 3031 3534 3536 0020 5300 0000 346.015456. S... 000000a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ My expectation here is that --sparse-version=1.0 is an error when combined with --format=gnu, and that you must specify --format=pax for --sparse-version=1.0. Alternatively, if --format=gnu is not specified on the command line, --sparse-version=1.0 implies --format=pax? The tests for --sparse-version seem unaware of this restriction, although I note that they are skipped by default. I didn't look at this very hard. Does this format cause problems in the real world? If not, maybe the manual should be changed to explain that the GNU format is widely used and doesn't have any issues. Tools seem to understand it fine: bsdtar from libarchive 3.2.2, python's tarfile module, tar-rs for Rust, ... I note that the pax format embeds the current pid[2] into[3] the tar file, apparently as a source of a "unique" random number, which seems rather bizarre, and breaks reproducibility. Perhaps a good motivation not to move things to --format=pax by default. Chris. 1: https://www.gnu.org/software/tar/manual/html_section/tar_92.html 2: https://git.savannah.gnu.org/cgit/tar.git/tree/src/xheader.c?id=1e6d2c048819e57e74dfd6aa8bd4eaff0d86d24b#n296 3: https://git.savannah.gnu.org/cgit/tar.git/tree/src/xheader.c?id=1e6d2c048819e57e74dfd6aa8bd4eaff0d86d24b#n366