Dear tar Community, We are using **tar** at our High-Performance Computing (HPC) at our research institute iDiv. The networked file system serving (scientific) data on our cluster is using a block size of 2 MiB:
``` $ mkdir data $ dd if=/dev/zero bs=2M count=42 of=data/blob status=none $ stat -c %o data/blob 2097152 ``` **tar** does not explicitly use the block size of the file system where the files are located, but, for a reason I don't know (feel free to educate me), 10 KiB: ``` $ tar --version | head -1 tar (GNU tar) 1.30 $ strace -T -ttt -ff -o tar-1.30.strace tar cf data.tar data $ strace-analyzer io tar-1.30.strace.59539 | grep data | column -t read 84M in 444.041 ms (~ 189M / s) with 8602 ops (~ 10K / op, ~ 10K request size) data/blob write 84M in 404.483 ms (~ 208M / s) with 8602 ops (~ 10K / op, ~ 10K request size) data.tar.gz ``` If you're interested, you can find strace-analyzer [here](https://github.com/wookietreiber/strace-analyzer). It is, more or less, just doing some stats over the strace log. Especially for a networked file system, the comparatively high amount of IOPS with that block size results in not so good performance. Using the native file system block size would generally yield better performance. I would like to propose to use the native file system block size in favor of the currently used 10 KiB. The block size can be queried with the `stat` syscall, just like with the `stat` command from above. If the syscall does not return the block size, e.g. if the file system does not support it, the current default of 10 KiB could still be applied as a fallback. What do you think about an improvement like this? I can offer to try to implement this myself and provide a patch. I'm fairly new to GNU Savannah, so I'm still a bit fuzzy on what the preferred way to submit patches to the project is (I'm used to the fork plus pull request / merge request model as you can find on GitHub/GitLab). Best Regards -- Christian Krause Scientific Computing Administration and Support ----------------------------------------------------------------------------- Email: christian.kra...@idiv.de Office: BioCity Leipzig 5e, Room 3.201.3 Phone: +49 341 97 33144 ----------------------------------------------------------------------------- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig Deutscher Platz 5e 04103 Leipzig Germany ----------------------------------------------------------------------------- iDiv is a research centre of the DFG – Deutsche Forschungsgemeinschaft iDiv ist eine zentrale Einrichtung der Universität Leipzig im Sinne des § 92 Abs. 1 SächsHSFG und wird zusammen mit der Martin-Luther-Universität Halle-Wittenberg und der Friedrich-Schiller-Universität Jena betrieben sowie in Kooperation mit dem Helmholtz-Zentrum für Umweltforschung GmbH – UFZ. Beteiligte Kooperationspartner sind die folgenden außeruniversitären Forschungseinrichtungen: das Helmholtz-Zentrum für Umweltforschung GmbH - UFZ, das Max-Planck-Institut für Biogeochemie (MPI BGC), das Max-Planck-Institut für chemische Ökologie (MPI CE), das Max-Planck-Institut für evolutionäre Anthropologie (MPI EVA), das Leibniz-Institut Deutsche Sammlung von Mikroorganismen und Zellkulturen (DSMZ), das Leibniz-Institut für Pflanzenbiochemie (IPB), das Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung (IPK) und das Leibniz-Institut Senckenberg Museum für Naturkunde Görlitz (SMNG). USt-IdNr. DE 141510383