Il giorno ven 25 ott 2024 alle ore 07:51 Sergey Poznyakoff <g...@gnu.org.ua> ha scritto: > > Hi Matteo, > > > I'm developing a feature which aligns the file content to the > > filesystem boundary by adding a PAX comment header. > > This, along with, the --offset option will allow to extract a Debian > > package (which is an an archive embedding a tar one), > > by using under Linux the FICLONERANGE ioctl(), and extract the archive > > without doing any IO. > > The idea is interesting. However, I believe that "without doing any IO" > is an exaggeration. Indeed, FICLONERANGE allows you to make contents > of one file descriptor appear under another file descriptor and that > doesn't require additional I/O. But *reading* from that other descriptor > would mean I/O activity, as usual. So, it seems that this will have the > same effect as running > > ar p DEBFILE | tar tfJ - > > (modulo compression option). Am I missing something? >
Hi Sergey, As you're interested, I'm providing more details. During the download phase, the one done by apt, I pipe the archives into a tool named `debcow`. This tool adds a member before the data.tar.xz to align the data starting offset, uncompresses it and adds comments in the PAX header to align the file data: $ ar -tO coreutils_9.1-1_amd64.deb debian-binary 0x44 control.tar.xz 0x84 data.tar.xz 0x1c3c $ debcow <coreutils_9.1-1_amd64.deb >coreutils_9.1-1_amd64_cow.deb $ ar -tO coreutils_9.1-1_amd64_cow.deb debian-binary 0x44 control.tar.xz 0x84 _data-pad 0x1c3c data.tar 0x2000 Once I have the data.tar and the files aligned on the block boundary, I extract the files with FICLONERANGE instead of the usual read()/write() loop: $ { dd status=none bs=8k skip=1 count=0 && strace -eopenat,ioctl tar -x -C root/ --reflink ; } <coreutils_9.1-1_amd64_cow.deb openat(3, "./bin/cat", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0755) = 4 ioctl(4, BTRFS_IOC_CLONE_RANGE or FICLONERANGE, {src_fd=0, src_offset=4096, src_length=45056, dest_offset=0}) = 0 openat(3, "./bin/chgrp", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0755) = 4 ioctl(4, BTRFS_IOC_CLONE_RANGE or FICLONERANGE, {src_fd=0, src_offset=53248, src_length=69632, dest_offset=0}) = 0 openat(3, "./bin/chmod", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0755) = 4 ioctl(4, BTRFS_IOC_CLONE_RANGE or FICLONERANGE, {src_fd=0, src_offset=126976, src_length=65536, dest_offset=0}) = 0 openat(3, "./bin/chown", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0755) = 4 ioctl(4, BTRFS_IOC_CLONE_RANGE or FICLONERANGE, {src_fd=0, src_offset=196608, src_length=73728, dest_offset=0}) = 0 yes, some I/O is still needed, but only to read the file entries from the tar archive, the file content can be skipped. So when extracting bigger files, the improvement can be substantial. Regards, -- Matteo Croce perl -e 'for($t=0;;$t++){print chr($t*($t>>8|$t>>13)&255)}' |aplay