Hi! On Mon, 2016-05-23 at 12:22:47 +0200, Guillem Jover wrote: > Yesterday I had an inspiration for some crazy proposal related to the > PAX stuff. :) We could switch from an ar container to an uncompressed > PAX container, which has no limits. To preserve backward compatibility > at least when it comes to detecting that this is a .deb format, we could > use the first PAX header name field to store the ar magic and first ar > header and contents. Because the PAX header's filename is supposed to > be ignore anyway for archivers supporting the PAX format, but that might > be used as the extracted filename for ones that do not. > > The nice thing is that the ustar header has a name field which is 100 > chars long, and the ar magic + entire header is 68 chars long, which > both start at offset 0, and both are ASCII. Of course this might actually > confuse file detectors and archiving tools quite a bit, but seems like an > interesting hack. :) > > (This reminded me of the multiple executable formats in the same file > hacks. :)
And as a PoC, here's some crazyness: ,--- $ mkdir -p meta fsys/dir $ touch meta/control $ touch fsys/dir/test $ tar --create --xz --file meta.tar.xz -C meta . $ tar --create --xz --file fsys.tar.xz -C fsys . $ magic="!<arch> debian-binary 1444848908 0 0 100644 4 \` 3.0 " $ tar --create --file test-v3.deb --blocking-factor=4 --format=posix \ --pax-option globexthdr.name="$magic",dpkg.format=3.0 \ meta.tar.xz fsys.tar.xz $ file test-v3.deb test-v3.deb: POSIX tar archive $ file -e tar test-v3.deb test-v3.deb: Debian binary package (format 3.0) $ dpkg-deb -I test-v3.deb dpkg-deb: error: archive is format version 3.0; get a newer dpkg-deb `--- So, one problem is that by default file(1) does get confused. Another problem is that PAX is still (AFAIK) not a very widespread format (?), it's pretty standard though. :) Concocting the above should also be possible on conformant systems with a pax(1) utility. Another issue is that due to the additional padding and extra tar header entries needed by the PAX format, a farily empty package becomes way fatter. For an archive with 2 members we need 1 global extended header, 2 file extended headers, 2 normal headers, plus the data blocks for each extended header, and the data for each file member rounded to the nearest 512 byte block, plus 2 extra zero-filled blocks at the end; that makes a minimum of at least 6 KiB. So a minimal PAX archive would weight around 6 KiB, while a minimal format 2.0 weights less than 200 bytes. This could be relevant for udebs, but we could make dpkg-deb only use format 3.0 when needed, or udebs could be forced to use --deb-format=2.0. In any case this seems insignificant for the overall archive, let's say with 60k packages / arch * 12 arches * 6 KiB / package =~ 4.2 MiB of total overhead. I've updated the <https://wiki.debian.org/Teams/Dpkg/TimeTravelFixes> page with the new proposals, and will add a link to this post. Thanks, Guillem