Salut tout le monde, Some time ago (*cough* 2009), I had a play with working out how to apply pdiffs more efficiently than apt currently does, and implemented a proof of concept in python [0]. There weren't any replies (even a "ooo, cool") when I posted to the deity list, so I left it at that; though trolling google now, I see that it got mentioned on -devel [1] not all that long ago (*cough* 2012), so apparently it did get read after all. Not that it looks like it would've done much good, because it seems like the script I put on the web, and the one I was actually using are completely different to the extent that the one on the web even has syntax errors. WTF?
Anyhoo... Like many, I've been getting annoyed at how tedious pdiff application is over the past year, so over Christmas I thought I'd have a go at patching apt to do things "properly". I've got it to the point where it works now, and apt-get update reports things like: Fetched 199 MB in 50s (3919 kB/s) over a 50kB/s link when updating testing+unstable for i386+amd64 for a day's worth of changes, rather than reporting 7000 B/s over a megabit link. It should also be much more pleasant on SSDs, since it only attempts to write each diff twice (once compressed, once uncompressed), and each packages file once; rather than writing each diff once, and the Packages file once for each diff. My patched apt source is up on github at: https://github.com/ajtowns/apt/tree/better-pdiffs My current theory is that it's best to re-specify the .diff/Index format by adding an additional header Patch-Step-Size: 1 1 4 3 2 1 1 which would indicate that there are six patches available, and that if you start from the oldest known version, you should apply patch 1, then patch 2 (patch 1 + 1), then patch 3 (patch 2 + 1), then patch 7 (patch 3 + 4), and then you're done (patch 7 + 1 == patch 8, which doesn't exist). If you wanted to run an archive where you only had to download one patch and apply it no matter how old your Packages file was, the line would look like: Patch-Step-Size: 12 11 10 9 8 7 6 5 4 3 2 1 (I think a nice efficient compromise method would be Patch-Step-Size: 8 8 4 4 4 4 2 2 1, which would only update a lg(diffs) each time you updated the packages file, and only require users to download and apply a maximum of lg(diffs) to get up to date) If the line's not present, it should be treated as: Patch-Step-Size: 1 1 1 1 1 1 1 ... which would be compatible with the way Debian does things, but presumably not with some derivatives. Anyway, the code works adequately as a proof of concept and I don't think it should need a major re-write, but the things I think still need doing to it are: * make sure main sha256/sha512/etc hashes are being checked after patches are applied * verify that diff sha1 hashes are being checked on download * don't keep diffs around after they've been applied * add support for Patch-Step-Size as above * more robust parsing of diff files in rred method * review code paths in acquire-item to make sure they're sane * fix up debug message for rred method and diff acquire-item classes * consider just reading the gzipped diffs directly rather than uncompressing them * check that the http downloads are being done as efficiently as possible (?) * consider leaving the diffs in partial/ * allow resuming downloading diffs, rather than always redownloading them from scratch * add an option to only allow diffs up to a fixed total size for embedded systems (since the diffs are loaded into memory as a whole before being applied) * add/update test suite? * fix up any code style issues * fix up rred method so it can be used directly from the command line to merge and apply ed-diffs * work out where ed diffs get generated these days, and provide patch to generate Patch-Step-Size header, and optionally use rred method to update patches to make it a bit easier to download stuff If anyone wants to poke around, please do! Cheers, aj [0] https://lists.debian.org/deity/2009/08/msg00169.html [1] https://lists.debian.org/debian-devel/2012/02/msg00411.html -- Anthony Towns <a...@erisian.com.au> -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/cajs_lcvjtdczrc7nneappeffv0kjwls3iq9s+xe9x1__dgm...@mail.gmail.com