hi by quite a coincidence, while you people were discussing this idea, I was already implementing it, in a package called 'debdelta' : see http://lists.debian.org/debian-devel/2006/05/msg03120.html
Moreover, by some telepathy :-) , I already included features you were proposing, and addressed problems you where discussing (and other problems you were not discussing since you did not try implementing it :-) Here are the replies: To curt manucredo : while the implementation is not exactly what you were suggesting in your original email, it still achieves all desired goals; moreover, it is alive an kicking. 'debdelta' differs from your implementation in this respect: - it does not use dpkg-repack (for many good reasons, see below) - it recreates the new .deb , and guarantees that it is equal to the one in archives, so archive signatures can be verified; currently it does not patch into the filesystem (altough this would be an easy adaptation, if anybody wishes for it) 'debdelta' conforms to your requests, in that - it can recreate the new .deb, either using the installed version of the old .deb, or old .deb file. On the bright side, everything is already working, there is already a repository of patches available, and a method of downloading them. To Tyler MacDonald : - 'debdelta' uses 'bsdiff' , or 'xdelta' as a fallback, see below - regarding this: > Some work will have to go into the math to determine when it's > actually more efficient to download the latest archive, etc.... just a > fleeting mental note, the threshold should not be 100% of the full archives > size, it should be 90 or 80% due to the CPU/RAM overhead of patching and the > bandwidth/latency overhead of requesting multiple patch files vs. one > stream of data. This math must go in the client side, and it is in my TODO list (see at the end of the README); it is a nice exercise in Dynamical Programming. Anyway , currently the archive discards deltas that exceed ~50% of the new .deb , just as an heuristic, and to keep disk usage low. To Goswin von Brederlow : >| bsdiff is quite memory-hungry. It requires max(17*n,9*n+m)+O(1) Ah so this is the correct formula! The man page just says '17*n'. But in my stats, that that is not the case; my stats are estimating that the memory is '12*n' so that is what I use >| bytes of memory, where n is the size of the old file and m is the >| size of the new file. bspatch requires n+m+O(1) bytes. > That is quite unacceptable. We have debs in debian up to 160Mb 'debdelta' has an option '-M ' to choose between 'xdelta' and 'bsdiff' ; by default, it uses 'xdelta' when memory usage would exceed 50Mb ; but in the server, I set '-M 200' since I have 1GB RAM there. > Seems to be quite useless for patching full debs. One would have to > limit it to a file-by-file approach. This is in my TODO list. Actually, I have in mind a scheme to break TARs at suitable points, I have to check if it is worthwhile ; I can discuss details. To: Tyler MacDonald again: > True.. It'd probably only be efficient if the deltas were based on > the contents of the .deb's before they're packed. .. and this is the reason why I do not use dpkg-repack... why unpacking data when I need them unpacked ? :-) Absolutely true. Look at this $ ls -s tetex-doc_3.0-17_all.deb tetex-doc_3.0-18_all.deb 42388 tetex-doc_3.0-18_all.deb 42340 tetex-doc_3.0-17_all.deb $ bsdiff tetex-doc_3.0-17_all.deb tetex-doc_3.0-18_all.deb brutal.bsdiff $ ls -s brutal.bsdiff 10092 brutal.bsdiff Hat tip to 'bsdiff', but we can do better... $ ar p tetex-doc_3.0-17_all.deb data.tar.gz | zcat > /tmp/17.tar $ ar p tetex-doc_3.0-18_all.deb data.tar.gz | zcat > /tmp/18.tar $ ls -s /tmp/17.tar /tmp/18.tar 53532 /tmp/17.tar 53580 /tmp/18.tar $ time bsdiff /tmp/17.tar /tmp/18.tar /tmp/tar.bsdiff times: real 2m4.994s user 2m3.947s memory: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9784 debdev 25 0 471m 470m 1384 T 0.0 46.5 1:18.82 bsdiff size: 92 /tmp/tar.bsdiff so as you see, the reduction in size is impressive, but it uses too much memory and takes too much time. $ time xdelta delta -m 50M -9 /tmp/17.tar /tmp/18.tar /tmp/tar.xdelta times: real 0m1.728s user 0m1.660s memory... it is too fast size: 236 /tmp/tar.xdelta still good enough for our goal ---- Comparing to the above $ ls -s pool/main/t/tetex-base/tetex-doc_3.0-17_3.0-18_all.debdelta 288 pool/main/t/tetex-base/tetex-doc_3.0-17_3.0-18_all.debdelta (the extra 35kB are the script that 'debpatch' uses :-( actually, I told 'debdelta' to use 'bzip' instead of gzip in this cases, but it did not... just found another bug :-) ) To: Marc 'HE' Brockschmidt <[EMAIL PROTECTED]>: > Now the interesting questions: How many diffs do you keep? very few, currently, due to space constraints; moreover , suppose that you have a_1.deb installed, a_1_2.debdelta and a_2_3.debdelta are in pool of deltas, want to upgrade to a_3.deb This would work if done by hand, just doing $ debpatch a_1_2.debdelta / /tmp/a_2.deb $ debpatch a_2_4.debdelta /tmp/a_2.deb /tmp/a_3.deb but 'debdelta-upgrade' now is uncapable to exploit this situation; so I keep only one delta for each deb > How do you > integrate this approach with the minimal security Release files give us > today? recreated debs are identical to original in archive. Currently the best way to use my package is: $ apt-get update $ su nobody -c debdelta-upgrade $ mv /tmp/archives/*deb /var/cache/apt/archives $ apt-get upgrade (By default , debdelta-upgrade puts the resulting .deb in /tmp/archives; use --dir to your taste, though ) As you see , I propose to run debdelta-upgrade not as root, since it is still in development. > What about the kind of signatures dpkg-sig provides? Those are supported. 'debdelta' reproduces everything it sees into the .deb file, considering it as an 'ar' archive (altough it is not exactly a 'ar' archive, since 'ar' adds a '/' in the header , 'dpkg' does not ); it just treats control.tar.gz and data.tar.gz in a smarter way. ----- other FAQ I made up for you Q: What about .debs where the data part is compressed with bzip ? A: currently, is unsupported (I never found one :-) but I did write some code to support it. Q: can 'debpatch' recreate the new .deb using the installed old .deb, even when - there are dpkg-diversions ? - conf files where modified ? A: yes, yes. Q: can 'debpatch' recreate the new .deb using the installed old .deb, when 'prelink' is used in the host? A: currently, no. a. -- Andrea Mennucc -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]