On Mon, Jul 14, 2014 at 06:25:47PM +0200, Jakub Wilk wrote: > Description-md5 794.3 KiB 11.9%
Needed to provide a mapping as versions change a lot more often than descriptions do; also, historically, Translation-* were outside of the control of ftpmasters (at least, that is what history digging told me). It is also relatively new in the Packages file, which leads me to: With a slight change in semantic we could drop the field from the Packages file again anyhow: At the moment it is the MD5sum of the long description. If it isn't present the clients are expected to calculate it for themselves (well, this was required to work with Translation-* before we moved long descriptions to Translation-en, so very new clients might not know about that). So if we change this to MD5sum of whatever is in the description field (short or long), we could drop it from the Packages file and clients will again calculate this themselves to look stuff up with it in the Translation files (where this field came from). I haven't tested, but that should work without any change in apt (okay, apt-ftparchive needs to be patched), so first stop for someone wanting to drive this is probably dak - takers? Other servers and maybe clients need to be adapted, but that could be done rather uncoordinated as there is usually just one server creating both Packages and Translation-* files, so it will have the same semantic interpretation and clients either take what they get or already implicitely have the "whatever is in the field" semantic. (sidenote: see my other mail for the non-existent security implications of using md5 here if you care) > Description 463.4 KiB 7.0% ftpmaster's actually wanted to drop that in their final implementation of the long description splitout. We got the short description back as it wasn't part of the initial plan and clients didn't liked that (= apt-cache search would segfault for example), beside that I prefer to have at least a short description around in any case. I think if we drop one of them, it should be the -md5 field as it isn't as compressible as human-readable text… (not to mention quite useless for a human). > SHA256 1463.8 KiB 22.0% > SHA1 938.9 KiB 14.1% > MD5sum 752.4 KiB 11.3% I *guess* the most painless drop would be SHA1. Entirely dropping it from the archive means changing the pdiff infrastructure though. Someone ought to check that claim… Dropping MD5 will break some scripts parsing apt output. I personally hate breaking users, so any takers to check/fix that at least Debian tools do not break? Entirely dropping would be easy after this is done (modulo Description-md5 of course, but see there). Adding/Changing to SHA512 in the indexes is probably close to useless, in the Release file the benefit is probably not worthwhile, but it is here if need would arise. I have some hope that with apt/experimental we will be able to add new hashsums with less pain (aka: no abibreak), too, but that just as a sidenote. > [other fields - present hopefully only for comparison proposes] For the rest it is hopefully clear why we can't drop them, even though I kinda like the idea of dropping dependencies… would make installing stuff so much simpler… ;) > Format changes ala base-whatever, \0, … Changing the format is _*EXTREMELY*_ painful. It is also nice to have a textfile you can work with easily… If you want to improve, this improvement should be factored into a compression algorithm so that not every parser in the universe needs to be rewritten… (one of apts testcases uses 'rev' as a "compression" algorithm. You just need to set some options, advertise the availability in the Release file and you are good to go…) Best regards David Kalnischkies
signature.asc
Description: Digital signature