(picked up from http://www.debianplanet.org/article.php?sid=633)
The scalability problems of the Packages file is a recognised problem that has been discused many times on this list, i think the following idea could go a long way to solving it. The current method of checking for updates is to retrieve a new Packages.gz file and discard the old Packages.gz file. The problem with this method is that commonly less than 1% of the Packages.gz file has changed. A number of solutions have been proposed to overcome this problem, these include - Compressing the Packages.gz in an rsync friendly manner. - Making diffs from older Packages files available. - Splitting the Packages.gz into multiple files. - Reducing the size of the metadata for each package. Each of the above ideas has its own problems that have been discussed on this list. An idea that i havent heard mentioned here is to create a client/server application for specifically handling our metadata, the server can be queried by clients to send only the required metadata. Checking for updates could go something like this. 1) query the server for all the package names and version in woody, 2) Compare the results to your previous metadata to determine which packages have change. 3) query the server for the metdata of the changed packages. 4) reconstruct the Packages file with the new metadata. Advantages - Compatable with existing packaging tools, it can compliment rather than replace the existing method. - Reduced bandwidth to a minimum - Flexible, different queries could be implemented to handle other unforseen situations. - other ? Disadvantages - implementation may take a bit of work. - requires a new server to be run rather than using standard file transport tools. - other ? One idea is to do it with LDAP, but i dont know enough to comment. Glenn
pgpAbhUi3u6dc.pgp
Description: PGP signature