Re: Trimming the CPAN - "Automatic Purging"

Rene Schickbauer Tue, 30 Mar 2010 13:09:21 -0700

Hi!

Now the key, as I see it, is that unlike all the other use cases wherersync is used, large mirrors are likely to have their directoriesdirectly transfered from another mirror. So, the client that pulledthe tree update down could store a list of changed files, and theserver could then just use that list to determine which filesneed to be synced to the downstream mirror. (Sure, the original sitehas to generate the list, but if they use a tool like PAUSE to uploadthe files, that shouldn't be hard to do).
Agreed, but I'm not sure we've gotten past the stat storm on the server,
though.

Ok, this might be a complete wacky idea, but couldn't we use some kindof version control system.

Before you kick my backside, hear me out: This is of course verytheoretical at the moment, there are probably quite a number of pitfallsand kinks to work out...


Currently, there's CPAN and Backpan. With Backpan playing the archive.

Suppose, just suppose we see that as some kind of old style, simplisticversion control system, e.g. CPAN is a checkout of the latest version ofall files and Backpan holding the older versions.

Now, if we where to put all files into mercurial, git or the like,renaming the files so they don't have version numbers in their names butstoring them sequentially as commits so new versions update old ones.

Now, a new mirror would (once) ask for the latest version without thehistory of all the files, meaning it will have to make a complete"checkout" of the latest version. No way around it, really. We call thatversion FOO.

But, suppose 100 modules get updated on the main server, so the serverstores 100 changesets, which in many version control systems are storedsequentially in a single file. Call that version BAR.

Now the mirror wants to update again, calls the server and says, "i haveversion FOO, give me all updates". So the server looks up version FOO inthe file (via some shorter index list), open the main file, seeks to theindicated position and basically dumps the rest of the file via networkto the mirror. The mirror then applies this changeset by taking eachchunk as a patch and applying it to the corresponding file(s).

For fast mirroring and legacy clients, the main server still would havea full directory checkout, allowing the oldstyle sync. Compressed,slurpable tarballs can also be autogenerated like once a month.

This could also solve some long-standing problems, like having modulesavailable for legacy production environments. A user might still be ableto checkout a specific version of CPAN depending on his/her needs, like"give me CPAN as it was on 23th December 2007".

This could work like any modern, distributed version control systems.That way, the user would also be able to apply local patches and/ordeciding which changesets to pull in from the main server. Or have acomplete, local mirror and one for the production systems where he/shepulls in changes after they have been reviewed.



NOW its time to kick my butt, if you want to.

LG
Rene

Re: Trimming the CPAN - "Automatic Purging"

Reply via email to