>>>>> "PB" == Petr Baudis <[EMAIL PROTECTED]> writes:
PB> It won't happen. Or rather, I hope the HTTP pulls become more efficient PB> soon. Actually, perhaps Linus has something done already, my workstation PB> is a bit derailed now so I couldn't pull from him in the last few days PB> (hopefully will sort that out today). PB> Hmm, yes, I guess Linus won't be touching the HTTP backend at all. ;-) I PB> suggest you to check the last development in Linus' branch and sync with PB> Daniel Barkalow, who promised improving the pull tools as well. If this weekend is not too late, I have been brewing what is called an "efficient pull from dumb servers" suite, which would hopefully fill this gap. I am still in the process of finishing the details, but basically it already seems to work. Linus, please drop the patch I sent you earlier, privately by mistake not CCing the list, that implemented only the server end. I've changed some file formats already from that one. The outline of how it works is like this. * I assume a dumb transport (read: static files only HTTP server) and no on-request server side processing. All the smarts must go in the client. The server side X.git being an ordinary GIT archive (no need for files in the work tree), plus: - X.git/objects/pack can have packed GIT archives. I envision that this will be a series of 5 to 20 MB packs, occasionally adding a new incremental pack when X.git/objects/??/ directories accumulate enough standalone SHA1 files. It is not necessary to have X.git/objects/??/ files if an object is contained in one of the packs. - X.git/info/ has three extra files. - "inventory" lists all the branches stored in X.git/refs and looks like this (contents and path): ff83c8f3554ceb444b413beaeb49b4a781dae944 snap/0 013e7c7ff498aae82d799f80da37fbd395545456 snap/10 ff83c8f3554ceb444b413beaeb49b4a781dae944 heads/master dd7ba8b4949535c24e604a37709db0e3be9ccbbc heads/linus This is to facilitate discovery from a transport that is not so "ls" friendly, like HTTP. - "pack" lists available packs under X.git/objects/pack and looks like this (size and name): 432495 pk-65fe69e9bc2e8a3e0881e008dde182522156ba7c.pack The file is there for discovery. The size is used by the client to discover optimum set of packs to slurp. - "rev-cache" is a binary file that describes commit ancestry information in a dense format. It lists all commits available from this repository along with who its parents are for each of the commit. This file is produced append-only, so that the server side can use rsync based mirroring scheme. A new command "git-update-dumb-server" is used to prepare these three files. There may need a helper script that uses git-pack-objects and friends to prepare packs partitioned to allow pulling a popular branch efficiently. * The client side is called "git-dumb-pull-script". This downloads the above three files, and .idx files associated with packs described in "pack". With the information in "inventory" about desired branch to pull from along with "rev-cache" ancestry information, it discovers the set of commits that is lacking from its local store. By comparing that list with downloaded .idx files, along with size information for each pack, it comes up a list of packs to download to cover the most commits that it wants to obtain, and downloads them, verifies them and stores them in its .git/objects/pack/ directory. The above process of downloading packs would typically not cover all the things lacking, because some new commits may not be in any of the packs. After this point, the usual commit-walking git-http-pull can be used to fill the rest, and it does not have to pull that many objects. Dan's http-pull parallelism improvement would be very useful independently here. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html