>>>>> "PB" == Petr Baudis <[EMAIL PROTECTED]> writes:

PB> It won't happen. Or rather, I hope the HTTP pulls become more efficient
PB> soon. Actually, perhaps Linus has something done already, my workstation
PB> is a bit derailed now so I couldn't pull from him in the last few days
PB> (hopefully will sort that out today).

PB> Hmm, yes, I guess Linus won't be touching the HTTP backend at all. ;-) I
PB> suggest you to check the last development in Linus' branch and sync with
PB> Daniel Barkalow, who promised improving the pull tools as well.

If this weekend is not too late, I have been brewing what is
called an "efficient pull from dumb servers" suite, which would
hopefully fill this gap.  I am still in the process of finishing
the details, but basically it already seems to work.

Linus, please drop the patch I sent you earlier, privately by
mistake not CCing the list, that implemented only the server
end.  I've changed some file formats already from that one.

The outline of how it works is like this.

 * I assume a dumb transport (read: static files only HTTP
   server) and no on-request server side processing.  All the
   smarts must go in the client.  The server side X.git being an
   ordinary GIT archive (no need for files in the work tree),
   plus:

   - X.git/objects/pack can have packed GIT archives.  I
     envision that this will be a series of 5 to 20 MB packs,
     occasionally adding a new incremental pack when
     X.git/objects/??/ directories accumulate enough standalone
     SHA1 files.  It is not necessary to have X.git/objects/??/
     files if an object is contained in one of the packs.

   - X.git/info/ has three extra files.

     - "inventory" lists all the branches stored in X.git/refs
       and looks like this (contents and path):

          ff83c8f3554ceb444b413beaeb49b4a781dae944 snap/0
          013e7c7ff498aae82d799f80da37fbd395545456 snap/10
          ff83c8f3554ceb444b413beaeb49b4a781dae944 heads/master
          dd7ba8b4949535c24e604a37709db0e3be9ccbbc heads/linus

       This is to facilitate discovery from a transport that is
       not so "ls" friendly, like HTTP.

     - "pack" lists available packs under X.git/objects/pack and
       looks like this (size and name):

          432495 pk-65fe69e9bc2e8a3e0881e008dde182522156ba7c.pack

       The file is there for discovery.  The size is used by the
       client to discover optimum set of packs to slurp.

     - "rev-cache" is a binary file that describes commit
       ancestry information in a dense format.  It lists all
       commits available from this repository along with who
       its parents are for each of the commit.  This file is
       produced append-only, so that the server side can use
       rsync based mirroring scheme.

   A new command "git-update-dumb-server" is used to prepare
   these three files.  There may need a helper script that uses
   git-pack-objects and friends to prepare packs partitioned to
   allow pulling a popular branch efficiently.

 * The client side is called "git-dumb-pull-script".  This
   downloads the above three files, and .idx files associated
   with packs described in "pack".  With the information in
   "inventory" about desired branch to pull from along with
   "rev-cache" ancestry information, it discovers the set of
   commits that is lacking from its local store.  By comparing
   that list with downloaded .idx files, along with size
   information for each pack, it comes up a list of packs to
   download to cover the most commits that it wants to obtain,
   and downloads them, verifies them and stores them in its
   .git/objects/pack/ directory.

   The above process of downloading packs would typically not
   cover all the things lacking, because some new commits may
   not be in any of the packs.  After this point, the usual
   commit-walking git-http-pull can be used to fill the rest,
   and it does not have to pull that many objects.  Dan's
   http-pull parallelism improvement would be very useful
   independently here.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to