While things are quiet (I envy everybody having fun at OLS),
I've been cooking something to help clients to pull from dumb
servers.

I assume that:

 - The object database is packed, following the recommendations
   in the "Working with Others" section of the tutorial.

 - The repository owner _may_ further create throw-away
   incremental packs.  There can be the following in one object
   database:

     - one baseline pack.
     - permanent incremental packs #1 .. #N
     - one throw-away incremental pack.
     - unpacked files under objects/??/.

   Baseline and permanent incremental packs are built by "git
   repack", just like Linus recommended from the beginning.  The
   throwaway pack is built periodically (say every hour) to
   collect all objects that are not in the baseline nor
   permanent incrementals.  Building of such a throw-away pack
   involves:

     - unpacking and removal of the current throw-away pack.
     - running "git repack".
     - running "git prune-packed".

 - The server could be truly dumb and can even refuse to serve
   dirindex; parsing autogenerated index.html is a pain anyway.

First, a somewhat related change I did was to write a script
called "git ls-remote".  It is used this way:

    $ git ls-remote origin
    17c0bd743c1c8113cd0ed72b7ca1776d13c27e01    HEAD
    17c0bd743c1c8113cd0ed72b7ca1776d13c27e01    refs/heads/master
    f0b32737ad5a35cc047db47353a75faccfe5939e    refs/heads/linus
    4d9ae497491fd838dafd7fcbd11c4aa678a726f1    refs/heads/pu
    d6602ec5194c87b0fc87103ca4d67251c76f233a    refs/tags/v0.99
    f25a265a342aed6041ab0cc484224d9ca54b6f41    refs/tags/v0.99.1

It slurps the set of refs from a remote repository (the same
short-hand we stole from Cogito using .git/branches/ can be used
here) and optionally it can be told to store tags under local
refs/.

This is produced by connecting directly to the git-daemon
running on the remote side and talking upload-pack protocol with
it.  A new helper program "git-peek-remote" is used to do this
when we use git:// URL.  From an rsync URL, everything under its
refs/ is copied to a temporary directory to produce the same
information.

To support the same on a dumb transport, I gave the server side
a new command, "git update-server-info", which prepares this
information in "$repo/info/refs", so writing http support for
"git ls-remote" using curl is trivial.  I arranged things so
that update-server-info is run whenever you push into the
repository via "git push".  You can of course run it by hand
from the command line.

The other file that update-server-info produces is to help dumb
pullers.  It is stored in "$repo/objects/info/pack", and looks
like this:

    P pack-c60dc6f7486e34043bd6861d6b2c0d21756dde76.pack
    P pack-e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135.pack
    D 0 1
    D 1
    T 0 9fb1759a3102c26cd8f64254a7c3e532782c2bb8 commit
    T 0 a339981ec18d304f9efeb9ccf01b1f04302edf32 tag
    T 1 0397236d43e48e821cce5bbe6a80a1a56bb7cc3a tag
    T 1 043d051615aa5da09a7e44f1edbb69798458e067 commit
    T 1 06f6d9e2f140466eeb41e494e14167f90210f89d tag
    T 1 26791a8bcf0e6d33f43aef7682bdb555236d56de tag
    T 1 5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c tag
    T 1 701d7ecec3e0c6b4ab9bb824fd2b34be4da63b7e tag
    T 1 733ad933f62e82ebc92fed988c7f0795e64dea62 tag
    T 1 9e734775f7c22d2f89943ad6c745571f1930105f tag
    T 1 c521cb0f10ef2bf28a18e1cc8adf378ccbbe5a19 tag
    T 1 ebb5573ea8beaf000d4833735f3e53acb9af844c tag

The lines that start with a 'P' list all the packs available in
this object database (relative to $repo/objects/pack).  These
packs are implicitly numbered starting at 0 in the order they
appear in the file; in the above, the pack c60dc6... is pack #0
and e3117b... is pack #1.

The lines that start with a 'D' list the dependencies.  "D 0 1"
says, pack #0 is not complete and refers to objects found in
pack #1 (e.g. a commit object in pack #0 has a subtree that is
the same one found in pack #1 hence pack #0 does not contain
that tree).  "D 1" shows that the pack #1 is self sufficient and
does not depend on anything (it is the linux-2.6 baseline pack).
Of course, you could have a pack that depends on more than one
packs, in which case you would see something like "D 4 1 2 3" to
mean pack #4 depending on packs #1, #2 and #3.

If the repository follows the "baseline, permanent incrementals,
and one throw-away" scheme I outlined above, the baseline would
be self sufficient, most likely incremental #i would depend on
the baseline and all the incrementals #j (j < i), and the
throw-away would depend on everybody else.

The lines that start with a 'T' list objects in a pack that are
not referenced by anything else in the same pack (they are
typically branch heads and tags).  We can see that pack #0 has
one head commit and a tag in the above example.

This file always resides at a known location.   A client can do
something like this to slurp from a dumb server:

 (1) Fetch $repo/objects/info/pack file for the above
     information.

 (2) Look at T lines.  If you have all the objects listed there
     for a pack, and if your repository is not incomplete to begin
     with, you are not interested in that pack.  By definition, all
     things that are in that pack are reachable from one of those
     objects listed on the T lines, and you already have them.
     Otherwise, you _may_ be interested in that pack.

 (3) Download corresponding .idx files for the packs you are
     interested in.  Run "git show-index" to see if the heads/tags
     you are interested in appear in one of them (you found out
     about the heads/tags using "git ls-remote" earlier).  If you
     find a pack that contains objects you are interested in, look
     at D lines to make sure you have all the head objects from
     packs that this pack depends on; otherwise you need to slurp
     that depended-upon packs as well (needless to say, this goes
     recursive).

 (4) Download the packs you decided to pick in the previous
     step.  It is up to you if you unpack those packs, but if
     the upstream has it statically packed I would recommend
     against unpacking.  Next time around you can just look at
     the name of the pack and decide you already have that pack.

     On the other hand, keeping a throw-away packed may not make
     much sense.  You can unpack the throw-away and then run
     "git prune-packed" in your repository next time you get the
     pack info file from the repository, by noticing that the
     pack is gone from the remote repository already.

 (5) Fill the rest using the commit walker.

The initial client implementation which is _really_ dumb could
even skip steps (2) and (3) and choose to always download/sync
all available packs from the dumb server, and directly go to
step (5) to fall back on the commit walker.

I haven't written the client side, but all the rest that are
necessary to support the above will be sent to the list as
separate patches.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to