Daniel Barkalow wrote:
On Sun, 10 Jul 2005, Dan Holmsand wrote:
Daniel Barkalow wrote:
If an individual file is not available, figure out what packs are
 available:

  Get the list of pack files the repository has
   (currently, I just use "e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135")
  For any packs we don't have, get the index files.

This part might be slightly expensive, for large repositories. If one assumes that packs are named as by git-repack-script, however, one might cache indexes we've already seen (again, see below). Or, if you go for the mandatory "pack-index-file", require that it has a reliable order, so that you can get the last added index first.


Nothing bad happens if you have index files for pack files you don't have,
as it turns out; the library ignores them. So we can keep the index files
around so we can quickly check if they have the objects we want. That way,
we don't have to worry about skipping something now (because it's not
needed) and then ignoring it when the branch gets merged in.

So what I actually do is make a list of the pack files that aren't already
downloaded that are available from the server, and download the index
files for any where the index file isn't downloaded, either.

Aah. In other words, you do the caching thing as well. It seems a little ugly, though, to store the index-only index files with the rest of the pack. It might be preferable to introduce something like $GIT_DIR/index-cache or something, so than it can be easily cleaned (and don't follow us around forever when cloning-by-hardlinking-the-entire-object-directory).

You might end up with quite a large number of index files, after a while though, if you pull from several repositories that are regularly repacked.

  Keep a list of the struct packed_gits for the packs the server has
   (these are not used as places to look for objects)

Each time we need an object, check the list for it. If it is in there,
 download the corresponding pack and report success.

Here you will need some strategy to deal with packs that overlap with what we've already got. Basically, small and overlapping packs should be unpacked, big and non-overlapping ones saved as is (since git-unpack-objects is painfully slow and memory-hungry...).


I don't think there's an issue to having overlapping packs, either with
each other or with separate objects. If the user wants, stuff can be
repacked outside of the pull operation (note, though, that the index files
should be truncated rather than removed, so that the program doesn't fetch
them again next time some object can't be found easily).

Well, the only issue is obviously waste of space. If you fetch a lot of branches from independently packed repos, it might mean a lot of waste, though.

About truncating index files: this seems a bit ugly. You get a file that doesn't contain what it says it contains, which may cause trouble if for example the git prune thing is used.

You might be better off with a simple list of index files we know we have all the objects of (and make sure that git-prune-script deletes this file, since it possibly breaks the contract).

One could also optimize the pack-download bit, by figuring out the last object in the pack that we need (easy enough to do from the index file), and just get the part of the pack file leading up to that object. That could be a huge win for independently packed repositories (I don't do that in my code below, though).


That's only possible if you can figure out what you want to have before
you get it. My code is walking the reachability graph on the client; it
can only figure out what other objects it needs after it's mapped the pack
file.

No, but we can find out which objects we *don't* want (i.e. the ones we have). And that may be a lot, e.g. if a repository is fully repacked, or if we track branches on several similar but independently packed repositories. And as far as I understand git-pack-objects, it tries to put recent objects in the front.

I don't have any numbers to back this up with, though. Some testing may be needed, but since the population of packed public repositories is 1, this is tricky...

I might use that method for listing the available packs, although I'd sort
of like to encourage a clean solution first.

Encouraging cleanliness is obviously a good thing :-)

/dan
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to