On Thu, 2015-02-19 at 23:57 -0700, Martin Fick wrote:
> On Feb 19, 2015 5:42 PM, David Turner <dtur...@twopensource.com> wrote:
> >
> > On Fri, 2015-02-20 at 06:38 +0700, Duy Nguyen wrote: 
> > > >    * 'git push'? 
> > > 
> > > This one is not affected by how deep your repo's history is, or how 
> > > wide your tree is, so should be quick.. 
> > > 
> > > Ah the number of refs may affect both git-push and git-pull. I think 
> > > Stefan knows better than I in this area. 
> >
> > I can tell you that this is a bit of a problem for us at Twitter.  We 
> > have over 100k refs, which adds ~20MiB of downstream traffic to every 
> > push. 
> >
> > I added a hack to improve this locally inside Twitter: The client sends 
> > a bloom filter of shas that it believes that the server knows about; the 
> > server sends only the sha of master and any refs that are not in the 
> > bloom filter.  The client  uses its local version of the servers' refs 
> > as if they had just been sent.  This means that some packs will be 
> > suboptimal, due to false positives in the bloom filter leading some new 
> > refs to not be sent.  Also, if there were a repack between the pull and 
> > the push, some refs might have been deleted on the server; we repack 
> > rarely enough and pull frequently enough that this is hopefully not an 
> > issue. 
> >
> > We're still testing to see if this works.  But due to the number of 
> > assumptions it makes, it's probably not that great an idea for general 
> > use. 
> 
> Good to hear that others are starting to experiment with solutions to this 
> problem!  I hope to hear more updates on this.
> 
> I have a prototype of a simpler, and
> I believe more robust solution, but aimed at a smaller use case I think.  On 
> connecting, the client sends a sha of all its refs/shas as defined by a 
> refspec, which it also sends to the server, which it believes the server 
> might have the same refs/shas values for.  The server can then calculate the 
> value of its refs/shas which meet the same refspec, and then omit sending 
> those refs if the "verification" sha matches, and instead send only a 
> confirmation that they matched (along with any refs outside of the refspec).  
> On a match, the client can inject the local values of the refs which met the 
> refspec and be guaranteed that they match the server's values.
> 
> This optimization is aimed at the worst case scenario (and is thus the 
> potentially best case "compression"), when the client and server match for 
> all refs (a refs/* refspec)  This is something that happens often on Gerrit 
> server startup, when it verifies that its mirrors are up-to-date.  One reason 
> I chose this as a starting optimization, is because I think it is one use 
> case which will actually not benefit from "fixing" the git protocol to only 
> send relevant refs since all the refs are in fact relevant here! So something 
> like this will likely be needed in any future git protocol in order for it to 
> be efficient for this use case.  And I believe this use case is likely to 
> stick around.
> 
> With a minor tweak, this optimization should work when replicating actual 
> expected updates also by excluding the expected updating refs from the 
> verification so that the server always sends their values since they will 
> likely not match and would wreck the optimization.  However, for this use 
> case it is not clear whether it is actually even worth caring about the non 
> updating refs?  In theory the knowledge of the non updating refs can 
> potentially reduce the amount of data transmitted, but I suspect that as the 
> ref count increases, this has diminishing returns and mostly ends up chewing 
> up CPU and memory in a vain attempt to reduce network traffic.

For a more general solution, perhaps a log of ref updates could be used.
Every time a ref is updated on the server, that ref would be written
into an append-only log.  Every time a client pulls, their pull data
includes an index into that log.  Then on push, the client could say, "I
have refs as-of $index", and the server could read the log (or do
something more-optimized) and send only refs updated since that index.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to