On Thu, 2015-02-19 at 23:57 -0700, Martin Fick wrote: > On Feb 19, 2015 5:42 PM, David Turner <dtur...@twopensource.com> wrote: > > > > On Fri, 2015-02-20 at 06:38 +0700, Duy Nguyen wrote: > > > > * 'git push'? > > > > > > This one is not affected by how deep your repo's history is, or how > > > wide your tree is, so should be quick.. > > > > > > Ah the number of refs may affect both git-push and git-pull. I think > > > Stefan knows better than I in this area. > > > > I can tell you that this is a bit of a problem for us at Twitter. We > > have over 100k refs, which adds ~20MiB of downstream traffic to every > > push. > > > > I added a hack to improve this locally inside Twitter: The client sends > > a bloom filter of shas that it believes that the server knows about; the > > server sends only the sha of master and any refs that are not in the > > bloom filter. The client uses its local version of the servers' refs > > as if they had just been sent. This means that some packs will be > > suboptimal, due to false positives in the bloom filter leading some new > > refs to not be sent. Also, if there were a repack between the pull and > > the push, some refs might have been deleted on the server; we repack > > rarely enough and pull frequently enough that this is hopefully not an > > issue. > > > > We're still testing to see if this works. But due to the number of > > assumptions it makes, it's probably not that great an idea for general > > use. > > Good to hear that others are starting to experiment with solutions to this > problem! I hope to hear more updates on this. > > I have a prototype of a simpler, and > I believe more robust solution, but aimed at a smaller use case I think. On > connecting, the client sends a sha of all its refs/shas as defined by a > refspec, which it also sends to the server, which it believes the server > might have the same refs/shas values for. The server can then calculate the > value of its refs/shas which meet the same refspec, and then omit sending > those refs if the "verification" sha matches, and instead send only a > confirmation that they matched (along with any refs outside of the refspec). > On a match, the client can inject the local values of the refs which met the > refspec and be guaranteed that they match the server's values. > > This optimization is aimed at the worst case scenario (and is thus the > potentially best case "compression"), when the client and server match for > all refs (a refs/* refspec) This is something that happens often on Gerrit > server startup, when it verifies that its mirrors are up-to-date. One reason > I chose this as a starting optimization, is because I think it is one use > case which will actually not benefit from "fixing" the git protocol to only > send relevant refs since all the refs are in fact relevant here! So something > like this will likely be needed in any future git protocol in order for it to > be efficient for this use case. And I believe this use case is likely to > stick around. > > With a minor tweak, this optimization should work when replicating actual > expected updates also by excluding the expected updating refs from the > verification so that the server always sends their values since they will > likely not match and would wreck the optimization. However, for this use > case it is not clear whether it is actually even worth caring about the non > updating refs? In theory the knowledge of the non updating refs can > potentially reduce the amount of data transmitted, but I suspect that as the > ref count increases, this has diminishing returns and mostly ends up chewing > up CPU and memory in a vain attempt to reduce network traffic.
For a more general solution, perhaps a log of ref updates could be used. Every time a ref is updated on the server, that ref would be written into an append-only log. Every time a client pulls, their pull data includes an index into that log. Then on push, the client could say, "I have refs as-of $index", and the server could read the log (or do something more-optimized) and send only refs updated since that index. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html