On Mon, Feb 16, 2015 at 07:31:33AM -0800, David Lang wrote: > >Then the server streams the data to the client. It might do some light > >work transforming the data as it comes off the disk, but most of it is > >just blitted straight from disk, and the network is the bottleneck. > > Depending on how close to full the WAN link is, it may be possible to > improve this with multiple connections (again, referencing bbcp), but > there's also the question of if it's worth trying to use the entire WAN for > a single user. The vast majority of the time the server is doing more than > one thing and would rather let any individual user wait a bit and service > the other users.
Yeah, I have seen clients that make multiple TCP connections to each request a chunk of a file in parallel. The short answer is that this is going to be very hard with git. Each clone generates the pack on the fly based on what's on disk and streams it out. It should _usually_ be the same, but there's nothing to guarantee byte-for-byte equality between invocations. So you'd have to multiplex all of the connections into the same server process. And even then it's hard; that process knows its going to send you byte the bytes for object X, but it doesn't know at exactly which offset until it gets there, which makes sending things out of order tricky. And the whole output is checksummed by a single sha1 over the whole stream that comes at the end. I think the most feasible thing would be to quickly spool it to a server on the LAN, and then use an existing fetch-in-parallel tool to grab it from there over the WAN. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html