It seems that the document in notes/ did not make it clear what the actual problem is and how it applies to Subversion servers. Let me try to illustrate it.
Assume, we want to reconstruct a single file from the repository as part of a single request, then this is what we effectively do (omitting minor details): file_t repo_files[20] for i in 0..19 : repo_files[i].open("revs/$i") result = "" for i in 0..19 : result.combine(repo_files[i].read())) Now, if there were 50 requests for the *same* reconstructed data: file_t repo_files[50][20] for k in 0..49 parallel_do for i in 0..19 : repo_files[k][i].open("revs/$i") result[k] = "" for i in 0..19 : result[k].combine(repo_files[k][i].read())) Caches don't help if they don't contain the data: for k in 0..49 parallel_do result[k] = cache.lookup() // fails for all at the same time if result[k].missing then // moved sub-loops to sub-function for clarity result[k] = reconstruct(repo_files[k]) cache.add(result[k]) There are two major problems with that: (1) We process the same data 50 times while once would suffice. SVN-internal caches did not help; however the OS may only have read the data once and then fed us from disc cache. (2) We keep 1000 files open. On some systems, this may cause resource shortages. How likely this the above scenario in SVN? An operation like checkout may take many minutes to complete. The first client to do the c/o will read data from disk and populate the server caches. Any other client comming in later will be much faster since it gets fed from cache. If new c/o requests keep comming in before the first one completed, those extra requests have a good chance of "catching up" for the first one. In case like ra_svn that have a fully deterministic reporting order, all requests have a chance to gang up to the "50 requests scenario" above. And they will do it over and over for many files to come. With ra_serf, things are slightly more subtle, iff the clients should randomize their requests (not sure they do). For them, it is metadata (revprop packs, indexes) and data layout (temporal locality being corelated to spacial locality) that will see the issue - albeit in a more distributed fashon (e.g. 10 locations with 5 readers each instead of 1 with 50). The ideal solution / control flow would look like this: for k = 0 do result[k] = reconstruct(repo_files[k]) cache.add(result[k]) for k in 1..49 parallel_do result[k] = cache.lookup() Since we don't (can't?) coordinate requests on a global level, this is what we do on the thunder branch: for k in 0..49 parallel_do result[k] = cache.lookup() if result[k].missing then token = thunder.coordinate(data_location) if token.another_got_to_read then // all but the first result[k] = cache.lookup() if result[k].ok : jump done // >90% hit rate result[k] = reconstruct(repo_files[k]) cache.add(result[k]) thunder.completed(token) done So, there is no penalty on the hot path, i.e. when data can be found in the respective cache. The coordinating instance is also conceptually simple (keep a list of all accesses in flight) and the delay for the first thread is negligible. Concurrent threads reading the same location will be blocked until the initial thread completed its access. That minimizes the code churn on the calling side. A timeout prevents rouge threads from blocking the whole system. Also, entries that timed out will be removed from the access list. The rouge thread would have to start another relevant data access (and be the first) to block other threads for another time. My plan is to test the code on my server setup at home to get a more nuanced picture of what the performance and scalability impact is. On my SSD macbook, I get 70% less total CPU cost, 50% higher thoughput and smooth handling of 200 concurrent c/o (vs running out of file handles at 100 clients) over ra_svn. Should these trends be confirmed for "real" HW, networks and ra_serf, I'd love to see this code in 1.9 - after due review and feedback. -- Stefan^2.