On Wed, 2010-08-11 at 19:14 -0400, Johan Corveleyn wrote: > I naively thought that the server, upon being called get_file_revs2, > would just supply the deltas which it has already stored in the > repository. I.e. that the deltas are just the native format in which > the stuff is kept in the back-end FS, and the server wasn't doing much > else but iterate through the relevant files, and extract the relevant > bits.
The server doesn't have deltas between each revision and the next (or previous). Instead, it has "skip-deltas" which may point backward a large number of revisions. This allows any revision of a file to be reconstructed in O(log(n)) delta applications, where n is the number of file revisions, but it makes "what the server has lying around" even less useful for blame output. It's probably best to think of the FS as a black box which can produce any version of a file in reasonable time. If you look at svn_repos_get_file_revs2 in libsvn_repos/rev_hunt.c, you'll see the code which produces deltas to send to the client, using svn_fs_get_file_delta_stream. The required code changes for this kind of optimization would be fairly deep, I think. You'd have to invent a new type of "diffy" delta algorithm (either line-based or binary, but either way producing an edit stream rather than acting like a compression algorithm), and then parameterize a bunch of functions which produce deltas, and then have the server-side code produce diffy deltas, and then have the client code recognize when it's getting diffy deltas and behave more efficiently. If the new diffy-delta algorithm isn't format-compatible with the current encoding, you'd also need some protocol negotiation.