On Mon, Mar 22, 2010 at 11:03 PM, Ivan Zhakov <i...@visualsvn.com> wrote: > On Tue, Mar 23, 2010 at 00:53, Mark Phippard <markp...@gmail.com> wrote: >> On Mon, Mar 22, 2010 at 5:14 PM, Ivan Zhakov <i...@visualsvn.com> wrote: >>> On Mon, Mar 22, 2010 at 20:37, <hwri...@apache.org> wrote: >>>> +Some other random stuff Hyrum would like to talk about: >>>> + * Why is merge slow (compared to $OTHER_SYSTEM)? >>>> + - Is it endemic to Subversion's architecture, or can it be fixed? >>> My opinion that merge is slow because it's client driven. Client >>> perform a lot of requests to decide what revisions and files to merge. >>> Just an idea: move this logic to server side and use slightly extended >>> reporter/editor to apply changes on client. >> >> Whether it is merge or blame or something else, the reason I have >> heard given in the past is that SVN was designed this way for >> scalability. The server was supposed to just serve up revisions and >> leave the more expensive parts for the client. Given the amount of >> RAM the client can spike to at times, I cannot see this ever scaling >> if it were done on the server. >> > Scalability is a good reason to move operations to client and I > understand how blame operation will impact server. But I don't see > reasons why merge should take more resource than update/switch/diff > operations. As I understand during merge we retrieve mergeinfo for > from several locations then perform some set math on them and apply > revisions to working tree.
I agree. I can certainly understand that general design principle, but I think in general the answer is: it depends. Obviously it pays off that the server does _some_ work, and doesn't shove everything off to the client (otherwise, the server could also just stream the entire repository to the client for every read operation, and let it sort out for itself which revisions it needs, and what parts of it ;), then it would hardly use any RAM on the server). So I think that, for every use case, one needs to carefully balance the scalability of the server against the efficiency and performance of the operation as a whole. This will mostly depend on the amount of memory and cpu power that is needed to do stuff on the server versus sending stuff to the client and letting him sort it out (asking additional stuff from the server in the process). It may actually even be the case that the "client does most of the work" approach is more costly for the server in the long run, because of all the extra interactions with the client when it needs additional stuff (maybe not in terms of maximum memory usage, but in terms of cpu, I/O with the repos back-end, and because the operation takes a long time so some amount of memory is tied up for a long time...). I'm no expert on mergeinfo, but I can imagine that some (parts) of the algorithms can be implemented quite scalably (is that a word?) on the server. Of course, I'm only guessing here. As for blame: sure, the current algorithm is way too heavy to put on the server. But I'm not convinced that it has to be that way. Maybe a faster, more efficient blame algorithm can change the equation. I don't have enough deep knowledge about it now, so I really couldn't say. But I don't rule it out a priori. Anyway, we'll see if we get there (the faster algo, I mean). Johan