On Wed, 2005-02-16 at 09:41 -0800, Per Bothner wrote: > Daniel Berlin wrote: > Huh? I don't get this. You sort filenames *in* the server *before* > you generate diffs. And you do the sorting within each directory; > i.e. early before you do anything else. What does the "streaming > protocol" have to do with this? Dogenerate teh stream in order, > that's all? > > How to do this depends on the internal data structures of the > repository, and I realize that the possibility of renames does > complicate those data structures. But there has to be a data structure > and api to navigate the file hierarchy - a "tree walker". That > tree walker should sort by filename before doing anything with > the contents of a directory.
You assume the tree walker ever sees an entire directory at once, for starters. I'm not sure it does. I may be wrong, i've not explored that portion of the code in detail. > > Yes, the sorting does cost some time, but sorting a directory is > pretty fast. You can do much better than quicksort: Put each filename > in one of 27 buckets, one for each of A/a to Z/z, and one for "other", > and the > Patches welcome :) I haven't looked in detail, but i don't believe it's near as simple as you think, because the structure of how a diff happens isn't what you think it is. I believe someone is going to try to just store/walk the dirent hash in a sorted order for now when possible, so that everything above the fs level just gets handed this stuff in sorted order. > No problem. The client's request can include the current LOCALE value. > However, I'm not sure that's derirable. Obviously the charset and > language used for filenames cannot be client-dependent. The sort > order could be client-dependent, but since it might not match the > server language, I don't think it makes sense. If I'm a German > speaker working with repository containing English filenames, the > sort order should be English, regardless of my LOCALE. Protocol changes are no simple matter. We can't just arbitrarily break the client/server protocol. There needs to be a graceful fallback for older clients against newer servers, and the reverse, at least until 2.0. Anyway, i was just trying to throw you off track, because *I* have no current plans to make diff output in sorted order :) > > > In other words, so far the cost of trying to do it has > > outweighed the benefit of having diffs appear in some well-specified > > order. > > Having output in a well-specified order is very important. How else > would I be able to compare two 'diff' runs otherwise? How would I > write a regression test for 'svn diff'? Of course I can postprocess > the output, but it's much more convenient and efficient to just sort > each directory before diffing each file. > > More generally, any listing that humans are expected to see should > be sorted. If you put it in random order people will wonder if there > is a meaning to the order. An 'ls' that doesn't by default sort the > output is obviously Wrong. It's not *random* order. It's deterministic, and stable. It's just not sorted alphabetically. Thus, you can compare and regression test svn diff output (which we do) :) > > Now I'm not suggesting this is a show-stopper issue, or that you should > be responsible for fixing it. But clearly, if svn output is not by > default in a predictable output, that is most definitely a serious > (but not critical) bug. (It's ok to have a "don't sort" option to > speed things up, but it shouldn't be the default.) Again, patches are welcome :) (I believe someone is working on it to some degree, soon)