On Wed, 2005-02-16 at 09:41 -0800, Per Bothner wrote:
> Daniel Berlin wrote:
> Huh?  I don't get this.  You sort filenames *in* the server *before*
> you generate diffs.  And you do the sorting within each directory;
> i.e. early before you do anything else.  What does the "streaming
> protocol" have to do with this?  Dogenerate teh stream in order,
> that's all?
> 
> How to do this depends on the internal data structures of the
> repository, and I realize that the possibility of renames does
> complicate those data structures.  But there has to be a data structure
> and api to navigate the file hierarchy - a "tree walker".  That
> tree walker should sort by filename before doing anything with
> the contents of a directory.
You assume the tree walker ever sees an entire directory at once, for
starters. I'm not sure it does. I may be wrong, i've not explored that
portion of the code in detail.
> 
> Yes, the sorting does cost some time, but sorting a directory is
> pretty fast.  You can do much better than quicksort: Put each filename
> in one of 27 buckets, one for each of A/a to Z/z, and one for "other",
> and the
> 

Patches welcome :)

I haven't looked in detail, but i don't believe it's near as simple as
you think, because the structure of how a diff happens isn't what you
think it is.
I believe someone is going to try to just store/walk the dirent hash in
a sorted order for now when possible, so that everything above the fs
level just gets handed this stuff in sorted order.


> No problem.  The client's request can include the current LOCALE value.
> However, I'm not sure that's derirable.  Obviously the charset and
> language used for filenames cannot be client-dependent.  The sort
> order could be client-dependent, but since it might not match the
> server language, I don't think it makes sense.  If I'm a German
> speaker working with repository containing English filenames, the
> sort order should be English, regardless of my LOCALE.

Protocol changes are no simple matter. We can't just arbitrarily break
the client/server protocol.
There needs to be a graceful fallback for older clients against newer
servers, and the reverse, at least until 2.0.

Anyway, i was just trying to throw you off track, because *I* have no
current plans to make diff output in sorted order :)

> 
> > In other words, so far the cost of trying to do it has
> > outweighed the benefit of having diffs appear in some well-specified
> > order.
> 
> Having output in a well-specified order is very important.  How else
> would I be able to compare two 'diff' runs otherwise?  How would I
> write a regression test for 'svn diff'?  Of course I can postprocess
> the output, but it's much more convenient and efficient to just sort
> each directory before diffing each file.
> 
> More generally, any listing that humans are expected to see should
> be sorted.  If you put it in random order people will wonder if there
> is a meaning to the order.  An 'ls' that doesn't by default sort the
> output is obviously Wrong.

It's not *random* order.
It's deterministic, and stable. It's just not sorted alphabetically.
Thus, you can compare and regression test svn diff output (which we
do) :)

> 
> Now I'm not suggesting this is a show-stopper issue, or that you should
> be responsible for fixing it.  But clearly, if svn output is not by
> default in a predictable output, that is most definitely a serious
> (but not critical) bug.  (It's ok to have a "don't sort" option to
> speed things up, but it shouldn't be the default.)

Again, patches are welcome :)
(I believe someone is working on it to some degree, soon)


Reply via email to