On Fri, Apr 1, 2022 at 5:19 PM Stefan Sperling <s...@elego.de> wrote: > > On Fri, Apr 01, 2022 at 05:04:49PM +0200, Johan Corveleyn wrote: > > Yes, I suppose this is the case: Patience feeds different (smaller) > > things to LCS. Because, as far as I understand, Myers' way of > > calculating the LCS is fundamentally "somewhat" quadratic (according > > to the Myers paper from 1986 [1], titled "An O(ND) Difference > > Algorithm and Its Variations"). > > > > That is: O(ND) where N = the sum of the lengths of both sides of the > > diff and D = the size of the minimal diff. That's why eliminating the > > common prefix and suffix (added as an optimization years ago) is > > useful, it makes N much smaller. > > > > Now, in cases where both texts are huge (even after prefix-suffix > > stripping), but the minimal diff is small, the algorithm should in > > theory be fast (because D is small). Just not sure what is going wrong > > then in our variation of this algorithm. Or have we implemented > > another algorithm than the one described by Myers? > > SVN diff is allegedly "based on the approach described by ... Meyers (sic), > [...] but has been modified for better performance." > I guess the modifications refer to prefix/suffix scanning, > because this description dates from r1128862.
Hm, not the prefix/suffix scanning, as I implemented those (it was my first big patch / work in SVN), and that's outside of the LCS algorithm (it happens in a phase before assembling "tokens (=lines)" and before handing it off to the LCS algorithm). It was integrated into trunk in r1067800 (and some further fixes followed afterwards). Basically, that work is done in libsvn_diff/diff_file.c#find_identical_{prefix,suffix}. I suppose the "modified for better performance" refers to some optimisations done by Morten Kloster, who then later submitted the patch adding this comment in r1128862. His optimisations were more related to the LCS algorithm (further reducing the problem space, and providing early exits in certain cases or some such). The core LCS algorithm in libsvn_diff/lcs.c was from way before my time, and according to 'svn blame' was mainly written by Sander Striker. I don't understand it fully either :-), but I always assumed it was basically some clever implementation of Myers' algorithm. -- Johan