On Fri, Apr 01, 2022 at 05:04:49PM +0200, Johan Corveleyn wrote:
> Yes, I suppose this is the case: Patience feeds different (smaller)
> things to LCS. Because, as far as I understand, Myers' way of
> calculating the LCS is fundamentally "somewhat" quadratic (according
> to the Myers paper from 1986 [1], titled "An O(ND) Difference
> Algorithm and Its Variations").
> 
> That is: O(ND) where N = the sum of the lengths of both sides of the
> diff and D = the size of the minimal diff. That's why eliminating the
> common prefix and suffix (added as an optimization years ago) is
> useful, it makes N much smaller.
> 
> Now, in cases where both texts are huge (even after prefix-suffix
> stripping), but the minimal diff is small, the algorithm should in
> theory be fast (because D is small). Just not sure what is going wrong
> then in our variation of this algorithm. Or have we implemented
> another algorithm than the one described by Myers?

SVN diff is allegedly "based on the approach described by ... Meyers (sic),
[...]  but has been modified for better performance."
I guess the modifications refer to prefix/suffix scanning,
because this description dates from r1128862.

For gameoftrees Neels re-implemented Myers from scratch, using the original
paper and some articles found on the web as a reference. This code looks
very different to SVN diff. But those differences could be due to
implementation style. I cannot judge that. Neels code makes more sense to
me and is exhaustively commented. The SVN diff code is not nearly as
readable from my point of view.

Neels diff code was developed stand-alone in a separate Git repository,
with the intention that it could be re-used outside of gameoftrees:
https://git.gameoftrees.org/gitweb/?p=diff.git;a=summary
If someone eventually ends up looking at rewriting SVN diff's code from
scratch, then this code would be a good starting point (and the licence
is compatible). Neels is an inactive SVN committer, but can be reached
and might be able to provide assistance if required.

Reply via email to