On Tue, Jun 8, 2021 at 5:58 PM Johan Corveleyn <jcor...@gmail.com> wrote: > On Tue, Jun 8, 2021 at 3:24 PM Stefan Sperling <s...@stsp.name> wrote: > > On Tue, Jun 08, 2021 at 02:57:34PM +0200, Johan Corveleyn wrote: > > > Okay, I focused on the first revision causing the annotate to differ, > > > and with some debug logging: > > > - p went up to 139 > > > - length[0]=1942, length[1]=1817 > > > > > > Now, 1942 lines on the left and 1817 on the right doesn't seem all > > > that many (that's what remains after prefix-suffix stripping). I see > > > no reason 'svn diff' shouldn't be able to calculate a minimal diff for > > > those sizes in a reasonable amount of time. So if p can go up to such > > > a "cost" for such "reasonable" lenghts, I imagine we should put the > > > actual limit much higher. I suppose we can just as well set "min_cost" > > > to 1024 or even higher. 64 is way too low. > > > > Thanks! > > I have set the value back to at least 1024 with this new version of patch. > > Hmmm, apparently I'm still running into the limit. Even with 8192 I'm > hitting it at least once. The actual diff there is not really pretty, > but it's an actual manual change made by some developer years ago, and > the "minimal diff" is more or less readable / understandable. The log > message is something like "Group sections related to multimedia > module", and it shuffles around a lot of xml sections to group them > together. > > In this case, length[0]==length[1]==46780. Pretty big for the LCS, but > not humongous. The value of 'p' goes up to 8279. > > With the limit set to 16384, I'm not hitting it, and the annotate > comes out as with the original. > > Now, I'm a bit puzzled why limit==8192 already takes 18s in your > tests. In my situation, with the above diff, I get: > limit==8192: 3.3s > limit==16384: 3.5s > > (that's including downloading both versions from the server over https) > > So I'm wondering why it takes so long in your case. One thing to keep > in mind here is that, since this is cpu intensive, optimizations might > matter. I remember back in 2011 that I did most of my measurements > with a "Release" build for example. But the above tests I did were > with a debug build, so ... dunno.
[ snip part about another optimization by Morten Kloster in r1140247, which seemed not to work for Stefan. ] Coming back to this thread from last year, just wondering if you want to pick this up again, Stefan (or anyone else), since a 1.15 release might be happening in the near future. I think we already concluded that changing this behaviour in a patch release would be hard to justify. But for 1.15 it might be acceptable in some form. Just wanted to bring this back up, I have no vested interest here. I'm still skeptical about changing the output of diff & blame in cases where we hit "this takes too much work", but I won't stand in the way if that what others consense on :-). If we want to pick a limit of the maximum effort, then I'd suggest 16384 (or above), but that's a purely personal / local suggestion, because the use case I had above would then not be impacted. BTW: it's a pity that we can's say: "limit the diff effort in case of 'update' or 'merge', but not in case of 'diff' or 'blame'". Because in the latter case, the user might care more about the actual output, and in the former they might not (until they hit a text conflict of course, then they will care again ...) It might be interesting if someone could take a more "profiling" look at why Stefan's example takes much more time, even with limit==8192 (18 seconds), compared to my example (3.3 seconds). Is that purely because of the total size of the files / number of lines? I find that difference quite strange. There might be something that can be optimized without changing behaviour. But I'm afraid I don't have time to dig deeply into this. -- Johan