My understanding of the situation is as follows. I'm not adding much to what Philip and Stefan have already diagnosed, I just want to say how I see it as well, and allow others to confirm it or correct me if necessary.
First we need to be clear about 'change' and 'difference'. There has been some confusion in the past due to the fact that the FS API can record and report a 'change' even when there is no difference in content, conveying the information that the two path-revs of interest are represented by different node-revs. However, it is important to notice that this distinction is only meaningful when comparing two path-revs on the same line of history. That is the only case where they can possibly be the same node-rev. When comparing path-revs on different 'branches' (copies), which is the case in the 'blame' scenario of interest, then the node-revs are necessarily different and the only interesting question is whether the content differs. The client-side 'blame -g' code was written in such a way that it happened to rely on an undocumented detail of the report from svn_repos_get_file_revs2(). Specifically, it assumed that the report includes a text delta, even if an empty one, whenever it switches from reporting a merged version to reporting a version from the main line of history. If such a text delta were not present, the old blame code would not 'notice' that the report had switched from a merged to a main-line version, and would produce garbled output such as reporting a revision number against a line that did not exist in that revision. Note that there is no meaning attached to the presence of an empty text delta, as opposed to no text delta. Both cases mean there is no difference in the text, and therefore the blame should not report any text as changed in that new revision. The client-side blame code just happened to assume that the text delta syntactic element was always present in the merged-to-mainline transition, and relied on its presence to trigger some logic which could and should have been triggered another way. (Changed in r1570936 to no longer rely on it.) The report from svn_repos_get_file_revs2() happened to include such deltas, because of an undocumented detail of svn_fs_contents_changed() which happened to report 'changed' in such cases even when there was no difference. As Stefan says, the reason for the underlying change was determinism, not optimization. The general false-positives behaviour of svn_fs_contents_changed() was not useful to any sane caller. That is why all callers were changed to use the new 'strict' function svn_fs_contents_different() in r1572363 and r1573111. Now we're going to have a behaviour change (yes, a bug) for old clients accessing new servers if we don't do something to restore exact compatibility at the client-server protocol level. As Philip says: > However the implementation of root_vtable_t.contents_changed() has > changed in 1.9 and it no longer reports as many changes as it used to (Clarification: it no longer reports as many false positive results for non-changes as it used to.) So just switching the call in svn_repos_get_file_revs2() back to using the _changed() function doesn't solve the problem. Would it be feasible to restore exactly (or enough of) the previous behaviour of _changed()? I don't know. In theory, there may be other direct callers of _changed() that depend on its exact false-positives behaviour. However, we don't know of any besides get_file_revs. Stefan Fuhrmann wrote: > Branko Čibej wrote: >> Philip Martin wrote: >>> Philip Martin writes: >>>> However the implementation of root_vtable_t.contents_changed() has >>>> changed in 1.9 and it no longer reports as many changes as it used to >>>> report so even replacing the _different() call with a _changed() call >>>> does not solve the problem. The only way I can see to fix the problem >>>> is to simply ignore/remove the call altogether and to hard-code >>>> contents_changed to TRUE. In other words, make svn_repos_get_file_revs2() give a text delta in all cases. If that makes the old blame code work again as well as it did before (I don't say 'correctly'), that is probably a good solution, as it doesn't sound like it would add much overhead or cause any other known problems. Of course that is bodging the behaviour to suit one particular caller that we know about. Other callers may exist that depend on some different behavioural details, but it's better to fix the breakage we know about. - Julian