Branko Čibej wrote: > On 06.03.2015 12:13, Julian Foad wrote: [...] >> In order to build a sane merging system, we expect certain symmetries in >> the data model. [...] >> That generalizes to: >> >> * uniformity of the difference from branch1@r1 to branch2@r2 >> for any values of: branch1, r1, branch2, r2 >> where branch1 and branch2 are 'related' (formally: in the same branch >> family) >> >> * diffs obey some (more or less obvious) arithmetic rules such as: >> diff(A,B) (+) diff(B,C) == diff(A,C) >> diff(A,B) (+) diff(B,A) == nil >> and so on >> >> Not just roughly but precisely, testably so. > > http://darcs.net/Theory/GaneshPatchAlgebra > > You've heard of darcs, I take it. :)
Certainly! However, Darcs patch algebra is mostly concerned with how patches can be re-ordered (commuted, rebased) and additional semantic patch types (renaming all occurrences of an identifier) and how the system can take advantage of those properties. That's a step beyond what I'm talking about. What I am saying is so fundamental that Darcs and probably all DVCS's take it as an axiom. It is that we need to define a "versioned state" (of a branch), which is a "context" in Darcs terminology, and "difference" between versioned states, in a way that is independent of which branch(es) and revision(s) we're looking at. So I was sloppy in saying "some" arithmetic rules. Let me try being a bit more precise, but still not complete. (A, B, ...) are States; (d, e, ...) are Differences, and Differences are directed; (A:B) means the Difference from A to B; (A + d) means applying the Difference 'd' to state 'A'; (d & e) means sequential composition of Differences 'd' and then 'e'. * for any two States A and B the directed difference A:B exists * the 'nil' or 'identity' State is denoted '/': it consists of an empty directory with no properties; this directory acts as the root directory for the branch; a State never includes the name or path of this directory * any State 'A' can be represented by a Difference from nil-state to 'A': / + ((/):A) == A * thus, the set of Differences is a superset of the set of States * the 'nil' or 'identity' difference is denoted '(:)': A:A == (:) A + (:) == A * every difference is invertible: A:B is the inverse of B:A (:) is the inverse of (:) * the set of states is NOT closed under the operation of applying a difference: A + A:B == B, always; but... A + d => undefined in general * composition of differences is associative: ((A:B & B:C) & C:D) == (A:B & (B:C & C:D)) ((d & e) & f) == (d & (e & f)), iff (d & e) and (e & f) are both defined * composition of differences is NOT commutative, in general: (A:B & B:C) != (B:C & A:B) "Don't we already obey that?", you may ask. Here's a clue. There is no Subversion command to output a pure state of a branch, in its entirety and free of other information. ('svn export' + 'svn proplist -v' comes close, but not close enough as we'll see below.) Nor to output a pure difference. (Adding tree changes to 'svn diff' would be a start.) Nor to input a pure state or difference. Nor are there even APIs that exactly serve these purposes. To really see what's missing, we need to focus on the difficult parts which include: * mergeinfo * copies (what to do with 'copy-from' information) * move tracking An example. The move-tracking-2 branch was caught up to trunk@1665165 in r1665166. svn diff ^/subversion/trunk@1665165 ^/subversion/branches/move-tracking-2@1665166 [...] Property changes on: . ___________________________________________________________________ Modified: svn:mergeinfo ## -0,1 +0,1 ## Reverse-merged /subversion/branches/move-tracking-2:r1607334 Merged /subversion/trunk:r1606692-1663280,1663281-1665165* This appears to say, "In comparison to trunk, one move-tracking-2 branch commit was reverse-merged on move-tracking-2." That's wrong: rather, that revision was merged from the branch to trunk in the past and is currently common to both. And, "Lots of stuff from trunk was merged to the branch." That's wrong too. There *is* a difference in the mergeinfo properties that corresponds to what is printed here, but that difference in properties does not represent a difference in the versioned states of the branches. It is an implementation detail. We should have a diff that does not show mergeinfo implementation differences. Why? Not merely because seeing the mergeinfo is "a distraction" or "inconvenient", but because the "pure" difference between versioned states is a semantically useful concept in its own right. OK, we already knew about the leaky abstraction of mergeinfo. What else? Copy-from. This is a much deeper issue. Let's start by noting that our diff commands present different results depending on copy-from information, modified by flags such as '--show-copies-as-adds' and '--diff-copy-from'. There are many inconsistencies in these behaviours, some of which have been discussed and filed as bugs including: http://subversion.tigris.org/issues/show_bug.cgi?id=4455 http://subversion.tigris.org/issues/show_bug.cgi?id=4464 A very long email thread linked from #4455 discusses some of these inconsistencies, but fails to reach an understanding of the real problem and solution. Some of our differencing APIs don't present copy-from information on adds (svn_ra_do_diff3). Others can do (svn_ra_do_switch3), but what exactly does it mean? Both the left and right 'sides' of the diff may be trees that were copied from somewhere and modified with sub-tree copies. Does the copy-from of 'D/foo' on the right-hand side, say, mean the source of the most recent copy where this node was the copy-root; or the most recent copy that included this node even if the copy root was some parent directory; or only describe it as a copy if the copy target revision was revision DIFF_RIGHT itself; or what? So far we were probably thinking of a diff where the right-hand side is a newer revision of the same line of history as the left, but what if the right-hand side is in fact the older revision, do we apply different rules? And if both sides are at the same revision (say) but in different branches, then what? The underlying problem, I suggest, is that we haven't clearly defined how copying fits into the model of 'versioned state' and 'difference'. That's something I've been thinking on, and I have some ideas which I'll share. - Julian