On Tue, Nov 22, 2011 at 5:37 PM, Alex Besogonov <[email protected]> wrote: > I'm trying to understand the conflict resolution protocol of CouchDB > (the selection of the winning revision). So far I understand that > CouchDB does essentially this: > > 1) Finds the revision with the highest number and if there are no > other revisions with the same number then it is declared the winner. > 2) If there are several revisions with the same revision number, then > the one with the lowest revision ID is selected (Erlang's string > comparison function is used to find the lowest string). >
I'd avoid using the term "revision number" in this case because it denotes some sort of serial incrementing of a value. I'd also avoid calling it "conflict resolution" as it never attempts to resolve anything, it only identifies when one exists. The basic algorithm can be described as: "When multiple leaves in the revision tree exist in an undeleted state, there is a conflict. To choose which conflict 'wins' we first look for the revision with the number of edits (ie, deepest path from root). If multiple revisions have an equal depth we break the tie by arbitrary sorting criteria on the revision." It's actually a fairly simple algorithm with a weird implementation and, as you have found out, little to no documentation outside a few snippets here and there. > After the winner is found everything else is straightforward - > revision trees are aligned, conflicting revisions are stored, extra > revisions are stemmed, etc. > I remember thinking that before tearing my hair out over COUCHDB-1265. :D But yeah, once the general description of the algorithm exists its not impossible to read though the implementation and finally see it snap into focus. > I'm going to document all of my findings for the future developers who > might be interested to use CouchDB with other systems. > That would be awesome. I've been long meaning to rewrite the replication algorithm as documented Python code so that it would be more tenable for non-Erlangers to read. At it's core, its a rather simple thing but requires that people learn an unfamiliar language to navigate some of the finer details. Thanks for the effort
