Re: About Vector Clocks

Kresten Krab Thorup Mon, 24 Jan 2011 05:07:11 -0800

OK, found it, ... if you want to follow my train of though, feel free to read 
along ...

Turns out vector clocks can be merged, even if one is not derived from the 
other. A vector clock is a list of {agent-id, time} pairs, which has no 
duplicate agent-id's in the list.

Merging is done by

        - including all clock entries that are only in one of the clocks, and
        - including the "latest" (greater) of time/clock entries that are in 
both.

This operation is commutative and transitive, and does not introduce any new 
information so it can perfectly be done on two distinct nodes (which may 
realize the conflict independently) and then later re-reconciled.

Reconciling the values is a simple operation of computing the union of the 
values (i.e. remove duplicates in the combined set).  Notice that if two agents 
did the same update, then the reconciled value is just a single value.

so .. the reconciled object ( vclock x [value] ) is in conflict exactly iff 
there is more than one value.  

When a client sees a multi-valued object, all it knows is that it in conflict.  
It can't say which agents caused the conflict, or when in terms of vector-clock 
time.  Only that it happened "before" the new vector clock time.

And it's important to develop some patterns or strategies for how to resolve 
conflicts.  Because this is one of the issues that most developers think are 
scary when they hear of these concepts.

Perhaps what I'd like to know is information like ... 

        "Peter updated the object to A at 15:00, and Eve updated it to B at 
14:58"

Such information may be useful to a client in order to handle the conflict.  

If you need such information, you'll have to push it through X-Riak-Meta- 
headers; because the meta data is part of the value.

Hmm, ... cool stuff.  Love learning about this.

Kresten

On Jan 24, 2011, at 11:24 , Kresten Krab Thorup wrote:

> Still trying to grok vector clocks, here's a question.
> 
> When riak realizes that two peers did successfully update the same key, those 
> two riak_object's will have different (one not deriving from the other) 
> vector clocks.  This could happen when e.g. synchronizing using long-haul 
> replication.
> 
> When these are then consolidated into a single multi-valued object, how is 
> the vector clock of the resulting object computed?  
> 
> In other words, why does it keep the two vclocks, one per value?  
> 
> I would think that a riak_object should be structured thus:
> 
>       object ::= { bucket,  key,  [ vclock x value ] }
> 
> But, it is structured like this:
> 
>       object ::= { bucket,  key,  vclock,  [ value ] }
> 
> This structure implies that a new vclock is computed by an agent doing the 
> merge [involving a client id of that agent], but what happens if two agents 
> independently realizes the conflict and tries to do the merge?  Will those 
> two independent merges have conflicting vector clocks?
> 
> So why not keep both vector clocks around, so that independent agents can do 
> merges that will then be conciliable because no new information is created 
> when merging?
> 
> Kresten
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: About Vector Clocks

Reply via email to