Sorry for being dismissive, I do understand what you're after. I'm just saying 
that if your application needs those semantics, build them in -- don't expect 
Riak's vector clocks to do the work for you. Keep a list of the most recent 
"change" events either in that object or alongside, or keep a copy of the 
last-seen version in your object -- whatever works to make those kinds of 
merges possible.

Interestingly, multiple people have explored the SCM-on-top-of-Riak thing, so I 
know it's doable; the key difference there is that multiple, independently 
written objects are used to represent the history of a single conceptual 
"object". Once written, nothing is overwritten, only new objects are created.

Sean Cribbs <s...@basho.com>
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Apr 18, 2011, at 10:46 PM, Ben Tilly wrote:

> I'm not missing the point you think I am.  Riak already has the
> ability to store more than one value for a key/value pair.  I'd like
> an option - possibly named something new, that used this to store a
> limited amount of history so that clients could be presented with a
> common ancestor when that was required.
> 
> In the case that I gave you, if the common ancestor is:
> 
>  {
>    "name": "Jane Doe",
>    "occupation": "secretary"
>  }
> 
> then a standard three-way merge would say that she got married and the
> correct result should be:
> 
>  {
>    "name": "Jane Blow",
>    "husband": "Joe Blow",
>    "occupation": "n/a"
>  }
> 
> while if the common ancestor is:
> 
>  {
>    "name": "Jane Blow",
>    "husband": "Joe Blow",
>    "occupation": "n/a"
>  }
> 
> then a standard 3-way merge would say that she dumped the jerk and got
> a job resulting in:
> 
>  {
>    "name": "Jane Doe",
>    "occupation": "secretary"
>  }
> 
> Without the common ancestor you know what changed, but not which
> direction the changes are going, and so have no sane way to resolve
> the conflict.
> 
> Given the non-atomic nature of reads and writes in Riak, it is likely
> that neither of the two clients that wrote that data was in any way
> aware of the existence of the other write.  This makes your suggestion
> of escalating to the user impossible.  And there is no particular
> reason to believe that the third user to come along will necessarily
> know anything either.
> 
> (Besides, I spent enough years maintaining batch systems to be wary of
> escalating to users at the drop of a hat.  The "user" may well be a
> complete moron on autopilot.)
> 
> On Mon, Apr 18, 2011 at 7:01 PM, Sean Cribbs <s...@basho.com> wrote:
>> I think you're missing a key point here, and that is that the vector clock 
>> doesn't store copies of the *values*, only the individual "touches" of 
>> identified clients. I'm not sure what computing the common ancestor is going 
>> to give you if you don't have the value.  Vector clocks are essentially 
>> opaque to clients.
>> 
>> That said, I think the use-case you gave is one that can clearly bubble up 
>> to the user, e.g. "Someone else changed this record while you were editing 
>> it. Can you resolve the differences?" (Give the other person's name perhaps, 
>> highlight the fields that are different.)
>> 
>> Sean Cribbs <s...@basho.com>
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>> 
>> On Apr 18, 2011, at 9:12 PM, Ben Tilly wrote:
>> 
>>> Riak's small_vclock, big_vclock, young_vclock, and old_vclock
>>> parameters already give control over pruning behavior.  If there isn't
>>> enough history to compute a common ancestor, then return nothing for
>>> the common ancestor.
>>> 
>>> The use case here really isn't an SCM.  The use case is when two
>>> clients get simultaneous (within, say, 50 ms) requests to write to the
>>> same object.  When a third one tries to read the data 5s later, it
>>> would be nice to have a way to figure out what to do.  For this use
>>> case you can limit the amount of history quite severely without loss.
>>> 
>>> Let's take a practical example of conflicting data structures:
>>> 
>>>  {
>>>    "name": "Jane Doe",
>>>    "occupation": "n/a"
>>>  },
>>>  {
>>>    "name": "Jane Blow",
>>>    "husband": "Joe Blow",
>>>    "occupation": "secretary"
>>>  }
>>> 
>>> What should it be resolved to?  Perhaps Jane just got divorced and
>>> went to work as a secretary.  Or she could have gotten married and
>>> left her job.  If you give me the common ancestor I can tell which
>>> scenario to believe.  Without it I can only guess badly.  I don't want
>>> to keep a history here.  I want to resolve the discrepancy the next
>>> time I see it (and log it somewhere important if I can't resolve it).
>>> 
>>> On Mon, Apr 18, 2011 at 5:38 PM, Sean Cribbs <s...@basho.com> wrote:
>>>> Yes, but vector clocks are for resolution of race-conditions and network 
>>>> partitions, not to provide an SCM history.  Imagine how much space would 
>>>> be consumed by the history long enough to disambiguate an object that has 
>>>> been updated normally 1000 times, followed by one bad client that decides 
>>>> write to it without fetching the vector clock first.
>>>> 
>>>> Coda Hale put it well in his talk at the recent Riak Meetup: your data 
>>>> needs to be logically monotonic so that writes (and reads) can be retried 
>>>> until resolution is reached.
>>>> 
>>>> Also, we've found that assigning the client id to something that is 
>>>> relevant to your domain, e.g. real people, will help reduce surprises (and 
>>>> degenerate cases like sibling explosion) when it comes to vector-clock 
>>>> resolution.
>>>> 
>>>> Sean Cribbs <s...@basho.com>
>>>> Developer Advocate
>>>> Basho Technologies, Inc.
>>>> http://basho.com/
>>>> 
>>>> On Apr 18, 2011, at 8:15 PM, Aphyr wrote:
>>>> 
>>>>>> I actually had a question about that page.  Why is it that when there
>>>>>> is a conflict we can only get the conflicting versions of the data?
>>>>>> If I'm going to try to resolve the conflict intelligently, I really
>>>>>> want the common ancestor as well so that I can try to do a 3-way
>>>>>> merge.
>>>>> 
>>>>> Good call. If an ancestor were available it would make counting and 
>>>>> merging orthogonal changes *much* simpler.
>>>>> 
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> riak-users@lists.basho.com
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>> 
>>>> 
>> 
>> 


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to