Ben,

There's a little demo app that was written by someone at Basho that
demostrates a way to accomplish what you're talking about.

http://forms.basho.com/riak-in-action-wriaki-p/

Eric.

On Mon, Apr 18, 2011 at 11:05 PM, Sean Cribbs <[email protected]> wrote:
> Sorry for being dismissive, I do understand what you're after. I'm just 
> saying that if your application needs those semantics, build them in -- don't 
> expect Riak's vector clocks to do the work for you. Keep a list of the most 
> recent "change" events either in that object or alongside, or keep a copy of 
> the last-seen version in your object -- whatever works to make those kinds of 
> merges possible.
>
> Interestingly, multiple people have explored the SCM-on-top-of-Riak thing, so 
> I know it's doable; the key difference there is that multiple, independently 
> written objects are used to represent the history of a single conceptual 
> "object". Once written, nothing is overwritten, only new objects are created.
>
> Sean Cribbs <[email protected]>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Apr 18, 2011, at 10:46 PM, Ben Tilly wrote:
>
>> I'm not missing the point you think I am.  Riak already has the
>> ability to store more than one value for a key/value pair.  I'd like
>> an option - possibly named something new, that used this to store a
>> limited amount of history so that clients could be presented with a
>> common ancestor when that was required.
>>
>> In the case that I gave you, if the common ancestor is:
>>
>>  {
>>    "name": "Jane Doe",
>>    "occupation": "secretary"
>>  }
>>
>> then a standard three-way merge would say that she got married and the
>> correct result should be:
>>
>>  {
>>    "name": "Jane Blow",
>>    "husband": "Joe Blow",
>>    "occupation": "n/a"
>>  }
>>
>> while if the common ancestor is:
>>
>>  {
>>    "name": "Jane Blow",
>>    "husband": "Joe Blow",
>>    "occupation": "n/a"
>>  }
>>
>> then a standard 3-way merge would say that she dumped the jerk and got
>> a job resulting in:
>>
>>  {
>>    "name": "Jane Doe",
>>    "occupation": "secretary"
>>  }
>>
>> Without the common ancestor you know what changed, but not which
>> direction the changes are going, and so have no sane way to resolve
>> the conflict.
>>
>> Given the non-atomic nature of reads and writes in Riak, it is likely
>> that neither of the two clients that wrote that data was in any way
>> aware of the existence of the other write.  This makes your suggestion
>> of escalating to the user impossible.  And there is no particular
>> reason to believe that the third user to come along will necessarily
>> know anything either.
>>
>> (Besides, I spent enough years maintaining batch systems to be wary of
>> escalating to users at the drop of a hat.  The "user" may well be a
>> complete moron on autopilot.)
>>
>> On Mon, Apr 18, 2011 at 7:01 PM, Sean Cribbs <[email protected]> wrote:
>>> I think you're missing a key point here, and that is that the vector clock 
>>> doesn't store copies of the *values*, only the individual "touches" of 
>>> identified clients. I'm not sure what computing the common ancestor is 
>>> going to give you if you don't have the value.  Vector clocks are 
>>> essentially opaque to clients.
>>>
>>> That said, I think the use-case you gave is one that can clearly bubble up 
>>> to the user, e.g. "Someone else changed this record while you were editing 
>>> it. Can you resolve the differences?" (Give the other person's name 
>>> perhaps, highlight the fields that are different.)
>>>
>>> Sean Cribbs <[email protected]>
>>> Developer Advocate
>>> Basho Technologies, Inc.
>>> http://basho.com/
>>>
>>> On Apr 18, 2011, at 9:12 PM, Ben Tilly wrote:
>>>
>>>> Riak's small_vclock, big_vclock, young_vclock, and old_vclock
>>>> parameters already give control over pruning behavior.  If there isn't
>>>> enough history to compute a common ancestor, then return nothing for
>>>> the common ancestor.
>>>>
>>>> The use case here really isn't an SCM.  The use case is when two
>>>> clients get simultaneous (within, say, 50 ms) requests to write to the
>>>> same object.  When a third one tries to read the data 5s later, it
>>>> would be nice to have a way to figure out what to do.  For this use
>>>> case you can limit the amount of history quite severely without loss.
>>>>
>>>> Let's take a practical example of conflicting data structures:
>>>>
>>>>  {
>>>>    "name": "Jane Doe",
>>>>    "occupation": "n/a"
>>>>  },
>>>>  {
>>>>    "name": "Jane Blow",
>>>>    "husband": "Joe Blow",
>>>>    "occupation": "secretary"
>>>>  }
>>>>
>>>> What should it be resolved to?  Perhaps Jane just got divorced and
>>>> went to work as a secretary.  Or she could have gotten married and
>>>> left her job.  If you give me the common ancestor I can tell which
>>>> scenario to believe.  Without it I can only guess badly.  I don't want
>>>> to keep a history here.  I want to resolve the discrepancy the next
>>>> time I see it (and log it somewhere important if I can't resolve it).
>>>>
>>>> On Mon, Apr 18, 2011 at 5:38 PM, Sean Cribbs <[email protected]> wrote:
>>>>> Yes, but vector clocks are for resolution of race-conditions and network 
>>>>> partitions, not to provide an SCM history.  Imagine how much space would 
>>>>> be consumed by the history long enough to disambiguate an object that has 
>>>>> been updated normally 1000 times, followed by one bad client that decides 
>>>>> write to it without fetching the vector clock first.
>>>>>
>>>>> Coda Hale put it well in his talk at the recent Riak Meetup: your data 
>>>>> needs to be logically monotonic so that writes (and reads) can be retried 
>>>>> until resolution is reached.
>>>>>
>>>>> Also, we've found that assigning the client id to something that is 
>>>>> relevant to your domain, e.g. real people, will help reduce surprises 
>>>>> (and degenerate cases like sibling explosion) when it comes to 
>>>>> vector-clock resolution.
>>>>>
>>>>> Sean Cribbs <[email protected]>
>>>>> Developer Advocate
>>>>> Basho Technologies, Inc.
>>>>> http://basho.com/
>>>>>
>>>>> On Apr 18, 2011, at 8:15 PM, Aphyr wrote:
>>>>>
>>>>>>> I actually had a question about that page.  Why is it that when there
>>>>>>> is a conflict we can only get the conflicting versions of the data?
>>>>>>> If I'm going to try to resolve the conflict intelligently, I really
>>>>>>> want the common ancestor as well so that I can try to do a 3-way
>>>>>>> merge.
>>>>>>
>>>>>> Good call. If an ancestor were available it would make counting and 
>>>>>> merging orthogonal changes *much* simpler.
>>>>>>
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> [email protected]
>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>
>>>>>
>>>
>>>
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to