On Apr 18, 2011, at 7:46 PM, Ben Tilly wrote:

>> Given the non-atomic nature of reads and writes in Riak, it is likely
>> that neither of the two clients that wrote that data was in any way
>> aware of the existence of the other write.  This makes your suggestion
>> of escalating to the user impossible.  And there is no particular
>> reason to believe that the third user to come along will necessarily
>> know anything either.

Thanks for asking this, I thought the same thing and I've been meaning to ask 
the list what the response to this problem is. 

On Apr 18, 2011, at 8:05 PM, Sean Cribbs wrote:

> Sorry for being dismissive, I do understand what you're after. I'm just 
> saying that if your application needs those semantics, build them in -- don't 
> expect Riak's vector clocks to do the work for you.


It's not an unreasonable question given Riak's positioning (write-available 
key-value store with conflict *detection* and document support). A lot of 
people, myself included, are new to these types of datastores (vector clocks in 
particular) and it takes a while to understand what the different options 
provide. For example, Riak is listed right next to CouchDB in the wikipedia 
definition of multiversion concurrency control 
(http://en.wikipedia.org/wiki/Multiversion_concurrency_control). Anyone coming 
from an understanding of couch and/or is looking to get document conflict 
resolution via MVCC might make incorrect assumptions about what Riak provides. 

>  Keep a list of the most recent "change" events either in that object or 
> alongside, or keep a copy of the last-seen version in your object -- whatever 
> works to make those kinds of merges possible.


Thanks for clarifying :) The docs advertise vector clocks as a solution to 
detect conflicts with the goal of passing them up to the user, but it took me a 
while to come up with the options I had for helping the user with the conflict 
(standard approaches seem to take a slightly new light when your document is 
always write-able :)

If anyone's interested in Couch's experience and/or wants to play with adding 
document conflict resolution to their Riak, 
https://issues.apache.org/jira/browse/COUCHDB-988 is a must-read.

On Apr 18, 2011, at 9:01 PM, Eric Moritz wrote:

> There's a little demo app that was written by someone at Basho that
> demostrates a way to accomplish what you're talking about.
> 
> http://forms.basho.com/riak-in-action-wriaki-p/

Version control needs conflict resolution, but ex. the couch ticket I 
referenced additionally mentions pruning strategies and tradeoffs; a relevant 
part of conflict resolution, but not version control. You might have meant it 
as a good starting/reference point, but since in previous emails it seems as 
though his problem was misunderstood, I want to make sure to acknowledge that 
he's talking about optimal strategies for *just* conflict resolution.

Thanks all, happy to see this get discussed.

-Woody

(PS - sorry Sean for sending this to you twice, my reply-all habits depend 
entirely on whatever mailing list I'm frequenting. I do prefer non-mangling 
lists such as this :)

On Apr 18, 2011, at 8:05 PM, Sean Cribbs wrote:

> Sorry for being dismissive, I do understand what you're after. I'm just 
> saying that if your application needs those semantics, build them in -- don't 
> expect Riak's vector clocks to do the work for you. Keep a list of the most 
> recent "change" events either in that object or alongside, or keep a copy of 
> the last-seen version in your object -- whatever works to make those kinds of 
> merges possible.
> 
> Interestingly, multiple people have explored the SCM-on-top-of-Riak thing, so 
> I know it's doable; the key difference there is that multiple, independently 
> written objects are used to represent the history of a single conceptual 
> "object". Once written, nothing is overwritten, only new objects are created.
> 
> Sean Cribbs <s...@basho.com>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
> 
> On Apr 18, 2011, at 10:46 PM, Ben Tilly wrote:
> 
>> I'm not missing the point you think I am.  Riak already has the
>> ability to store more than one value for a key/value pair.  I'd like
>> an option - possibly named something new, that used this to store a
>> limited amount of history so that clients could be presented with a
>> common ancestor when that was required.
>> 
>> In the case that I gave you, if the common ancestor is:
>> 
>> {
>>   "name": "Jane Doe",
>>   "occupation": "secretary"
>> }
>> 
>> then a standard three-way merge would say that she got married and the
>> correct result should be:
>> 
>> {
>>   "name": "Jane Blow",
>>   "husband": "Joe Blow",
>>   "occupation": "n/a"
>> }
>> 
>> while if the common ancestor is:
>> 
>> {
>>   "name": "Jane Blow",
>>   "husband": "Joe Blow",
>>   "occupation": "n/a"
>> }
>> 
>> then a standard 3-way merge would say that she dumped the jerk and got
>> a job resulting in:
>> 
>> {
>>   "name": "Jane Doe",
>>   "occupation": "secretary"
>> }
>> 
>> Without the common ancestor you know what changed, but not which
>> direction the changes are going, and so have no sane way to resolve
>> the conflict.
>> 
>> Given the non-atomic nature of reads and writes in Riak, it is likely
>> that neither of the two clients that wrote that data was in any way
>> aware of the existence of the other write.  This makes your suggestion
>> of escalating to the user impossible.  And there is no particular
>> reason to believe that the third user to come along will necessarily
>> know anything either.
>> 
>> (Besides, I spent enough years maintaining batch systems to be wary of
>> escalating to users at the drop of a hat.  The "user" may well be a
>> complete moron on autopilot.)
>> 
>> On Mon, Apr 18, 2011 at 7:01 PM, Sean Cribbs <s...@basho.com> wrote:
>>> I think you're missing a key point here, and that is that the vector clock 
>>> doesn't store copies of the *values*, only the individual "touches" of 
>>> identified clients. I'm not sure what computing the common ancestor is 
>>> going to give you if you don't have the value.  Vector clocks are 
>>> essentially opaque to clients.
>>> 
>>> That said, I think the use-case you gave is one that can clearly bubble up 
>>> to the user, e.g. "Someone else changed this record while you were editing 
>>> it. Can you resolve the differences?" (Give the other person's name 
>>> perhaps, highlight the fields that are different.)
>>> 
>>> Sean Cribbs <s...@basho.com>
>>> Developer Advocate
>>> Basho Technologies, Inc.
>>> http://basho.com/
>>> 
>>> On Apr 18, 2011, at 9:12 PM, Ben Tilly wrote:
>>> 
>>>> Riak's small_vclock, big_vclock, young_vclock, and old_vclock
>>>> parameters already give control over pruning behavior.  If there isn't
>>>> enough history to compute a common ancestor, then return nothing for
>>>> the common ancestor.
>>>> 
>>>> The use case here really isn't an SCM.  The use case is when two
>>>> clients get simultaneous (within, say, 50 ms) requests to write to the
>>>> same object.  When a third one tries to read the data 5s later, it
>>>> would be nice to have a way to figure out what to do.  For this use
>>>> case you can limit the amount of history quite severely without loss.
>>>> 
>>>> Let's take a practical example of conflicting data structures:
>>>> 
>>>> {
>>>>   "name": "Jane Doe",
>>>>   "occupation": "n/a"
>>>> },
>>>> {
>>>>   "name": "Jane Blow",
>>>>   "husband": "Joe Blow",
>>>>   "occupation": "secretary"
>>>> }
>>>> 
>>>> What should it be resolved to?  Perhaps Jane just got divorced and
>>>> went to work as a secretary.  Or she could have gotten married and
>>>> left her job.  If you give me the common ancestor I can tell which
>>>> scenario to believe.  Without it I can only guess badly.  I don't want
>>>> to keep a history here.  I want to resolve the discrepancy the next
>>>> time I see it (and log it somewhere important if I can't resolve it).
>>>> 
>>>> On Mon, Apr 18, 2011 at 5:38 PM, Sean Cribbs <s...@basho.com> wrote:
>>>>> Yes, but vector clocks are for resolution of race-conditions and network 
>>>>> partitions, not to provide an SCM history.  Imagine how much space would 
>>>>> be consumed by the history long enough to disambiguate an object that has 
>>>>> been updated normally 1000 times, followed by one bad client that decides 
>>>>> write to it without fetching the vector clock first.
>>>>> 
>>>>> Coda Hale put it well in his talk at the recent Riak Meetup: your data 
>>>>> needs to be logically monotonic so that writes (and reads) can be retried 
>>>>> until resolution is reached.
>>>>> 
>>>>> Also, we've found that assigning the client id to something that is 
>>>>> relevant to your domain, e.g. real people, will help reduce surprises 
>>>>> (and degenerate cases like sibling explosion) when it comes to 
>>>>> vector-clock resolution.
>>>>> 
>>>>> Sean Cribbs <s...@basho.com>
>>>>> Developer Advocate
>>>>> Basho Technologies, Inc.
>>>>> http://basho.com/
>>>>> 
>>>>> On Apr 18, 2011, at 8:15 PM, Aphyr wrote:
>>>>> 
>>>>>>> I actually had a question about that page.  Why is it that when there
>>>>>>> is a conflict we can only get the conflicting versions of the data?
>>>>>>> If I'm going to try to resolve the conflict intelligently, I really
>>>>>>> want the common ancestor as well so that I can try to do a 3-way
>>>>>>> merge.
>>>>>> 
>>>>>> Good call. If an ancestor were available it would make counting and 
>>>>>> merging orthogonal changes *much* simpler.
>>>>>> 
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> riak-users@lists.basho.com
>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>> 
>>>>> 
>>> 
>>> 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to