Re: This sure looks like a bug...?

Ben Tilly Mon, 18 Apr 2011 19:46:42 -0700

I'm not missing the point you think I am.  Riak already has the
ability to store more than one value for a key/value pair.  I'd like
an option - possibly named something new, that used this to store a
limited amount of history so that clients could be presented with a
common ancestor when that was required.


In the case that I gave you, if the common ancestor is:

  {
    "name": "Jane Doe",
    "occupation": "secretary"
  }

then a standard three-way merge would say that she got married and the
correct result should be:

  {
    "name": "Jane Blow",
    "husband": "Joe Blow",
    "occupation": "n/a"
  }

while if the common ancestor is:

  {
    "name": "Jane Blow",
    "husband": "Joe Blow",
    "occupation": "n/a"
  }

then a standard 3-way merge would say that she dumped the jerk and got
a job resulting in:

  {
    "name": "Jane Doe",
    "occupation": "secretary"
  }

Without the common ancestor you know what changed, but not which
direction the changes are going, and so have no sane way to resolve
the conflict.

Given the non-atomic nature of reads and writes in Riak, it is likely
that neither of the two clients that wrote that data was in any way
aware of the existence of the other write.  This makes your suggestion
of escalating to the user impossible.  And there is no particular
reason to believe that the third user to come along will necessarily
know anything either.

(Besides, I spent enough years maintaining batch systems to be wary of
escalating to users at the drop of a hat.  The "user" may well be a
complete moron on autopilot.)

On Mon, Apr 18, 2011 at 7:01 PM, Sean Cribbs <s...@basho.com> wrote:
> I think you're missing a key point here, and that is that the vector clock 
> doesn't store copies of the *values*, only the individual "touches" of 
> identified clients. I'm not sure what computing the common ancestor is going 
> to give you if you don't have the value.  Vector clocks are essentially 
> opaque to clients.
>
> That said, I think the use-case you gave is one that can clearly bubble up to 
> the user, e.g. "Someone else changed this record while you were editing it. 
> Can you resolve the differences?" (Give the other person's name perhaps, 
> highlight the fields that are different.)
>
> Sean Cribbs <s...@basho.com>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Apr 18, 2011, at 9:12 PM, Ben Tilly wrote:
>
>> Riak's small_vclock, big_vclock, young_vclock, and old_vclock
>> parameters already give control over pruning behavior.  If there isn't
>> enough history to compute a common ancestor, then return nothing for
>> the common ancestor.
>>
>> The use case here really isn't an SCM.  The use case is when two
>> clients get simultaneous (within, say, 50 ms) requests to write to the
>> same object.  When a third one tries to read the data 5s later, it
>> would be nice to have a way to figure out what to do.  For this use
>> case you can limit the amount of history quite severely without loss.
>>
>> Let's take a practical example of conflicting data structures:
>>
>>  {
>>    "name": "Jane Doe",
>>    "occupation": "n/a"
>>  },
>>  {
>>    "name": "Jane Blow",
>>    "husband": "Joe Blow",
>>    "occupation": "secretary"
>>  }
>>
>> What should it be resolved to?  Perhaps Jane just got divorced and
>> went to work as a secretary.  Or she could have gotten married and
>> left her job.  If you give me the common ancestor I can tell which
>> scenario to believe.  Without it I can only guess badly.  I don't want
>> to keep a history here.  I want to resolve the discrepancy the next
>> time I see it (and log it somewhere important if I can't resolve it).
>>
>> On Mon, Apr 18, 2011 at 5:38 PM, Sean Cribbs <s...@basho.com> wrote:
>>> Yes, but vector clocks are for resolution of race-conditions and network 
>>> partitions, not to provide an SCM history.  Imagine how much space would be 
>>> consumed by the history long enough to disambiguate an object that has been 
>>> updated normally 1000 times, followed by one bad client that decides write 
>>> to it without fetching the vector clock first.
>>>
>>> Coda Hale put it well in his talk at the recent Riak Meetup: your data 
>>> needs to be logically monotonic so that writes (and reads) can be retried 
>>> until resolution is reached.
>>>
>>> Also, we've found that assigning the client id to something that is 
>>> relevant to your domain, e.g. real people, will help reduce surprises (and 
>>> degenerate cases like sibling explosion) when it comes to vector-clock 
>>> resolution.
>>>
>>> Sean Cribbs <s...@basho.com>
>>> Developer Advocate
>>> Basho Technologies, Inc.
>>> http://basho.com/
>>>
>>> On Apr 18, 2011, at 8:15 PM, Aphyr wrote:
>>>
>>>>> I actually had a question about that page.  Why is it that when there
>>>>> is a conflict we can only get the conflicting versions of the data?
>>>>> If I'm going to try to resolve the conflict intelligently, I really
>>>>> want the common ancestor as well so that I can try to do a 3-way
>>>>> merge.
>>>>
>>>> Good call. If an ancestor were available it would make counting and 
>>>> merging orthogonal changes *much* simpler.
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users@lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: This sure looks like a bug...?

Reply via email to