On Apr 18, 2011, at 7:46 PM, Ben Tilly wrote: >> Given the non-atomic nature of reads and writes in Riak, it is likely >> that neither of the two clients that wrote that data was in any way >> aware of the existence of the other write. This makes your suggestion >> of escalating to the user impossible. And there is no particular >> reason to believe that the third user to come along will necessarily >> know anything either.
Thanks for asking this, I thought the same thing and I've been meaning to ask the list what the response to this problem is. On Apr 18, 2011, at 8:05 PM, Sean Cribbs wrote: > Sorry for being dismissive, I do understand what you're after. I'm just > saying that if your application needs those semantics, build them in -- don't > expect Riak's vector clocks to do the work for you. It's not an unreasonable question given Riak's positioning (write-available key-value store with conflict *detection* and document support). A lot of people, myself included, are new to these types of datastores (vector clocks in particular) and it takes a while to understand what the different options provide. For example, Riak is listed right next to CouchDB in the wikipedia definition of multiversion concurrency control (http://en.wikipedia.org/wiki/Multiversion_concurrency_control). Anyone coming from an understanding of couch and/or is looking to get document conflict resolution via MVCC might make incorrect assumptions about what Riak provides. > Keep a list of the most recent "change" events either in that object or > alongside, or keep a copy of the last-seen version in your object -- whatever > works to make those kinds of merges possible. Thanks for clarifying :) The docs advertise vector clocks as a solution to detect conflicts with the goal of passing them up to the user, but it took me a while to come up with the options I had for helping the user with the conflict (standard approaches seem to take a slightly new light when your document is always write-able :) If anyone's interested in Couch's experience and/or wants to play with adding document conflict resolution to their Riak, https://issues.apache.org/jira/browse/COUCHDB-988 is a must-read. On Apr 18, 2011, at 9:01 PM, Eric Moritz wrote: > There's a little demo app that was written by someone at Basho that > demostrates a way to accomplish what you're talking about. > > http://forms.basho.com/riak-in-action-wriaki-p/ Version control needs conflict resolution, but ex. the couch ticket I referenced additionally mentions pruning strategies and tradeoffs; a relevant part of conflict resolution, but not version control. You might have meant it as a good starting/reference point, but since in previous emails it seems as though his problem was misunderstood, I want to make sure to acknowledge that he's talking about optimal strategies for *just* conflict resolution. Thanks all, happy to see this get discussed. -Woody (PS - sorry Sean for sending this to you twice, my reply-all habits depend entirely on whatever mailing list I'm frequenting. I do prefer non-mangling lists such as this :) On Apr 18, 2011, at 8:05 PM, Sean Cribbs wrote: > Sorry for being dismissive, I do understand what you're after. I'm just > saying that if your application needs those semantics, build them in -- don't > expect Riak's vector clocks to do the work for you. Keep a list of the most > recent "change" events either in that object or alongside, or keep a copy of > the last-seen version in your object -- whatever works to make those kinds of > merges possible. > > Interestingly, multiple people have explored the SCM-on-top-of-Riak thing, so > I know it's doable; the key difference there is that multiple, independently > written objects are used to represent the history of a single conceptual > "object". Once written, nothing is overwritten, only new objects are created. > > Sean Cribbs <s...@basho.com> > Developer Advocate > Basho Technologies, Inc. > http://basho.com/ > > On Apr 18, 2011, at 10:46 PM, Ben Tilly wrote: > >> I'm not missing the point you think I am. Riak already has the >> ability to store more than one value for a key/value pair. I'd like >> an option - possibly named something new, that used this to store a >> limited amount of history so that clients could be presented with a >> common ancestor when that was required. >> >> In the case that I gave you, if the common ancestor is: >> >> { >> "name": "Jane Doe", >> "occupation": "secretary" >> } >> >> then a standard three-way merge would say that she got married and the >> correct result should be: >> >> { >> "name": "Jane Blow", >> "husband": "Joe Blow", >> "occupation": "n/a" >> } >> >> while if the common ancestor is: >> >> { >> "name": "Jane Blow", >> "husband": "Joe Blow", >> "occupation": "n/a" >> } >> >> then a standard 3-way merge would say that she dumped the jerk and got >> a job resulting in: >> >> { >> "name": "Jane Doe", >> "occupation": "secretary" >> } >> >> Without the common ancestor you know what changed, but not which >> direction the changes are going, and so have no sane way to resolve >> the conflict. >> >> Given the non-atomic nature of reads and writes in Riak, it is likely >> that neither of the two clients that wrote that data was in any way >> aware of the existence of the other write. This makes your suggestion >> of escalating to the user impossible. And there is no particular >> reason to believe that the third user to come along will necessarily >> know anything either. >> >> (Besides, I spent enough years maintaining batch systems to be wary of >> escalating to users at the drop of a hat. The "user" may well be a >> complete moron on autopilot.) >> >> On Mon, Apr 18, 2011 at 7:01 PM, Sean Cribbs <s...@basho.com> wrote: >>> I think you're missing a key point here, and that is that the vector clock >>> doesn't store copies of the *values*, only the individual "touches" of >>> identified clients. I'm not sure what computing the common ancestor is >>> going to give you if you don't have the value. Vector clocks are >>> essentially opaque to clients. >>> >>> That said, I think the use-case you gave is one that can clearly bubble up >>> to the user, e.g. "Someone else changed this record while you were editing >>> it. Can you resolve the differences?" (Give the other person's name >>> perhaps, highlight the fields that are different.) >>> >>> Sean Cribbs <s...@basho.com> >>> Developer Advocate >>> Basho Technologies, Inc. >>> http://basho.com/ >>> >>> On Apr 18, 2011, at 9:12 PM, Ben Tilly wrote: >>> >>>> Riak's small_vclock, big_vclock, young_vclock, and old_vclock >>>> parameters already give control over pruning behavior. If there isn't >>>> enough history to compute a common ancestor, then return nothing for >>>> the common ancestor. >>>> >>>> The use case here really isn't an SCM. The use case is when two >>>> clients get simultaneous (within, say, 50 ms) requests to write to the >>>> same object. When a third one tries to read the data 5s later, it >>>> would be nice to have a way to figure out what to do. For this use >>>> case you can limit the amount of history quite severely without loss. >>>> >>>> Let's take a practical example of conflicting data structures: >>>> >>>> { >>>> "name": "Jane Doe", >>>> "occupation": "n/a" >>>> }, >>>> { >>>> "name": "Jane Blow", >>>> "husband": "Joe Blow", >>>> "occupation": "secretary" >>>> } >>>> >>>> What should it be resolved to? Perhaps Jane just got divorced and >>>> went to work as a secretary. Or she could have gotten married and >>>> left her job. If you give me the common ancestor I can tell which >>>> scenario to believe. Without it I can only guess badly. I don't want >>>> to keep a history here. I want to resolve the discrepancy the next >>>> time I see it (and log it somewhere important if I can't resolve it). >>>> >>>> On Mon, Apr 18, 2011 at 5:38 PM, Sean Cribbs <s...@basho.com> wrote: >>>>> Yes, but vector clocks are for resolution of race-conditions and network >>>>> partitions, not to provide an SCM history. Imagine how much space would >>>>> be consumed by the history long enough to disambiguate an object that has >>>>> been updated normally 1000 times, followed by one bad client that decides >>>>> write to it without fetching the vector clock first. >>>>> >>>>> Coda Hale put it well in his talk at the recent Riak Meetup: your data >>>>> needs to be logically monotonic so that writes (and reads) can be retried >>>>> until resolution is reached. >>>>> >>>>> Also, we've found that assigning the client id to something that is >>>>> relevant to your domain, e.g. real people, will help reduce surprises >>>>> (and degenerate cases like sibling explosion) when it comes to >>>>> vector-clock resolution. >>>>> >>>>> Sean Cribbs <s...@basho.com> >>>>> Developer Advocate >>>>> Basho Technologies, Inc. >>>>> http://basho.com/ >>>>> >>>>> On Apr 18, 2011, at 8:15 PM, Aphyr wrote: >>>>> >>>>>>> I actually had a question about that page. Why is it that when there >>>>>>> is a conflict we can only get the conflicting versions of the data? >>>>>>> If I'm going to try to resolve the conflict intelligently, I really >>>>>>> want the common ancestor as well so that I can try to do a 3-way >>>>>>> merge. >>>>>> >>>>>> Good call. If an ancestor were available it would make counting and >>>>>> merging orthogonal changes *much* simpler. >>>>>> >>>>>> _______________________________________________ >>>>>> riak-users mailing list >>>>>> riak-users@lists.basho.com >>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>> >>>>> >>> >>> > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com