Re: How to reliably achieve unique constraints with Cassandra?

Mohit Anchlia Fri, 06 Jan 2012 13:33:32 -0800

This looks like right way to do it. But remember this still doesn't
gurantee if your clocks drifts way too much. But it's trade-off with
having to manage one additional component or use something internal to
C*. It would be good to see similar functionality implemented in C* so
that clients don't have to deal with it explicitly.


On Fri, Jan 6, 2012 at 1:16 PM, Bryce Allen <bal...@ci.uchicago.edu> wrote:
> This looks like it:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Implementing-locks-using-cassandra-only-tp5527076p5527076.html
>
> There's also some interesting JIRA tickets related to locking/CAS:
> https://issues.apache.org/jira/browse/CASSANDRA-2686
> https://issues.apache.org/jira/browse/CASSANDRA-48
>
> -Bryce
>
> On Fri, 06 Jan 2012 14:53:21 -0600
> Jeremiah Jordan <jeremiah.jor...@morningstar.com> wrote:
>> Correct, any kind of locking in Cassandra requires clocks that are in
>> sync, and requires you to wait "possible clock out of sync time"
>> before reading to check if you got the lock, to prevent the issue you
>> describe below.
>>
>> There was a pretty detailed discussion of locking with only Cassandra
>> a month or so back on this list.
>>
>> -Jeremiah
>>
>> On 01/06/2012 02:42 PM, Bryce Allen wrote:
>> > On Fri, 6 Jan 2012 10:38:17 -0800
>> > Mohit Anchlia<mohitanch...@gmail.com>  wrote:
>> >> It could be as simple as reading before writing to make sure that
>> >> email doesn't exist. But I think you are looking at how to handle 2
>> >> concurrent requests for same email? Only way I can think of is:
>> >>
>> >> 1) Create new CF say tracker
>> >> 2) write email and time uuid to CF tracker
>> >> 3) read from CF tracker
>> >> 4) if you find a row other than yours then wait and read again from
>> >> tracker after few ms
>> >> 5) read from USER CF
>> >> 6) write if no rows in USER CF
>> >> 7) delete from tracker
>> >>
>> >> Please note you might have to modify this logic a little bit, but
>> >> this should give you some ideas of how to approach this problem
>> >> without locking.
>> > Distributed locking is pretty subtle; I haven't seen a correct
>> > solution that uses just Cassandra, even with QUORUM read/write. I
>> > suspect it's not possible.
>> >
>> > With the above proposal, in step 4 two processes could both have
>> > inserted an entry in the tracker before either gets a chance to
>> > check, so you need a way to order the requests. I don't think the
>> > timestamp works for ordering, because it's set by the client (even
>> > the internal timestamp is set by the client), and will likely be
>> > different from when the data is actually committed and available to
>> > read by other clients.
>> >
>> > For example:
>> >
>> > * At time 0ms, client 1 starts insert of u...@example.org
>> > * At time 1ms, client 2 also starts insert for u...@example.org
>> > * At time 2ms, client 2 data is committed
>> > * At time 3ms, client 2 reads tracker and sees that it's the only
>> > one, so enters the critical section
>> > * At time 4ms, client 1 data is committed
>> > * At time 5ms, client 2 reads tracker, and sees that is not the only
>> >    one, but since it has the lowest timestamp (0ms vs 1ms), it
>> > enters the critical section.
>> >
>> > I don't think Cassandra counters work for ordering either.
>> >
>> > This approach is similar to the Zookeeper lock recipe:
>> > http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks
>> > but zookeeper has sequence nodes, which provide a consistent way of
>> > ordering the requests. Zookeeper also avoids the busy waiting.
>> >
>> > I'd be happy to be proven wrong. But even if it is possible, if it
>> > involves a lot of complexity and busy waiting it's probably not
>> > worth it. There's a reason people are using Zookeeper with
>> > Cassandra.
>> >
>> > -Bryce

Re: How to reliably achieve unique constraints with Cassandra?

Reply via email to