Re: How to reliably achieve unique constraints with Cassandra?

Bryce Allen Fri, 06 Jan 2012 13:17:09 -0800

This looks like it:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Implementing-locks-using-cassandra-only-tp5527076p5527076.html


There's also some interesting JIRA tickets related to locking/CAS:
https://issues.apache.org/jira/browse/CASSANDRA-2686
https://issues.apache.org/jira/browse/CASSANDRA-48

-Bryce

On Fri, 06 Jan 2012 14:53:21 -0600
Jeremiah Jordan <jeremiah.jor...@morningstar.com> wrote:
> Correct, any kind of locking in Cassandra requires clocks that are in 
> sync, and requires you to wait "possible clock out of sync time"
> before reading to check if you got the lock, to prevent the issue you
> describe below.
> 
> There was a pretty detailed discussion of locking with only Cassandra
> a month or so back on this list.
> 
> -Jeremiah
> 
> On 01/06/2012 02:42 PM, Bryce Allen wrote:
> > On Fri, 6 Jan 2012 10:38:17 -0800
> > Mohit Anchlia<mohitanch...@gmail.com>  wrote:
> >> It could be as simple as reading before writing to make sure that
> >> email doesn't exist. But I think you are looking at how to handle 2
> >> concurrent requests for same email? Only way I can think of is:
> >>
> >> 1) Create new CF say tracker
> >> 2) write email and time uuid to CF tracker
> >> 3) read from CF tracker
> >> 4) if you find a row other than yours then wait and read again from
> >> tracker after few ms
> >> 5) read from USER CF
> >> 6) write if no rows in USER CF
> >> 7) delete from tracker
> >>
> >> Please note you might have to modify this logic a little bit, but
> >> this should give you some ideas of how to approach this problem
> >> without locking.
> > Distributed locking is pretty subtle; I haven't seen a correct
> > solution that uses just Cassandra, even with QUORUM read/write. I
> > suspect it's not possible.
> >
> > With the above proposal, in step 4 two processes could both have
> > inserted an entry in the tracker before either gets a chance to
> > check, so you need a way to order the requests. I don't think the
> > timestamp works for ordering, because it's set by the client (even
> > the internal timestamp is set by the client), and will likely be
> > different from when the data is actually committed and available to
> > read by other clients.
> >
> > For example:
> >
> > * At time 0ms, client 1 starts insert of u...@example.org
> > * At time 1ms, client 2 also starts insert for u...@example.org
> > * At time 2ms, client 2 data is committed
> > * At time 3ms, client 2 reads tracker and sees that it's the only
> > one, so enters the critical section
> > * At time 4ms, client 1 data is committed
> > * At time 5ms, client 2 reads tracker, and sees that is not the only
> >    one, but since it has the lowest timestamp (0ms vs 1ms), it
> > enters the critical section.
> >
> > I don't think Cassandra counters work for ordering either.
> >
> > This approach is similar to the Zookeeper lock recipe:
> > http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks
> > but zookeeper has sequence nodes, which provide a consistent way of
> > ordering the requests. Zookeeper also avoids the busy waiting.
> >
> > I'd be happy to be proven wrong. But even if it is possible, if it
> > involves a lot of complexity and busy waiting it's probably not
> > worth it. There's a reason people are using Zookeeper with
> > Cassandra.
> >
> > -Bryce

signature.asc
Description: PGP signature

Re: How to reliably achieve unique constraints with Cassandra?

Reply via email to