Re: How to reliably achieve unique constraints with Cassandra?

Jeremiah Jordan Fri, 06 Jan 2012 12:54:11 -0800

Correct, any kind of locking in Cassandra requires clocks that are insync, and requires you to wait "possible clock out of sync time" beforereading to check if you got the lock, to prevent the issue you describebelow.

There was a pretty detailed discussion of locking with only Cassandra amonth or so back on this list.


-Jeremiah

On 01/06/2012 02:42 PM, Bryce Allen wrote:

On Fri, 6 Jan 2012 10:38:17 -0800
Mohit Anchlia<mohitanch...@gmail.com>  wrote:

It could be as simple as reading before writing to make sure that
email doesn't exist. But I think you are looking at how to handle 2
concurrent requests for same email? Only way I can think of is:

1) Create new CF say tracker
2) write email and time uuid to CF tracker
3) read from CF tracker
4) if you find a row other than yours then wait and read again from
tracker after few ms
5) read from USER CF
6) write if no rows in USER CF
7) delete from tracker

Please note you might have to modify this logic a little bit, but this
should give you some ideas of how to approach this problem without
locking.

Distributed locking is pretty subtle; I haven't seen a correct solution
that uses just Cassandra, even with QUORUM read/write. I suspect it's
not possible.

With the above proposal, in step 4 two processes could both have
inserted an entry in the tracker before either gets a chance to check,
so you need a way to order the requests. I don't think the timestamp
works for ordering, because it's set by the client (even the internal
timestamp is set by the client), and will likely be different from
when the data is actually committed and available to read by other
clients.

For example:

* At time 0ms, client 1 starts insert of u...@example.org
* At time 1ms, client 2 also starts insert for u...@example.org
* At time 2ms, client 2 data is committed
* At time 3ms, client 2 reads tracker and sees that it's the only one,
   so enters the critical section
* At time 4ms, client 1 data is committed
* At time 5ms, client 2 reads tracker, and sees that is not the only
   one, but since it has the lowest timestamp (0ms vs 1ms), it enters
   the critical section.

I don't think Cassandra counters work for ordering either.

This approach is similar to the Zookeeper lock recipe:
http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks
but zookeeper has sequence nodes, which provide a consistent way of
ordering the requests. Zookeeper also avoids the busy waiting.

I'd be happy to be proven wrong. But even if it is possible, if it
involves a lot of complexity and busy waiting it's probably not worth
it. There's a reason people are using Zookeeper with Cassandra.

-Bryce

Re: How to reliably achieve unique constraints with Cassandra?

Reply via email to