transaction in multithread counter updates

aaron morton Sun, 17 Jun 2012 20:58:14 -0700

> I'm in a pseudo-deadlock 
BOOM BOOM ! :)

>  (N.B. The updates requires a read of current value before the update write. 
> Otherwise counter column can be used, but in my opinion the problem still 
> remain).
Writes in the cassandra server do not require a read.


> My simple question is: what happens when two (or more) threads try to update 
> (increment) the same integer column value of the same row in a column family?
Multiple values for the same column are deterministically resolved. So actual 
order of the interleaving on the server side does not matter. 

Either thread in your example will compare the column A it's trying to write 
with what is in the memtable. The columns are then resolved as:
* deletes with a higher time stamp wins
* next the column instance with the highest timestamp wins. 
* finally the column instance with the greater byte value wins

In 1.1 the threads then try to put their shadow copy of the data that was in 
the memtable back. If it's changed they get it again and try the write.

if two write threads start at the same time and try to apply their change to 
the memtable at the (roughly) same time, one will win and the other will "redo" 
the write in memory. The order this occurs in is irrelevant. 

Cheers
  
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 16/06/2012, at 7:37 PM, Manuel Peli wrote:

> I'm in a pseudo-deadlock about Cassandra and atomicity/isolation/transaction 
> arguments. My simple question is: what happens when two (or more) threads try 
> to update (increment) the same integer column value of the same row in a 
> column family? I've read something about row-level isolation, but I don't 
> sure that is managed properly. Any suggestions? (N.B. The updates requires a 
> read of current value before the update write. Otherwise counter column can 
> be used, but in my opinion the problem still remain).
> 
> My personal idea is described next. Because it's a real time analytics 
> application, the counter updates are inherent only the current hour, while 
> previous hours still remain the same. So I think that one way to avoid the 
> problem should be to use a RDBMS layer for current updates (which support 
> ACID properties) and when the hour expires consolidate data on Cassandra. 
> It's right?
> 
> Also in the case of RDBMS layer still remain the transaction problem: some 
> update on different column family are correlated and if even one fails a 
> rollback is needed. I know that Cassandra doesn't support transactions, but I 
> think that, playing with replication factor and write/read levels the problem 
> can be mitigated, eventually implementing an application level 
> commit/rollback. I read something about Zookeeper, but I guess that add 
> complexity and latency.

Re: Cassandra atomicity/isolation/transaction in multithread counter updates

Reply via email to