Strategies for avoiding corrupted duplicate data?

Max C Sat, 19 Mar 2016 00:42:47 -0700

Hello,

What are your best practices for avoiding collisions when updating duplicate 
(derived) data?  For example if I have tables like this:


users — (username, email, phone) — PK (username)
users_by_email — (username, email, phone) — PK (email)
users_by_phone — (username, email, phone) — PK ((phone), username)

… and I want to change the user’s email address ...

What is your strategy for ensuring that two writers don’t try to update the 
same records at the same time?

Some sort of lock?

1) Do you have a global “locks” table in Cassandra, and use a LWT to lock it?

insert into locks (what, locked_by) values ('users:maxc', ’server1_pid1234') if 
not exists;
# Verify insert succeeded
# Now that I have the lock, grab the latest 
select * from users where username=‘maxc’;   
begin batch
# … apply changes ...
apply batch;
# release lock
delete from locks where what=‘users:maxc’;  

2) Do you add a “locked_by” column to the master (“users”) table, and then use 
a LWT to lock it?

update users set locked_by=‘server1_pid1234’’ where username=“maxc” if 
locked_by = null;
# Verify update succeeded
# Now that I have the lock, grab the latest 
select * from users where username=‘maxc’;   
begin batch
# … apply changes ...
apply batch;
# release lock
update users set locked_by=null where username=‘maxc’; 

3) Do you use something outside of Cassandra to manage the locks?  Zoo keeper?  

## Acquire external lock ##
# Now that I have the lock, grab the latest 
select * from users where username=‘maxc’;   
begin batch
# … apply changes ...
apply batch;
## release external lock ##

Or is there some other way to do this that I’m totally missing??  Materialized 
views in 3.x, I suppose.  Other ideas?

Thanks!

- Max

Strategies for avoiding corrupted duplicate data?

Reply via email to