Hi,

 I am looking for an efficient way migrate a portion of the data existing in
a Cassandra cluster to another, separate Cassandra cluster. What I need is
to solve the typical live migration problem that appears in any "DB
sharding" where need to transfer "ownership" of certain rows from DB1 to
DB2...but in a way that clients see no (or almost no) disruption when you
actually do the cutover to DB2 for those writes.

I mean doing something as typical like:

loop (until almost no rows have been modified):
 rows = SELECT * from T where "criteria matches (i.e., shard_id=1) " AND
updated_at > last_time
 last_time = now
 insert(rows) elsewhere
end
...
"lock" modifications to original DB
do one last SELECT to get the last few modified rows
cutover the ownership - (change and ensure the clients know that the new
home for that data is in the other "DB")
unlock modifications


 So, anyway, I thought that I'd be able to apply the same principles by
passing a timestamp of sorts to the get_slices call so I could further
restrict getting only matching columns that have timestamps newer than the
one passed. Now, looking at the thrift interface I see that there is no
timestamp parameter at all...which makes me wonder how people are doing it,
and if there are any well-know practices for it. Setting up a full new
replicating DC within the same cluster doesn't work, as there are some clear
cases where you want to have completely separate cassandra rings.

Cheers,

 Josep M.

Reply via email to