On Tue, Oct 28, 2014 at 8:01 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> By the way I found this reference:
> http://grokbase.com/t/cassandra/user/13824z7ykm/best-way-to-split-cluster-online.
> Is this still the "easiest" solution ? Does this work for same table name
> but transferring data to a new keyspace ? Can we migrate sstables this way ?
>

The author of that solution is as wise as he is modest... that said,
there's a really easy way to "split" some column families out of a
Keyspace, even easier than the above.

https://issues.apache.org/jira/browse/CASSANDRA-1585?focusedCommentId=13488959&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13488959
"
To anyone who is wondering about the manual way to do this :

1) create schema for NEW_Keyspace
2) stop writes to OLD_Keyspace from app (reads can continue)
3) flush OLD_Keyspace on every node, via nodetool
4) hard link all sstables from OLD_Keyspace directory to NEW_Keyspace
directory
5) call nodetool -h localhost refresh NEW_Keyspace
6) enable reads/writes from/to NEW_Keyspace from app (disable reads on
OLD_Keyspace)
7) clean up OLD_Keyspace (drop schema, delete files, etc.)

Alternately, if you don't want to do 2/6 because you can't tolerate
OLD_Keyspace not being writable, you can enable writes to NEW_Keyspace,
flush OLD_Keyspace, hard link the just-flushed tables and then enable reads
from NEW_Keyspace. This resolves the delta with a shorter window where you
can't write.

The same technique could also be applied to renaming Columnfamilies,
although in the Columnfamily case the files also need to be renamed. In
Cassandra 1.1+, the files get renamed to include the Keyspace name, so that
would have to change as appropriate.

(Additional Notes :
In 1) You have to create the scheme with all the column familys and indexes.

In 4) remember that the files that stored the sstables start with the name
of the keyspace. You have to rename the files in order to be recognized by
the nodetool refresh.)

In 5), there is a race whereby if you are writing to NEW_Keyspace, you have
a nonzero chance of clobbering files with newly flushed files [1]
"

I see you on that ticket (in July 2013!) but perhaps you do not remember
its relevance.. :D

The primary reason to move Columnfamlies between Keyspaces is that
replication configuration is on the Keyspace level. So you can for example
have some Columnfamilies only available in some DCs, or with a lower RF.

=Rob
http://twitter.com/rcolidba
[1] https://issues.apache.org/jira/browse/CASSANDRA-6245

Reply via email to