On Tue, Feb 23, 2016 at 6:44 AM, Jarod Guertin <jarod.guer...@sparkpost.com> wrote:
> Being fairly new to Cassandra, I'd like to run the following with the > experts to make sure it's an ok thing to do. > > We have a particular case where we have multiple keyspaces with multiple > tables each and we want to migrate to a new unique keyspace on the same > cluster. > > The approach envisioned is: > 1. take snapshots on all the nodes > 2. create the new keyspace and all the tables with identical schema > settings (just a different name and keyspace location) > 3. one node at a time, stop cassandra, copy the db files from the old > keyspace\table locations to the new keyspace\table locations and rename the > db filename to use the new keyspace name; then restart cassandra > 4. verify cassandra is running, then repeat step 3 for each other node > 5. once all done switch our application calls to use the new keyspace \ > tables > 6. run node repair on each node, one node at a time > > It is understood that between the snapshots (1) and using the new keyspace > (5) that any changes would not be included in the migration, it would be > done during a maintenance window when only read operations would be > permitted. I should also mention that our number of cassandra nodes is > greater than the replication factor (3). > This is essentially the same operation as renaming a columnfamily, which I described (and someone provided some useful details regarding) in this Jira : https://issues.apache.org/jira/browse/CASSANDRA-1585?focusedCommentId=13488959&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13488959 It's similar to the "copy-the-sstables" method here as well : https://www.pythian.com/blog/bulk-loading-options-for-cassandra/ Notes on your variant : - 1) why snapshot? just for safety? - 3) add nodetool drain before stopping - 3) if you're "copying" you should strongly consider hard linking instead. that way you keep the (immutable) files in both places but only use the disk space once. [1] - 6) is un-necessary if you've done things properly, which you could verify by having a representative known set of data that you read before and after - Presumably there is a silent 7) where you drop the old keyspaces/CFs? =Rob [1] In some very new versions of Cassandra, this may not be safe to do with certain meta information files which are sadly no longer immutable.