How much data is on the nodes in cluster 1 and how much disk space on cluster 2 ? Be aware that Cassandra 0.8 has an issue where repair can go crazy and use a lot of space.
If you are not regularly running repair I would also repair before the move. The repair after the copy is a good idea but should technically not be necessary. If you can practice the move watch the repair to see if much is transferred (check the logs). There is always a small transfer, but if you see data been transferred for several minutes I would investigate. When you start a repair it will repair will the other nodes it replicates data with. So you only need to run it every RF nodes. Start it one one, watch the logs to see who it talks to and then start it on the first node it does not talk to. And so on. Add a snapshot before the clean (repair will also snapshot before it runs) Scrub is not needed unless you are migrating or you have file errors. If your cluster is online, consider running the clean every RFth node rather than all at once (e.g. 1,4, 7, 10 then 2,5,8,11). It will have less impact on clients. Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22/09/2011, at 10:27 AM, Philippe wrote: > Hello, > We're currently running on a 3-node RF=3 cluster. Now that we have a better > grip on things, we want to replace it with a 12-node RF=3 cluster of > "smaller" servers. So I wonder what the best way to move the data to the new > cluster would be. I can afford to stop writing to the current cluster for > whatever time is necessary. Has anyone written up something on this subject ? > > My plan is the following (nodes in cluster 1 are node1.1->1.3, nodes in > cluster 2 are node2.1->2.12) > stop writing to current cluster & drain it > get a snapshot on each node > Since it's RF=3, each node should have all the data, so assuming I set the > tokens correctly I would move the snapshot from node1.1 to node2.1, 2.2, 2.3 > and 2.4 then node1.2->node2.5,2.6,2.,2.8, etc. This is because the range for > node1.1 is now spread across 2.1->2.4 > Run repair & clean & scrub on each node (more or less in //) > What do you think ? > Thanks