to get it "correct", meaning consistent, it seems you will need to do a repair no matter what since the source cluster is taking writes during this time and writing to commit log. so to avoid filename issues just do the first copy and then repair. i am not sure if they can have any filename.
to the question about whether the tokens must be the same, the answer is they can't be. (http://www.datastax.com/docs/datastax_enterprise2.0/multi_dc_install). i believe that as long as your replication factor is > 1, then using repair would fix most any token assignment On Wed, Dec 19, 2012 at 4:27 AM, Vegard Berget <p...@fantasista.no> wrote: > Hi, > > I know this have been a topic here before, but I need some input on how to > move data from one datacenter to another (and google just gives me some old > mails) - and at the same time moving "production" writing the same way. > To add the target cluster into the source cluster and just replicate data > before moving source nodes is not an option, but my plan is as follows: > 1) Flush data on source cluster and move all data/-files to the destination > cluster. While this is going on, we are still writing to the source > cluster. > 2) When data is copied, start cassandra on the new cluster - and then move > writing/reading to the new cluster. > 3) Now, do a new flush on the source cluster. As I understand, the sstable > files are immutable, so the _newly added_ data/ files could be moved to the > target cluster. > 4) After new data is also copied into the the target data/, do a nodetool > -refresh to load the new sstables into the system (i know we need to take > care of filenames). > > It's worth noting that none of the data is critical, but it would be nice to > get it correct. I know that there will be a short period between 2 and 4 > that reads potentially could read old data (written while copying, reading > after we have moved read/write). This is ok in this case. Our second > alternative is to: > > 1) Drain old cluster > 2) Copy to new cluster > 3) Start new cluster > > This will cause the cluster to be unavailable for writes in the copy-period, > and I wish to avoid that (even if that, too, is survivable). > > Both nodes are 1.1.6, but it might be that we upgrade the target to 1.1.7, > as I can't see that this will cause any problems? > > Questions: > > 1) It's the same number of nodes on both clusters, but does the tokens need > to be the same aswell? (Wouldn't a repair correct that later?) > > 2) Could data files have any name? Could we, to avoid a filename crash, > just substitute the numbers with for example XXX in the data-files? > > 3) Is this really a sane way to do things? > > Suggestions are most welcome! > > Regards > Vegard Berget > >