We would try migration for some small keyspaces (with data of serveral gigabytes across a dc) first, but ultimately migration for several large keyspaces with data size ranged from 100G to 5T, some tables having >1T data, would be scheduled too.
As for StreamSets/Talend, personally I doubt if using that would be appropriate at our company, as manpower is pretty restricted for this migration. Arbab's answer actually resolved my initial concern, now trying to play with spark-connector. Thanks for all your replies, much appreciated! 2018-05-16 5:35 GMT+08:00 Joseph Arriola <jcarrio...@gmail.com>: > Hi Jing. > > How much information do you need to migrate? in volume and number of > tables? > > With Spark could you do the follow: > > - Read the data and export directly to MySQL. > - Read the data and export to csv files and after load to MySQL. > > > Could you use other paths such as: > > - StreamSets > - Talend Open Studio > - Kafka Streams. > > > > > 2018-05-15 4:59 GMT-06:00 Jing Meng <self.rel...@gmail.com>: > >> Hi guys, for some historical reason, our cassandra cluster is currently >> overloaded and operating on that somehow becomes a nightmare. Anyway, >> (sadly) we're planning to migrate cassandra data back to mysql... >> >> So we're not quite clear how to migrating the historical data from >> cassandra. >> >> While as I know there is the COPY command, I wonder if it works in >> product env where more than hundreds gigabytes data are present. And, if it >> does, would it impact server performance significantly? >> >> Apart from that, I know spark-connector can be used to scan data from c* >> cluster, but I'm not that familiar with spark and still not sure whether >> write data to mysql database can be done naturally with spark-connector. >> >> Are there any suggestions/best-practice/read-materials doing this? >> >> Thanks! >> > >