Have a look at using Spark on Cassandra. It's commonly used for data movement / data migration / reconciliation (on top of analytics). You will get much better performances.
Christophe Schmitz - Instaclustr <https://www.instaclustr.com/> - Cassandra | Kafka | Spark Consulting On Tue, 2 Oct 2018 at 09:58 Richard Xin <richardxin...@yahoo.com.invalid> wrote: > Christophe, thanks for your insights, > Sorry, I forgot to mention that currently both tableA and tableB are being > updated by application (all newly inserted/updated records should be > identical on A and B), exporting from tableB and COPY it back later on will > result in older data overwrites newly updated data. > > I can only thinking about using COPY tableA to a csv, and then iterate the > csv line by line to insert to tableB using "if not exists" clause to avoid > down-time , but it's error-prone and slow. Not sure whether there is a > better way. > Best, > Richard > > On Monday, October 1, 2018, 4:34:38 PM PDT, Christophe Schmitz < > christo...@instaclustr.com> wrote: > > > Hi Richard, > > You could consider exporting your few thousands record of Table B in a > file, with *COPY TO*. Then *TRUNCATE* Table B, copy the SSTable files of > TableA to the data directory of Table A (make sure you *flush* the > memtables first), then run nodetool *refresh*. Final step is to load the > few thousands record on Table B with *COPY FROM*. This will overwrite the > data you loaded from the SSTables of Table A. > Overall, there is no downtime on your cluster, there is no downtime on > Table A, yet you need to think about the consequences on Table B if your > application is writing on Table A or Table B during this process. > Please test first :) > > Cheers, > Christophe > > Christophe Schmitz - Instaclustr <https://www.instaclustr.com/> - > Cassandra | Kafka | Spark Consulting > > > > > On Tue, 2 Oct 2018 at 09:18 Richard Xin <richardxin...@yahoo.com.invalid> > wrote: > > I have a tableA with about a few ten millions record, and I have tableB > with a few thousands record, > TableA and TableB have exact same schema (except that tableB doesnt have > TTL) > > I want to load all data to tableB from tableA EXCEPT for those already on > tableB (we don't want data on tableB to be overwritten) > > What's the best to way accomplish this? > > Thanks, > >