Christophe, thanks for your insights,Sorry, I forgot to mention that 
currently both tableA and tableB are being updated by application (all newly 
inserted/updated records should be identical on A and B), exporting from tableB 
and COPY it back later on will result in older data overwrites newly updated 
data.
I can only thinking about using COPY tableA to a csv, and then iterate the csv 
line by line to insert to tableB using "if not exists" clause to avoid 
down-time , but it's error-prone and slow. Not sure whether there is a better 
way. Best,Richard
    On Monday, October 1, 2018, 4:34:38 PM PDT, Christophe Schmitz 
<christo...@instaclustr.com> wrote:  
 
 Hi Richard,
You could consider exporting your few thousands record of Table B in a file, 
with COPY TO. Then TRUNCATE Table B, copy the SSTable files of TableA to the 
data directory of Table A (make sure you flush the memtables first), then run 
nodetool refresh. Final step is to load the few thousands record on Table B 
with COPY FROM. This will overwrite the data you loaded from the SSTables of 
Table A.Overall, there is no downtime on your cluster, there is no downtime on 
Table A, yet you need to think about the consequences on Table B if your 
application is writing on Table A or Table B during this process.Please test 
first :)
Cheers,Christophe

Christophe Schmitz - Instaclustr - Cassandra | Kafka | Spark Consulting




On Tue, 2 Oct 2018 at 09:18 Richard Xin <richardxin...@yahoo.com.invalid> wrote:

I have a tableA with about a few ten millions record, and I have tableB with a 
few thousands record,TableA and TableB have exact same schema (except that 
tableB doesnt have TTL)
I want to load all data to tableB from tableA EXCEPT for those already on 
tableB (we don't want data on tableB to be overwritten)
What's the best to way accomplish this?  
Thanks,
  

Reply via email to