To the question of ‘best approach’, so far the comments have been about 
alternatives in tools.

Another axis you might want to consider is from the data model viewpoint.  So, 
for example, let’s say you have 600M rows.  You want to do a daily transfer of 
data for some reason.  First question that comes to mind is, do you need all 
the data every day?  Usually that would only be the case if all of the data is 
at risk of changing.

Generally the way I’d cut down the pain on something like this is to figure out 
if the data model currently does, or could be made to, only mutate in a limited 
subset.  Then maybe all you are transferring are the daily changes.  Systems 
based on catching up to daily changes will usually be pulling single-digit 
percentages of data volume compared to the entire storage footprint.  That’s 
not only a lot less data to pull, it’s also a lot less impact on the ongoing 
operations of the cluster while you are pulling that data.

R

From: "JOHN, BIBIN" <bj9...@att.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wednesday, February 19, 2020 at 1:13 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Mechanism to Bulk Export from Cassandra on daily Basis

Message from External Sender
Team,
We have a requirement to bulk export data from Cassandra on daily basis? Table 
contain close to 600M records and cluster is having 12 nodes. What is the best 
approach to do this?


Thanks
Bibin John

Reply via email to