Thank you for suggestion. Full refresh is currently designed because with delta we cannot identify what got deleted. So downstreams prefer full data everyday.
Thanks Bibin John From: Reid Pinchback <rpinchb...@tripadvisor.com> Sent: Wednesday, February 19, 2020 3:14 PM To: user@cassandra.apache.org Subject: Re: Mechanism to Bulk Export from Cassandra on daily Basis To the question of ‘best approach’, so far the comments have been about alternatives in tools. Another axis you might want to consider is from the data model viewpoint. So, for example, let’s say you have 600M rows. You want to do a daily transfer of data for some reason. First question that comes to mind is, do you need all the data every day? Usually that would only be the case if all of the data is at risk of changing. Generally the way I’d cut down the pain on something like this is to figure out if the data model currently does, or could be made to, only mutate in a limited subset. Then maybe all you are transferring are the daily changes. Systems based on catching up to daily changes will usually be pulling single-digit percentages of data volume compared to the entire storage footprint. That’s not only a lot less data to pull, it’s also a lot less impact on the ongoing operations of the cluster while you are pulling that data. R From: "JOHN, BIBIN" <bj9...@att.com<mailto:bj9...@att.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Wednesday, February 19, 2020 at 1:13 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Mechanism to Bulk Export from Cassandra on daily Basis Message from External Sender Team, We have a requirement to bulk export data from Cassandra on daily basis? Table contain close to 600M records and cluster is having 12 nodes. What is the best approach to do this? Thanks Bibin John