Yes essentially it’s the same, but from a code complexity perspective, writing in spark is more compact and execution is superfast. Spark uses the Cassandra connector so the question was mostly on if there is any issue with that and also with spark we will be deleting in analytical nodes which would then be replicated over to the transactional nodes instead of the other way round. And yes tombstone problem is also the same with either approach.
I just want to know the pros and cons if any . Charu From: Jonathan Haddad <j...@jonhaddad.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Friday, March 23, 2018 at 12:10 PM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: Using Spark to delete from Transactional Cluster I'm confused as to what the difference between deleting with prepared statements and deleting through spark is? To the best of my knowledge either way it's the same thing - normal deletion with tombstones replicated. Is it that you're doing deletes in the analytics DC instead of your real time one? On Fri, Mar 23, 2018 at 11:38 AM Charulata Sharma (charshar) <chars...@cisco.com<mailto:chars...@cisco.com>> wrote: Hi Rahul, Thanks for your answer. Why do you say that deleting from spark is not elegant?? This is the exact feedback I want. Basically why is it not elegant? I can either delete using delete prepared statements or through spark. TTL approach doesn’t work for us Because first of all ttl is there at a column level and there are business rules for purge which make the TTL solution not very clean in our case. Thanks, Charu From: Rahul Singh <rahul.xavier.si...@gmail.com<mailto:rahul.xavier.si...@gmail.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Thursday, March 22, 2018 at 5:08 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>, "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: Using Spark to delete from Transactional Cluster Short answer : it works. You can even run “delete” statements from within Spark once you know which keys to delete. Not elegant but it works. It will create a bunch of tombstones and you may need to spread your deletes over days. Another thing to consider is instead of deleting setting a TTL which will eventually get cleansed. -- Rahul Singh rahul.si...@anant.us<mailto:rahul.si...@anant.us> Anant Corporation On Mar 22, 2018, 2:19 PM -0500, Charulata Sharma (charshar) <chars...@cisco.com<mailto:chars...@cisco.com>>, wrote: Hi, Wanted to know the community’s experiences and feedback on using Apache Spark to delete data from C* transactional cluster. We have spark installed in our analytical C* cluster and so far we have been using Spark only for analytics purposes. However, now with advanced features of Spark 2.0, I am considering using spark-cassandra connector for deletes instead of a series of Delete Prepared Statements So essentially the deletes will happen on the analytical cluster and they will be replicated over to transactional cluster by means of our keyspace replication strategies. Are there any risks involved in this ?? Thanks, Charu