I think there are better ways to leverage parallel processing than to use it to delete data. As I said , it works for one of my projects for the same exact reason you stated : business rules.
Deleting data is an old way of thinking. Why not store the data and just use the relevant data .. let really old data expire .. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Mar 23, 2018, 11:38 AM -0700, Charulata Sharma (charshar) <chars...@cisco.com>, wrote: > Hi Rahul, > Thanks for your answer. Why do you say that deleting from spark is > not elegant?? This is the exact feedback I want. Basically why is it not > elegant? > I can either delete using delete prepared statements or through spark. TTL > approach doesn’t work for us > Because first of all ttl is there at a column level and there are business > rules for purge which make the TTL solution not very clean in our case. > > Thanks, > Charu > > From: Rahul Singh <rahul.xavier.si...@gmail.com> > Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Date: Thursday, March 22, 2018 at 5:08 PM > To: "user@cassandra.apache.org" <user@cassandra.apache.org>, > "user@cassandra.apache.org" <user@cassandra.apache.org> > Subject: Re: Using Spark to delete from Transactional Cluster > > Short answer : it works. You can even run “delete” statements from within > Spark once you know which keys to delete. Not elegant but it works. > > It will create a bunch of tombstones and you may need to spread your deletes > over days. Another thing to consider is instead of deleting setting a TTL > which will eventually get cleansed. > > -- > Rahul Singh > rahul.si...@anant.us > > Anant Corporation > > On Mar 22, 2018, 2:19 PM -0500, Charulata Sharma (charshar) > <chars...@cisco.com>, wrote: > > > Hi, > > Wanted to know the community’s experiences and feedback on using Apache > > Spark to delete data from C* transactional cluster. > > We have spark installed in our analytical C* cluster and so far we have > > been using Spark only for analytics purposes. > > > > However, now with advanced features of Spark 2.0, I am considering using > > spark-cassandra connector for deletes instead of a series of Delete > > Prepared Statements > > So essentially the deletes will happen on the analytical cluster and they > > will be replicated over to transactional cluster by means of our keyspace > > replication strategies. > > > > Are there any risks involved in this ?? > > > > Thanks, > > Charu > >