We use spark to do same because our partition contains data for whole year and we delete one day at a time. C* does not allow us delete without using partition key. I know it’s wrong data model but we can’t change it due to obvious reason of whole application redesign.
Sent from my iPhone > On Mar 23, 2018, at 2:10 PM, Jonathan Haddad <j...@jonhaddad.com> wrote: > > I'm confused as to what the difference between deleting with prepared > statements and deleting through spark is? To the best of my knowledge either > way it's the same thing - normal deletion with tombstones replicated. Is it > that you're doing deletes in the analytics DC instead of your real time one? > >> On Fri, Mar 23, 2018 at 11:38 AM Charulata Sharma (charshar) >> <chars...@cisco.com> wrote: >> Hi Rahul, >> >> Thanks for your answer. Why do you say that deleting from spark is >> not elegant?? This is the exact feedback I want. Basically why is it not >> elegant? >> >> I can either delete using delete prepared statements or through spark. TTL >> approach doesn’t work for us >> >> Because first of all ttl is there at a column level and there are business >> rules for purge which make the TTL solution not very clean in our case. >> >> >> >> Thanks, >> >> Charu >> >> >> >> From: Rahul Singh <rahul.xavier.si...@gmail.com> >> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> Date: Thursday, March 22, 2018 at 5:08 PM >> To: "user@cassandra.apache.org" <user@cassandra.apache.org>, >> "user@cassandra.apache.org" <user@cassandra.apache.org> >> Subject: Re: Using Spark to delete from Transactional Cluster >> >> >> >> Short answer : it works. You can even run “delete” statements from within >> Spark once you know which keys to delete. Not elegant but it works. >> >> It will create a bunch of tombstones and you may need to spread your deletes >> over days. Another thing to consider is instead of deleting setting a TTL >> which will eventually get cleansed. >> >> >> -- >> Rahul Singh >> rahul.si...@anant.us >> >> Anant Corporation >> >> >> On Mar 22, 2018, 2:19 PM -0500, Charulata Sharma (charshar) >> <chars...@cisco.com>, wrote: >> >> >> Hi, >> >> Wanted to know the community’s experiences and feedback on using Apache >> Spark to delete data from C* transactional cluster. >> >> We have spark installed in our analytical C* cluster and so far we have been >> using Spark only for analytics purposes. >> >> >> >> However, now with advanced features of Spark 2.0, I am considering using >> spark-cassandra connector for deletes instead of a series of Delete Prepared >> Statements >> >> So essentially the deletes will happen on the analytical cluster and they >> will be replicated over to transactional cluster by means of our keyspace >> replication strategies. >> >> >> >> Are there any risks involved in this ?? >> >> >> >> Thanks, >> >> Charu >> >>