Re: Using Spark to delete from Transactional Cluster

Rahul Singh Fri, 23 Mar 2018 13:46:27 -0700

I think there are better ways to leverage parallel processing than to use it to 
delete data. As I said , it works for one of my projects for the same exact 
reason you stated : business rules.


Deleting data is an old way of thinking. Why not store the data and just use 
the relevant data .. let really old data expire ..

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 23, 2018, 11:38 AM -0700, Charulata Sharma (charshar) 
<chars...@cisco.com>, wrote:
> Hi Rahul,
>          Thanks for your answer. Why do you say that deleting from spark is 
> not elegant?? This is the exact feedback I want. Basically why is it not 
> elegant?
> I can either delete using delete prepared statements or through spark. TTL 
> approach doesn’t work for us
> Because first of all ttl is there at a column level and there are business 
> rules for purge which make the TTL solution not very clean in our case.
>
> Thanks,
> Charu
>
> From: Rahul Singh <rahul.xavier.si...@gmail.com>
> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Date: Thursday, March 22, 2018 at 5:08 PM
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>, 
> "user@cassandra.apache.org" <user@cassandra.apache.org>
> Subject: Re: Using Spark to delete from Transactional Cluster
>
> Short answer : it works. You can even run “delete” statements from within 
> Spark once you know which keys to delete. Not elegant but it works.
>
> It will create a bunch of tombstones and you may need to spread your deletes 
> over days. Another thing to consider is instead of deleting setting a TTL 
> which will eventually get cleansed.
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Mar 22, 2018, 2:19 PM -0500, Charulata Sharma (charshar) 
> <chars...@cisco.com>, wrote:
>
> > Hi,
> >    Wanted to know the community’s experiences and feedback on using Apache 
> > Spark to delete data from C* transactional cluster.
> > We have spark installed in our analytical C* cluster and so far we have 
> > been using Spark only for analytics purposes.
> >
> > However, now with advanced features of Spark 2.0, I am considering using 
> > spark-cassandra connector for deletes instead of a series of Delete 
> > Prepared Statements
> > So essentially the deletes will happen on the analytical cluster and they 
> > will be replicated over to transactional cluster by means of our keyspace 
> > replication strategies.
> >
> > Are there any risks involved in this ??
> >
> > Thanks,
> > Charu
> >

Re: Using Spark to delete from Transactional Cluster

Reply via email to