Yes essentially it’s the same, but from a code complexity perspective, writing 
in spark is more compact and execution is superfast. Spark uses the Cassandra 
connector so the question was mostly on if there is any issue with that and also
with spark we will be deleting in analytical nodes which would then be 
replicated over to the transactional nodes instead of the other way round. And 
yes tombstone problem is also the same with either approach.

I just want to know the pros and cons if any .

Charu


From: Jonathan Haddad <j...@jonhaddad.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Friday, March 23, 2018 at 12:10 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Using Spark to delete from Transactional Cluster

I'm confused as to what the difference between deleting with prepared 
statements and deleting through spark is?  To the best of my knowledge either 
way it's the same thing - normal deletion with tombstones replicated.  Is it 
that you're doing deletes in the analytics DC instead of your real time one?

On Fri, Mar 23, 2018 at 11:38 AM Charulata Sharma (charshar) 
<chars...@cisco.com<mailto:chars...@cisco.com>> wrote:
Hi Rahul,
         Thanks for your answer. Why do you say that deleting from spark is not 
elegant?? This is the exact feedback I want. Basically why is it not elegant?
I can either delete using delete prepared statements or through spark. TTL 
approach doesn’t work for us
Because first of all ttl is there at a column level and there are business 
rules for purge which make the TTL solution not very clean in our case.

Thanks,
Charu

From: Rahul Singh 
<rahul.xavier.si...@gmail.com<mailto:rahul.xavier.si...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Thursday, March 22, 2018 at 5:08 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>, 
"user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Using Spark to delete from Transactional Cluster

Short answer : it works. You can even run “delete” statements from within Spark 
once you know which keys to delete. Not elegant but it works.

It will create a bunch of tombstones and you may need to spread your deletes 
over days. Another thing to consider is instead of deleting setting a TTL which 
will eventually get cleansed.

--
Rahul Singh
rahul.si...@anant.us<mailto:rahul.si...@anant.us>

Anant Corporation

On Mar 22, 2018, 2:19 PM -0500, Charulata Sharma (charshar) 
<chars...@cisco.com<mailto:chars...@cisco.com>>, wrote:
Hi,
   Wanted to know the community’s experiences and feedback on using Apache 
Spark to delete data from C* transactional cluster.
We have spark installed in our analytical C* cluster and so far we have been 
using Spark only for analytics purposes.

However, now with advanced features of Spark 2.0, I am considering using 
spark-cassandra connector for deletes instead of a series of Delete Prepared 
Statements
So essentially the deletes will happen on the analytical cluster and they will 
be replicated over to transactional cluster by means of our keyspace 
replication strategies.

Are there any risks involved in this ??

Thanks,
Charu

Reply via email to