RE: An extremely fast cassandra table full scan utility

SEAN_R_DURITY Tue, 01 Nov 2016 09:53:11 -0700

In general, I recommend that full table scans are a bad use case for Cassandra. 
Another technology might be a better choice.

Sean Durity

From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
Sent: Monday, October 03, 2016 4:38 PM
To: user@cassandra.apache.org
Subject: Re: An extremely fast cassandra table full scan utility

I undertook a similar effort a while ago.

https://issues.apache.org/jira/browse/CASSANDRA-7014

Other than the fact that it was closed with no comments, I can tell you that 
other efforts I had to embed things in Cassandra did not go swimmingly. 
Although at the time ideas were rejected like groovy udfs

On Mon, Oct 3, 2016 at 4:22 PM, Bhuvan Rawal 
<bhu1ra...@gmail.com<mailto:bhu1ra...@gmail.com>> wrote:
Hi Jonathan,

If full scan is a regular requirement then setting up a spark cluster in 
locality with Cassandra nodes makes perfect sense. But supposing that it is a 
one off requirement, say a weekly or a fortnightly task, a spark cluster could 
be an added overhead with additional capacity, resource planning as far as 
operations / maintenance is concerned.

So this could be thought a simple substitute for a single threaded scan without 
additional efforts to setup and maintain another technology.

Regards,
Bhuvan

On Tue, Oct 4, 2016 at 1:37 AM, siddharth verma 
<sidd.verma29.l...@gmail.com<mailto:sidd.verma29.l...@gmail.com>> wrote:
Hi Jon,
It wan't allowed.
Moreover, if someone who isn't familiar with spark, and might be new to map 
filter reduce etc. operations, could also use the utility for some simple 
operations assuming a sequential scan of the cassandra table.

Regards
Siddharth Verma

On Tue, Oct 4, 2016 at 1:32 AM, Jonathan Haddad 
<j...@jonhaddad.com<mailto:j...@jonhaddad.com>> wrote:
Couldn't set up as couldn't get it working, or its not allowed?

On Mon, Oct 3, 2016 at 3:23 PM Siddharth Verma 
<verma.siddha...@snapdeal.com<mailto:verma.siddha...@snapdeal.com>> wrote:
Hi Jon,
We couldn't setup a spark cluster.

For some use case, a spark cluster was required, but for some reason we 
couldn't create spark cluster. Hence, one may use this utility to iterate 
through the entire table at very high speed.
Had to find a work around, that would be faster than paging on result set.
Regards

Siddharth Verma
Software Engineer I - CaMS

[http://i.sdlcdn.com/img/marketing-mailers/mailer/2015/signature_24mar/images/dilkideal_logo.png]

M: +91 9013689856<tel:%2B91%209013689856>, T: 011 22791596 EXT: 14697
CA2125, 2nd Floor, ASF Centre-A, Jwala Mill Road,
Udyog Vihar Phase - IV, Gurgaon-122016, INDIA

[http://i3.sdlcdn.com/img/homepage/03/sirertPlaceWrk2.png]

Download Our App

<https://play.google.com/store/apps/details?id=com.snapdeal.main&utm_source=mobileAppLp&utm_campaign=android>[A]<https://play.google.com/store/apps/details?id=com.snapdeal.main&utm_source=mobileAppLp&utm_campaign=android><https://play.google.com/store/apps/details?id=com.snapdeal.main&utm_source=mobileAppLp&utm_campaign=android>

<https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1&mt=8&utm_source=mobileAppLp&utm_campaign=ios>[A]<https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1&mt=8&utm_source=mobileAppLp&utm_campaign=ios><https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1&mt=8&utm_source=mobileAppLp&utm_campaign=ios>

<http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f>[W]<http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f><http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f>

On Tue, Oct 4, 2016 at 12:41 AM, Jonathan Haddad 
<j...@jonhaddad.com<mailto:j...@jonhaddad.com>> wrote:
It almost sounds like you're duplicating all the work of both spark and the 
connector. May I ask why you decided to not use the existing tools?

On Mon, Oct 3, 2016 at 2:21 PM siddharth verma 
<sidd.verma29.l...@gmail.com<mailto:sidd.verma29.l...@gmail.com>> wrote:
Hi DuyHai,
Thanks for your reply.
A few more features planned in the next one(if there is one) like,
custom policy keeping in mind the replication of token range on specific nodes,
fine graining the token range(for more speedup),
and a few more.

I think, as fine graining a token range,
If one token range is split further in say, 2-3 parts, divided among threads, 
this would exploit the possible parallelism on a large scaled out cluster.

And, as you mentioned the JIRA, streaming of request, that would of huge help 
with further splitting the range.

Thanks once again for your valuable comments. :-)

Regards,
Siddharth Verma

________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

RE: An extremely fast cassandra table full scan utility

Reply via email to