Re: An extremely fast cassandra table full scan utility

Bhuvan Rawal Mon, 03 Oct 2016 13:23:01 -0700

Hi Jonathan,

If full scan is a regular requirement then setting up a spark cluster in
locality with Cassandra nodes makes perfect sense. But supposing that it is
a one off requirement, say a weekly or a fortnightly task, a spark cluster
could be an added overhead with additional capacity, resource planning as
far as operations / maintenance is concerned.


So this could be thought a simple substitute for a single threaded scan
without additional efforts to setup and maintain another technology.

Regards,
Bhuvan

On Tue, Oct 4, 2016 at 1:37 AM, siddharth verma <sidd.verma29.l...@gmail.com
> wrote:

> Hi Jon,
> It wan't allowed.
> Moreover, if someone who isn't familiar with spark, and might be new to
> map filter reduce etc. operations, could also use the utility for some
> simple operations assuming a sequential scan of the cassandra table.
>
> Regards
> Siddharth Verma
>
> On Tue, Oct 4, 2016 at 1:32 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:
>
>> Couldn't set up as couldn't get it working, or its not allowed?
>>
>> On Mon, Oct 3, 2016 at 3:23 PM Siddharth Verma <
>> verma.siddha...@snapdeal.com> wrote:
>>
>>> Hi Jon,
>>> We couldn't setup a spark cluster.
>>>
>>> For some use case, a spark cluster was required, but for some reason we
>>> couldn't create spark cluster. Hence, one may use this utility to iterate
>>> through the entire table at very high speed.
>>>
>>> Had to find a work around, that would be faster than paging on result
>>> set.
>>>
>>> Regards
>>>
>>> Siddharth Verma
>>> *Software Engineer I - CaMS*
>>> *M*: +91 9013689856, *T*: 011 22791596 *EXT*: 14697
>>> CA2125, 2nd Floor, ASF Centre-A, Jwala Mill Road,
>>> Udyog Vihar Phase - IV, Gurgaon-122016, INDIA
>>> Download Our App
>>> [image: A]
>>> <https://play.google.com/store/apps/details?id=com.snapdeal.main&utm_source=mobileAppLp&utm_campaign=android>
>>>  [image:
>>> A]
>>> <https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1&mt=8&utm_source=mobileAppLp&utm_campaign=ios>
>>>  [image:
>>> W]
>>> <http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f>
>>>
>>> On Tue, Oct 4, 2016 at 12:41 AM, Jonathan Haddad <j...@jonhaddad.com>
>>> wrote:
>>>
>>> It almost sounds like you're duplicating all the work of both spark and
>>> the connector. May I ask why you decided to not use the existing tools?
>>>
>>> On Mon, Oct 3, 2016 at 2:21 PM siddharth verma <
>>> sidd.verma29.l...@gmail.com> wrote:
>>>
>>> Hi DuyHai,
>>> Thanks for your reply.
>>> A few more features planned in the next one(if there is one) like,
>>> custom policy keeping in mind the replication of token range on specific
>>> nodes,
>>> fine graining the token range(for more speedup),
>>> and a few more.
>>>
>>> I think, as fine graining a token range,
>>> If one token range is split further in say, 2-3 parts, divided among
>>> threads, this would exploit the possible parallelism on a large scaled out
>>> cluster.
>>>
>>> And, as you mentioned the JIRA, streaming of request, that would of huge
>>> help with further splitting the range.
>>>
>>> Thanks once again for your valuable comments. :-)
>>>
>>> Regards,
>>> Siddharth Verma
>>>
>>>
>>>
>

Re: An extremely fast cassandra table full scan utility

Reply via email to