I undertook a similar effort a while ago. https://issues.apache.org/jira/browse/CASSANDRA-7014
Other than the fact that it was closed with no comments, I can tell you that other efforts I had to embed things in Cassandra did not go swimmingly. Although at the time ideas were rejected like groovy udfs On Mon, Oct 3, 2016 at 4:22 PM, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > Hi Jonathan, > > If full scan is a regular requirement then setting up a spark cluster in > locality with Cassandra nodes makes perfect sense. But supposing that it is > a one off requirement, say a weekly or a fortnightly task, a spark cluster > could be an added overhead with additional capacity, resource planning as > far as operations / maintenance is concerned. > > So this could be thought a simple substitute for a single threaded scan > without additional efforts to setup and maintain another technology. > > Regards, > Bhuvan > > On Tue, Oct 4, 2016 at 1:37 AM, siddharth verma < > sidd.verma29.l...@gmail.com> wrote: > >> Hi Jon, >> It wan't allowed. >> Moreover, if someone who isn't familiar with spark, and might be new to >> map filter reduce etc. operations, could also use the utility for some >> simple operations assuming a sequential scan of the cassandra table. >> >> Regards >> Siddharth Verma >> >> On Tue, Oct 4, 2016 at 1:32 AM, Jonathan Haddad <j...@jonhaddad.com> >> wrote: >> >>> Couldn't set up as couldn't get it working, or its not allowed? >>> >>> On Mon, Oct 3, 2016 at 3:23 PM Siddharth Verma < >>> verma.siddha...@snapdeal.com> wrote: >>> >>>> Hi Jon, >>>> We couldn't setup a spark cluster. >>>> >>>> For some use case, a spark cluster was required, but for some reason we >>>> couldn't create spark cluster. Hence, one may use this utility to iterate >>>> through the entire table at very high speed. >>>> >>>> Had to find a work around, that would be faster than paging on result >>>> set. >>>> >>>> Regards >>>> >>>> Siddharth Verma >>>> *Software Engineer I - CaMS* >>>> *M*: +91 9013689856, *T*: 011 22791596 *EXT*: 14697 >>>> CA2125, 2nd Floor, ASF Centre-A, Jwala Mill Road, >>>> Udyog Vihar Phase - IV, Gurgaon-122016, INDIA >>>> Download Our App >>>> [image: A] >>>> <https://play.google.com/store/apps/details?id=com.snapdeal.main&utm_source=mobileAppLp&utm_campaign=android> >>>> [image: >>>> A] >>>> <https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1&mt=8&utm_source=mobileAppLp&utm_campaign=ios> >>>> [image: >>>> W] >>>> <http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f> >>>> >>>> On Tue, Oct 4, 2016 at 12:41 AM, Jonathan Haddad <j...@jonhaddad.com> >>>> wrote: >>>> >>>> It almost sounds like you're duplicating all the work of both spark and >>>> the connector. May I ask why you decided to not use the existing tools? >>>> >>>> On Mon, Oct 3, 2016 at 2:21 PM siddharth verma < >>>> sidd.verma29.l...@gmail.com> wrote: >>>> >>>> Hi DuyHai, >>>> Thanks for your reply. >>>> A few more features planned in the next one(if there is one) like, >>>> custom policy keeping in mind the replication of token range on >>>> specific nodes, >>>> fine graining the token range(for more speedup), >>>> and a few more. >>>> >>>> I think, as fine graining a token range, >>>> If one token range is split further in say, 2-3 parts, divided among >>>> threads, this would exploit the possible parallelism on a large scaled out >>>> cluster. >>>> >>>> And, as you mentioned the JIRA, streaming of request, that would of >>>> huge help with further splitting the range. >>>> >>>> Thanks once again for your valuable comments. :-) >>>> >>>> Regards, >>>> Siddharth Verma >>>> >>>> >>>> >> >