Hi Jonathan, If full scan is a regular requirement then setting up a spark cluster in locality with Cassandra nodes makes perfect sense. But supposing that it is a one off requirement, say a weekly or a fortnightly task, a spark cluster could be an added overhead with additional capacity, resource planning as far as operations / maintenance is concerned.
So this could be thought a simple substitute for a single threaded scan without additional efforts to setup and maintain another technology. Regards, Bhuvan On Tue, Oct 4, 2016 at 1:37 AM, siddharth verma <sidd.verma29.l...@gmail.com > wrote: > Hi Jon, > It wan't allowed. > Moreover, if someone who isn't familiar with spark, and might be new to > map filter reduce etc. operations, could also use the utility for some > simple operations assuming a sequential scan of the cassandra table. > > Regards > Siddharth Verma > > On Tue, Oct 4, 2016 at 1:32 AM, Jonathan Haddad <j...@jonhaddad.com> wrote: > >> Couldn't set up as couldn't get it working, or its not allowed? >> >> On Mon, Oct 3, 2016 at 3:23 PM Siddharth Verma < >> verma.siddha...@snapdeal.com> wrote: >> >>> Hi Jon, >>> We couldn't setup a spark cluster. >>> >>> For some use case, a spark cluster was required, but for some reason we >>> couldn't create spark cluster. Hence, one may use this utility to iterate >>> through the entire table at very high speed. >>> >>> Had to find a work around, that would be faster than paging on result >>> set. >>> >>> Regards >>> >>> Siddharth Verma >>> *Software Engineer I - CaMS* >>> *M*: +91 9013689856, *T*: 011 22791596 *EXT*: 14697 >>> CA2125, 2nd Floor, ASF Centre-A, Jwala Mill Road, >>> Udyog Vihar Phase - IV, Gurgaon-122016, INDIA >>> Download Our App >>> [image: A] >>> <https://play.google.com/store/apps/details?id=com.snapdeal.main&utm_source=mobileAppLp&utm_campaign=android> >>> [image: >>> A] >>> <https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1&mt=8&utm_source=mobileAppLp&utm_campaign=ios> >>> [image: >>> W] >>> <http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f> >>> >>> On Tue, Oct 4, 2016 at 12:41 AM, Jonathan Haddad <j...@jonhaddad.com> >>> wrote: >>> >>> It almost sounds like you're duplicating all the work of both spark and >>> the connector. May I ask why you decided to not use the existing tools? >>> >>> On Mon, Oct 3, 2016 at 2:21 PM siddharth verma < >>> sidd.verma29.l...@gmail.com> wrote: >>> >>> Hi DuyHai, >>> Thanks for your reply. >>> A few more features planned in the next one(if there is one) like, >>> custom policy keeping in mind the replication of token range on specific >>> nodes, >>> fine graining the token range(for more speedup), >>> and a few more. >>> >>> I think, as fine graining a token range, >>> If one token range is split further in say, 2-3 parts, divided among >>> threads, this would exploit the possible parallelism on a large scaled out >>> cluster. >>> >>> And, as you mentioned the JIRA, streaming of request, that would of huge >>> help with further splitting the range. >>> >>> Thanks once again for your valuable comments. :-) >>> >>> Regards, >>> Siddharth Verma >>> >>> >>> >