Apologize, I meant version C* 2.0.16 The latest 2.1.7 source has a different WordCount example and this does not use the CqlPagingInputFormat. I am comparing the differences to understand why the change was made. But if you can shed some light on the reasoning, it is much appreciated (and will save me a few hours of digging through the code). ————————————————————————————————————
Venky Kandaswamy 925-200-7124 On 6/29/15, 8:40 PM, "Venkatesh Kandaswamy" <ve...@walmartlabs.com> wrote: >I was going through the WordCount example in the latest 2.1.7 Apache C* >source and there is a reference to >org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat, but it is not in >the source tree or in the compiled binary. Looks like we really cannot use >C* with Hadoop without a paging input format. Is there a reason why this >was removed? But the example includes it. I am confused. Please shed some >light if you know the answer. > >‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹ > >Venky Kandaswamy >925-200-7124 > > > > > >On 6/29/15, 1:15 PM, "Venkatesh Kandaswamy" <ve...@walmartlabs.com> wrote: > >>All, >> I converted one of my C* programs to Hadoop 2.x and C* datastax >>drivers for 2.1.0. The original program (Hadoop 1.x) worked fine when we >>specified InputCQLPageRowSize and InputSplitSize to reasonable values. >>For example, if we had 60K rows, a row size of 100 and split size of >>10000 will run 6 mappers and give us 60K rows. When we switched to 2.1.x >>version of the datastax drivers, the same program now gives only 600 >>rows. >> >> It looks like the paging logic has changed and the page size is only >>getting the first 100 rows. How do we get all the rows? >> >>‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹ >>[cid:E4089CAC-450F-40E4-8A26-88A74F209FC9] >>Venky Kandaswamy >>925-200-7124 >> >