I was going through the WordCount example in the latest 2.1.7 Apache C* source and there is a reference to org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat, but it is not in the source tree or in the compiled binary. Looks like we really cannot use C* with Hadoop without a paging input format. Is there a reason why this was removed? But the example includes it. I am confused. Please shed some light if you know the answer.
‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹ Venky Kandaswamy 925-200-7124 On 6/29/15, 1:15 PM, "Venkatesh Kandaswamy" <ve...@walmartlabs.com> wrote: >All, > I converted one of my C* programs to Hadoop 2.x and C* datastax >drivers for 2.1.0. The original program (Hadoop 1.x) worked fine when we >specified InputCQLPageRowSize and InputSplitSize to reasonable values. >For example, if we had 60K rows, a row size of 100 and split size of >10000 will run 6 mappers and give us 60K rows. When we switched to 2.1.x >version of the datastax drivers, the same program now gives only 600 rows. > > It looks like the paging logic has changed and the page size is only >getting the first 100 rows. How do we get all the rows? > >‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹ >[cid:E4089CAC-450F-40E4-8A26-88A74F209FC9] >Venky Kandaswamy >925-200-7124 >