Apologize, I meant version C* 2.0.16
The latest 2.1.7 source has a different WordCount example and this does
not use the CqlPagingInputFormat. I am comparing the differences to
understand why the change was made. But if you can shed some light on the
reasoning, it is much appreciated (and will save me a few hours of digging
through the code).
————————————————————————————————————

Venky Kandaswamy
925-200-7124





On 6/29/15, 8:40 PM, "Venkatesh Kandaswamy" <ve...@walmartlabs.com> wrote:

>I was going through the WordCount example in the latest 2.1.7 Apache C*
>source and there is a reference to
>org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat, but it is not in
>the source tree or in the compiled binary. Looks like we really cannot use
>C* with Hadoop without a paging input format. Is there a reason why this
>was removed? But the example includes it. I am confused. Please shed some
>light if you know the answer.
>
>‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹
>
>Venky Kandaswamy
>925-200-7124
>
>
>
>
>
>On 6/29/15, 1:15 PM, "Venkatesh Kandaswamy" <ve...@walmartlabs.com> wrote:
>
>>All,
>>   I converted one of my C* programs to Hadoop 2.x and C* datastax
>>drivers for 2.1.0. The original program (Hadoop 1.x) worked fine when we
>>specified InputCQLPageRowSize and InputSplitSize to reasonable values.
>>For example, if we had 60K rows, a row size of 100 and split size of
>>10000 will run 6 mappers and give us 60K rows. When we switched to 2.1.x
>>version of the datastax drivers, the same program now gives only 600
>>rows.
>>
>> It looks like the paging logic has changed and the page size is only
>>getting the first 100 rows. How do we get all the rows?
>>
>>‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹
>>[cid:E4089CAC-450F-40E4-8A26-88A74F209FC9]
>>Venky Kandaswamy
>>925-200-7124
>>
>

Reply via email to