Looking at the debug log, I see

[2015-06-29 23:38:11] [main] DEBUG CqlRecordReader - cqlQuery SELECT
"wpid","value" FROM "qarth_catalog_dev"."product_v1" WHERE token("wpid")>?
AND token("wpid")<=? LIMIT 10
[2015-06-29 23:38:11] [main] DEBUG CqlRecordReader - created
org.apache.cassandra.hadoop.cql3.CqlRecordReader$RowIterator@11963225
[2015-06-29 23:38:11] [main] DEBUG CqlRecordReader - Finished scanning 6
rows (estimate was: 0)

I know the split has about 1000 rows, so why is the record reader not
paging through the whole thing? I guess I am missing something very
fundamental and I cannot figure it out from the manuals or the source code
for CqlInputFormat and CqlRecordReader.


Anyone have a working sample code they can share?
————————————————————————————————————

Venky Kandaswamy
925-200-7124





On 6/29/15, 8:46 PM, "Venkatesh Kandaswamy" <ve...@walmartlabs.com> wrote:

>Apologize, I meant version C* 2.0.16
>The latest 2.1.7 source has a different WordCount example and this does
>not use the CqlPagingInputFormat. I am comparing the differences to
>understand why the change was made. But if you can shed some light on the
>reasoning, it is much appreciated (and will save me a few hours of digging
>through the code).
>————————————————————————————————————
>
>Venky Kandaswamy
>925-200-7124
>
>
>
>
>
>On 6/29/15, 8:40 PM, "Venkatesh Kandaswamy" <ve...@walmartlabs.com> wrote:
>
>>I was going through the WordCount example in the latest 2.1.7 Apache C*
>>source and there is a reference to
>>org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat, but it is not in
>>the source tree or in the compiled binary. Looks like we really cannot
>>use
>>C* with Hadoop without a paging input format. Is there a reason why this
>>was removed? But the example includes it. I am confused. Please shed some
>>light if you know the answer.
>>
>>‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹
>>
>>Venky Kandaswamy
>>925-200-7124
>>
>>
>>
>>
>>
>>On 6/29/15, 1:15 PM, "Venkatesh Kandaswamy" <ve...@walmartlabs.com>
>>wrote:
>>
>>>All,
>>>   I converted one of my C* programs to Hadoop 2.x and C* datastax
>>>drivers for 2.1.0. The original program (Hadoop 1.x) worked fine when we
>>>specified InputCQLPageRowSize and InputSplitSize to reasonable values.
>>>For example, if we had 60K rows, a row size of 100 and split size of
>>>10000 will run 6 mappers and give us 60K rows. When we switched to 2.1.x
>>>version of the datastax drivers, the same program now gives only 600
>>>rows.
>>>
>>> It looks like the paging logic has changed and the page size is only
>>>getting the first 100 rows. How do we get all the rows?
>>>
>>>‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹‹
>>>[cid:E4089CAC-450F-40E4-8A26-88A74F209FC9]
>>>Venky Kandaswamy
>>>925-200-7124
>>>
>>
>

Reply via email to