Re[2]: Cassandra input paging for Hadoop

Renat Gilfanov Wed, 11 Sep 2013 22:59:26 -0700

 Hello,

So it means that job will process only first "cassandra.input.page.row.size" 
rows, and ignore the rest? Or CqlPagingRecordReader supports paging through the 
entire result set?



  Aaron Morton <aa...@thelastpickle.com>:
>>>
>>>I'm looking at the ConfigHelper.setRangeBatchSize() and
>>>CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if
>>>that's what I need and if yes, which one should I use for those purposes. If 
>>>you are using CQL 3 via Hadoop CqlConfigHelper.setInputCQLPageRowSize is the 
>>>one you want. 
>
>it maps to the LIMIT clause of the select statement the input reader will 
>generate, the default is 1,000.
>
>A
> 
>-----------------
>Aaron Morton
>New Zealand
>@aaronmorton
>
>Co-Founder & Principal Consultant
>Apache Cassandra Consulting
>http://www.thelastpickle.com
>
>On 12/09/2013, at 9:04 AM, Jiaan Zeng < l.alle...@gmail.com > wrote:
>>Speaking of thrift client, i.e. ColumnFamilyInputFormat, yes,
>>ConfigHelper.setRangeBatchSize() can reduce the number of rows sent to
>>Cassandra.
>>
>>Depend on how big your column is, you may also want to increase thrift
>>message length through setThriftMaxMessageLengthInMb().
>>
>>Hope that helps.
>>
>>On Tue, Sep 10, 2013 at 8:18 PM, Renat Gilfanov < gren...@mail.ru > wrote:
>>>Hi,
>>>
>>>We have Hadoop jobs that read data from our Cassandra column families and
>>>write some data back to another column families.
>>>The input column families are pretty simple CQL3 tables without wide rows.
>>>In Hadoop jobs we set up corresponding WHERE clause in
>>>ConfigHelper.setInputWhereClauses(...), so we don't process the whole table
>>>at once.
>>>Never  the less, sometimes the amount of data returned by input query is big
>>>enough to cause TimedOutExceptions.
>>>
>>>To mitigate this, I'd like to configure Hadoop job in a such way that it
>>>sequentially fetches input rows by smaller portions.
>>>
>>>I'm looking at the ConfigHelper.setRangeBatchSize() and
>>>CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if
>>>that's what I need and if yes, which one should I use for those purposes.
>>>
>>>Any help is appreciated.
>>>
>>>Hadoop version is 1.1.2, Cassandra version is 1.2.8.
>>
>>
>>
>>-- 
>>Regards,
>>Jiaan
>

Re[2]: Cassandra input paging for Hadoop

Reply via email to