Re: Cassandra input paging for Hadoop

Jiaan Zeng Wed, 11 Sep 2013 14:32:04 -0700

Speaking of thrift client, i.e. ColumnFamilyInputFormat, yes,
ConfigHelper.setRangeBatchSize() can reduce the number of rows sent to
Cassandra.


Depend on how big your column is, you may also want to increase thrift
message length through setThriftMaxMessageLengthInMb().

Hope that helps.

On Tue, Sep 10, 2013 at 8:18 PM, Renat Gilfanov <gren...@mail.ru> wrote:
> Hi,
>
> We have Hadoop jobs that read data from our Cassandra column families and
> write some data back to another column families.
> The input column families are pretty simple CQL3 tables without wide rows.
> In Hadoop jobs we set up corresponding WHERE clause in
> ConfigHelper.setInputWhereClauses(...), so we don't process the whole table
> at once.
> Never  the less, sometimes the amount of data returned by input query is big
> enough to cause TimedOutExceptions.
>
> To mitigate this, I'd like to configure Hadoop job in a such way that it
> sequentially fetches input rows by smaller portions.
>
> I'm looking at the ConfigHelper.setRangeBatchSize() and
> CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if
> that's what I need and if yes, which one should I use for those purposes.
>
> Any help is appreciated.
>
> Hadoop version is 1.1.2, Cassandra version is 1.2.8.



-- 
Regards,
Jiaan

Re: Cassandra input paging for Hadoop

Reply via email to