Cassandra input paging for Hadoop

Renat Gilfanov Tue, 10 Sep 2013 17:19:12 -0700

 Hi,

We have Hadoop jobs that read data from our Cassandra column families and write 
some data back to another column families.
The input column families are pretty simple CQL3 tables without wide rows.
In Hadoop jobs we set up corresponding WHERE clause in 
ConfigHelper.setInputWhereClauses(...), so we don't process the whole table at 
once. 
Never  the less, sometimes the amount of data returned by input query is big  
enough to cause TimedOutExceptions.


To mitigate this, I'd like to configure Hadoop job in a such way that it 
sequentially fetches input rows by smaller portions.

I'm looking at the ConfigHelper.setRangeBatchSize() and 
CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if that's 
what I need and if yes, which one should I use for those purposes.

Any help is appreciated.

Hadoop version is 1.1.2, Cassandra version is 1.2.8.

Cassandra input paging for Hadoop

Reply via email to