>> >> I'm looking at the ConfigHelper.setRangeBatchSize() and >> CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if >> that's what I need and if yes, which one should I use for those purposes. If you are using CQL 3 via Hadoop CqlConfigHelper.setInputCQLPageRowSize is the one you want.
it maps to the LIMIT clause of the select statement the input reader will generate, the default is 1,000. A ----------------- Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 12/09/2013, at 9:04 AM, Jiaan Zeng <l.alle...@gmail.com> wrote: > Speaking of thrift client, i.e. ColumnFamilyInputFormat, yes, > ConfigHelper.setRangeBatchSize() can reduce the number of rows sent to > Cassandra. > > Depend on how big your column is, you may also want to increase thrift > message length through setThriftMaxMessageLengthInMb(). > > Hope that helps. > > On Tue, Sep 10, 2013 at 8:18 PM, Renat Gilfanov <gren...@mail.ru> wrote: >> Hi, >> >> We have Hadoop jobs that read data from our Cassandra column families and >> write some data back to another column families. >> The input column families are pretty simple CQL3 tables without wide rows. >> In Hadoop jobs we set up corresponding WHERE clause in >> ConfigHelper.setInputWhereClauses(...), so we don't process the whole table >> at once. >> Never the less, sometimes the amount of data returned by input query is big >> enough to cause TimedOutExceptions. >> >> To mitigate this, I'd like to configure Hadoop job in a such way that it >> sequentially fetches input rows by smaller portions. >> >> I'm looking at the ConfigHelper.setRangeBatchSize() and >> CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if >> that's what I need and if yes, which one should I use for those purposes. >> >> Any help is appreciated. >> >> Hadoop version is 1.1.2, Cassandra version is 1.2.8. > > > > -- > Regards, > Jiaan