> Or CqlPagingRecordReader supports paging through the entire result set? Supports paging through the entire result set.
Cheers ----------------- Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 12/09/2013, at 5:58 PM, Renat Gilfanov <gren...@mail.ru> wrote: > Hello, > > So it means that job will process only first "cassandra.input.page.row.size" > rows, and ignore the rest? Or CqlPagingRecordReader supports paging through > the entire result set? > > > Aaron Morton <aa...@thelastpickle.com>: >>> >>> I'm looking at the ConfigHelper.setRangeBatchSize() and >>> CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if >>> that's what I need and if yes, which one should I use for those purposes. > If you are using CQL 3 via Hadoop CqlConfigHelper.setInputCQLPageRowSize is > the one you want. > > it maps to the LIMIT clause of the select statement the input reader will > generate, the default is 1,000. > > A > > ----------------- > Aaron Morton > New Zealand > @aaronmorton > > Co-Founder & Principal Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > > On 12/09/2013, at 9:04 AM, Jiaan Zeng <l.alle...@gmail.com> wrote: > >> Speaking of thrift client, i.e. ColumnFamilyInputFormat, yes, >> ConfigHelper.setRangeBatchSize() can reduce the number of rows sent to >> Cassandra. >> >> Depend on how big your column is, you may also want to increase thrift >> message length through setThriftMaxMessageLengthInMb(). >> >> Hope that helps. >> >> On Tue, Sep 10, 2013 at 8:18 PM, Renat Gilfanov <gren...@mail.ru> wrote: >>> Hi, >>> >>> We have Hadoop jobs that read data from our Cassandra column families and >>> write some data back to another column families. >>> The input column families are pretty simple CQL3 tables without wide rows. >>> In Hadoop jobs we set up corresponding WHERE clause in >>> ConfigHelper.setInputWhereClauses(...), so we don't process the whole table >>> at once. >>> Never the less, sometimes the amount of data returned by input query is big >>> enough to cause TimedOutExceptions. >>> >>> To mitigate this, I'd like to configure Hadoop job in a such way that it >>> sequentially fetches input rows by smaller portions. >>> >>> I'm looking at the ConfigHelper.setRangeBatchSize() and >>> CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if >>> that's what I need and if yes, which one should I use for those purposes. >>> >>> Any help is appreciated. >>> >>> Hadoop version is 1.1.2, Cassandra version is 1.2.8. >> >> >> >> -- >> Regards, >> Jiaan >