Hello, So it means that job will process only first "cassandra.input.page.row.size" rows, and ignore the rest? Or CqlPagingRecordReader supports paging through the entire result set?
Aaron Morton <aa...@thelastpickle.com>: >>> >>>I'm looking at the ConfigHelper.setRangeBatchSize() and >>>CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if >>>that's what I need and if yes, which one should I use for those purposes. If >>>you are using CQL 3 via Hadoop CqlConfigHelper.setInputCQLPageRowSize is the >>>one you want. > >it maps to the LIMIT clause of the select statement the input reader will >generate, the default is 1,000. > >A > >----------------- >Aaron Morton >New Zealand >@aaronmorton > >Co-Founder & Principal Consultant >Apache Cassandra Consulting >http://www.thelastpickle.com > >On 12/09/2013, at 9:04 AM, Jiaan Zeng < l.alle...@gmail.com > wrote: >>Speaking of thrift client, i.e. ColumnFamilyInputFormat, yes, >>ConfigHelper.setRangeBatchSize() can reduce the number of rows sent to >>Cassandra. >> >>Depend on how big your column is, you may also want to increase thrift >>message length through setThriftMaxMessageLengthInMb(). >> >>Hope that helps. >> >>On Tue, Sep 10, 2013 at 8:18 PM, Renat Gilfanov < gren...@mail.ru > wrote: >>>Hi, >>> >>>We have Hadoop jobs that read data from our Cassandra column families and >>>write some data back to another column families. >>>The input column families are pretty simple CQL3 tables without wide rows. >>>In Hadoop jobs we set up corresponding WHERE clause in >>>ConfigHelper.setInputWhereClauses(...), so we don't process the whole table >>>at once. >>>Never the less, sometimes the amount of data returned by input query is big >>>enough to cause TimedOutExceptions. >>> >>>To mitigate this, I'd like to configure Hadoop job in a such way that it >>>sequentially fetches input rows by smaller portions. >>> >>>I'm looking at the ConfigHelper.setRangeBatchSize() and >>>CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if >>>that's what I need and if yes, which one should I use for those purposes. >>> >>>Any help is appreciated. >>> >>>Hadoop version is 1.1.2, Cassandra version is 1.2.8. >> >> >> >>-- >>Regards, >>Jiaan >