Re: Cassandra hadoop Thrift Time out

aaron morton Fri, 24 Sep 2010 18:09:57 -0700

There is some information on the wiki 
http://wiki.apache.org/cassandra/HadoopSupport about a resource leak before 
0.6.2 versions that can result in a TimeoutException. But you're on 0.6.5 so 
should be ok.


I had a quick look at the Hadoop code and could not see where to change the 
timeout (that would be the obvious thing to try). If you have a look in the 
ConfigHelper.java though it says 

 /**
     * The number of rows to request with each get range slices request.
     * Too big and you can either get timeouts when it takes Cassandra too
     * long to fetch all the data. Too small and the performance
     * will be eaten up by the overhead of each request.
     *
     * @param conf      Job configuration you are about to run
     * @param batchsize Number of rows to request each time
     */
    public static void setRangeBatchSize(Configuration conf, int batchsize)
    {
        conf.setInt(RANGE_BATCH_SIZE_CONFIG, batchsize);
    }

The config item name is ""cassandra.range.batch.size".

Try reducing the batch size first and see if the timeouts go away. Though it 
does not sound like you have a lot of data.  

An 0.7 beta2 may be out this week. But it's still beta. 

Hope that helps. 
Aaron


  
On 25 Sep 2010, at 07:17, Saket Joshi wrote:

> Hi Experts,
>  
> I need help on an exception integrating cassandra-hadoop. I am getting the 
> following exception, when running a Hadoop Map reduce job 
> http://pastebin.com/RktaqDnj
> I am using cassandra 0.6.5 , 3 node cluster. I don’t get any exception when 
> the data I am processing is very small  < 5 rows and 100 columns,  but get 
> the error with modest data is > 5 rows 500 columns. I went though some of the 
> forums where people have experienced the same issue.
> http://www.listware.net/201005/cassandra-user/21897-timeout-while-running-simple-hadoop-job.html
>  . Is this a bug with Cassandra-hadoop classes and is that fixed in 0.7 for 
> sure? how stable is 0.7 beta ? In the system.log I see a lot of ”  index has 
> reached its threshold; switching in a fresh Memtable” messages
>  
> Has Anyone faced a similar issue and solved it? Is migrating to 0.7  the only 
> solution?
>  
> Thanks,
> Saket
>  
> Stack Trace of the Exception:
> {ava.lang.RuntimeException: TimedOutException()
>         at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:186)
>         at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:236)
>         at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:104)
>         at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135)
>         at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130)
>         at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:98)
>         at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
>         at 
> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: TimedOutException()
>         at 
> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11094)
>         at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:628)
>         at 
> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:602)
>         at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:164)}

Re: Cassandra hadoop Thrift Time out

Reply via email to