The cassandra logs strangely show no errors at the time of failure. Changing the RPCTimeoutInMillis seemed to help. Though it slowed down the job considerably, it seems to be finishing by changing the timeout value to 1 min. Unfortunately, I cannot be sure if it will continue to work if the data increases further. Hopefully will be upgrading to the recently released final version of 0.7.0.
Thanks for all the help and suggestions. Warm regards, Jairam Chandar On 13/01/2011 14:47, "Jeremy Hanna" <jeremy.hanna1...@gmail.com> wrote: >On Jan 12, 2011, at 12:40 PM, Jairam Chandar wrote: > >> Hi folks, >> >> We have a Cassandra 0.6.6 cluster running in production. We want to run >>Hadoop (version 0.20.2) jobs over this cluster in order to generate >>reports. >> I modified the word_count example in the contrib folder of the >>cassandra distribution. While the program is running fine for small >>datasets (in the order of 100-200 MB) on small clusters (2 machines), it >>starts to give errors while trying to run on a bigger cluster (5 >>machines) with much larger dataset (400 GB). Here is the error that we >>get - >> >> java.lang.RuntimeException: TimedOutException() >> at >>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeIni >>t(ColumnFamilyRecordReader.java:186) >> at >>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeN >>ext(ColumnFamilyRecordReader.java:236) >> at >>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeN >>ext(ColumnFamilyRecordReader.java:104) >> at >>com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractItera >>tor.java:135) >> at >>com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java: >>130) >> at >>org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnF >>amilyRecordReader.java:98) >> at >>org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(Map >>Task.java:423) >> at >>org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> Caused by: TimedOutException() >> at >>org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassan >>dra.java:11094) >> at >>org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassan >>dra.java:628) >> at >>org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.j >>ava:602) >> at >>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeIni >>t(ColumnFamilyRecordReader.java:164) >> ... 11 more >> > >I wonder if messing with RpcTimeoutInMillis in storage-conf.xml would >help. > >> >> >> >> I came across this page on the Cassandra wiki - >>http://wiki.apache.org/cassandra/HadoopSupport and tried modifying the >>ulimit and changing batch sizes. These did not help. Though the number >>of successful map tasks increased, it eventually fails since the total >>number of map tasks is huge. >> >> Any idea on what could be causing this? The program we are running is a >>very slight modification of the word_count example with respect to >>reading from Cassandra. The only change being specific keyspace, >>columnfamily and columns. The rest of the code for reading is the same >>as the word_count example in the source code for Cassandra 0.6.6. >> >> Thanks and regards, >> Jairam Chandar >