Hello Jeff, Thank you for your comments, bu the problem is not about the RangeBatchSize.
In the case of the configuration parameter, mapred.tasktracker.map.tasks.maximum > 1 all the map task times out, they don't even run a single line of code in the Mapper.map() function. In the case of the configuration parameter, mapred.tasktracker.map.tasks.maximum = 1 Map tasks work one by one on the tasktracker, therefore they finish without any problem at all. I guess there's some kind of an concurrency problem integration cassandra with hadoop. I'm using Cassandra 0.6.1 and hadoop 0.20.2 Best Regards, Utku On Thu, Apr 29, 2010 at 5:03 PM, Joost Ouwerkerk <jo...@openplaces.org>wrote: > The default batch size is 4096, which means that each call to > get_range_slices retrieves 4,096 rows. I have found that this causes > timeouts when cassandra is under load. Try reducing the batchsize > with a call to ConfigHelper.setRangeBatchSize(). This has eliminated > the TimedOutExceptions for us. > joost. > > On Thu, Apr 29, 2010 at 10:25 AM, Utku Can Topçu <u...@topcu.gen.tr> > wrote: > > Hey All, > > > > I'm trying to run some tests on cassandra an Hadoop integration. I'm > > basically following the word count example at > > > https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/src/WordCount.java > > using the ColumnFamilyInputFormat. > > > > Currently I have one-node cassandra and hadoop setup on the same machine. > > > > I'm having problems if there are more than one map tasks running on the > same > > node, please find the copy of the error message below. > > > > If I limit the map tasks per tasktracker to 1, the MapReduce works fine > > without anyproblems at all. > > > > Do you thinki it's a know issue or am I doing something wrong in > > implementation. > > > > ---------------error---------------- > > 10/04/29 13:47:37 INFO mapred.JobClient: Task Id : > > attempt_201004291109_0024_m_000000_1, Status : FAILED > > java.lang.RuntimeException: TimedOutException() > > at > > > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165) > > at > > > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215) > > at > > > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97) > > at > > > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) > > at > > > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) > > at > > > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91) > > at > > > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) > > at > > org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > Caused by: TimedOutException() > > at > > > org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015) > > at > > > org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623) > > at > > > org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597) > > at > > > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142) > > ... 11 more > > --------------------------------------- > > > > > > Best Regards, > > Utku > > >