Re: TimedOutException when using the ColumnFamilyInputFormat

Utku Can Topçu Thu, 29 Apr 2010 08:55:30 -0700

Hello Jeff,

Thank you for your comments, bu the problem is not about the RangeBatchSize.


In the case of the configuration parameter,
mapred.tasktracker.map.tasks.maximum > 1
all the map task times out, they don't even run a single line of code in the
Mapper.map() function.

In the case of the configuration parameter,
mapred.tasktracker.map.tasks.maximum = 1
Map tasks work one by one on the tasktracker, therefore they finish without
any problem at all.

I guess there's some kind of an concurrency problem integration cassandra
with hadoop.

I'm using Cassandra 0.6.1 and hadoop 0.20.2

Best Regards,
Utku


On Thu, Apr 29, 2010 at 5:03 PM, Joost Ouwerkerk <[email protected]>wrote:

> The default batch size is 4096, which means that each call to
> get_range_slices retrieves 4,096 rows.  I have found that this causes
> timeouts when cassandra is under load.  Try reducing the batchsize
> with a call to ConfigHelper.setRangeBatchSize().  This has eliminated
> the TimedOutExceptions for us.
> joost.
>
> On Thu, Apr 29, 2010 at 10:25 AM, Utku Can Topçu <[email protected]>
> wrote:
> > Hey All,
> >
> > I'm trying to run some tests on cassandra an Hadoop integration. I'm
> > basically following the word count example at
> >
> https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/src/WordCount.java
> > using the ColumnFamilyInputFormat.
> >
> > Currently I have one-node cassandra and hadoop setup on the same machine.
> >
> > I'm having problems if there are more than one map tasks running on the
> same
> > node, please find the copy of the error message below.
> >
> > If I limit the map tasks per tasktracker to 1, the MapReduce works fine
> > without anyproblems at all.
> >
> > Do you thinki it's a know issue or am I doing something wrong in
> > implementation.
> >
> > ---------------error----------------
> > 10/04/29 13:47:37 INFO mapred.JobClient: Task Id :
> > attempt_201004291109_0024_m_000000_1, Status : FAILED
> > java.lang.RuntimeException: TimedOutException()
> >     at
> >
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165)
> >     at
> >
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215)
> >     at
> >
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97)
> >     at
> >
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135)
> >     at
> >
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130)
> >     at
> >
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91)
> >     at
> >
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
> >     at
> > org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> >     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> >     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> >     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> >     at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > Caused by: TimedOutException()
> >     at
> >
> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015)
> >     at
> >
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623)
> >     at
> >
> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597)
> >     at
> >
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142)
> >     ... 11 more
> > ---------------------------------------
> >
> >
> > Best Regards,
> > Utku
> >
>

Re: TimedOutException when using the ColumnFamilyInputFormat

Reply via email to