The whole point is to parallelize to use the available capacity across
multiple machines.  If you go past that point (fairly easy when you
have a single machine) then you're just contending for resources, not
making things faster.

On Fri, May 7, 2010 at 7:48 AM, Joost Ouwerkerk <> wrote:
> Huh? Isn't that the whole point of using Map/Reduce?
> On Fri, May 7, 2010 at 8:44 AM, Jonathan Ellis <> wrote:
>> Sounds like you need to configure Hadoop to not create a whole bunch
>> of Map tasks at once
>> On Fri, May 7, 2010 at 3:47 AM, gabriele renzi <> wrote:
>>> Hi everyone,
>>> I am trying to develop a mapreduce job that does a simple
>>> selection+filter on the rows in our store.
>>> Of course it is mostly based on the WordCount example :)
>>> Sadly, while it seems the app runs fine on a test keyspace with little
>>> data, when run on a larger test index (but still on a single node) I
>>> reliably see this error in the logs
>>> 10/05/06 16:37:58 WARN mapred.LocalJobRunner: job_local_0001
>>> java.lang.RuntimeException: TimedOutException()
>>>        at 
>>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(
>>>        at 
>>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(
>>>        at 
>>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(
>>>        at 
>>>        at 
>>>        at 
>>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(
>>>        at 
>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(
>>>        at 
>>> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(
>>>        at
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(
>>>        at
>>>        at 
>>> org.apache.hadoop.mapred.LocalJobRunner$
>>> Caused by: TimedOutException()
>>>        at 
>>> org.apache.cassandra.thrift.Cassandra$
>>>        at 
>>> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(
>>>        at 
>>> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(
>>>        at 
>>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(
>>>        ... 11 more
>>> and after that the job seems to finish "normally" but no results are 
>>> produced.
>>> FWIW this is on 0.6.0 (we didn't move to 0.6.1 yet because, well, if
>>> it ain't broke don't fix it).
>>> The single node has a data directory of about 127GB in two column
>>> families, off which the one used in the mapred job is about 100GB.
>>> The cassandra server is run with 6GB of heap on a box with 8GB
>>> available and no swap enabled. read/write latency from cfstat are
>>>        Read Latency: 0.8535837762577986 ms.
>>>        Write Latency: 0.028849603764075547 ms.
>>> row cache is not enabled, key cache percentage is default. Load on the
>>> machine is basically zero when the job is not running.
>>> As my code is 99% that from the wordcount contrib, I shall notice that
>>> In 0.6.1's contrib (and trunk) there is a RING_DELAY constant that we
>>> can supposedly change, but it's apparently not used anywhere, but as I
>>> said, running on a single node this should not be an issue anyway.
>>> Does anyone has suggestions or has seen this error before? On the
>>> other hand, did people run this kind of jobs in similar conditions
>>> flawlessly, so I can consider it just my problem?
>>> Thanks in advance for any help.
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support

Reply via email to