Re: Help with MapReduce

Jesse McConnell Mon, 19 Apr 2010 17:42:03 -0700

err not count in your case, but same symptom, cassandra can't return
the answer to your query in the configured rpctimeout time


cheers,
jesse

--
jesse mcconnell
jesse.mcconn...@gmail.com



On Mon, Apr 19, 2010 at 19:40, Jesse McConnell
<jesse.mcconn...@gmail.com> wrote:
> most likely means that the count() operation is taking too long for
> the configured RPCTimeout
>
> counts get unreliable after a certain number of columns under a key in
> my experience
>
> jesse
>
> --
> jesse mcconnell
> jesse.mcconn...@gmail.com
>
>
>
> On Mon, Apr 19, 2010 at 19:12, Joost Ouwerkerk <jo...@openplaces.org> wrote:
>> I'm slowly getting somewhere with Cassandra... I have successfully imported
>> 1.5 million rows using MapReduce.  This took about 8 minutes on an 8-node
>> cluster, which is comparable to the time it takes with HBase.
>> Now I'm having trouble scanning this data.  I've created a simple MapReduce
>> job that counts rows in my ColumnFamily.  The Job fails with most tasks
>> throwing the following Exception.  Anyone have any ideas what's going wrong?
>> java.lang.RuntimeException: TimedOutException()
>>
>>       at
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165)
>>       at
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215)
>>       at
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97)
>>       at
>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135)
>>       at
>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130)
>>       at
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91)
>>       at
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
>>       at 
>> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>       at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> Caused by: TimedOutException()
>>       at
>> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015)
>>       at
>> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623)
>>       at
>> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597)
>>       at
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142)
>>       ... 11 more
>>
>> On Sun, Apr 18, 2010 at 6:01 PM, Stu Hood <stu.h...@rackspace.com> wrote:
>>>
>>> In 0.6.0 and trunk, it is located at
>>> src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java
>>>
>>> You might be using a pre-release version of 0.6 if you are seeing a fat
>>> client based InputFormat.
>>>
>>>
>>> -----Original Message-----
>>> From: "Joost Ouwerkerk" <jo...@openplaces.org>
>>> Sent: Sunday, April 18, 2010 4:53pm
>>> To: user@cassandra.apache.org
>>> Subject: Re: Help with MapReduce
>>>
>>> Where is the ColumnFamilyInputFormat that uses Thrift?  I don't actually
>>> have a preference about client, I just want to be consistent with
>>> ColumnInputFormat.
>>>
>>> On Sun, Apr 18, 2010 at 5:37 PM, Stu Hood <stu.h...@rackspace.com> wrote:
>>>
>>> > ColumnFamilyInputFormat no longer uses the fat client API, and instead
>>> > uses
>>> > Thrift. There are still some significant problems with the fat client,
>>> > so it
>>> > shouldn't be used without a good understanding of those problems.
>>> >
>>> > If you still want to use it, check out contrib/bmt_example, but I'd
>>> > recommend that you use thrift for now.
>>> >
>>> > -----Original Message-----
>>> > From: "Joost Ouwerkerk" <jo...@openplaces.org>
>>> > Sent: Sunday, April 18, 2010 2:59pm
>>> > To: user@cassandra.apache.org
>>> > Subject: Help with MapReduce
>>> >
>>> > I'm a Cassandra noob trying to validate Cassandra as a viable
>>> > alternative
>>> > to
>>> > HBase (which we've been using for over a year) for our application.  So
>>> > far,
>>> > I've had no success getting Cassandra working with MapReduce.
>>> >
>>> > My first step is inserting data into Cassandra.  I've created a MapRed
>>> > job
>>> > based using the fat client API.  I'm using the fat client (StorageProxy)
>>> > because that's what ColumnFamilyInputFormat uses and I want to use the
>>> > same
>>> > API for both read and write jobs.
>>> >
>>> > When I call StorageProxy.mutate(), nothing happens.  The job completes
>>> > as
>>> > if
>>> > it had done something, but in fact nothing has changed in the cluster.
>>> >  When
>>> > I call StorageProxy.mutateBlocking(), I get an IOException complaining
>>> > that
>>> > there is no connection to the cluster.  I've concluded with the debugger
>>> > that StorageService is not connecting to the cluster, even though I've
>>> > specified the correct seed and ListenAddress (I've using the exact same
>>> > storage-conf.xml as the nodes in the cluster).
>>> >
>>> > I'm sure I'm missing something obvious in the configuration or my setup,
>>> > but
>>> > since I'm new to Cassandra, I can't see what it is.
>>> >
>>> > Any help appreciated,
>>> > Joost
>>> >
>>> >
>>> >
>>>
>>>
>>
>>
>

Re: Help with MapReduce

Reply via email to