Re: Help with MapReduce

Joost Ouwerkerk Mon, 19 Apr 2010 18:11:09 -0700

hmm, might be too much data.  In the case of a supercolumn, how do I specify
which sub-columns to retrieve?  Or can I only retrieve entire supercolumns?


On Mon, Apr 19, 2010 at 8:47 PM, Jonathan Ellis <jbel...@gmail.com> wrote:

> Possibly you are asking it to retrieve too many columns per row.
>
> Possibly there is something else causing poor performance, like swapping.
>
> On Mon, Apr 19, 2010 at 7:12 PM, Joost Ouwerkerk <jo...@openplaces.org>
> wrote:
> > I'm slowly getting somewhere with Cassandra... I have successfully
> imported
> > 1.5 million rows using MapReduce.  This took about 8 minutes on an 8-node
> > cluster, which is comparable to the time it takes with HBase.
> > Now I'm having trouble scanning this data.  I've created a simple
> MapReduce
> > job that counts rows in my ColumnFamily.  The Job fails with most tasks
> > throwing the following Exception.  Anyone have any ideas what's going
> wrong?
> > java.lang.RuntimeException: TimedOutException()
> >
> >       at
> >
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165)
> >       at
> >
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215)
> >       at
> >
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97)
> >       at
> >
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135)
> >       at
> >
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130)
> >       at
> >
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91)
> >       at
> >
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
> >       at
> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> >       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> >       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> >       at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > Caused by: TimedOutException()
> >       at
> >
> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015)
> >       at
> >
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623)
> >       at
> >
> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597)
> >       at
> >
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142)
> >       ... 11 more
> >
> > On Sun, Apr 18, 2010 at 6:01 PM, Stu Hood <stu.h...@rackspace.com>
> wrote:
> >>
> >> In 0.6.0 and trunk, it is located at
> >> src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java
> >>
> >> You might be using a pre-release version of 0.6 if you are seeing a fat
> >> client based InputFormat.
> >>
> >>
> >> -----Original Message-----
> >> From: "Joost Ouwerkerk" <jo...@openplaces.org>
> >> Sent: Sunday, April 18, 2010 4:53pm
> >> To: user@cassandra.apache.org
> >> Subject: Re: Help with MapReduce
> >>
> >> Where is the ColumnFamilyInputFormat that uses Thrift?  I don't actually
> >> have a preference about client, I just want to be consistent with
> >> ColumnInputFormat.
> >>
> >> On Sun, Apr 18, 2010 at 5:37 PM, Stu Hood <stu.h...@rackspace.com>
> wrote:
> >>
> >> > ColumnFamilyInputFormat no longer uses the fat client API, and instead
> >> > uses
> >> > Thrift. There are still some significant problems with the fat client,
> >> > so it
> >> > shouldn't be used without a good understanding of those problems.
> >> >
> >> > If you still want to use it, check out contrib/bmt_example, but I'd
> >> > recommend that you use thrift for now.
> >> >
> >> > -----Original Message-----
> >> > From: "Joost Ouwerkerk" <jo...@openplaces.org>
> >> > Sent: Sunday, April 18, 2010 2:59pm
> >> > To: user@cassandra.apache.org
> >> > Subject: Help with MapReduce
> >> >
> >> > I'm a Cassandra noob trying to validate Cassandra as a viable
> >> > alternative
> >> > to
> >> > HBase (which we've been using for over a year) for our application.
>  So
> >> > far,
> >> > I've had no success getting Cassandra working with MapReduce.
> >> >
> >> > My first step is inserting data into Cassandra.  I've created a MapRed
> >> > job
> >> > based using the fat client API.  I'm using the fat client
> (StorageProxy)
> >> > because that's what ColumnFamilyInputFormat uses and I want to use the
> >> > same
> >> > API for both read and write jobs.
> >> >
> >> > When I call StorageProxy.mutate(), nothing happens.  The job completes
> >> > as
> >> > if
> >> > it had done something, but in fact nothing has changed in the cluster.
> >> >  When
> >> > I call StorageProxy.mutateBlocking(), I get an IOException complaining
> >> > that
> >> > there is no connection to the cluster.  I've concluded with the
> debugger
> >> > that StorageService is not connecting to the cluster, even though I've
> >> > specified the correct seed and ListenAddress (I've using the exact
> same
> >> > storage-conf.xml as the nodes in the cluster).
> >> >
> >> > I'm sure I'm missing something obvious in the configuration or my
> setup,
> >> > but
> >> > since I'm new to Cassandra, I can't see what it is.
> >> >
> >> > Any help appreciated,
> >> > Joost
> >> >
> >> >
> >> >
> >>
> >>
> >
> >
>

Re: Help with MapReduce

Reply via email to