Re: Help with MapReduce

Jonathan Ellis Mon, 19 Apr 2010 18:39:38 -0700

yes


On 4/19/10, Joost Ouwerkerk <jo...@openplaces.org> wrote:
> And when retrieving only one supercolumn?  Can I further specify which
> subcolumns to retrieve?
>
> On Mon, Apr 19, 2010 at 9:29 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>
>> the latter, if you are retrieving multiple supercolumns.
>>
>> On Mon, Apr 19, 2010 at 8:10 PM, Joost Ouwerkerk <jo...@openplaces.org>
>> wrote:
>> > hmm, might be too much data.  In the case of a supercolumn, how do I
>> specify
>> > which sub-columns to retrieve?  Or can I only retrieve entire
>> supercolumns?
>> > On Mon, Apr 19, 2010 at 8:47 PM, Jonathan Ellis <jbel...@gmail.com>
>> wrote:
>> >>
>> >> Possibly you are asking it to retrieve too many columns per row.
>> >>
>> >> Possibly there is something else causing poor performance, like
>> swapping.
>> >>
>> >> On Mon, Apr 19, 2010 at 7:12 PM, Joost Ouwerkerk <jo...@openplaces.org>
>> >> wrote:
>> >> > I'm slowly getting somewhere with Cassandra... I have successfully
>> >> > imported
>> >> > 1.5 million rows using MapReduce.  This took about 8 minutes on an
>> >> > 8-node
>> >> > cluster, which is comparable to the time it takes with HBase.
>> >> > Now I'm having trouble scanning this data.  I've created a simple
>> >> > MapReduce
>> >> > job that counts rows in my ColumnFamily.  The Job fails with most
>> tasks
>> >> > throwing the following Exception.  Anyone have any ideas what's going
>> >> > wrong?
>> >> > java.lang.RuntimeException: TimedOutException()
>> >> >
>> >> >       at
>> >> >
>> >> >
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165)
>> >> >       at
>> >> >
>> >> >
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215)
>> >> >       at
>> >> >
>> >> >
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97)
>> >> >       at
>> >> >
>> >> >
>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135)
>> >> >       at
>> >> >
>> >> >
>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130)
>> >> >       at
>> >> >
>> >> >
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91)
>> >> >       at
>> >> >
>> >> >
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
>> >> >       at
>> >> >
>> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>> >> >       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>> >> >       at
>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>> >> >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> >> >       at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> >> > Caused by: TimedOutException()
>> >> >       at
>> >> >
>> >> >
>> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015)
>> >> >       at
>> >> >
>> >> >
>> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623)
>> >> >       at
>> >> >
>> >> >
>> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597)
>> >> >       at
>> >> >
>> >> >
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142)
>> >> >       ... 11 more
>> >> >
>> >> > On Sun, Apr 18, 2010 at 6:01 PM, Stu Hood <stu.h...@rackspace.com>
>> >> > wrote:
>> >> >>
>> >> >> In 0.6.0 and trunk, it is located at
>> >> >> src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java
>> >> >>
>> >> >> You might be using a pre-release version of 0.6 if you are seeing a
>> fat
>> >> >> client based InputFormat.
>> >> >>
>> >> >>
>> >> >> -----Original Message-----
>> >> >> From: "Joost Ouwerkerk" <jo...@openplaces.org>
>> >> >> Sent: Sunday, April 18, 2010 4:53pm
>> >> >> To: user@cassandra.apache.org
>> >> >> Subject: Re: Help with MapReduce
>> >> >>
>> >> >> Where is the ColumnFamilyInputFormat that uses Thrift?  I don't
>> >> >> actually
>> >> >> have a preference about client, I just want to be consistent with
>> >> >> ColumnInputFormat.
>> >> >>
>> >> >> On Sun, Apr 18, 2010 at 5:37 PM, Stu Hood <stu.h...@rackspace.com>
>> >> >> wrote:
>> >> >>
>> >> >> > ColumnFamilyInputFormat no longer uses the fat client API, and
>> >> >> > instead
>> >> >> > uses
>> >> >> > Thrift. There are still some significant problems with the fat
>> >> >> > client,
>> >> >> > so it
>> >> >> > shouldn't be used without a good understanding of those problems.
>> >> >> >
>> >> >> > If you still want to use it, check out contrib/bmt_example, but
>> >> >> > I'd
>> >> >> > recommend that you use thrift for now.
>> >> >> >
>> >> >> > -----Original Message-----
>> >> >> > From: "Joost Ouwerkerk" <jo...@openplaces.org>
>> >> >> > Sent: Sunday, April 18, 2010 2:59pm
>> >> >> > To: user@cassandra.apache.org
>> >> >> > Subject: Help with MapReduce
>> >> >> >
>> >> >> > I'm a Cassandra noob trying to validate Cassandra as a viable
>> >> >> > alternative
>> >> >> > to
>> >> >> > HBase (which we've been using for over a year) for our
>> >> >> > application.
>> >> >> >  So
>> >> >> > far,
>> >> >> > I've had no success getting Cassandra working with MapReduce.
>> >> >> >
>> >> >> > My first step is inserting data into Cassandra.  I've created a
>> >> >> > MapRed
>> >> >> > job
>> >> >> > based using the fat client API.  I'm using the fat client
>> >> >> > (StorageProxy)
>> >> >> > because that's what ColumnFamilyInputFormat uses and I want to use
>> >> >> > the
>> >> >> > same
>> >> >> > API for both read and write jobs.
>> >> >> >
>> >> >> > When I call StorageProxy.mutate(), nothing happens.  The job
>> >> >> > completes
>> >> >> > as
>> >> >> > if
>> >> >> > it had done something, but in fact nothing has changed in the
>> >> >> > cluster.
>> >> >> >  When
>> >> >> > I call StorageProxy.mutateBlocking(), I get an IOException
>> >> >> > complaining
>> >> >> > that
>> >> >> > there is no connection to the cluster.  I've concluded with the
>> >> >> > debugger
>> >> >> > that StorageService is not connecting to the cluster, even though
>> >> >> > I've
>> >> >> > specified the correct seed and ListenAddress (I've using the exact
>> >> >> > same
>> >> >> > storage-conf.xml as the nodes in the cluster).
>> >> >> >
>> >> >> > I'm sure I'm missing something obvious in the configuration or my
>> >> >> > setup,
>> >> >> > but
>> >> >> > since I'm new to Cassandra, I can't see what it is.
>> >> >> >
>> >> >> > Any help appreciated,
>> >> >> > Joost
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >
>> >
>>
>

Re: Help with MapReduce

Reply via email to