hmm, might be too much data. In the case of a supercolumn, how do I specify which sub-columns to retrieve? Or can I only retrieve entire supercolumns?
On Mon, Apr 19, 2010 at 8:47 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > Possibly you are asking it to retrieve too many columns per row. > > Possibly there is something else causing poor performance, like swapping. > > On Mon, Apr 19, 2010 at 7:12 PM, Joost Ouwerkerk <jo...@openplaces.org> > wrote: > > I'm slowly getting somewhere with Cassandra... I have successfully > imported > > 1.5 million rows using MapReduce. This took about 8 minutes on an 8-node > > cluster, which is comparable to the time it takes with HBase. > > Now I'm having trouble scanning this data. I've created a simple > MapReduce > > job that counts rows in my ColumnFamily. The Job fails with most tasks > > throwing the following Exception. Anyone have any ideas what's going > wrong? > > java.lang.RuntimeException: TimedOutException() > > > > at > > > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165) > > at > > > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215) > > at > > > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97) > > at > > > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) > > at > > > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) > > at > > > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91) > > at > > > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) > > at > org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > Caused by: TimedOutException() > > at > > > org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015) > > at > > > org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623) > > at > > > org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597) > > at > > > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142) > > ... 11 more > > > > On Sun, Apr 18, 2010 at 6:01 PM, Stu Hood <stu.h...@rackspace.com> > wrote: > >> > >> In 0.6.0 and trunk, it is located at > >> src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java > >> > >> You might be using a pre-release version of 0.6 if you are seeing a fat > >> client based InputFormat. > >> > >> > >> -----Original Message----- > >> From: "Joost Ouwerkerk" <jo...@openplaces.org> > >> Sent: Sunday, April 18, 2010 4:53pm > >> To: user@cassandra.apache.org > >> Subject: Re: Help with MapReduce > >> > >> Where is the ColumnFamilyInputFormat that uses Thrift? I don't actually > >> have a preference about client, I just want to be consistent with > >> ColumnInputFormat. > >> > >> On Sun, Apr 18, 2010 at 5:37 PM, Stu Hood <stu.h...@rackspace.com> > wrote: > >> > >> > ColumnFamilyInputFormat no longer uses the fat client API, and instead > >> > uses > >> > Thrift. There are still some significant problems with the fat client, > >> > so it > >> > shouldn't be used without a good understanding of those problems. > >> > > >> > If you still want to use it, check out contrib/bmt_example, but I'd > >> > recommend that you use thrift for now. > >> > > >> > -----Original Message----- > >> > From: "Joost Ouwerkerk" <jo...@openplaces.org> > >> > Sent: Sunday, April 18, 2010 2:59pm > >> > To: user@cassandra.apache.org > >> > Subject: Help with MapReduce > >> > > >> > I'm a Cassandra noob trying to validate Cassandra as a viable > >> > alternative > >> > to > >> > HBase (which we've been using for over a year) for our application. > So > >> > far, > >> > I've had no success getting Cassandra working with MapReduce. > >> > > >> > My first step is inserting data into Cassandra. I've created a MapRed > >> > job > >> > based using the fat client API. I'm using the fat client > (StorageProxy) > >> > because that's what ColumnFamilyInputFormat uses and I want to use the > >> > same > >> > API for both read and write jobs. > >> > > >> > When I call StorageProxy.mutate(), nothing happens. The job completes > >> > as > >> > if > >> > it had done something, but in fact nothing has changed in the cluster. > >> > When > >> > I call StorageProxy.mutateBlocking(), I get an IOException complaining > >> > that > >> > there is no connection to the cluster. I've concluded with the > debugger > >> > that StorageService is not connecting to the cluster, even though I've > >> > specified the correct seed and ListenAddress (I've using the exact > same > >> > storage-conf.xml as the nodes in the cluster). > >> > > >> > I'm sure I'm missing something obvious in the configuration or my > setup, > >> > but > >> > since I'm new to Cassandra, I can't see what it is. > >> > > >> > Any help appreciated, > >> > Joost > >> > > >> > > >> > > >> > >> > > > > >