I'm slowly getting somewhere with Cassandra... I have successfully imported 1.5 million rows using MapReduce. This took about 8 minutes on an 8-node cluster, which is comparable to the time it takes with HBase.
Now I'm having trouble scanning this data. I've created a simple MapReduce job that counts rows in my ColumnFamily. The Job fails with most tasks throwing the following Exception. Anyone have any ideas what's going wrong? java.lang.RuntimeException: TimedOutException() at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142) ... 11 more On Sun, Apr 18, 2010 at 6:01 PM, Stu Hood <stu.h...@rackspace.com> wrote: In 0.6.0 and trunk, it is located at > src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java > > You might be using a pre-release version of 0.6 if you are seeing a fat > client based InputFormat. > > > -----Original Message----- > From: "Joost Ouwerkerk" <jo...@openplaces.org> > Sent: Sunday, April 18, 2010 4:53pm > To: user@cassandra.apache.org > Subject: Re: Help with MapReduce > > Where is the ColumnFamilyInputFormat that uses Thrift? I don't actually > have a preference about client, I just want to be consistent with > ColumnInputFormat. > > On Sun, Apr 18, 2010 at 5:37 PM, Stu Hood <stu.h...@rackspace.com> wrote: > > > ColumnFamilyInputFormat no longer uses the fat client API, and instead > uses > > Thrift. There are still some significant problems with the fat client, so > it > > shouldn't be used without a good understanding of those problems. > > > > If you still want to use it, check out contrib/bmt_example, but I'd > > recommend that you use thrift for now. > > > > -----Original Message----- > > From: "Joost Ouwerkerk" <jo...@openplaces.org> > > Sent: Sunday, April 18, 2010 2:59pm > > To: user@cassandra.apache.org > > Subject: Help with MapReduce > > > > I'm a Cassandra noob trying to validate Cassandra as a viable alternative > > to > > HBase (which we've been using for over a year) for our application. So > > far, > > I've had no success getting Cassandra working with MapReduce. > > > > My first step is inserting data into Cassandra. I've created a MapRed > job > > based using the fat client API. I'm using the fat client (StorageProxy) > > because that's what ColumnFamilyInputFormat uses and I want to use the > same > > API for both read and write jobs. > > > > When I call StorageProxy.mutate(), nothing happens. The job completes as > > if > > it had done something, but in fact nothing has changed in the cluster. > > When > > I call StorageProxy.mutateBlocking(), I get an IOException complaining > that > > there is no connection to the cluster. I've concluded with the debugger > > that StorageService is not connecting to the cluster, even though I've > > specified the correct seed and ListenAddress (I've using the exact same > > storage-conf.xml as the nodes in the cluster). > > > > I'm sure I'm missing something obvious in the configuration or my setup, > > but > > since I'm new to Cassandra, I can't see what it is. > > > > Any help appreciated, > > Joost > > > > > > > > >