most likely means that the count() operation is taking too long for the configured RPCTimeout
counts get unreliable after a certain number of columns under a key in my experience jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Mon, Apr 19, 2010 at 19:12, Joost Ouwerkerk <jo...@openplaces.org> wrote: > I'm slowly getting somewhere with Cassandra... I have successfully imported > 1.5 million rows using MapReduce. This took about 8 minutes on an 8-node > cluster, which is comparable to the time it takes with HBase. > Now I'm having trouble scanning this data. I've created a simple MapReduce > job that counts rows in my ColumnFamily. The Job fails with most tasks > throwing the following Exception. Anyone have any ideas what's going wrong? > java.lang.RuntimeException: TimedOutException() > > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) > at > org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Caused by: TimedOutException() > at > org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015) > at > org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623) > at > org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142) > ... 11 more > > On Sun, Apr 18, 2010 at 6:01 PM, Stu Hood <stu.h...@rackspace.com> wrote: >> >> In 0.6.0 and trunk, it is located at >> src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java >> >> You might be using a pre-release version of 0.6 if you are seeing a fat >> client based InputFormat. >> >> >> -----Original Message----- >> From: "Joost Ouwerkerk" <jo...@openplaces.org> >> Sent: Sunday, April 18, 2010 4:53pm >> To: user@cassandra.apache.org >> Subject: Re: Help with MapReduce >> >> Where is the ColumnFamilyInputFormat that uses Thrift? I don't actually >> have a preference about client, I just want to be consistent with >> ColumnInputFormat. >> >> On Sun, Apr 18, 2010 at 5:37 PM, Stu Hood <stu.h...@rackspace.com> wrote: >> >> > ColumnFamilyInputFormat no longer uses the fat client API, and instead >> > uses >> > Thrift. There are still some significant problems with the fat client, >> > so it >> > shouldn't be used without a good understanding of those problems. >> > >> > If you still want to use it, check out contrib/bmt_example, but I'd >> > recommend that you use thrift for now. >> > >> > -----Original Message----- >> > From: "Joost Ouwerkerk" <jo...@openplaces.org> >> > Sent: Sunday, April 18, 2010 2:59pm >> > To: user@cassandra.apache.org >> > Subject: Help with MapReduce >> > >> > I'm a Cassandra noob trying to validate Cassandra as a viable >> > alternative >> > to >> > HBase (which we've been using for over a year) for our application. So >> > far, >> > I've had no success getting Cassandra working with MapReduce. >> > >> > My first step is inserting data into Cassandra. I've created a MapRed >> > job >> > based using the fat client API. I'm using the fat client (StorageProxy) >> > because that's what ColumnFamilyInputFormat uses and I want to use the >> > same >> > API for both read and write jobs. >> > >> > When I call StorageProxy.mutate(), nothing happens. The job completes >> > as >> > if >> > it had done something, but in fact nothing has changed in the cluster. >> > When >> > I call StorageProxy.mutateBlocking(), I get an IOException complaining >> > that >> > there is no connection to the cluster. I've concluded with the debugger >> > that StorageService is not connecting to the cluster, even though I've >> > specified the correct seed and ListenAddress (I've using the exact same >> > storage-conf.xml as the nodes in the cluster). >> > >> > I'm sure I'm missing something obvious in the configuration or my setup, >> > but >> > since I'm new to Cassandra, I can't see what it is. >> > >> > Any help appreciated, >> > Joost >> > >> > >> > >> >> > >