the latter, if you are retrieving multiple supercolumns.
On Mon, Apr 19, 2010 at 8:10 PM, Joost Ouwerkerk <jo...@openplaces.org> wrote: > hmm, might be too much data. In the case of a supercolumn, how do I specify > which sub-columns to retrieve? Or can I only retrieve entire supercolumns? > On Mon, Apr 19, 2010 at 8:47 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >> >> Possibly you are asking it to retrieve too many columns per row. >> >> Possibly there is something else causing poor performance, like swapping. >> >> On Mon, Apr 19, 2010 at 7:12 PM, Joost Ouwerkerk <jo...@openplaces.org> >> wrote: >> > I'm slowly getting somewhere with Cassandra... I have successfully >> > imported >> > 1.5 million rows using MapReduce. This took about 8 minutes on an >> > 8-node >> > cluster, which is comparable to the time it takes with HBase. >> > Now I'm having trouble scanning this data. I've created a simple >> > MapReduce >> > job that counts rows in my ColumnFamily. The Job fails with most tasks >> > throwing the following Exception. Anyone have any ideas what's going >> > wrong? >> > java.lang.RuntimeException: TimedOutException() >> > >> > at >> > >> > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165) >> > at >> > >> > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215) >> > at >> > >> > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97) >> > at >> > >> > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) >> > at >> > >> > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) >> > at >> > >> > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91) >> > at >> > >> > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) >> > at >> > org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) >> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) >> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> > at org.apache.hadoop.mapred.Child.main(Child.java:170) >> > Caused by: TimedOutException() >> > at >> > >> > org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015) >> > at >> > >> > org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623) >> > at >> > >> > org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597) >> > at >> > >> > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142) >> > ... 11 more >> > >> > On Sun, Apr 18, 2010 at 6:01 PM, Stu Hood <stu.h...@rackspace.com> >> > wrote: >> >> >> >> In 0.6.0 and trunk, it is located at >> >> src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java >> >> >> >> You might be using a pre-release version of 0.6 if you are seeing a fat >> >> client based InputFormat. >> >> >> >> >> >> -----Original Message----- >> >> From: "Joost Ouwerkerk" <jo...@openplaces.org> >> >> Sent: Sunday, April 18, 2010 4:53pm >> >> To: user@cassandra.apache.org >> >> Subject: Re: Help with MapReduce >> >> >> >> Where is the ColumnFamilyInputFormat that uses Thrift? I don't >> >> actually >> >> have a preference about client, I just want to be consistent with >> >> ColumnInputFormat. >> >> >> >> On Sun, Apr 18, 2010 at 5:37 PM, Stu Hood <stu.h...@rackspace.com> >> >> wrote: >> >> >> >> > ColumnFamilyInputFormat no longer uses the fat client API, and >> >> > instead >> >> > uses >> >> > Thrift. There are still some significant problems with the fat >> >> > client, >> >> > so it >> >> > shouldn't be used without a good understanding of those problems. >> >> > >> >> > If you still want to use it, check out contrib/bmt_example, but I'd >> >> > recommend that you use thrift for now. >> >> > >> >> > -----Original Message----- >> >> > From: "Joost Ouwerkerk" <jo...@openplaces.org> >> >> > Sent: Sunday, April 18, 2010 2:59pm >> >> > To: user@cassandra.apache.org >> >> > Subject: Help with MapReduce >> >> > >> >> > I'm a Cassandra noob trying to validate Cassandra as a viable >> >> > alternative >> >> > to >> >> > HBase (which we've been using for over a year) for our application. >> >> > So >> >> > far, >> >> > I've had no success getting Cassandra working with MapReduce. >> >> > >> >> > My first step is inserting data into Cassandra. I've created a >> >> > MapRed >> >> > job >> >> > based using the fat client API. I'm using the fat client >> >> > (StorageProxy) >> >> > because that's what ColumnFamilyInputFormat uses and I want to use >> >> > the >> >> > same >> >> > API for both read and write jobs. >> >> > >> >> > When I call StorageProxy.mutate(), nothing happens. The job >> >> > completes >> >> > as >> >> > if >> >> > it had done something, but in fact nothing has changed in the >> >> > cluster. >> >> > When >> >> > I call StorageProxy.mutateBlocking(), I get an IOException >> >> > complaining >> >> > that >> >> > there is no connection to the cluster. I've concluded with the >> >> > debugger >> >> > that StorageService is not connecting to the cluster, even though >> >> > I've >> >> > specified the correct seed and ListenAddress (I've using the exact >> >> > same >> >> > storage-conf.xml as the nodes in the cluster). >> >> > >> >> > I'm sure I'm missing something obvious in the configuration or my >> >> > setup, >> >> > but >> >> > since I'm new to Cassandra, I can't see what it is. >> >> > >> >> > Any help appreciated, >> >> > Joost >> >> > >> >> > >> >> > >> >> >> >> >> > >> > > >