yes
On 4/19/10, Joost Ouwerkerk <jo...@openplaces.org> wrote: > And when retrieving only one supercolumn? Can I further specify which > subcolumns to retrieve? > > On Mon, Apr 19, 2010 at 9:29 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > >> the latter, if you are retrieving multiple supercolumns. >> >> On Mon, Apr 19, 2010 at 8:10 PM, Joost Ouwerkerk <jo...@openplaces.org> >> wrote: >> > hmm, might be too much data. In the case of a supercolumn, how do I >> specify >> > which sub-columns to retrieve? Or can I only retrieve entire >> supercolumns? >> > On Mon, Apr 19, 2010 at 8:47 PM, Jonathan Ellis <jbel...@gmail.com> >> wrote: >> >> >> >> Possibly you are asking it to retrieve too many columns per row. >> >> >> >> Possibly there is something else causing poor performance, like >> swapping. >> >> >> >> On Mon, Apr 19, 2010 at 7:12 PM, Joost Ouwerkerk <jo...@openplaces.org> >> >> wrote: >> >> > I'm slowly getting somewhere with Cassandra... I have successfully >> >> > imported >> >> > 1.5 million rows using MapReduce. This took about 8 minutes on an >> >> > 8-node >> >> > cluster, which is comparable to the time it takes with HBase. >> >> > Now I'm having trouble scanning this data. I've created a simple >> >> > MapReduce >> >> > job that counts rows in my ColumnFamily. The Job fails with most >> tasks >> >> > throwing the following Exception. Anyone have any ideas what's going >> >> > wrong? >> >> > java.lang.RuntimeException: TimedOutException() >> >> > >> >> > at >> >> > >> >> > >> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165) >> >> > at >> >> > >> >> > >> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215) >> >> > at >> >> > >> >> > >> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97) >> >> > at >> >> > >> >> > >> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) >> >> > at >> >> > >> >> > >> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) >> >> > at >> >> > >> >> > >> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91) >> >> > at >> >> > >> >> > >> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) >> >> > at >> >> > >> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) >> >> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) >> >> > at >> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) >> >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> >> > at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> > Caused by: TimedOutException() >> >> > at >> >> > >> >> > >> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015) >> >> > at >> >> > >> >> > >> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623) >> >> > at >> >> > >> >> > >> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597) >> >> > at >> >> > >> >> > >> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142) >> >> > ... 11 more >> >> > >> >> > On Sun, Apr 18, 2010 at 6:01 PM, Stu Hood <stu.h...@rackspace.com> >> >> > wrote: >> >> >> >> >> >> In 0.6.0 and trunk, it is located at >> >> >> src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java >> >> >> >> >> >> You might be using a pre-release version of 0.6 if you are seeing a >> fat >> >> >> client based InputFormat. >> >> >> >> >> >> >> >> >> -----Original Message----- >> >> >> From: "Joost Ouwerkerk" <jo...@openplaces.org> >> >> >> Sent: Sunday, April 18, 2010 4:53pm >> >> >> To: user@cassandra.apache.org >> >> >> Subject: Re: Help with MapReduce >> >> >> >> >> >> Where is the ColumnFamilyInputFormat that uses Thrift? I don't >> >> >> actually >> >> >> have a preference about client, I just want to be consistent with >> >> >> ColumnInputFormat. >> >> >> >> >> >> On Sun, Apr 18, 2010 at 5:37 PM, Stu Hood <stu.h...@rackspace.com> >> >> >> wrote: >> >> >> >> >> >> > ColumnFamilyInputFormat no longer uses the fat client API, and >> >> >> > instead >> >> >> > uses >> >> >> > Thrift. There are still some significant problems with the fat >> >> >> > client, >> >> >> > so it >> >> >> > shouldn't be used without a good understanding of those problems. >> >> >> > >> >> >> > If you still want to use it, check out contrib/bmt_example, but >> >> >> > I'd >> >> >> > recommend that you use thrift for now. >> >> >> > >> >> >> > -----Original Message----- >> >> >> > From: "Joost Ouwerkerk" <jo...@openplaces.org> >> >> >> > Sent: Sunday, April 18, 2010 2:59pm >> >> >> > To: user@cassandra.apache.org >> >> >> > Subject: Help with MapReduce >> >> >> > >> >> >> > I'm a Cassandra noob trying to validate Cassandra as a viable >> >> >> > alternative >> >> >> > to >> >> >> > HBase (which we've been using for over a year) for our >> >> >> > application. >> >> >> > So >> >> >> > far, >> >> >> > I've had no success getting Cassandra working with MapReduce. >> >> >> > >> >> >> > My first step is inserting data into Cassandra. I've created a >> >> >> > MapRed >> >> >> > job >> >> >> > based using the fat client API. I'm using the fat client >> >> >> > (StorageProxy) >> >> >> > because that's what ColumnFamilyInputFormat uses and I want to use >> >> >> > the >> >> >> > same >> >> >> > API for both read and write jobs. >> >> >> > >> >> >> > When I call StorageProxy.mutate(), nothing happens. The job >> >> >> > completes >> >> >> > as >> >> >> > if >> >> >> > it had done something, but in fact nothing has changed in the >> >> >> > cluster. >> >> >> > When >> >> >> > I call StorageProxy.mutateBlocking(), I get an IOException >> >> >> > complaining >> >> >> > that >> >> >> > there is no connection to the cluster. I've concluded with the >> >> >> > debugger >> >> >> > that StorageService is not connecting to the cluster, even though >> >> >> > I've >> >> >> > specified the correct seed and ListenAddress (I've using the exact >> >> >> > same >> >> >> > storage-conf.xml as the nodes in the cluster). >> >> >> > >> >> >> > I'm sure I'm missing something obvious in the configuration or my >> >> >> > setup, >> >> >> > but >> >> >> > since I'm new to Cassandra, I can't see what it is. >> >> >> > >> >> >> > Any help appreciated, >> >> >> > Joost >> >> >> > >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> > >> >> > >> > >> > >> >