I wrote some Iterable<*> methods to do this for column families that share key structure with OPP. It is on the hector examples page. Caveat emptor.
It does iterative chunking of the working set for each column family, so that you can set the nominal transfer size when you construct the Iterator/Iterable. I've been very happy with the performance of it, even over large ranges of keys. This is with OrderPreservingPartitioner because of other requirements, so it may not be a good example for comparison with a random partitioner, which is preferred. Doing joins as such on the server works against the basic design of Cassandra. The server does a few things very well only because it isn't overloaded with extra faucets and kitchen sinks. However, I'd like to be able to load auxiliary classes into the server runtime in a modular way, just for things like this. Maybe we'll get that someday. My impression is that there is much more common key structure in a workable Cassandra storage layout than in a conventional ER model. This is the nature of the beast when you are organizing your information more according to access patterns than fully normal relationships. That is one of the fundamental design trade-offs of using a hash structure over a schema. Having something that lets you deploy a fully normal schema on a hash store can be handy, but it can also obscure the way that your application indirectly exercises the storage layer. The end-result may be that the layout is less friendly to the underlying mechanisms of Cassandra. I'm not saying that it is bad to have a tool to do this, only that it can make it easy to avoid thinking about Cassandra storage in terms of what it really is. There may be ways to optimize the OCM queries, but that takes you down the road of query optimization, which can be quite nebulous. My gut instinct is to focus more on the layout, using aggregate keys and common key structure where you can, so that you can take advantage of the parallel queries more of the time. On Wed, May 26, 2010 at 3:13 PM, Charlie Mason <charlie....@gmail.com> wrote: > On Wed, May 26, 2010 at 7:45 PM, Dodong Juan <dodongj...@gmail.com> wrote: >> >> So I am not sure if you guys are familiar with OCM . Basically it is an ORM >> for Cassandra. Been testing it >> > > In case anyone is interested I have posted a reply on the OCM issue > tracker where this was also raised. > > http://github.com/charliem/OCM/issues/closed#issue/5/comment/254717 > > > Charlie M >