I have a similar issue, but I can't create a CF per type, because types are an open-ended set in my case (they are geographical locations). So I wanted to have one CF for types, and a supercolumn for each type, with the keys as columns per supercolumn.
Is it a problem for me to have millions of columns in a supercolumn? On Tue, May 11, 2010 at 4:29 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > multiget performs in O(N) with the number of rows requested. so will > range scanning. > > if you want to query millions of records of one type i would create a > CF per type and use hadoop to parallelize the computation. > > On Fri, May 7, 2010 at 6:16 PM, James <rent.lupin.r...@gmail.com> wrote: > > Hi all, > > Apologies if I'm still stuck in RDBMS mentality - first project using > > Cassandra! > > I'll be using Cassandra to store quite a lot (10s of millions) of > records, > > each of which has a type. > > I'll want to query the records to get all of a certain type; it's an > > analagous situation to the TaggedPosts schema from Arin's blog post > > (http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model). > > The thing is, each type (or tag) row key will be pointing at millions of > > records. I know I can use multiget_slice with all those record IDs as one > > request, but is this The Right Way of "filtering" a large column family > by > > type? > > Coming from an RDBMS-ingrained mindset, it seems kind of awkward... > > Thanks! > > James > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >