multiget performs in O(N) with the number of rows requested. so will range scanning.
if you want to query millions of records of one type i would create a CF per type and use hadoop to parallelize the computation. On Fri, May 7, 2010 at 6:16 PM, James <rent.lupin.r...@gmail.com> wrote: > Hi all, > Apologies if I'm still stuck in RDBMS mentality - first project using > Cassandra! > I'll be using Cassandra to store quite a lot (10s of millions) of records, > each of which has a type. > I'll want to query the records to get all of a certain type; it's an > analagous situation to the TaggedPosts schema from Arin's blog post > (http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model). > The thing is, each type (or tag) row key will be pointing at millions of > records. I know I can use multiget_slice with all those record IDs as one > request, but is this The Right Way of "filtering" a large column family by > type? > Coming from an RDBMS-ingrained mindset, it seems kind of awkward... > Thanks! > James -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com