multiget performs in O(N) with the number of rows requested.  so will
range scanning.

if you want to query millions of records of one type i would create a
CF per type and use hadoop to parallelize the computation.

On Fri, May 7, 2010 at 6:16 PM, James <rent.lupin.r...@gmail.com> wrote:
> Hi all,
> Apologies if I'm still stuck in RDBMS mentality - first project using
> Cassandra!
> I'll be using Cassandra to store quite a lot (10s of millions) of records,
> each of which has a type.
> I'll want to query the records to get all of a certain type; it's an
> analagous situation to the TaggedPosts schema from Arin's blog post
> (http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model).
> The thing is, each type (or tag) row key will be pointing at millions of
> records. I know I can use multiget_slice with all those record IDs as one
> request, but is this The Right Way of "filtering" a large column family by
> type?
> Coming from an RDBMS-ingrained mindset, it seems kind of awkward...
> Thanks!
> James



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Reply via email to