Is it a problem for me to have millions of columns in a supercolumn? You will have problem, because there is no index in supercolumn for subcolumns.
On Tue, May 11, 2010 at 10:03 PM, David Boxenhorn <da...@lookin2.com> wrote: > I have a similar issue, but I can't create a CF per type, because types are > an open-ended set in my case (they are geographical locations). So I wanted > to have one CF for types, and a supercolumn for each type, with the keys as > columns per supercolumn. > > Is it a problem for me to have millions of columns in a supercolumn? > > > On Tue, May 11, 2010 at 4:29 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > >> multiget performs in O(N) with the number of rows requested. so will >> range scanning. >> >> if you want to query millions of records of one type i would create a >> CF per type and use hadoop to parallelize the computation. >> >> On Fri, May 7, 2010 at 6:16 PM, James <rent.lupin.r...@gmail.com> wrote: >> > Hi all, >> > Apologies if I'm still stuck in RDBMS mentality - first project using >> > Cassandra! >> > I'll be using Cassandra to store quite a lot (10s of millions) of >> records, >> > each of which has a type. >> > I'll want to query the records to get all of a certain type; it's an >> > analagous situation to the TaggedPosts schema from Arin's blog post >> > (http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model). >> > The thing is, each type (or tag) row key will be pointing at millions of >> > records. I know I can use multiget_slice with all those record IDs as >> one >> > request, but is this The Right Way of "filtering" a large column family >> by >> > type? >> > Coming from an RDBMS-ingrained mindset, it seems kind of awkward... >> > Thanks! >> > James >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com >> > >