Agreed, that's what I meant by "there are a lot of simple ways to split it up over multiple rows", assuming it necessary.
On Thu, Aug 25, 2011 at 4:24 PM, Konstantin Naryshkin <konstant...@a-bb.net>wrote: > Why are you keeping all your indexes in the same row? We do a similar thing > (maintain several indexes over the same data) and we just have an index > column family with keys like "dest192.168.0.1" which means destination index > of 192.168.0.1. You can do rows like User_Keys_By_Last_Name_adams and > User_Keys_By_Last_Name_alden. You can keep the matching main column family > key as the column name. This will ensure that your index is evenly > distributed throughout your cluster. > > ----- Original Message ----- > From: "Ed Anuff" <e...@anuff.com> > To: user@cassandra.apache.org > Sent: Thursday, August 25, 2011 12:48:49 PM > Subject: Re: Customized Secondary Index Schema > > How many unique last names do you anticipate having? How many characters in > the last name do you anticipate keeping in your index? You can easily do the > math to figure out how many you could fit on a node. I think you'll find > that the ceiling might be quite a bit higher than you think. If you have > over a couple of hundred million users it might not be the best approach. > There are a lot of very simple ways to split it up over multiple rows. As is > the case with most things regarding Cassandra, the off-the-cuff assumptions > only get you so far before you have to do some math and do some tests. > > As I mentioned in my talk, for simple uses cases like this, you probably > should just start with the built in secondary indexes, but I assume you > already have explored those. > > Ed > > > On Thu, Aug 25, 2011 at 9:27 AM, Alvin UW < alvi...@gmail.com > wrote: > > > Yes, this is what I am worrying about. > > > 2011/8/24 Ryan King < r...@twitter.com > > > > > > > On Tue, Aug 23, 2011 at 10:03 AM, Alvin UW < alvi...@gmail.com > wrote: > > Hello, > > > > As mentioned by Ed Anuff in his blog and slides, one way to build > customized > > secondary index is: > > We use one CF, each row to represent a secondary index, with the > secondary > > index name as row key. > > For example, > > > > Indexes = { > > "User_Keys_By_Last_Name" : { > > "adams" : "e5d61f2b-…", > > "alden" : "e80a17ba-…", > > "anderson" : "e5d61f2b-…", > > "davis" : "e719962b-…", > > "doe" : "e78ece0f-…", > > "franks" : "e66afd40-…", > > … : …, > > } > > } > > > > But the whole secondary index is partitioned into a single node, because > of > > the row key. > > All the queries against this secondary index will go to this node. Of > > course, there are some replica nodes. > > > > Do you think this is a scalability problem, or any better solution to > solve > > it? > > Its certainly a scalability problem in that this solution has a hard > ceiling (this index can't get larger than the capacity of any single > node). It will probably work on small datasets, but if your dataset is > small then why are you using cassandra? > > -ryan > > >