Hi, we had a similar use case. Just do the filtering client-side, the #2 example performs horribly, secondary indexes on something dividing the set into two roughly the same size subsets just don't work.
Give it a try on localhost with just a couple of records (150.000), you will see. regards, ondrej On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin <y2klyf+w...@gmail.com> wrote: > in my #2 example: > select * from people where company_id='xxx' and gender='male' > > I already specify the first part of the primary key(row key) in my where > clause, so how does the secondary indexed column gender='male" help > determine which row to return? It is more like filtering a list of column > from a row(which is exactly I can do that in #1 example). > But then if I don't create index first, the cql statement will run into > syntax error. > > > > > On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert < > robert.mul...@pearson.com> wrote: > >> I would do #2. Take a look at this blog which talks about secondary >> indexes, cardinality, and what it means for cassandra. Secondary indexes >> in cassandra are a different beast, so often old rules of thumb about >> indexes don't apply. http://www.wentnet.com/blog/?p=77 >> >> >> On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo >> <edlinuxg...@gmail.com>wrote: >> >>> Generally indexes on binary fields true/false male/female are not >>> terrible effective. >>> >>> >>> On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin <y2klyf+w...@gmail.com>wrote: >>> >>>> I have a simple column family like the following >>>> >>>> create table people( >>>> company_id text, >>>> employee_id text, >>>> gender text, >>>> primary key(company_id, employee_id) >>>> ); >>>> >>>> if I want to find out all the "male" employee given a company id, I can >>>> do >>>> >>>> 1/ >>>> select * from people where company_id=xxxx' >>>> and loop through the result efficiently to pick the employee who has >>>> gender column value equal to "male" >>>> >>>> 2/ >>>> add a seconday index >>>> create index gender_index on people(gender) >>>> select * from people where company_id='xxx' and gender='male' >>>> >>>> >>>> I though #2 seems more appropriate, but I also thought the secondary >>>> index is helping only locating the primary row key, with the select clause >>>> in #2, is it more efficient than #1 where application responsible loop >>>> through the result and filter the right content? >>>> >>>> ( >>>> It totally make sense if I only need to find out all the male >>>> employee(and not within a company) by using >>>> select * from people where gender='male" >>>> ) >>>> >>>> thanks >>>> >>> >>> >> >