Thanks guys. I am working with Andy on this project. Further questions on the secondary indexes:
Assuming we have 1000 columns in 1 row of the column family and about 900 of them have NamedColumn1=1 and of those 900 only 10 of them also have NamedColumn2=1. If I query for columns which have NamedColumn1=1 && NamedColumn2=1 does Cassandra optimized this in any way by fetching only the 10 versus the 900 and then filtering out the 10 I really need? We ran some tests that show if only 10 of the 18 NamedColumns were inserted only those 10 internal index column families are updated. In this case the size of the user defined column families and the internal index column families will differ. It is then expected that the user defined column family will get compacted more often and tombstones removed more often. How does this affect the internal index column family and how often will it get compacted? About expirations: if the column in the user defined column family expires can I safely assume that its related indexes will also expire? Here is the link that talks about the rebuild: http://www.datastax.com/docs/1.1/ddl/indexes Maintaining Secondary Indexes When a node starts up, Cassandra rebuilds the secondary index of the stored rows. To perform a hot rebuild of a secondary index, use the nodetool utility rebuild_index command. We'll look further into Solr but at the moment it may not fit our need/schedule. > - Will that result in Cassandra creating 18 new column families, > one for each index? Inserts will be slower, as each insert will potentially result in 18 additional inserts. This is just the same as a RDBMS, more indexes == more insert work. > - If a given column is not specified in any rows, will Cassandra > still create an index column family? Yes > - The documentation says that indexes are rebuilt with every > Cassandra restart. Why is that needed? What does the rebuild do? Does it > read the whole column family into memory at once? That is not correct, do you have a link for the docs ? As Moshe said, standard cassandra i not a great fit for faceting. Consider Solr or Data Stax http://3.datastax.com/datastax-enterprise.php Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmortonhttp://www.thelastpickle.com On 17/03/2013, at 10:17 PM, moshe.kr...@barclays.com wrote: > I do not think this is a good use case for Cassandra alone, assuming the > queries can be any combination of the 18 columns. > I would consider using some combination of Cassandra and Solr, where Solr > provides the indexing/search, and Cassandra provides the bulk store. > > From: Andy Stec [mailto:andys...@gmail.com <andys...@gmail.com>] > Sent: Saturday, March 16, 2013 12:10 AM > To: user@cassandra.apache.org > Subject: Secondary Indexes > > We need to provide search capability based on a field that is a bitmap > combination of 18 possible values. We want to use secondary indexes to > improve performance. One possible solution is to create a named column for > each value and have a secondary index for each of the 18 columns. > Questions we have are: > > > - Will that result in Cassandra creating 18 new column families, > one for each index? > > - If a given column is not specified in any rows, will Cassandra > still create an index column family? > > - The documentation says that indexes are rebuilt with every > Cassandra restart. Why is that needed? What does the rebuild do? Does it > read the whole column family into memory at once? > > > > _______________________________________________ > > This message may contain information that is confidential or privileged. If > you are not an intended recipient of this message, please delete it and any > attachments, and notify the sender that you have received it in error. Unless > specifically stated in the message or otherwise indicated, you may not > duplicate, redistribute or forward this message or any portion thereof, > including any attachments, by any means to any other person, including any > retail investor or customer. This message is not a recommendation, advice, > offer or solicitation, to buy/sell any product or service, and is not an > official confirmation of any transaction. Any opinions presented are solely > those of the author and do not necessarily represent those of Barclays. This > message is subject to terms available at: www.barclays.com/emaildisclaimer > and, if received from Barclays' Sales or Trading desk, the terms available > at: www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays > you consent to the foregoing. Barclays Bank PLC is a company registered in > England (number 1026167) with its registered office at 1 Churchill Place, > London, E14 5HP. This email may relate to or be sent from other members of > the Barclays group. > > _______________________________________________ >