Re: Secondary Indexes

aaron morton Tue, 19 Mar 2013 21:34:34 -0700

> Assuming we have 1000 columns in 1 row of the column family and about 900 of 
> them have 
> 
> NamedColumn1=1 and of those 900 only 10 of them also have NamedColumn2=1.
Am assuming you mean 1,000 rows not columns.


> does Cassandra
> optimized this in any way by fetching only the 10 versus the 900 and then 
> 
> filtering out the 10 I really need?
The most selective term is used first. All the rows that match that term are 
read (in batches) and then filtered. 

>   How does this affect the internal index column 
> family and how often will it get compacted? 
It is compacted when it needs it, not when the parent CF is compacted. 

> About expirations: if the column in the user defined column family expires 
> can I
> safely assume that its related indexes will also expire? 
Yes

> 
> http://www.datastax.com/docs/1.1/ddl/indexes
> Maintaining Secondary Indexes
Emailed to ask for clarification. 

Cheers


-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/03/2013, at 5:08 AM, Mayank <4may...@gmail.com> wrote:

> Thanks guys. I am working with Andy on this project.
> 
> Further questions on the secondary indexes:
> Assuming we have 1000 columns in 1 row of the column family and about 900 of 
> them have 
> 
> NamedColumn1=1 and of those 900 only 10 of them also have NamedColumn2=1. If 
> I 
> 
> query for columns which have NamedColumn1=1 && NamedColumn2=1 does Cassandra
> optimized this in any way by fetching only the 10 versus the 900 and then 
> 
> filtering out the 10 I really need?
> 
> 
> We ran some tests that show if only 10 of the 18 NamedColumns were inserted 
> only 
> those 10 internal index column families are updated. In this case the size of 
> the 
> 
> user defined column families and the internal index column families will 
> differ. It 
> 
> is then expected that the user defined column family will get compacted more 
> often
> and tombstones removed more often. How does this affect the internal index 
> column 
> 
> family and how often will it get compacted?
> 
> 
> About expirations: if the column in the user defined column family expires 
> can I
> safely assume that its related indexes will also expire?  
> 
> 
> Here is the link that talks about the rebuild:
> 
> http://www.datastax.com/docs/1.1/ddl/indexes
> Maintaining Secondary Indexes
> 
> When a node starts up, Cassandra rebuilds the secondary index of the stored 
> rows. To perform a hot rebuild of a secondary index, use the nodetool utility 
> rebuild_index command.
> 
> 
> We'll look further into Solr but at the moment it may not fit our 
> need/schedule. 
> 
> > - Will that result in Cassandra creating 18 new column families,
> > one for each index?
> Inserts will be slower, as each insert will potentially result in 18 
> additional 
> inserts. This is just the same as a RDBMS, more indexes == more insert work. 
> 
> > - If a given column is not specified in any rows, will Cassandra
> > still create an index column family?
> Yes
> 
> > - The documentation says that indexes are rebuilt with every
> > Cassandra restart. Why is that needed? What does the rebuild do? Does it
> > read the whole column family into memory at once?
> That is not correct, do you have a link for the docs ? 
> 
> As Moshe said, standard cassandra i not a great fit for faceting. Consider 
> Solr 
> or Data Stax 
> http://3.datastax.com/datastax-enterprise.php
> 
> 
> Cheers
>   
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> 
> http://www.thelastpickle.com
> 
> 
> On 17/03/2013, at 10:17 PM, 
> moshe.kr...@barclays.com
>  wrote:
> 
> > I do not think this is a good use case for Cassandra alone, assuming the 
> > queries can be any combination of the 18 columns.
> > I would consider using some combination of Cassandra and Solr, where Solr 
> > provides the indexing/search, and Cassandra provides the bulk store.
> >  
> > From: Andy Stec [
> mailto:andys...@gmail.com
> ] 
> > Sent: Saturday, March 16, 2013 12:10 AM
> > To: 
> user@cassandra.apache.org
> 
> > Subject: Secondary Indexes
> >  
> > We need to provide search capability based on a field that is a bitmap
> > combination of 18 possible values. We want to use secondary indexes to
> > improve performance. One possible solution is to create a named column for
> > each value and have a secondary index for each of the 18 columns.
> > Questions we have are:
> > 
> > 
> > - Will that result in Cassandra creating 18 new column families,
> > one for each index?
> > 
> > - If a given column is not specified in any rows, will Cassandra
> > still create an index column family?
> > 
> > - The documentation says that indexes are rebuilt with every
> > Cassandra restart. Why is that needed? What does the rebuild do? Does it
> > read the whole column family into memory at once?
> > 
> >  
> >  
> > _______________________________________________
> > 
> > This message may contain information that is confidential or privileged. If 
> > you are not an intended recipient of this message, please delete it and any 
> > attachments, and notify the sender that you have received it in error. 
> > Unless 
> > specifically stated in the message or otherwise indicated, you may not 
> > duplicate, redistribute or forward this message or any portion thereof, 
> > including any attachments, by any means to any other person, including any 
> > retail investor or customer. This message is not a recommendation, advice, 
> > offer or solicitation, to buy/sell any product or service, and is not an 
> > official confirmation of any transaction. Any opinions presented are solely 
> > those of the author and do not necessarily represent those of Barclays. 
> > This 
> > message is subject to terms available at: 
> www.barclays.com/emaildisclaimer
>  
> > and, if received from Barclays' Sales or Trading desk, the terms available 
> > at: 
> www.barclays.com/salesandtradingdisclaimer/
> . By messaging with Barclays 
> > you consent to the foregoing. Barclays Bank PLC is a company registered in 
> > England (number 1026167) with its registered office at 1 Churchill Place, 
> > London, E14 5HP. This email may relate to or be sent from other members of 
> > the Barclays group.
> > 
> > _______________________________________________
> > 
> 
>

Re: Secondary Indexes

Reply via email to