Re: Secondary Indexes

aaron morton Thu, 21 Mar 2013 10:02:03 -0700

> When I query for user_id = "user1" and order_attr1 = 1991 I want to get the 
> order_num. Is this possible without super columns?
If you only have a few hundred columns you can read them all back and filter 
client side.


Secondary indexes are used when you do not know the row you want to get back. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/03/2013, at 5:27 AM, Mayank <4may...@gmail.com> wrote:

> Hi Aaron,
> 
> I did mean 1000 columns. But I see your point.
> The current CF schema has user_id as the row key and unnamed column order_num 
> = order info as the col-val pair. The plan is to add named columns 
> order_attr1, order_attr2... order_attr18. When I query for user_id = "user1" 
> and order_attr1 = 1991 I want to get the order_num. Is this possible without 
> super columns? Can I make Cassandra link the order_num to order_attr1 in any 
> way?
> 
> An Example:
> RowKey: user1
> => (column=Order100, value=Order100Info, timestamp=1363789981864000)
> => (column=Order101, value=Order101Info, timestamp=1363789985382000)
> => (column=Order102, value=Order102Info, timestamp=1363789990118000)
> => (column=OrderAttr1, value=1991, timestamp=1363790016318000)
> => (column=OrderAttr2, value=2015, timestamp=1363789936073000)
> => (column=OrderAttr3, value=4090, timestamp=1363789946805000)
> 
> Thanks.
>   
> 
> 
> On Tue, Mar 19, 2013 at 11:34 PM, aaron morton <aa...@thelastpickle.com> 
> wrote:
> > Assuming we have 1000 columns in 1 row of the column family and about 900 
> > of them have
> >
> > NamedColumn1=1 and of those 900 only 10 of them also have NamedColumn2=1.
> Am assuming you mean 1,000 rows not columns.
> 
> > does Cassandra
> > optimized this in any way by fetching only the 10 versus the 900 and then
> >
> > filtering out the 10 I really need?
> The most selective term is used first. All the rows that match that term are 
> read (in batches) and then filtered.
> 
> >   How does this affect the internal index column
> > family and how often will it get compacted?
> It is compacted when it needs it, not when the parent CF is compacted.
> 
> > About expirations: if the column in the user defined column family expires 
> > can I
> > safely assume that its related indexes will also expire?
> Yes
> 
> >
> > http://www.datastax.com/docs/1.1/ddl/indexes
> > Maintaining Secondary Indexes
> Emailed to ask for clarification.
> 
> Cheers
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 20/03/2013, at 5:08 AM, Mayank <4may...@gmail.com> wrote:
> 
> > Thanks guys. I am working with Andy on this project.
> >
> > Further questions on the secondary indexes:
> > Assuming we have 1000 columns in 1 row of the column family and about 900 
> > of them have
> >
> > NamedColumn1=1 and of those 900 only 10 of them also have NamedColumn2=1. 
> > If I
> >
> > query for columns which have NamedColumn1=1 && NamedColumn2=1 does Cassandra
> > optimized this in any way by fetching only the 10 versus the 900 and then
> >
> > filtering out the 10 I really need?
> >
> >
> > We ran some tests that show if only 10 of the 18 NamedColumns were inserted 
> > only
> > those 10 internal index column families are updated. In this case the size 
> > of the
> >
> > user defined column families and the internal index column families will 
> > differ. It
> >
> > is then expected that the user defined column family will get compacted 
> > more often
> > and tombstones removed more often. How does this affect the internal index 
> > column
> >
> > family and how often will it get compacted?
> >
> >
> > About expirations: if the column in the user defined column family expires 
> > can I
> > safely assume that its related indexes will also expire?
> >
> >
> > Here is the link that talks about the rebuild:
> >
> > http://www.datastax.com/docs/1.1/ddl/indexes
> > Maintaining Secondary Indexes
> >
> > When a node starts up, Cassandra rebuilds the secondary index of the stored 
> > rows. To perform a hot rebuild of a secondary index, use the nodetool 
> > utility rebuild_index command.
> >
> >
> > We'll look further into Solr but at the moment it may not fit our 
> > need/schedule.
> >
> > > - Will that result in Cassandra creating 18 new column families,
> > > one for each index?
> > Inserts will be slower, as each insert will potentially result in 18 
> > additional
> > inserts. This is just the same as a RDBMS, more indexes == more insert work.
> >
> > > - If a given column is not specified in any rows, will Cassandra
> > > still create an index column family?
> > Yes
> >
> > > - The documentation says that indexes are rebuilt with every
> > > Cassandra restart. Why is that needed? What does the rebuild do? Does it
> > > read the whole column family into memory at once?
> > That is not correct, do you have a link for the docs ?
> >
> > As Moshe said, standard cassandra i not a great fit for faceting. Consider 
> > Solr
> > or Data Stax
> > http://3.datastax.com/datastax-enterprise.php
> >
> >
> > Cheers
> >
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Consultant
> > New Zealand
> >
> > @aaronmorton
> >
> > http://www.thelastpickle.com
> >
> >
> > On 17/03/2013, at 10:17 PM,
> > moshe.kr...@barclays.com
> >  wrote:
> >
> > > I do not think this is a good use case for Cassandra alone, assuming the
> > > queries can be any combination of the 18 columns.
> > > I would consider using some combination of Cassandra and Solr, where Solr
> > > provides the indexing/search, and Cassandra provides the bulk store.
> > >
> > > From: Andy Stec [
> > mailto:andys...@gmail.com
> > ]
> > > Sent: Saturday, March 16, 2013 12:10 AM
> > > To:
> > user@cassandra.apache.org
> >
> > > Subject: Secondary Indexes
> > >
> > > We need to provide search capability based on a field that is a bitmap
> > > combination of 18 possible values. We want to use secondary indexes to
> > > improve performance. One possible solution is to create a named column for
> > > each value and have a secondary index for each of the 18 columns.
> > > Questions we have are:
> > >
> > >
> > > - Will that result in Cassandra creating 18 new column families,
> > > one for each index?
> > >
> > > - If a given column is not specified in any rows, will Cassandra
> > > still create an index column family?
> > >
> > > - The documentation says that indexes are rebuilt with every
> > > Cassandra restart. Why is that needed? What does the rebuild do? Does it
> > > read the whole column family into memory at once?
> > >
> > >
> > >
> > > _______________________________________________
> > >
> > > This message may contain information that is confidential or privileged. 
> > > If
> > > you are not an intended recipient of this message, please delete it and 
> > > any
> > > attachments, and notify the sender that you have received it in error. 
> > > Unless
> > > specifically stated in the message or otherwise indicated, you may not
> > > duplicate, redistribute or forward this message or any portion thereof,
> > > including any attachments, by any means to any other person, including any
> > > retail investor or customer. This message is not a recommendation, advice,
> > > offer or solicitation, to buy/sell any product or service, and is not an
> > > official confirmation of any transaction. Any opinions presented are 
> > > solely
> > > those of the author and do not necessarily represent those of Barclays. 
> > > This
> > > message is subject to terms available at:
> > www.barclays.com/emaildisclaimer
> >
> > > and, if received from Barclays' Sales or Trading desk, the terms available
> > > at:
> > www.barclays.com/salesandtradingdisclaimer/
> > . By messaging with Barclays
> > > you consent to the foregoing. Barclays Bank PLC is a company registered in
> > > England (number 1026167) with its registered office at 1 Churchill Place,
> > > London, E14 5HP. This email may relate to or be sent from other members of
> > > the Barclays group.
> > >
> > > _______________________________________________
> > >
> >
> >
> 
>

Re: Secondary Indexes

Reply via email to