> When I query for user_id = "user1" and order_attr1 = 1991 I want to get the > order_num. Is this possible without super columns? If you only have a few hundred columns you can read them all back and filter client side.
Secondary indexes are used when you do not know the row you want to get back. Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 21/03/2013, at 5:27 AM, Mayank <4may...@gmail.com> wrote: > Hi Aaron, > > I did mean 1000 columns. But I see your point. > The current CF schema has user_id as the row key and unnamed column order_num > = order info as the col-val pair. The plan is to add named columns > order_attr1, order_attr2... order_attr18. When I query for user_id = "user1" > and order_attr1 = 1991 I want to get the order_num. Is this possible without > super columns? Can I make Cassandra link the order_num to order_attr1 in any > way? > > An Example: > RowKey: user1 > => (column=Order100, value=Order100Info, timestamp=1363789981864000) > => (column=Order101, value=Order101Info, timestamp=1363789985382000) > => (column=Order102, value=Order102Info, timestamp=1363789990118000) > => (column=OrderAttr1, value=1991, timestamp=1363790016318000) > => (column=OrderAttr2, value=2015, timestamp=1363789936073000) > => (column=OrderAttr3, value=4090, timestamp=1363789946805000) > > Thanks. > > > > On Tue, Mar 19, 2013 at 11:34 PM, aaron morton <aa...@thelastpickle.com> > wrote: > > Assuming we have 1000 columns in 1 row of the column family and about 900 > > of them have > > > > NamedColumn1=1 and of those 900 only 10 of them also have NamedColumn2=1. > Am assuming you mean 1,000 rows not columns. > > > does Cassandra > > optimized this in any way by fetching only the 10 versus the 900 and then > > > > filtering out the 10 I really need? > The most selective term is used first. All the rows that match that term are > read (in batches) and then filtered. > > > How does this affect the internal index column > > family and how often will it get compacted? > It is compacted when it needs it, not when the parent CF is compacted. > > > About expirations: if the column in the user defined column family expires > > can I > > safely assume that its related indexes will also expire? > Yes > > > > > http://www.datastax.com/docs/1.1/ddl/indexes > > Maintaining Secondary Indexes > Emailed to ask for clarification. > > Cheers > > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 20/03/2013, at 5:08 AM, Mayank <4may...@gmail.com> wrote: > > > Thanks guys. I am working with Andy on this project. > > > > Further questions on the secondary indexes: > > Assuming we have 1000 columns in 1 row of the column family and about 900 > > of them have > > > > NamedColumn1=1 and of those 900 only 10 of them also have NamedColumn2=1. > > If I > > > > query for columns which have NamedColumn1=1 && NamedColumn2=1 does Cassandra > > optimized this in any way by fetching only the 10 versus the 900 and then > > > > filtering out the 10 I really need? > > > > > > We ran some tests that show if only 10 of the 18 NamedColumns were inserted > > only > > those 10 internal index column families are updated. In this case the size > > of the > > > > user defined column families and the internal index column families will > > differ. It > > > > is then expected that the user defined column family will get compacted > > more often > > and tombstones removed more often. How does this affect the internal index > > column > > > > family and how often will it get compacted? > > > > > > About expirations: if the column in the user defined column family expires > > can I > > safely assume that its related indexes will also expire? > > > > > > Here is the link that talks about the rebuild: > > > > http://www.datastax.com/docs/1.1/ddl/indexes > > Maintaining Secondary Indexes > > > > When a node starts up, Cassandra rebuilds the secondary index of the stored > > rows. To perform a hot rebuild of a secondary index, use the nodetool > > utility rebuild_index command. > > > > > > We'll look further into Solr but at the moment it may not fit our > > need/schedule. > > > > > - Will that result in Cassandra creating 18 new column families, > > > one for each index? > > Inserts will be slower, as each insert will potentially result in 18 > > additional > > inserts. This is just the same as a RDBMS, more indexes == more insert work. > > > > > - If a given column is not specified in any rows, will Cassandra > > > still create an index column family? > > Yes > > > > > - The documentation says that indexes are rebuilt with every > > > Cassandra restart. Why is that needed? What does the rebuild do? Does it > > > read the whole column family into memory at once? > > That is not correct, do you have a link for the docs ? > > > > As Moshe said, standard cassandra i not a great fit for faceting. Consider > > Solr > > or Data Stax > > http://3.datastax.com/datastax-enterprise.php > > > > > > Cheers > > > > ----------------- > > Aaron Morton > > Freelance Cassandra Consultant > > New Zealand > > > > @aaronmorton > > > > http://www.thelastpickle.com > > > > > > On 17/03/2013, at 10:17 PM, > > moshe.kr...@barclays.com > > wrote: > > > > > I do not think this is a good use case for Cassandra alone, assuming the > > > queries can be any combination of the 18 columns. > > > I would consider using some combination of Cassandra and Solr, where Solr > > > provides the indexing/search, and Cassandra provides the bulk store. > > > > > > From: Andy Stec [ > > mailto:andys...@gmail.com > > ] > > > Sent: Saturday, March 16, 2013 12:10 AM > > > To: > > user@cassandra.apache.org > > > > > Subject: Secondary Indexes > > > > > > We need to provide search capability based on a field that is a bitmap > > > combination of 18 possible values. We want to use secondary indexes to > > > improve performance. One possible solution is to create a named column for > > > each value and have a secondary index for each of the 18 columns. > > > Questions we have are: > > > > > > > > > - Will that result in Cassandra creating 18 new column families, > > > one for each index? > > > > > > - If a given column is not specified in any rows, will Cassandra > > > still create an index column family? > > > > > > - The documentation says that indexes are rebuilt with every > > > Cassandra restart. Why is that needed? What does the rebuild do? Does it > > > read the whole column family into memory at once? > > > > > > > > > > > > _______________________________________________ > > > > > > This message may contain information that is confidential or privileged. > > > If > > > you are not an intended recipient of this message, please delete it and > > > any > > > attachments, and notify the sender that you have received it in error. > > > Unless > > > specifically stated in the message or otherwise indicated, you may not > > > duplicate, redistribute or forward this message or any portion thereof, > > > including any attachments, by any means to any other person, including any > > > retail investor or customer. This message is not a recommendation, advice, > > > offer or solicitation, to buy/sell any product or service, and is not an > > > official confirmation of any transaction. Any opinions presented are > > > solely > > > those of the author and do not necessarily represent those of Barclays. > > > This > > > message is subject to terms available at: > > www.barclays.com/emaildisclaimer > > > > > and, if received from Barclays' Sales or Trading desk, the terms available > > > at: > > www.barclays.com/salesandtradingdisclaimer/ > > . By messaging with Barclays > > > you consent to the foregoing. Barclays Bank PLC is a company registered in > > > England (number 1026167) with its registered office at 1 Churchill Place, > > > London, E14 5HP. This email may relate to or be sent from other members of > > > the Barclays group. > > > > > > _______________________________________________ > > > > > > > > >