Thanks guys. I am working with Andy on this project.

Further questions on the secondary indexes:

Assuming we have 1000 columns in 1 row of the column family and about
900 of them have
NamedColumn1=1 and of those 900 only 10 of them also have NamedColumn2=1. If I
query for columns which have NamedColumn1=1 && NamedColumn2=1 does Cassandra
optimized this in any way by fetching only the 10 versus the 900 and then
filtering out the 10 I really need?


We ran some tests that show if only 10 of the 18 NamedColumns were
inserted only
those 10 internal index column families are updated. In this case the
size of the
user defined column families and the internal index column families
will differ. It
is then expected that the user defined column family will get
compacted more often
and tombstones removed more often. How does this affect the internal
index column
family and how often will it get compacted?

About expirations: if the column in the user defined column family expires can I
safely assume that its related indexes will also expire?


Here is the link that talks about the rebuild:
http://www.datastax.com/docs/1.1/ddl/indexes

Maintaining Secondary Indexes

When a node starts up, Cassandra rebuilds the secondary index of the stored
rows. To perform a hot rebuild of a secondary index, use the nodetool
utility rebuild_index command.


We'll look further into Solr but at the moment it may not fit our
need/schedule.


> - Will that result in Cassandra creating 18 new column families,
> one for each index?
Inserts will be slower, as each insert will potentially result in 18 additional
inserts. This is just the same as a RDBMS, more indexes == more insert work.

> - If a given column is not specified in any rows, will Cassandra
> still create an index column family?
Yes

> - The documentation says that indexes are rebuilt with every
> Cassandra restart. Why is that needed? What does the rebuild do? Does it
> read the whole column family into memory at once?
That is not correct, do you have a link for the docs ?

As Moshe said, standard cassandra i not a great fit for faceting. Consider Solr
or Data Stax http://3.datastax.com/datastax-enterprise.php

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmortonhttp://www.thelastpickle.com

On 17/03/2013, at 10:17 PM, moshe.kr...@barclays.com wrote:

> I do not think this is a good use case for Cassandra alone, assuming the
> queries can be any combination of the 18 columns.
> I would consider using some combination of Cassandra and Solr, where Solr
> provides the indexing/search, and Cassandra provides the bulk store.
>
> From: Andy Stec [mailto:andys...@gmail.com <andys...@gmail.com>]
> Sent: Saturday, March 16, 2013 12:10 AM
> To: user@cassandra.apache.org
> Subject: Secondary Indexes
>
> We need to provide search capability based on a field that is a bitmap
> combination of 18 possible values. We want to use secondary indexes to
> improve performance. One possible solution is to create a named column for
> each value and have a secondary index for each of the 18 columns.
> Questions we have are:
>
>
> - Will that result in Cassandra creating 18 new column families,
> one for each index?
>
> - If a given column is not specified in any rows, will Cassandra
> still create an index column family?
>
> - The documentation says that indexes are rebuilt with every
> Cassandra restart. Why is that needed? What does the rebuild do? Does it
> read the whole column family into memory at once?
>
>
>
> _______________________________________________
>
> This message may contain information that is confidential or privileged. If
> you are not an intended recipient of this message, please delete it and any
> attachments, and notify the sender that you have received it in error. Unless
> specifically stated in the message or otherwise indicated, you may not
> duplicate, redistribute or forward this message or any portion thereof,
> including any attachments, by any means to any other person, including any
> retail investor or customer. This message is not a recommendation, advice,
> offer or solicitation, to buy/sell any product or service, and is not an
> official confirmation of any transaction. Any opinions presented are solely
> those of the author and do not necessarily represent those of Barclays. This
> message is subject to terms available at: www.barclays.com/emaildisclaimer
> and, if received from Barclays' Sales or Trading desk, the terms available
> at: www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays
> you consent to the foregoing. Barclays Bank PLC is a company registered in
> England (number 1026167) with its registered office at 1 Churchill Place,
> London, E14 5HP. This email may relate to or be sent from other members of
> the Barclays group.
>
> _______________________________________________
>

Reply via email to