RE: Secondary index or dedicated CF?

Leleu Eric Fri, 22 Aug 2014 10:47:40 -0700

Thanks you for your feedbacks.

De : Mark Reddy [mailto:[email protected]]
Envoyé : vendredi 22 août 2014 17:08
À : [email protected]
Objet : Re: Secondary index or dedicated CF?


Hi,

As a general rule of thumb I would steer clear of secondary indexes, this is 
also the official stand that DataStax take (see p5 of their best practices doc: 
http://www.datastax.com/wp-content/uploads/2014/04/WP-DataStax-Enterprise-Best-Practices.pdf).

"It is best to avoid using Cassandra's built-in secondary indexes where 
possible. Instead, it is recommended to denormalize data and manually maintain 
a dynamic table as a form of an index instead of using a secondary index. If 
and when secondary indexes are to be used, they should be created only on 
columns containing low-cardinality data (for example: fields with less than 
1000 states)."

Mark

On 22 Aug 2014, at 15:58, DuyHai Doan 
<[email protected]<mailto:[email protected]>> wrote:


Hello Eric

"Under the hood what is the difference of the both solutions?"

 1. Cassandra secondary index: distributed index, supports better high volume 
of data, the index itself is distributed so there is no bottleneck. The 
tradeoff is that depending on the cardinality of data having the same 
"bucketname+tenantID" the performance may drop sharply. Please read this: 
http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_when_use_index_c.html?scroll=concept_ds_sgh_yzz_zj__when-no-index.
 There are several restrictions to secondary index

2. Manual index: easy to design, but potentially wide row and not well balance 
if  data having the same "bucketname+tenantID" is very large. Furthermore you 
need to manage index consistency manually so that it is synced with source data 
updates.

 The best thing to do is to benchmark both solutions and takes the approach 
giving you the best results. Be careful with benchmarks, it should be 
representative of the data pattern you likely have in production.

On Fri, Aug 22, 2014 at 7:47 AM, Leleu Eric 
<[email protected]<mailto:[email protected]>> wrote:
Hi,


I'm new with Cassandra and I wondering what is the best design for my case.

I have a set of buckets that contain one or thousands of contents.

Here is my Content CF :

    CREATE TABLE IF NOT EXISTS contents (tenantID varchar,
     key varchar,
    type varchar,
    bucket varchar,
    owner varchar,
    workspace varchar,
     public_read boolean  PRIMARY KEY ((key, tenantID), type, workspace));


To retrieve all contents that belong to a bucket, I have created an index on 
the bucket column.

    CREATE INDEX IF NOT EXISTS bucket_to_contents ON contents (bucket);

The column value "bucket" is concatenated with the tenantId (bucket = 
bucketname+tenantID) in order to avoid filtering on the tenantID on my 
application.

Is it the rights way to do or should I create another column family to link 
each content to the bucket ?

    CREATE TABLE IF NOT EXISTS bucket_to_contents (tenantID varchar,
     key varchar,
    type varchar,
    bucket varchar,
    owner varchar,
    workspace varchar,
     public_read boolean  PRIMARY KEY ((bucket, tenantID), key));

Under the hood what is the difference of the both solutions?

According to my understanding, the result will be the same. Both will have the 
rowkey equals to the "bucketname"  and the "tenantID".
Excepted that the secondary index can have a replication delay...

Can you help me on this point?

Regards,
Eric


________________________________

Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.



________________________________

Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.

RE: Secondary index or dedicated CF?

Reply via email to