Re: Secondary index or dedicated CF?

DuyHai Doan Fri, 22 Aug 2014 07:59:55 -0700

Hello Eric

"Under the hood what is the difference of the both solutions?"


 1. Cassandra secondary index: distributed index, supports better high
volume of data, the index itself is distributed so there is no bottleneck.
The tradeoff is that depending on the cardinality of data having the same "
bucketname+tenantID" the performance may drop sharply. Please read this:
http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_when_use_index_c.html?scroll=concept_ds_sgh_yzz_zj__when-no-index.
There are several restrictions to secondary index

2. Manual index: easy to design, but potentially wide row and not well
balance if  data having the same "bucketname+tenantID" is very large.
Furthermore you need to manage index consistency manually so that it is
synced with source data updates.

 The best thing to do is to benchmark both solutions and takes the approach
giving you the best results. Be careful with benchmarks, it should be
representative of the data pattern you likely have in production.


On Fri, Aug 22, 2014 at 7:47 AM, Leleu Eric <eric.le...@worldline.com>
wrote:

>  Hi,
>
>
>
>
>
> I’m new with Cassandra and I wondering what is the best design for my case.
>
>
>
> I have a set of buckets that contain one or thousands of contents.
>
>
>
> Here is my Content CF :
>
>
>
>     CREATE TABLE IF NOT EXISTS contents (tenantID varchar,
>
>      key varchar,
>
>     type varchar,
>
>     bucket varchar,
>
>     owner varchar,
>
>     workspace varchar,
>
>      public_read boolean  PRIMARY KEY ((key, tenantID), type, workspace));
>
>
>
>
>
> To retrieve all contents that belong to a bucket, I have created an index
> on the bucket column.
>
>
>
>     CREATE INDEX IF NOT EXISTS bucket_to_contents ON contents (bucket);
>
>
>
> The column value “bucket” is concatenated with the tenantId (bucket =
> bucketname+tenantID) in order to avoid filtering on the tenantID on my
> application.
>
>
>
> Is it the rights way to do or should I create another column family to
> link each content to the bucket ?
>
>
>
>     CREATE TABLE IF NOT EXISTS bucket_to_contents (tenantID varchar,
>
>      key varchar,
>
>     type varchar,
>
>     bucket varchar,
>
>     owner varchar,
>
>     workspace varchar,
>
>      public_read boolean  PRIMARY KEY ((bucket, tenantID), key));
>
>
>
> Under the hood what is the difference of the both solutions?
>
>
>
> According to my understanding, the result will be the same. Both will have
> the rowkey equals to the “bucketname”  and the “tenantID”.
>
> Excepted that the secondary index can have a replication delay…
>
>
>
> Can you help me on this point?
>
>
>
> Regards,
>
> Eric
>
>
>
> ------------------------------
>
> Ce message et les pièces jointes sont confidentiels et réservés à l'usage
> exclusif de ses destinataires. Il peut également être protégé par le secret
> professionnel. Si vous recevez ce message par erreur, merci d'en avertir
> immédiatement l'expéditeur et de le détruire. L'intégrité du message ne
> pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra
> être recherchée quant au contenu de ce message. Bien que les meilleurs
> efforts soient faits pour maintenir cette transmission exempte de tout
> virus, l'expéditeur ne donne aucune garantie à cet égard et sa
> responsabilité ne saurait être recherchée pour tout dommage résultant d'un
> virus transmis.
>
> This e-mail and the documents attached are confidential and intended
> solely for the addressee; it may also be privileged. If you receive this
> e-mail in error, please notify the sender immediately and destroy it. As
> its integrity cannot be secured on the Internet, the Worldline liability
> cannot be triggered for the message content. Although the sender endeavours
> to maintain a computer virus-free network, the sender does not warrant that
> this transmission is virus-free and will not be liable for any damages
> resulting from any virus transmitted.
>

Re: Secondary index or dedicated CF?

Reply via email to