You could put the tenant as a column that is part of the clustering key. That 
avoids large partitions. 

On 12 Jun 2017, at 07:14, Erick Ramirez <flightc...@gmail.com> wrote:

>> Given my use case is cassandra the best suited one or is there any other 
>> database which suits my requirement better?
> 
> Probably not the right forum for that question. It's like walking into a Ford 
> dealership and asking if the Mustang is the best car for you. 😄
> 
> In any case, you would choose Cassandra because you require:
> - high availability
> - very fast reads
> - no single-point-of-failure
> - no downtime
> - you have a scale problem
> - etc
> 
>> What would be best way to implement multi-tenancy?
> 
> The "best" way is what works for your use case based on testing you've done. 
> As you already are aware in the example you provided, adding a column as the 
> tenant indicator could lead to large partitions so you need to be careful 
> about how you model your data.
> 
> Some implementations completely side-step this by distributing tenants across 
> keyspaces but that may not suit your needs.
> 
>> Given that I need to query by multiple dimensions would denormalized tables 
>> work better or should I be using materialized views?
> 
> With denormalised tables, your application needs to implement the logic for 
> batching the updates together.
> 
> With materialised views, that complexity is managed for you by C* but you 
> need to be aware of the performance impact associated with it. For example 
> with RF=3 on the base table, MV adds another RF=3 for an additional table so 
> RF=3+3. A second MV increases RF=3+3+3 and so on.
> 
>> Anything else that I need to consider based on your experiences with 
>> cassandra?
> 
> 
> Multi-tenancy can be difficult particularly for complex use cases. Test, test 
> and test. And make sure you always correctly size your cluster with enough 
> nodes.
> 
> You need to limit the number of tables to about 200 at the most (regardless 
> of the number of keyspaces). Having too many tables puts pressure on the heap 
> of each node.
> 
> Good luck!
> 
>> On Sun, Jun 11, 2017 at 2:07 AM, Govindarajan Srinivasaraghavan 
>> <govindragh...@gmail.com> wrote:
>> Hi All,
>> 
>> Just to give a background I'm working on a project where I need to store 
>> fast incoming time series data and have rest api's to query and serve the 
>> data to users when needed. The data as such is a single JSON which is 1kb in 
>> size and the data has to be purged after a specific time period (say few 
>> weeks or months). The incoming rate would be approximately 100k messages per 
>> second and the biggest challenge is the data should be query-able by 
>> multiple dimensions with sorting, paging and data dump options. 
>> 
>> I started looking into database options and felt like cassandra might be a 
>> good choice for my use case since the requirement needs faster writes. In 
>> order to query by multiple dimensions I had to insert the same record into 
>> multiple denormalized tables (around 8 tables). Now I need to implement 
>> multitenancy and having an extra column in the partition key to query by 
>> tenant will not work since there will be some tenants with huge amounts of 
>> data compared to the rest. My other option is to have the tenant identifier 
>> appended to the table names so that I can perform per teannt queries easily. 
>> 
>> Here are my questions for which I need some help.
>> - Given my use case is cassandra the best suited one or is there any other 
>> database which suits my requirement better?
>> - What would be best way to implement multi-tenancy?
>> - Given that I need to query by multiple dimensions would denormalized 
>> tables work better or should I be using materialized views?
>> - Anything else that I need to consider based on your experiences with 
>> cassandra?
>> 
>> Thanks
> 

Reply via email to