You could put the tenant as a column that is part of the clustering key. That avoids large partitions.
On 12 Jun 2017, at 07:14, Erick Ramirez <flightc...@gmail.com> wrote: >> Given my use case is cassandra the best suited one or is there any other >> database which suits my requirement better? > > Probably not the right forum for that question. It's like walking into a Ford > dealership and asking if the Mustang is the best car for you. 😄 > > In any case, you would choose Cassandra because you require: > - high availability > - very fast reads > - no single-point-of-failure > - no downtime > - you have a scale problem > - etc > >> What would be best way to implement multi-tenancy? > > The "best" way is what works for your use case based on testing you've done. > As you already are aware in the example you provided, adding a column as the > tenant indicator could lead to large partitions so you need to be careful > about how you model your data. > > Some implementations completely side-step this by distributing tenants across > keyspaces but that may not suit your needs. > >> Given that I need to query by multiple dimensions would denormalized tables >> work better or should I be using materialized views? > > With denormalised tables, your application needs to implement the logic for > batching the updates together. > > With materialised views, that complexity is managed for you by C* but you > need to be aware of the performance impact associated with it. For example > with RF=3 on the base table, MV adds another RF=3 for an additional table so > RF=3+3. A second MV increases RF=3+3+3 and so on. > >> Anything else that I need to consider based on your experiences with >> cassandra? > > > Multi-tenancy can be difficult particularly for complex use cases. Test, test > and test. And make sure you always correctly size your cluster with enough > nodes. > > You need to limit the number of tables to about 200 at the most (regardless > of the number of keyspaces). Having too many tables puts pressure on the heap > of each node. > > Good luck! > >> On Sun, Jun 11, 2017 at 2:07 AM, Govindarajan Srinivasaraghavan >> <govindragh...@gmail.com> wrote: >> Hi All, >> >> Just to give a background I'm working on a project where I need to store >> fast incoming time series data and have rest api's to query and serve the >> data to users when needed. The data as such is a single JSON which is 1kb in >> size and the data has to be purged after a specific time period (say few >> weeks or months). The incoming rate would be approximately 100k messages per >> second and the biggest challenge is the data should be query-able by >> multiple dimensions with sorting, paging and data dump options. >> >> I started looking into database options and felt like cassandra might be a >> good choice for my use case since the requirement needs faster writes. In >> order to query by multiple dimensions I had to insert the same record into >> multiple denormalized tables (around 8 tables). Now I need to implement >> multitenancy and having an extra column in the partition key to query by >> tenant will not work since there will be some tenants with huge amounts of >> data compared to the rest. My other option is to have the tenant identifier >> appended to the table names so that I can perform per teannt queries easily. >> >> Here are my questions for which I need some help. >> - Given my use case is cassandra the best suited one or is there any other >> database which suits my requirement better? >> - What would be best way to implement multi-tenancy? >> - Given that I need to query by multiple dimensions would denormalized >> tables work better or should I be using materialized views? >> - Anything else that I need to consider based on your experiences with >> cassandra? >> >> Thanks >