Hi All, Just to give a background I'm working on a project where I need to store fast incoming time series data and have rest api's to query and serve the data to users when needed. The data as such is a single JSON which is 1kb in size and the data has to be purged after a specific time period (say few weeks or months). The incoming rate would be approximately 100k messages per second and the biggest challenge is the data should be query-able by multiple dimensions with sorting, paging and data dump options.
I started looking into database options and felt like cassandra might be a good choice for my use case since the requirement needs faster writes. In order to query by multiple dimensions I had to insert the same record into multiple denormalized tables (around 8 tables). Now I need to implement multitenancy and having an extra column in the partition key to query by tenant will not work since there will be some tenants with huge amounts of data compared to the rest. My other option is to have the tenant identifier appended to the table names so that I can perform per teannt queries easily. Here are my questions for which I need some help. - Given my use case is cassandra the best suited one or is there any other database which suits my requirement better? - What would be best way to implement multi-tenancy? - Given that I need to query by multiple dimensions would denormalized tables work better or should I be using materialized views? - Anything else that I need to consider based on your experiences with cassandra? Thanks