Hi All,

Just to give a background I'm working on a project where I need to store
fast incoming time series data and have rest api's to query and serve the
data to users when needed. The data as such is a single JSON which is 1kb
in size and the data has to be purged after a specific time period (say few
weeks or months). The incoming rate would be approximately 100k messages
per second and the biggest challenge is the data should be query-able by
multiple dimensions with sorting, paging and data dump options.

I started looking into database options and felt like cassandra might be a
good choice for my use case since the requirement needs faster writes. In
order to query by multiple dimensions I had to insert the same record into
multiple denormalized tables (around 8 tables). Now I need to implement
multitenancy and having an extra column in the partition key to query by
tenant will not work since there will be some tenants with huge amounts of
data compared to the rest. My other option is to have the tenant identifier
appended to the table names so that I can perform per teannt queries
easily.

Here are my questions for which I need some help.
- Given my use case is cassandra the best suited one or is there any other
database which suits my requirement better?
- What would be best way to implement multi-tenancy?
- Given that I need to query by multiple dimensions would denormalized
tables work better or should I be using materialized views?
- Anything else that I need to consider based on your experiences with
cassandra?

Thanks

Reply via email to