What’s the root cause of this many queries? Is this because of multi tenancy or multiple processes ?
It’s possible to potentially logically group some of this data if you use collections / sets inside a column. That works if the data is of a similar structure of a similar query. It’s “semi-normalization” where you are leveraging the collection / set as a way to store the structure and the table as a way to partition and cluster the data. Potentially you’d need some “index” tables where you’d query it first to get the partitions you need. Would you benefit from creating separate logical clusters? How much data do these queries return? If not a lot consider materializing the output into more general “cache” tables with set / collection columns when data is shoved when data is updated via triggers or spark. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Feb 18, 2018, 6:38 AM -0500, onmstester onmstester <onmstes...@zoho.com>, wrote: > I have a single structured row as input with rate of 10K per seconds. Each > row has 20 columns. Some queries should be answered on these inputs. Because > most of queries needs different where, group by or orderby, The final data > model ended up like this: > > primary key for table of query1 : ((column1,column2),column3,column4) > > primary key for table of query2 : ((column3,column4),column2,column1) > > and so on > > > > I am aware of the limit in number of tables in cassandra data model (200 is > > warning and 500 would fail) Because for every input row i should do an > > insert in every table, the final write per seconds became big * big data!: > > write per seconds = 10K (input) * number of tables (queries) * replication > > factor > > > > The main question: am i in the right path? is this normal to have a table > > for every query even when the input rate is already so high? Shouldn't i > > use something like spark or hadoop upon instead of relying on bare > > datamodel Or event Hbase instead of cassandra? > > > > Sent using Zoho Mail > > > >