Hello, our company has a huge table in a relational database which keeps statistics of some financional operations. It looks like the following: SERVER_ID - server, which served the transaction ACCOUNT_FROM - account1 ACCOUNT_TO - account2 HOUR - time range for this statistics row (from 0 minutes to 59 minutes of the hour) SERVICE_ID - a service GATEWAY_ID - a gateway TRANSACTIONS_COUNT - transactions count per the hour AMOUNT - $ per the hour
There are different reports, which use data from the table - examples are: show the amount per account, show the transactions count per server and so on.. Only TRANSACTIONS_COUNT and AMOUNT columns is actually "data columns", others act like a "filter" to group the results. The table is huge and some queries run more than 2 minutes - this is unacceptable, we need almost real time responses. Moreover, the cost of database servers' support grows and grows. So there is an idea to move statistical data of this kind to the Cassandra. The question is - what is the correct data model for the Cassandra? I'm thinking of creating a SuperColumn consisted of all columns described above and map an auto generated key (ROW_ID) to it. Then create all indexes like SERVER_ID -> ROW_ID, ACCOUNT_FROM -> ROW_ID and so on. But it looks a little bit weird. How to do things right? ) Thanks in advance, Alex