Correct data model for Cassandra

Oleg Ivanov Wed, 28 Apr 2010 07:33:21 -0700

Hello,

our company has a huge table in a relational database which keeps statistics
of some financional operations.
It looks like the following:
SERVER_ID - server, which served the transaction
ACCOUNT_FROM - account1
ACCOUNT_TO - account2
HOUR - time range for this statistics row (from 0 minutes to 59 minutes of
the hour)
SERVICE_ID - a service
GATEWAY_ID - a gateway
TRANSACTIONS_COUNT - transactions count per the hour
AMOUNT -  $ per the hour


There are different reports, which use data from the table - examples are:
show the amount per account, show the transactions count per server and so
on.. Only TRANSACTIONS_COUNT and AMOUNT columns is actually "data columns",
others act like a "filter" to group the results.
The table is huge and some queries run more than 2 minutes - this is
unacceptable, we need almost real time responses. Moreover, the cost of
database servers' support grows and grows. So there is an idea to move
statistical data of this kind to the Cassandra.

The question is - what is the correct data model for the Cassandra?

I'm thinking of creating a SuperColumn consisted of all columns described
above and map an auto generated key (ROW_ID) to it. Then create all indexes
like SERVER_ID -> ROW_ID, ACCOUNT_FROM -> ROW_ID and so on. But it looks a
little bit weird.

How to do things right? )

Thanks in advance,
Alex

Correct data model for Cassandra

Reply via email to