Hi all. I thinking about implementing a real-time WA tool using Cassandra as my storage. But i have some questions first.
I'm considering Cassandra because of its excellent write performance, horizontal scalability and its tunable consistency level. - First of all, my first thoughts is to have two CF one for raw client request (~10 millions++ per day) and other for aggregated metrics in some defined inteval time like 1min, 5min, 15min... Is this a good approach ? - It is a good idea to use a OrderPreservingPartitioner ? To maintain the order of my requests in the raw data CF ? Or the overhead is too big. - Initially the cluster will contain only three nodes, is it a problem (to few maybe) ? - I think the best way to do the aggregation job is through a hadoop MapReduce job. Right ? Is there any other way to consider ? - Is really Cassandra suitable for it ? Maybe HBase is better in this case? Any other fact that u guys want to make me aware of, plz do it. Tks, Paulo Poiati.