Real-time Web Analysis tool using Cassandra. Doubts...

Paulo Gabriel Poiati Tue, 11 May 2010 11:53:41 -0700

Hi all.

I thinking about implementing a real-time WA tool using Cassandra as my
storage. But i have some questions first.


I'm considering Cassandra because of its excellent write performance,
horizontal scalability and its tunable consistency level.

- First of all, my first thoughts is to have two CF one for raw client
request (~10 millions++ per day) and other for aggregated metrics in some
defined inteval time like 1min, 5min, 15min... Is this a good approach ?

- It is a good idea to use a OrderPreservingPartitioner ? To maintain the
order of my requests in the raw data CF ? Or the overhead is too big.

- Initially the cluster will contain only three nodes, is it a problem (to
few maybe) ?

- I think the best way to do the aggregation job is through a hadoop
MapReduce job. Right ? Is there any other way to consider ?

- Is really Cassandra suitable for it ? Maybe HBase is better in this case?

Any other fact that u guys want to make me aware of, plz do it.

Tks,
Paulo Poiati.

Real-time Web Analysis tool using Cassandra. Doubts...

Reply via email to