Hi,
we have developed a software that store logs from mail servers in MySQL,
but for huge enviroments we are developing a version that store this
data in HBase. Raw logs are, once a day, first normalized, so the output
is like this:
username,date of login, IP Address, protocol
username,date of login, IP Address, protocol
username,date of login, IP Address, protocol
[...]
and after inserted into the database.
As I was saying, for huge installation (from 1 to 10 million of logins
per day, keep for 12 months) we are working with HBase, but I would also
consider Cassandra.
The advantage of HBase is MapReduce which makes searching the logs very
fast by splitting the "query" concurrently on multiple hosts.
Query will be launched from a web interface (will be few requests per
day) and the search keys are user and time range.
But Cassandra seems less complex to manage and simply to run, so I want
to evaluate it instead of HBase.
My question is, can also Cassandra split a "query" over the cluster like
MapReduce? Reading on-line Cassandra seems fast in insert data but
slower than HBase to "query". Is it really so?
We want not install Hadoop over Cassandra.
Any suggestion is welcome :-)
--
Alessio Cecchi is:
@ ILS -> http://www.linux.it/~alessice/
on LinkedIn -> http://www.linkedin.com/in/alessice
Assistenza Sistemi GNU/Linux -> http://www.cecchi.biz/
@ PLUG -> ex-Presidente, adesso senatore a vita, http://www.prato.linux.it
@ LOLUG -> Socio http://www.lolug.net