Are there plans to build-in some sort of map-reduce framework into Cassandra and CQL? It seems that users should be able to apply a Java method to selected rows in parallel on the distributed Cassandra JVMs. I believe Solandra uses such an integration.
Don ________________________________________ From: Alessio Cecchi [ales...@skye.it] Sent: Friday, February 17, 2012 4:42 AM To: user@cassandra.apache.org Subject: General questions about Cassandra Hi, we have developed a software that store logs from mail servers in MySQL, but for huge enviroments we are developing a version that store this data in HBase. Raw logs are, once a day, first normalized, so the output is like this: username,date of login, IP Address, protocol username,date of login, IP Address, protocol username,date of login, IP Address, protocol [...] and after inserted into the database. As I was saying, for huge installation (from 1 to 10 million of logins per day, keep for 12 months) we are working with HBase, but I would also consider Cassandra. The advantage of HBase is MapReduce which makes searching the logs very fast by splitting the "query" concurrently on multiple hosts. Query will be launched from a web interface (will be few requests per day) and the search keys are user and time range. But Cassandra seems less complex to manage and simply to run, so I want to evaluate it instead of HBase. My question is, can also Cassandra split a "query" over the cluster like MapReduce? Reading on-line Cassandra seems fast in insert data but slower than HBase to "query". Is it really so? We want not install Hadoop over Cassandra. Any suggestion is welcome :-) -- Alessio Cecchi is: @ ILS -> http://www.linux.it/~alessice/ on LinkedIn -> http://www.linkedin.com/in/alessice Assistenza Sistemi GNU/Linux -> http://www.cecchi.biz/ @ PLUG -> ex-Presidente, adesso senatore a vita, http://www.prato.linux.it @ LOLUG -> Socio http://www.lolug.net