You might find it better to use elasticsearch for your aggregate queries and analytics. Cassandra is more of just a data store. On Apr 22, 2015 4:42 PM, "Matthew Johnson" <matt.john...@algomi.com> wrote:
> Hi all, > > > > Currently we are setting up a “big” data cluster, but we are only going to > have a couple of servers to start with but we need to be able to scale out > quickly when usage ramps up. Previously we have used Hadoop/HBase for our > big data cluster, but since we are starting this one on only two nodes I > think Cassandra will be a much better fit, as Hadoop and HBase really need > at least 3 to achieve any sort of resilience (zookeeper quorum etc). > > > > My question is this: > > > > I have used Apache Phoenix as a JDBC layer on top of HBase, which allows > me to issue ad-hoc SQL-style queries. (eg count the number of times users > have clicked on a certain button after clicking a different button in the > last 3 weeks etc). My understanding is that CQL does not support this style > of adhoc aggregate querying out of the box. Is there a recommended way to > do count, sum, average etc without writing client code (in my case Java) > every time I want to run one? I have been looking at projects like Drill, > Spark etc that could potentially sit on top of Cassandra but without > actually setting everything up and testing them it is difficult to figure > out what they would give us. > > > > Does anyone else interactively issue adhoc aggregate queries to Cassandra, > and if so, what stack do you use? > > > > Thanks! > > Matt > > >