Hi all,


Currently we are setting up a “big” data cluster, but we are only going to
have a couple of servers to start with but we need to be able to scale out
quickly when usage ramps up. Previously we have used Hadoop/HBase for our
big data cluster, but since we are starting this one on only two nodes I
think Cassandra will be a much better fit, as Hadoop and HBase really need
at least 3 to achieve any sort of resilience (zookeeper quorum etc).



My question is this:



I have used Apache Phoenix as a JDBC layer on top of HBase, which allows me
to issue ad-hoc SQL-style queries. (eg count the number of times users have
clicked on a certain button after clicking a different button in the last 3
weeks etc). My understanding is that CQL does not support this style of
adhoc aggregate querying out of the box. Is there a recommended way to do
count, sum, average etc without writing client code (in my case Java) every
time I want to run one? I have been looking at projects like Drill, Spark
etc that could potentially sit on top of Cassandra but without actually
setting everything up and testing them it is difficult to figure out what
they would give us.



Does anyone else interactively issue adhoc aggregate queries to Cassandra,
and if so, what stack do you use?



Thanks!

Matt

Reply via email to