Best approach in Cassandra (+ Spark?) for Continuous Queries?

Hugo José Pinto Sat, 03 Jan 2015 02:49:27 -0800

Hello.

We're currently using Hazelcast (http://hazelcast.org/) as a distributed
in-memory data grid. That's been working sort-of-well for us, but going
solely in-memory has exhausted its path in our use case, and we're
considering porting our application to a NoSQL persistent store. After the
usual comparisons and evaluations, we're borderline close to picking
Cassandra, plus eventually Spark for analytics.


Nonetheless, there is a gap in our architectural needs that we're still not
grasping how to solve in Cassandra (with or without Spark): Hazelcast
allows us to create a Continuous Query in that, whenever a row is
added/removed/modified from the clause's resultset, Hazelcast calls up back
with the corresponding notification. We use this to continuously update the
clients via AJAX streaming with the new/changed rows.

This is probably a conceptual mismatch we're making, so - how to best
address this use case in Cassandra (with or without Spark's help)? Is there
something in the API that allows for Continuous Queries on key/clause
changes (haven't found it)? Is there some other way to get a stream of
key/clause updates? Events of some sort?

I'm aware that we could, eventually, periodically poll Cassandra, but in
our use case, the client is potentially interested in a large number of
table clause notifications (think "all changes to Ship positions on
California's coastline"), and iterating out of the store would kill the
streamer's scalability.

Hence, the magic question: what are we missing? Is Cassandra the wrong tool
for the job? Are we not aware of a particular part of the API or external
library in/outside the apache realm that would allow for this?

Many thanks for any assistance!

Hugo

Best approach in Cassandra (+ Spark?) for Continuous Queries?

Reply via email to