>
>
> My application is a real-time application. It monitors devices in the
> network and displays the top N devices for various parameters averaged over
> a time period. A query may involve anywhere from 10 to 50k devices, and
> anywhere from 5 to 2000 intervals. We expect a query to take less than 2
> seconds.
>
>
>
> My impression was that Spark is aimed at larger scale analytics.
>
>
>
> I am ok with the limitation on “group by”. I am intending to use async
> queries and token-aware load balancing to partition the query and execute
> it in parallel on each node.
>
>
>

This sounds a lot more like a use case for a streaming system (run in
parallel with Cassandra).

Apache Flink might be one avenue to explore - their Cassandra integration
works fine, btw.

A lot of folks are doing similar things with Apache Beam as well as it has
quite an elegant paradigm for the use case you describe, particularly if
you need to combine batching with streaming. (FYI, their "CassandraIO" is
about to be merged in master:
https://github.com/apache/beam/pull/592#issuecomment-306618338).


-- 
-----------------
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Reply via email to