Hi DuyHai,

this is in response to the other points in your response.

My application is a real-time application. It monitors devices in the network 
and displays the top N devices for various parameters averaged over a time 
period. A query may involve anywhere from 10 to 50k devices, and anywhere from 
5 to 2000 intervals. We expect a query to take less than 2 seconds.

My impression was that Spark is aimed at larger scale analytics.

I am ok with the limitation on “group by”. I am intending to use async queries 
and token-aware load balancing to partition the query and execute it in 
parallel on each node.

Thanks…

Roger


From: DuyHai Doan [mailto:doanduy...@gmail.com]
Sent: Tuesday, June 06, 2017 12:31 AM
To: Roger Fischer (CW) <rfis...@brocade.com>
Cc: user@cassandra.apache.org
Subject: Re: Order by for aggregated values

First Group By is only allowed on partition keys and clustering columns, not on 
arbitrary column. The internal implementation of group by tries to fetch data 
on clustering order to avoid having to "re-sort" them in memory which would be 
very expensive

Second, group by works best when restricted to a single partition other wise it 
will force Cassandra to do a range scan so poor performance


For all of those reasons I don't expect an "order by" on aggregated values to 
be available any soon

Furthermore, Cassandra is optimised for real-time transactional scenarios, the 
group by/order by/limit is typically a classical analytics scenario, I would 
recommend to use the appropriate tool like Spark for that


Le 6 juin 2017 04:00, "Roger Fischer (CW)" 
<rfis...@brocade.com<mailto:rfis...@brocade.com>> a écrit :
Hello,

is there any intent to support “order by” and “limit” on aggregated values?

For time series data, top n queries are quite common. Group-by was the first 
step towards supporting such queries, but ordering by value and limiting the 
results are also required.

Thanks…

Roger




Reply via email to