[ https://issues.apache.org/jira/browse/CASSANDRA-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366504#comment-14366504 ]
Benedict commented on CASSANDRA-8826: ------------------------------------- I disagree that this is outside of scope for Cassandra. Spark and Hadoop are not designed for answering realtime aggregation queries over medium sized timeseries, and nor are coordinator level aggregations. IMO this is something we should be aiming to do sooner than later, and is probably not _that_ hard with the repair-aware consistency levels suggested by [~tjake]. Whether or not we deliver this now, users are likely to abuse aggregations without realising they are only acceptable for a very constrained kind of workload (very small slices), and we will find ourselves again behind the user demand curve (trying to deliver a feature that works as used, rather than as first envisaged). NB: Pushing aggregations lower into the engine may also permit some very significant optimisations at a later date, especially with columnar storage, so even single node queries would be helped. > Distributed aggregates > ---------------------- > > Key: CASSANDRA-8826 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8826 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Robert Stupp > Priority: Minor > > Aggregations have been implemented in CASSANDRA-4914. > All calculation is performed on the coordinator. This means, that all data is > pulled by the coordinator and processed there. > This ticket's about to distribute aggregates to make them more efficient. > Currently some related tickets (esp. CASSANDRA-8099) are currently in > progress - we should wait for them to land before talking about > implementation. > Another playgrounds (not covered by this ticket), that might be related is > about _distributed filtering_. -- This message was sent by Atlassian JIRA (v6.3.4#6332)