Thanks for the pointer on internal paging Tyler, I missed this one. But then it raises some questions:
1. Is it possible to "tune" the page size or is it hard-coded internally ? 2. Is read-repair performed on EACH page or is it done on the whole requested rows once they are fetched ? Question 2. is relevant in some particular scenarios when the user is using CL QUORUM (or more) and some replicas are out-of-sync. Even in the case of aggregation over a single partition, if this partition is wide and spans many fetch pages, the time the coordinator performs all the read-repair and reconcile over QUORUM replicas, the query may timeout very quickly. On Fri, Dec 18, 2015 at 5:26 PM, Tyler Hobbs <ty...@datastax.com> wrote: > > On Fri, Dec 18, 2015 at 9:17 AM, DuyHai Doan <doanduy...@gmail.com> wrote: > >> Cassandra will perform a full table scan and fetch all the data in memory >> to apply the aggregate function. > > > Just to clarify for others on the list: when executing aggregation > functions, Cassandra *will* use paging internally, so at most one page > worth of data will be held in memory at a time. However, if your > aggregation function retains a large amount of data, this may contribute to > heap pressure. > > > -- > Tyler Hobbs > DataStax <http://datastax.com/> >