Hi, On 2018-06-04 22:18:56 -0700, Jeff Davis wrote: > On Mon, 2018-06-04 at 11:52 -0700, Andres Freund wrote: > > I wonder whether, at least for aggregates, the better fix wouldn't be > > to > > switch to feeding the tuples into tuplesort upon memory exhaustion > > and > > doing a sort based aggregate. We have most of the infrastructure to > > do > > That's an interesting idea, but it seems simpler to stick to hashing > rather than using a combination strategy. It also seems like it would > take less CPU effort.
Isn't the locality of access going to considerably better with the sort based approach? > What advantages do you have in mind? My patch partitions the spilled > data, so it should have similar disk costs as a sort approach. I think one part of it is that I think the amount of code is going to be lower - we essentially have already all the code to handle sort based aggs, and to have both sort and hash based aggs in the same query. We'd mostly need a way to scan the hashtable and stuff it into a tuplesort, that's not hard. nodeAgg.c is already more than complex enough, I'm not sure that full blown partitioning is worth the cost. Greetings, Andres Freund