On Tue, 2018-06-05 at 07:04 +0200, Tomas Vondra wrote: > I expect the eviction strategy to be the primary design challenge of > this patch. The other bits will be mostly determined by this one > piece.
Not sure I agree that this is the primary challenge. The cases that benefit from eviction are probably a minority. I see two categories that would benefit: * Natural clustering in the heap. This sounds fairly common, but a lot of the cases that come to mind are too low-cardinality to be compelling; e.g. timestamps grouped by hour/day/month. If someone has run into a high-cardinality natural grouping case, let me know. * ARRAY_AGG (or similar): individual state values can be large enough that we need to evict to avoid exceeding work_mem even if not adding any new groups. In either case, it seems like a fairly simple eviction strategy would work. For instance, we could just evict the entire hash table if work_mem is exceeded or if the hit rate on the hash table falls below a certain threshold. If there was really something important that should have stayed in the hash table, it will go back in soon anyway. So why should eviction be a major driver for the entire design? I agree it should be an area of improvement for the future, so let me know if you see a major problem, but I haven't been as focused on eviction. > While the primary goal of the patch is addressing the OOM risks in > hash > aggregate, I think it'd be a mistake to see it just that way. I see > it > could allow us to perform hash aggregate more often, even if we know > the > groups won't fit into work_mem. If we could estimate how much of the > aggregate state we'll have to spill to disk, we could still prefer > hashagg over groupagg. We would pay the price for eviction, but on > large > data sets that can be massively cheaper than having to do sort. Agreed. I ran some tests of my patch in the last round, and they strongly supported choosing HashAgg a lot more often. A lot of sort improvements have been made though, so I really need to run some new numbers. > far away), but it would be unfortunate to make this improvement > impossible/more difficult in the future. If you see anything that would make this difficult in the future, let me know. Regards, Jeff Davis