On Fri, 2019-08-02 at 14:44 +0800, Adam Lee wrote: > I'm late to the party.
You are welcome to join any time! > These two approaches both spill the input tuples, what if the skewed > groups are not encountered before the hash table fills up? The spill > files' size and disk I/O could be downsides. Let's say the worst case is that we encounter 10 million groups of size one first; just enough to fill up memory. Then, we encounter a single additional group of size 20 million, and need to write out all of those 20 million raw tuples. That's still not worse than Sort+GroupAgg which would need to write out all 30 million raw tuples (in practice Sort is pretty fast so may still win in some cases, but not by any huge amount). > Greenplum spills all the groups by writing the partial aggregate > states, > reset the memory context, process incoming tuples and build in-memory > hash table, then reload and combine the spilled partial states at > last, > how does this sound? That can be done as an add-on to approach #1 by evicting the entire hash table (writing out the partial states), then resetting the memory context. It does add to the complexity though, and would only work for the aggregates that support serializing and combining partial states. It also might be a net loss to do the extra work of initializing and evicting a partial state if we don't have large enough groups to benefit. Given that the worst case isn't worse than Sort+GroupAgg, I think it should be left as a future optimization. That would give us time to tune the process to work well in a variety of cases. Regards, Jeff Davis