On Thu, Mar 26, 2020 at 05:56:56PM +0800, Richard Guo wrote:
Hello,
When calculating the disk costs of hash aggregation that spills to disk,
there is something wrong with how we determine depth:
depth = ceil( log(nbatches - 1) / log(num_partitions) );
If nbatches is some number between 1.0 and 2.0, we would have a negative
depth. As a result, we may have a negative cost for hash aggregation
plan node, as described in [1].
I don't think 'log(nbatches - 1)' is what we want here. Should it be
just '(nbatches - 1)'?
I think using log() is correct, but why should we allow fractional
nbatches values between 1.0 and 2.0? You either have 1 batch or 2
batches, you can't have 1.5 batches. So I think the issue is here
nbatches = Max((numGroups * hashentrysize) / mem_limit,
numGroups / ngroups_limit );
and we should probably do
nbatches = ceil(nbatches);
right after it.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services