On Sun, 2020-08-30 at 17:03 +0200, Tomas Vondra wrote: > So I'm wondering if the hashagg is not ignoring similar non-I/O costs > for the spilling case. In particular, the initial section computing > startup_cost seems to ignore that we may need to so some of the stuff > repeatedly - for example we'll repeat hash lookups for spilled > tuples, > and so on.
To fix that, we'd also need to change the cost of in-memory HashAgg, right? > The other thing is that sort seems to be doing only about half the > physical I/O (as measured by iosnoop) compared to hashagg, even > though > the estimates of pages / input_bytes are exactly the same. For > hashagg > the iosnoop shows 5921MB reads and 7185MB writes, while sort only > does > 2895MB reads and 3655MB writes. Which kinda matches the observed > sizes > of temp files in the two cases, so the input_bytes for sort seems to > be > a bit overestimated. Hmm, interesting. How reasonable is it to be making these kinds of changes to the cost model right now? I think your analysis is solid, but I'm worried about making more intrusive changes very late in the cycle. I had originally tried to limit the cost model changes to the new plans I am introducing -- that is, HashAgg plans expected to require disk. That's why I came up with a rather arbitrary penalty. Regards, Jeff Davis