Now you can say that'd be solved by bumping the cost up, sure. But
obviously the row / cost model is pretty much out of whack here, I don't
see how we can make reasonable decisions in a trivial query that has a
misestimation by five orders of magnitude.
Before JIT, it didn't matter whether the costing was wrong, provided
that the path with the lowest cost was the cheapest path (or at least
close enough to the cheapest path not to bother anyone). Now it does.
If the intended path is chosen but the costing is higher than it
should be, JIT will erroneously activate. If you had designed this in
such a way that we added separate paths for the JIT and non-JIT
versions and the JIT version had a bigger startup cost but a reduced
runtime cost, then you probably would not have run into this issue, or
at least not to the same degree. But as it is, JIT activates when the
plan looks expensive, regardless of whether activating JIT will do
anything to make it cheaper. As a blindingly obvious example, turning
on JIT to mitigate the effects of disable_cost is senseless, but as
you point out, that's exactly what happens right now.
I'd guess that, as you read this, you're thinking, well, but if I'd
added JIT and non-JIT paths for every option, it would have doubled
the number of paths, and that would have slowed the planner down way
too much. That's certainly true, but my point is just that the
problem is probably not as simple as "the defaults are too low". I
think the problem is more fundamentally that the model you've chosen
is kinda broken. I'm not saying I know how you could have done any
better, but I do think we're going to have to try to figure out
something to do about it, because saying, "check-pg_upgrade is 4x
slower, but that's just because of all those bad estimates" is not
going to fly. Those bad estimates were harmlessly bad before, and now
they are harmfully bad, and similar bad estimates are going to exist
in real-world queries, and those are going to be harmful now too.
Blaming the bad costing is a red herring. The problem is that you've
made the costing matter in a way that it previously didn't.
My 0.02€ on this interesting subject.
Historically, external IOs, ak rotating disk accesses, have been the main
cost (by several order of magnitude) of executing database queries, and
cpu costs are relatively very low in most queries. The point of the query
planner is mostly to avoid very bad path wrt to IOs.
Now, even with significanly faster IOs, eg SSD's, IOs are still a few
order of magnitude slower, but less so, so cpu may matter more.
Now again, for small database data are often in memory and stay there, in
which case CPU is the only cost.
This would suggest the following approach to evaluating costs in the
planner:
(1) are the needed data already in memory? if so use cpu only costs this
implies that the planner would know about it... which is probably not the
case.
(2) if not, then optimise for IOs first, because they are likely to
be the main cost driver anyway.
(3) once an "IO-optimal" (eg not too bad) plan is selected, consider
whether to apply JIT to part of it: if cpu costs are significant and some
parts are likely to be executed a lot, with a significant high margin
because JIT costs.
Basically, I'm suggesting to reevaluate the selected plan, without
changing it, with a JIT cost to improve it, as a second stage.
--
Fabien.