Re: [HACKERS] Weirdly pesimistic estimates in optimizer

David Kubečka Wed, 04 Mar 2015 15:09:42 -0800

Hi Tomas and others,

2015-03-02 21:29 GMT+01:00 Tomas Vondra <tomas.von...@2ndquadrant.com>:


> Hi David ;-)
>
> On 2.3.2015 20:19, David Kubečka wrote:
> >
> > The question is why optimizer, or rather the cost estimator,
> > produced so much different estimates upon very small change in input.
> > Moreover it seems that the main culprit of bad estimates isn't
> > actually directly related to outer table, but rather to inner table.
> > Just compare estimates for the two index scans:
> >
> > With 'random_fk_dupl':
> >          ->  Index Scan using facts_fk_idx on facts  (cost=0.42..5.75
> > rows=100 width=15) (actual time=0.009..0.117 rows=98 loops=100)
> > With 'random_fk_uniq':
> >          ->  Index Scan using facts_fk_idx on facts  (cost=0.42..214.26
> > rows=100 width=15) (actual time=0.007..0.109 rows=98 loops=100)
> >
> > I have read the optimizer README file and also looked briefly at the
> >  code, but this seems to be something not related to particular
> > implementation of algorithm (e.g. nested loop). Perhaps it's the way
> > how cost estimates are propagated down (or sideways? that would be
> > weird...) the query tree. But I am really not sure, since this is my
> > first time lookng at the optimizer code base. I should also add that
> > I have reproduced this behaviour for all versions of Pg from 9.2 up
> > to current devel.
>
> Interesting. I've only spent a few minutes looking at this, but it
> certainly is a bit strange that using smaller table with unique values
> results in a slower plan than using a table with duplicities (but the
> same number of unique values).
>

Yep. And I think that I have found the cause of the strangeness. I have
traced the planner code and this is how cost_index from costsize.c is
called when computing the cost of index scan on outer (facts) table, when
the inner table is with duplicate entries (random_fk_dupl):

cost_index (path=0x27f8fb8, root=0x27670e8, loop_count=9920)

The important thing here is the value of loop_count which is the (perfect
estimate of) number of rows of random_fk_dupl (I have recreated the table
so it differs slightly from previously posted version (9893)). Now the
predominant part of running cost is computed by this expression:

run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);

Since csquared (indexCorrelation * indexCorrelation) is very small in this
case, the biggest term here is max_IO_cost, which is computed as follows:

max_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;

There is division by loop_count because of predicted effect of caching and
it is exactly this division which makes the run_cost for single index
lookup so low compared with the query version with random_fk_uniq.  So the
main problem is that the planner calls cost_index with loop_count equal to
number of rows of inner table, *but it ignores semi-join semantics*, i.e.
it doesn't account for the number of *unique* rows in the inner table which
will actually be the number of loops.

Since this is my first time looking into the optimizer (and in fact any
postgres) code I am not yet able to locate the exact place where this
should be repaired, but I hope that in few days I will have a patch :-)

>
> Another observation is that the planner does see other plans, because
> tweaking the cost variables like this:
>
>     SET random_page_cost = 2;
>
> produces this plan (which on my machine runs just a tad slower than the
> first query in your post):
>
>                             QUERY PLAN
> ---------------------------------------------------------------------
>  HashAggregate  (cost=10564.81..10688.47 rows=9893 width=15)
>    Group Key: facts.fk
>    ->  Nested Loop  (cost=5.42..10515.34 rows=9893 width=15)
>        ->  HashAggregate  (cost=2.24..3.23 rows=99 width=4)
>              Group Key: random_fk_uniq.fk
>              ->  Seq Scan on random_fk_uniq  (cost=0.00..1.99 ...)
>        ->  Bitmap Heap Scan on facts  (cost=3.18..105.18 rows=100 ...)
>            Recheck Cond: (fk = random_fk_uniq.fk)
>            ->  Bitmap Index Scan on facts_fk_idx  (cost=0.00..3.15 ...)
>                      Index Cond: (fk = random_fk_uniq.fk)
>

Yeah, this makes now perfect sense to me. From the run_cost and max_IO_cost
formulas you can see that (in this case!) random_page_cost is almost
exactly the linear factor here. So eventually the results are completely
opposite than I wished at first, i.e. instead of tweaking postgres to
produce lower costs for random_fk_uniq it will start producing higher costs
for random_fk_dupl. But I now believe that this is correct.

Regards,

-David


> I can get similar results by setting cpu_operator_cost=0.005. And
> further lowering to random_page_cost to 1.1 actually gets me this:
>
>                             QUERY PLAN
> ---------------------------------------------------------------------
>  HashAggregate  (cost=6185.41..6309.08 rows=9893 width=15)
>    Group Key: facts.fk
>    ->  Nested Loop  (cost=2.66..6135.95 rows=9893 width=15)
>       ->  HashAggregate  (cost=2.24..3.23 rows=99 width=4)
>             Group Key: random_fk_uniq.fk
>             ->  Seq Scan on random_fk_uniq  (cost=0.00..1.99 ...)
>       ->  Index Scan using facts_fk_idx on facts  (cost=0.42..60.95 ...)
>                Index Cond: (fk = random_fk_uniq.fk)
>
> which is exactly the first plan from your post.
>
> I still don't understand why using the smaller table produces less
> efficient plan, but the estimated costs are very clost (20013 vs. 18147
> is very small difference in this context), and the estimator shoots for
> minimal average error. So it may seem 'weirdly pessimistic' but it might
> be equally optimistic for other queries.
>
> Also, keep in mind that the cost model is just a simplification of
> reality, so it can't be correct 100% of the time.
>
> The fact that this small difference in costs results in significant
> duration difference suggests that the default optimizer cost values do
> not reflect your environment. For example, you assume that this is very
> fast:
>
>     NestLoop
>         -> Seq Scan on A
>         -> Index Scan using B_Y_IDX on B
>             Index Condition: B.Y = A.X
>
> but index scans often produce a lot of random I/O, and that may be quite
> expensive if it really hits the storage. And that's what the default
> values (random_page_cost=4) is set for. But if you're on a system with
> lots of RAM, or SSD, ... that simply is not the case, and may penalize
> index scans (and prefer more sequential workloads).
>
> The other thing you might do is creating index on (fk,f), which on the
> new releases produces this:
>
>                            QUERY PLAN
> -------------------------------------------------------------------------
>  HashAggregate  (cost=783.77..907.43 rows=9893 width=15)
>    Group Key: facts.fk
>    ->  Nested Loop  (cost=2.66..734.30 rows=9893 width=15)
>        ->  HashAggregate  (cost=2.24..3.23 rows=99 width=4)
>            Group Key: random_fk_uniq.fk
>            ->  Seq Scan on random_fk_uniq  (cost=0.00..1.99 ...)
>        ->  Index Only Scan using facts_2_idx on facts  (cost=...)
>                Index Cond: (fk = random_fk_uniq.fk)
>
> Which limits the amount of random I/O by only scanning the index, and is
> even faster than the first query.
>
> regard
>
> --
> Tomas Vondra                http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Weirdly pesimistic estimates in optimizer

Reply via email to