2018-08-01 16:59 GMT+02:00 David Rowley <david.row...@2ndquadrant.com>:

> On 2 August 2018 at 02:48, Guillaume Lelarge <guilla...@lelarge.info>
> wrote:
> > EXPLAIN (ANALYZE) SELECT DISTINCT * FROM gleu2;
> >
> >                                                        QUERY PLAN
> > ------------------------------------------------------------
> ---------------------------------------------------------
> >  Unique  (cost=12005.97..12049.20 rows=1 width=1430) (actual
> > time=20055.294..20323.348 rows=1 loops=1)
> >    ->  Sort  (cost=12005.97..12006.30 rows=132 width=1430) (actual
> > time=20055.290..20105.738 rows=60000 loops=1)
> >          Sort Key: (... 130 columns ...)
> >          Sort Method: external sort  Disk: 84464kB
> >          ->  Seq Scan on gleu2  (cost=0.00..12001.32 rows=132 width=1430)
> > (actual time=0.109..114.142 rows=60000 loops=1)
> >  Planning time: 10.012 ms
> >  Execution time: 20337.854 ms
> > (7 rows)
> >
> > That looks quite good. The total cost is 12049, so I expect this plan to
> > have the smaller cost as it's the choosen plan. Now, I'm disabling Sort,
> and
> > here is what I get:
> >
> > SET enable_sort TO off;
> > EXPLAIN (ANALYZE) SELECT DISTINCT * FROM gleu2;
> >
> >                                                        QUERY PLAN
> > ------------------------------------------------------------
> ---------------------------------------------------------
> >  HashAggregate  (cost=12044.22..12044.23 rows=1 width=1430) (actual
> > time=508.342..508.343 rows=1 loops=1)
> >    Group Key: (... 130 columns ...)
> >    ->  Seq Scan on gleu2  (cost=0.00..12001.32 rows=132 width=1430)
> (actual
> > time=0.036..57.088 rows=60000 loops=1)
> >  Planning time: 1.335 ms
> >  Execution time: 508.529 ms
> > (5 rows)
>
> When the planner adds a new path it compares the cost not exactly, but
> with a 'fuzz factor'.  It's very likely that the hashagg path did not
> make it as it was not fuzzily any cheaper than the unique path. By
> default, this fuzz factor is 1%.
>
>
That may definitely be it. There's not much of a difference in the total
costs.

It seems in your case the costs don't quite match reality which is
> quite likely due to the poor row estimates on "gleu2".  Has that table
> been analyzed recently? or is there some reason that auto-vacuum is
> not getting to it?
>
>
This is a small test case of a much bigger query joining a large number of
tables, materialized views, views (calling functions), etc. The actual plan
contains 84 nodes (32 scans, lots of joins, and a few other nodes). The
mis-estimate is to follow what the big query gives me.

There's a bit more reading of what I'm talking about in
> https://github.com/postgres/postgres/blob/master/src/
> backend/optimizer/util/pathnode.c#L141
>
>
I'm gonna read that. Thank you.


-- 
Guillaume.

Reply via email to