Re: Query became very slow after 9.6 -> 10 upgrade

Dmitry Shalashov Thu, 23 Nov 2017 06:59:20 -0800

We tried to apply the patch on 10.1 source, but something is wrong it seems:


patch -p1 < ../1.patch
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/backend/optimizer/plan/analyzejoins.c
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/backend/utils/adt/selfuncs.c
Hunk #1 succeeded at 3270 (offset -91 lines).
Hunk #2 succeeded at 3304 (offset -91 lines).
Hunk #3 succeeded at 3313 (offset -91 lines).
Hunk #4 succeeded at 3393 (offset -91 lines).
patch unexpectedly ends in middle of line
Hunk #5 succeeded at 3570 with fuzz 1 (offset -91 lines).


Dmitry Shalashov, relap.io & surfingbird.ru

2017-11-23 2:07 GMT+03:00 Tom Lane <[email protected]>:

> Dmitry Shalashov <[email protected]> writes:
> > Turns out we had not 9.6 but 9.5.
>
> I'd managed to reproduce the weird planner behavior locally in the
> regression database:
>
> regression=# create table foo (f1 int[], f2 int);
> CREATE TABLE
> regression=# explain select * from tenk1 where unique2 in (select distinct
> unnest(f1) from foo where f2=1);
>                                     QUERY PLAN
> ------------------------------------------------------------
> -----------------------
>  Nested Loop  (cost=30.85..80.50 rows=6 width=244)
>    ->  HashAggregate  (cost=30.57..30.63 rows=6 width=4)
>          Group Key: (unnest(foo.f1))
>          ->  HashAggregate  (cost=30.42..30.49 rows=6 width=4)
>                Group Key: unnest(foo.f1)
>                ->  ProjectSet  (cost=0.00..28.92 rows=600 width=4)
>                      ->  Seq Scan on foo  (cost=0.00..25.88 rows=6
> width=32)
>                            Filter: (f2 = 1)
>    ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.29..8.30 rows=1
> width=244)
>          Index Cond: (unique2 = (unnest(foo.f1)))
> (10 rows)
>
> Digging into it, the reason for the duplicate HashAggregate step was that
> query_supports_distinctness() punted on SRFs-in-the-targetlist, basically
> on the argument that it wasn't worth extra work to handle that case.
> Thinking a bit harder, it seems to me that the correct analysis is:
> 1. If we are proving distinctness on the grounds of a DISTINCT clause,
> then it doesn't matter whether there are any SRFs, because DISTINCT
> removes duplicates after tlist SRF expansion.
> 2. But tlist SRFs break the ability to prove distinctness on the grounds
> of GROUP BY, unless all of them are within grouping columns.
> It still seems like detecting the second case is harder than it's worth,
> but we can trivially handle the first case, with little more than some
> code rearrangement.
>
> The other problem is that the output rowcount of the sub-select (ie, of
> the HashAggregate) is being estimated as though the SRF weren't there.
> This turns out to be because estimate_num_groups() doesn't consider the
> possibility of SRFs in the grouping columns.  It never has, but in 9.6 and
> before the problem was masked by the fact that grouping_planner scaled up
> the result rowcount by tlist_returns_set_rows() *after* performing
> grouping.  Now we're effectively doing that in the other order, which is
> more correct, but that means estimate_num_groups() has to apply some sort
> of adjustment.  I suggest that it just multiply its old estimate by the
> maximum of the SRF expansion counts.  That's likely to be an overestimate,
> but it's really hard to do better without specific knowledge of the
> individual SRF's behavior.
>
> In short, I propose the attached fixes.  I've checked this and it seems
> to fix Dmitry's original problem according to the test case he sent
> off-list.
>
>                         regards, tom lane
>
>

Re: Query became very slow after 9.6 -> 10 upgrade

Reply via email to