Re: hashjoins, index loops to retrieve pk/ux constrains in pg12

Arturas Mazeika Mon, 27 Sep 2021 12:56:45 -0700

I
Hi Michael,

Thanks a lot for having a look at the query once again in more detail. In
short, you are right, I fired the liquibase scripts and observed the exact
query that was hanging in pg_stats_activity. The query was:

SELECT
        FK.TABLE_NAME       as "TABLE_NAME"
        , CU.COLUMN_NAME    as "COLUMN_NAME"
        , PK.TABLE_NAME     as "REFERENCED_TABLE_NAME"
        , PT.COLUMN_NAME    as "REFERENCED_COLUMN_NAME"
        , C.CONSTRAINT_NAME as "CONSTRAINT_NAME"
FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS C
INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS FK ON
C.CONSTRAINT_NAME = FK.CONSTRAINT_NAME
INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS PK ON
C.UNIQUE_CONSTRAINT_NAME = PK.CONSTRAINT_NAME
INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE CU ON C.CONSTRAINT_NAME
= CU.CONSTRAINT_NAME
INNER JOIN (
        SELECT
                i1.TABLE_NAME
                , i2.COLUMN_NAME
                FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS i1
                INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE i2 ON
i1.CONSTRAINT_NAME = i2.CONSTRAINT_NAME
                WHERE i1.CONSTRAINT_TYPE = 'PRIMARY KEY'
) PT ON PT.TABLE_NAME = PK.TABLE_NAME WHERE
lower(FK.TABLE_NAME)='secrole_condcollection'

I rerun this query twice. Once with set enable_hashjoin = false; and set
enable_hashjoin = true; . I observed that the join order was very, very
similar between the hash and index plans. I reran the above two queries
with random_page_cost to 2, 1.5, or 1.0 and observed no difference
whatsoever, the planner was always choosing the hashjoins over sort/index
nested loops. the seq_page_cost is set to default value 1. The tables
behind the views do not have more than 10K rows, and do not exceed 400KB of
space. The work_mem parameter is set to 256MB, effective cache is 9GB, the
machine has something around 32-64GB of RAM, SSD as the primary drive, 140
default connections. The query planner, of course thinks that the overall
nested loop including hashes is better:

cost=2174.36..13670.47 (hash)

vs

cost=1736.10..18890.44 (index/sort join)

but I think there's a problem there, cause I don't think that one can reuse
the pre-computed hashes over and over again, while sort/index joins end up
hitting the same buffers, or am I wrong?

More details about the query plans as well as the complete set of settings
can be found in the original email at
https://www.postgresql.org/message-id/CAAUL%3DcFcvUo%3D7b4T-K5PqiqrF6etp59qcgv77DyK2Swa4VhYuQ%40mail.gmail.com

If you could have another look into what's going on, I'd appreciate it a
lot. in postgres 9.6 our setup goes through the liquibase scripts in 5
minutes, and pg12 with hash joins may take up to 1.5 hours.

Cheers,
Arturas

On Mon, Sep 27, 2021 at 4:12 PM Michael Lewis <mle...@entrata.com> wrote:

> I'm unclear what you changed to get the planner to choose one vs the
> other. Did you disable hashjoins? Without the full plan to review, it is
> tough to agre with any conclusion that these particular nodes are
> troublesome. It might be that this was the right choice for that part of
> that plan, but improper estimates at a earlier step were problematic.
>
> What configs have you changed such as work_mem, random_page_cost, and
> such? If random_page_cost & seq_page_cost are still default values,
> then the planner will tend to do more seq scans I believe, and hash them to
> join with large sets of data, rather than do nested loop index scans. I
> think that's how that works. With the lack of flexibility to change the
> query, you might be able to set a few configs for the user that runs these
> schema checks. If you can find changes that make an overall improvement.
>
>
> *Michael Lewis  |  Database Engineer*
> *Entrata*
>
>>

Re: hashjoins, index loops to retrieve pk/ux constrains in pg12

Reply via email to