I Hi Michael, Thanks a lot for having a look at the query once again in more detail. In short, you are right, I fired the liquibase scripts and observed the exact query that was hanging in pg_stats_activity. The query was:
SELECT FK.TABLE_NAME as "TABLE_NAME" , CU.COLUMN_NAME as "COLUMN_NAME" , PK.TABLE_NAME as "REFERENCED_TABLE_NAME" , PT.COLUMN_NAME as "REFERENCED_COLUMN_NAME" , C.CONSTRAINT_NAME as "CONSTRAINT_NAME" FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS C INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS FK ON C.CONSTRAINT_NAME = FK.CONSTRAINT_NAME INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS PK ON C.UNIQUE_CONSTRAINT_NAME = PK.CONSTRAINT_NAME INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE CU ON C.CONSTRAINT_NAME = CU.CONSTRAINT_NAME INNER JOIN ( SELECT i1.TABLE_NAME , i2.COLUMN_NAME FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS i1 INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE i2 ON i1.CONSTRAINT_NAME = i2.CONSTRAINT_NAME WHERE i1.CONSTRAINT_TYPE = 'PRIMARY KEY' ) PT ON PT.TABLE_NAME = PK.TABLE_NAME WHERE lower(FK.TABLE_NAME)='secrole_condcollection' I rerun this query twice. Once with set enable_hashjoin = false; and set enable_hashjoin = true; . I observed that the join order was very, very similar between the hash and index plans. I reran the above two queries with random_page_cost to 2, 1.5, or 1.0 and observed no difference whatsoever, the planner was always choosing the hashjoins over sort/index nested loops. the seq_page_cost is set to default value 1. The tables behind the views do not have more than 10K rows, and do not exceed 400KB of space. The work_mem parameter is set to 256MB, effective cache is 9GB, the machine has something around 32-64GB of RAM, SSD as the primary drive, 140 default connections. The query planner, of course thinks that the overall nested loop including hashes is better: cost=2174.36..13670.47 (hash) vs cost=1736.10..18890.44 (index/sort join) but I think there's a problem there, cause I don't think that one can reuse the pre-computed hashes over and over again, while sort/index joins end up hitting the same buffers, or am I wrong? More details about the query plans as well as the complete set of settings can be found in the original email at https://www.postgresql.org/message-id/CAAUL%3DcFcvUo%3D7b4T-K5PqiqrF6etp59qcgv77DyK2Swa4VhYuQ%40mail.gmail.com If you could have another look into what's going on, I'd appreciate it a lot. in postgres 9.6 our setup goes through the liquibase scripts in 5 minutes, and pg12 with hash joins may take up to 1.5 hours. Cheers, Arturas On Mon, Sep 27, 2021 at 4:12 PM Michael Lewis <mle...@entrata.com> wrote: > I'm unclear what you changed to get the planner to choose one vs the > other. Did you disable hashjoins? Without the full plan to review, it is > tough to agre with any conclusion that these particular nodes are > troublesome. It might be that this was the right choice for that part of > that plan, but improper estimates at a earlier step were problematic. > > What configs have you changed such as work_mem, random_page_cost, and > such? If random_page_cost & seq_page_cost are still default values, > then the planner will tend to do more seq scans I believe, and hash them to > join with large sets of data, rather than do nested loop index scans. I > think that's how that works. With the lack of flexibility to change the > query, you might be able to set a few configs for the user that runs these > schema checks. If you can find changes that make an overall improvement. > > > *Michael Lewis | Database Engineer* > *Entrata* > >>