po 30. 3. 2020 v 8:47 odesÃlatel Silvio Moioli <m...@suse.de> napsal:
> Dear list, > > here is a pretty contrived case where increasing work_mem produces a worse > plan, with much worse overall query time. I wonder why that is the case. > > > Problem: INSERTing a thousand new rows in a table which can easily have > one million rows. PK is "id", which comes from a table, and we have two > columns (called "name" and "version") which do not admit duplicates. > > Schema here: > https://github.com/uyuni-project/uyuni/blob/Uyuni-2020.03/schema/spacewalk/common/tables/rhnPackageCapability.sql > Indices here: > https://github.com/uyuni-project/uyuni/blob/Uyuni-2020.03/schema/spacewalk/postgres/tables/rhnPackageCapability_index.sql > > > We want one command that returns IDs given (name, version) couples. If > they are already in the table, they should be SELECTed, if they are not, > they should be INSERTed. > > Version is NULLable and NULL should be treated as a value. > > We use: > > WITH wanted_capability(ordering, name, version) AS ( > VALUES (1, 'first_name', '1.0.0'), (2, 'first_name', '1.0.1'), (1, > 'second_name', '1.0.0'), ...998 more... > ) > missing_capability AS ( > SELECT wanted_capability.* > FROM wanted_capability LEFT JOIN rhnPackageCapability > ON wanted_capability.name = rhnPackageCapability.name > AND wanted_capability.version IS NOT DISTINCT FROM > rhnPackageCapability.version > WHERE rhnPackageCapability.id IS NULL > ), > inserted_capability AS ( > INSERT INTO rhnPackageCapability(id, name, version) > SELECT nextval('rhn_pkg_capability_id_seq'), name, version FROM > missing_capability ON CONFLICT DO NOTHING > RETURNING id, name, version > ) > SELECT wanted_capability.ordering, inserted_capability.id > FROM wanted_capability JOIN inserted_capability > ON wanted_capability.name = inserted_capability.name > AND wanted_capability.version IS NOT DISTINCT FROM > inserted_capability.version > UNION ( > SELECT wanted_capability.ordering, rhnPackageCapability.id > FROM wanted_capability JOIN rhnPackageCapability > ON wanted_capability.name = rhnPackageCapability.name > AND wanted_capability.version IS NOT DISTINCT FROM > rhnPackageCapability.version > ) > ORDER BY ordering > ; > > > Behavior at work_mem = 5 MB is pretty good, query finishes in 200ms. Plan: > https://explain.dalibo.com/plan/4u > > Behavior at work_mem = 80 MB seems not equally good, query takes more than > 13s. Two expensive SORTs and MERGE JOINs are done instead of HASH JOINs. > Plan: thttps://explain.dalibo.com/plan/ORd please, can you send explain in text form? Probably, there is a problem in wrong estimation. What can be expected because CTE is optimization fence in this version Regards Pavel > > Adding one more INDEX on rhnCapability.name fixes the issue. > > My question is: why are SORTs chosen if more work_mem is available, and > why can't the planner predict query will be slower that way? > > All of the above is reproducible on openSUSE Leap and PostgreSQL 10.12. > > Ideas welcome, and thanks in advance! > > Regards, > -- > Silvio Moioli > SUSE Manager Development Team > > >