po 30. 3. 2020 v 8:47 odesílatel Silvio Moioli <m...@suse.de> napsal:

> Dear list,
>
> here is a pretty contrived case where increasing work_mem produces a worse
> plan, with much worse overall query time. I wonder why that is the case.
>
>
> Problem: INSERTing a thousand new rows in a table which can easily have
> one million rows. PK is "id", which comes from a table, and we have two
> columns (called "name" and "version") which do not admit duplicates.
>
> Schema here:
> https://github.com/uyuni-project/uyuni/blob/Uyuni-2020.03/schema/spacewalk/common/tables/rhnPackageCapability.sql
> Indices here:
> https://github.com/uyuni-project/uyuni/blob/Uyuni-2020.03/schema/spacewalk/postgres/tables/rhnPackageCapability_index.sql
>
>
> We want one command that returns IDs given (name, version) couples. If
> they are already in the table, they should be SELECTed, if they are not,
> they should be INSERTed.
>
> Version is NULLable and NULL should be treated as a value.
>
> We use:
>
> WITH wanted_capability(ordering, name, version) AS (
>   VALUES (1, 'first_name', '1.0.0'), (2, 'first_name', '1.0.1'), (1,
> 'second_name', '1.0.0'), ...998 more...
> )
> missing_capability AS (
>   SELECT wanted_capability.*
>     FROM wanted_capability LEFT JOIN rhnPackageCapability
>       ON wanted_capability.name = rhnPackageCapability.name
>         AND wanted_capability.version IS NOT DISTINCT FROM
> rhnPackageCapability.version
>     WHERE rhnPackageCapability.id IS NULL
> ),
> inserted_capability AS (
>   INSERT INTO rhnPackageCapability(id, name, version)
>     SELECT nextval('rhn_pkg_capability_id_seq'), name, version FROM
> missing_capability ON CONFLICT DO NOTHING
>     RETURNING id, name, version
> )
> SELECT wanted_capability.ordering, inserted_capability.id
>   FROM wanted_capability JOIN inserted_capability
>     ON wanted_capability.name = inserted_capability.name
>       AND wanted_capability.version IS NOT DISTINCT FROM
> inserted_capability.version
>     UNION (
>       SELECT wanted_capability.ordering, rhnPackageCapability.id
>         FROM wanted_capability JOIN rhnPackageCapability
>           ON wanted_capability.name = rhnPackageCapability.name
>             AND wanted_capability.version IS NOT DISTINCT FROM
> rhnPackageCapability.version
>     )
>   ORDER BY ordering
> ;
>
>
> Behavior at work_mem = 5 MB is pretty good, query finishes in 200ms. Plan:
> https://explain.dalibo.com/plan/4u
>
> Behavior at work_mem = 80 MB seems not equally good, query takes more than
> 13s. Two expensive SORTs and MERGE JOINs are done instead of HASH JOINs.
> Plan: thttps://explain.dalibo.com/plan/ORd


please, can you send explain in text form?

Probably, there is a problem in wrong estimation. What can be expected
because CTE is optimization fence in this version


Regards

Pavel



>
> Adding one more INDEX on rhnCapability.name fixes the issue.
>
> My question is: why are SORTs chosen if more work_mem is available, and
> why can't the planner predict query will be slower that way?
>
> All of the above is reproducible on openSUSE Leap and PostgreSQL 10.12.
>
> Ideas welcome, and thanks in advance!
>
> Regards,
> --
> Silvio Moioli
> SUSE Manager Development Team
>
>
>

Reply via email to