Re: Adding LIMIT changes PostgreSQL plan from good to a bad one

Pavel Stehule Thu, 21 Nov 2019 09:16:37 -0800

čt 21. 11. 2019 v 17:19 odesílatel Michael Korbakov <rmih...@gmail.com>
napsal:


> Hi everybody.
>
> I stumbled upon a weird problem with the query planner. I have a query
> on a typical EAV schema:
>
> SELECT contacts.id
> FROM contacts
>     LEFT OUTER JOIN (SELECT DISTINCT contacts_values.company_id AS
> company_id, contacts_values.id AS id
>                      FROM contacts_values
>                      WHERE contacts_values.field_id =
> '\x000000000000000000000000'
>                        AND contacts_values.field_name = 'facebook'
>                        AND
>
> nimble_contact_value_normalize(nimble_skip_long_values(contacts_values.value))
> =
>
> nimble_contact_value_normalize('http://www.facebook.com/jon.ferrara'))
> AS anon_1
>                     ON anon_1.company_id = contacts.company_id AND
> anon_1.id = contacts.id
>     LEFT OUTER JOIN (SELECT DISTINCT contacts_values.company_id AS
> company_id, contacts_values.id AS id
>                      FROM contacts_values
>                      WHERE contacts_values.field_id =
> '\x000000000000000000000000'
>                        AND contacts_values.field_name = 'last_name'
>                        AND
>
> nimble_contact_value_normalize(nimble_skip_long_values(contacts_values.value))
> =
>                            nimble_contact_value_normalize('Ferrara')) AS
> anon_2
>                     ON anon_2.company_id = contacts.company_id AND
> anon_2.id = contacts.id
>     JOIN contacts__aggregated AS contacts__aggregated_1
>          ON contacts__aggregated_1.company_id = contacts.company_id AND
>             contacts__aggregated_1.contact_id = contacts.id AND
> contacts__aggregated_1.field_name = 'names'
> WHERE contacts.company_id = '\x4c2118ad54397f271b000000'
>   AND (anon_1.id IS NOT NULL OR anon_2.id IS NOT NULL)
> ORDER BY contacts__aggregated_1.value ASC
> LIMIT 30 OFFSET 0;
>
> My problem is that the LIMIT clause in this query makes the planner
> choose a bad plan
> with nested loops: https://explain.depesz.com/s/Mute. Running the same
> query after
> SET ENABLE_NESTLOOP TO OFF I am getting a much more efficient plan:
> https://explain.depesz.com/s/b5kn. Removing the LIMIT from the query
> results in a
> similar plan: https://explain.depesz.com/s/wDqE.
>
> One thing that concerns me is that the cost of the LIMIT node in the
> bad nested loop
> plan is just a fraction of the cost of its subnode. But for the better
> merge join
> plan LIMIT node has the same cost as its subnode. How could it be this
> way? And what
> can I do to make the planner pick up a better plan?
>
> We are running PostgreSQL 10.10.
>

Sometimes are problems with LIMIT clause, because it too much decrease
costs. The system expects so necessary values are found quickly - but if
this premise is not valid, then this plan can be bad.

typical solution is wrapping to subquery and using OFFSET 0 (that is
optimizer fence)

SELECT * FROM foo WHERE x = 10 LIMIT 10 -- should be transformed

SELECT * FROM (SELECT * FROM foo WHERE x = 10 OFFSET 0) s LIMIT 10;

But you can see in explain very bad estimations - left join is estimated to
918K rows and result is just 83 rows

There is relative very high rows removed in top part of query "Rows Removed
by Filter: 1043891"

It's EAV table - it is unoptimized pattern :/

Pavel


> Sincerely,
>
> -- Michael Korbakov
>
>
>

Re: Adding LIMIT changes PostgreSQL plan from good to a bad one

Reply via email to