čt 21. 11. 2019 v 17:19 odesílatel Michael Korbakov <rmih...@gmail.com> napsal:
> Hi everybody. > > I stumbled upon a weird problem with the query planner. I have a query > on a typical EAV schema: > > SELECT contacts.id > FROM contacts > LEFT OUTER JOIN (SELECT DISTINCT contacts_values.company_id AS > company_id, contacts_values.id AS id > FROM contacts_values > WHERE contacts_values.field_id = > '\x000000000000000000000000' > AND contacts_values.field_name = 'facebook' > AND > > nimble_contact_value_normalize(nimble_skip_long_values(contacts_values.value)) > = > > nimble_contact_value_normalize('http://www.facebook.com/jon.ferrara')) > AS anon_1 > ON anon_1.company_id = contacts.company_id AND > anon_1.id = contacts.id > LEFT OUTER JOIN (SELECT DISTINCT contacts_values.company_id AS > company_id, contacts_values.id AS id > FROM contacts_values > WHERE contacts_values.field_id = > '\x000000000000000000000000' > AND contacts_values.field_name = 'last_name' > AND > > nimble_contact_value_normalize(nimble_skip_long_values(contacts_values.value)) > = > nimble_contact_value_normalize('Ferrara')) AS > anon_2 > ON anon_2.company_id = contacts.company_id AND > anon_2.id = contacts.id > JOIN contacts__aggregated AS contacts__aggregated_1 > ON contacts__aggregated_1.company_id = contacts.company_id AND > contacts__aggregated_1.contact_id = contacts.id AND > contacts__aggregated_1.field_name = 'names' > WHERE contacts.company_id = '\x4c2118ad54397f271b000000' > AND (anon_1.id IS NOT NULL OR anon_2.id IS NOT NULL) > ORDER BY contacts__aggregated_1.value ASC > LIMIT 30 OFFSET 0; > > My problem is that the LIMIT clause in this query makes the planner > choose a bad plan > with nested loops: https://explain.depesz.com/s/Mute. Running the same > query after > SET ENABLE_NESTLOOP TO OFF I am getting a much more efficient plan: > https://explain.depesz.com/s/b5kn. Removing the LIMIT from the query > results in a > similar plan: https://explain.depesz.com/s/wDqE. > > One thing that concerns me is that the cost of the LIMIT node in the > bad nested loop > plan is just a fraction of the cost of its subnode. But for the better > merge join > plan LIMIT node has the same cost as its subnode. How could it be this > way? And what > can I do to make the planner pick up a better plan? > > We are running PostgreSQL 10.10. > Sometimes are problems with LIMIT clause, because it too much decrease costs. The system expects so necessary values are found quickly - but if this premise is not valid, then this plan can be bad. typical solution is wrapping to subquery and using OFFSET 0 (that is optimizer fence) SELECT * FROM foo WHERE x = 10 LIMIT 10 -- should be transformed SELECT * FROM (SELECT * FROM foo WHERE x = 10 OFFSET 0) s LIMIT 10; But you can see in explain very bad estimations - left join is estimated to 918K rows and result is just 83 rows There is relative very high rows removed in top part of query "Rows Removed by Filter: 1043891" It's EAV table - it is unoptimized pattern :/ Pavel > Sincerely, > > -- Michael Korbakov > > >