čt 21. 11. 2019 v 22:04 odesílatel Michael Korbakov <rmih...@gmail.com> napsal:
> On November 21, 2019 at 19:14:33, Pavel Stehule (pavel.steh...@gmail.com) > wrote: > > > > čt 21. 11. 2019 v 17:19 odesílatel Michael Korbakov <rmih...@gmail.com> > napsal: > >> Hi everybody. >> >> I stumbled upon a weird problem with the query planner. I have a query >> on a typical EAV schema: >> >> SELECT contacts.id >> FROM contacts >> LEFT OUTER JOIN (SELECT DISTINCT contacts_values.company_id AS >> company_id, contacts_values.id AS id >> FROM contacts_values >> WHERE contacts_values.field_id = >> '\x000000000000000000000000' >> AND contacts_values.field_name = 'facebook' >> AND >> >> nimble_contact_value_normalize(nimble_skip_long_values(contacts_values.value)) >> = >> >> nimble_contact_value_normalize('http://www.facebook.com/jon.ferrara')) >> AS anon_1 >> ON anon_1.company_id = contacts.company_id AND >> anon_1.id = contacts.id >> LEFT OUTER JOIN (SELECT DISTINCT contacts_values.company_id AS >> company_id, contacts_values.id AS id >> FROM contacts_values >> WHERE contacts_values.field_id = >> '\x000000000000000000000000' >> AND contacts_values.field_name = 'last_name' >> AND >> >> nimble_contact_value_normalize(nimble_skip_long_values(contacts_values.value)) >> = >> nimble_contact_value_normalize('Ferrara')) AS >> anon_2 >> ON anon_2.company_id = contacts.company_id AND >> anon_2.id = contacts.id >> JOIN contacts__aggregated AS contacts__aggregated_1 >> ON contacts__aggregated_1.company_id = contacts.company_id AND >> contacts__aggregated_1.contact_id = contacts.id AND >> contacts__aggregated_1.field_name = 'names' >> WHERE contacts.company_id = '\x4c2118ad54397f271b000000' >> AND (anon_1.id IS NOT NULL OR anon_2.id IS NOT NULL) >> ORDER BY contacts__aggregated_1.value ASC >> LIMIT 30 OFFSET 0; >> >> My problem is that the LIMIT clause in this query makes the planner >> choose a bad plan >> with nested loops: https://explain.depesz.com/s/Mute. Running the same >> query after >> SET ENABLE_NESTLOOP TO OFF I am getting a much more efficient plan: >> https://explain.depesz.com/s/b5kn. Removing the LIMIT from the query >> results in a >> similar plan: https://explain.depesz.com/s/wDqE. >> >> One thing that concerns me is that the cost of the LIMIT node in the >> bad nested loop >> plan is just a fraction of the cost of its subnode. But for the better >> merge join >> plan LIMIT node has the same cost as its subnode. How could it be this >> way? And what >> can I do to make the planner pick up a better plan? >> >> We are running PostgreSQL 10.10. >> > > Sometimes are problems with LIMIT clause, because it too much decrease > costs. The system expects so necessary values are found quickly - but if > this premise is not valid, then this plan can be bad. > > LIMIT node with nested loop decreases the cost proportionally to requested > number of rows versus estimated number. That may be too aggressive, but the > same LIMIT node with merge join does not decrease total cost at all. I do > not understand why is it happening. > > > typical solution is wrapping to subquery and using OFFSET 0 (that is > optimizer fence) > > SELECT * FROM foo WHERE x = 10 LIMIT 10 -- should be transformed > > SELECT * FROM (SELECT * FROM foo WHERE x = 10 OFFSET 0) s LIMIT 10; > > Unfortunately, that has not worked for me ( > https://explain.depesz.com/s/slsM). Looks like OFFSET 0 is not working as > a fence. However, I managed to get some success with wrapping everything > except LIMIT into a CTE: https://explain.depesz.com/s/n7c4. > > > But you can see in explain very bad estimations - left join is estimated > to 918K rows and result is just 83 rows > > There is relative very high rows removed in top part of query "Rows > Removed by Filter: 1043891" > > It's EAV table - it is unoptimized pattern :/ > > Is there any way to tune planner to choose better plan for such queries? I > tried increasing default_statistics_target to 1000 and creating extended > statistics: CREATE STATISTICS contacts_values_company_id_field > (dependencies) ON company_id, field_id, field_name FROM contacts_values. > After running ANALYZE on all relevant tables I noticed no changes in > planner's behavior. > you can try increase a value FROM_COLLAPSE_LIMIT and JOIN_COLLAPSE_LIMIT and maybe geqo_threshold or rewrite query to push some conditions deeper manually > > Pavel > > >> Sincerely, >> >> -- Michael Korbakov >> >> >>