čt 21. 11. 2019 v 22:04 odesílatel Michael Korbakov <rmih...@gmail.com>
napsal:

> On November 21, 2019 at 19:14:33, Pavel Stehule (pavel.steh...@gmail.com)
> wrote:
>
>
>
> čt 21. 11. 2019 v 17:19 odesílatel Michael Korbakov <rmih...@gmail.com>
> napsal:
>
>> Hi everybody.
>>
>> I stumbled upon a weird problem with the query planner. I have a query
>> on a typical EAV schema:
>>
>> SELECT contacts.id
>> FROM contacts
>>     LEFT OUTER JOIN (SELECT DISTINCT contacts_values.company_id AS
>> company_id, contacts_values.id AS id
>>                      FROM contacts_values
>>                      WHERE contacts_values.field_id =
>> '\x000000000000000000000000'
>>                        AND contacts_values.field_name = 'facebook'
>>                        AND
>>
>> nimble_contact_value_normalize(nimble_skip_long_values(contacts_values.value))
>> =
>>
>> nimble_contact_value_normalize('http://www.facebook.com/jon.ferrara'))
>> AS anon_1
>>                     ON anon_1.company_id = contacts.company_id AND
>> anon_1.id = contacts.id
>>     LEFT OUTER JOIN (SELECT DISTINCT contacts_values.company_id AS
>> company_id, contacts_values.id AS id
>>                      FROM contacts_values
>>                      WHERE contacts_values.field_id =
>> '\x000000000000000000000000'
>>                        AND contacts_values.field_name = 'last_name'
>>                        AND
>>
>> nimble_contact_value_normalize(nimble_skip_long_values(contacts_values.value))
>> =
>>                            nimble_contact_value_normalize('Ferrara')) AS
>> anon_2
>>                     ON anon_2.company_id = contacts.company_id AND
>> anon_2.id = contacts.id
>>     JOIN contacts__aggregated AS contacts__aggregated_1
>>          ON contacts__aggregated_1.company_id = contacts.company_id AND
>>             contacts__aggregated_1.contact_id = contacts.id AND
>> contacts__aggregated_1.field_name = 'names'
>> WHERE contacts.company_id = '\x4c2118ad54397f271b000000'
>>   AND (anon_1.id IS NOT NULL OR anon_2.id IS NOT NULL)
>> ORDER BY contacts__aggregated_1.value ASC
>> LIMIT 30 OFFSET 0;
>>
>> My problem is that the LIMIT clause in this query makes the planner
>> choose a bad plan
>> with nested loops: https://explain.depesz.com/s/Mute. Running the same
>> query after
>> SET ENABLE_NESTLOOP TO OFF I am getting a much more efficient plan:
>> https://explain.depesz.com/s/b5kn. Removing the LIMIT from the query
>> results in a
>> similar plan: https://explain.depesz.com/s/wDqE.
>>
>> One thing that concerns me is that the cost of the LIMIT node in the
>> bad nested loop
>> plan is just a fraction of the cost of its subnode. But for the better
>> merge join
>> plan LIMIT node has the same cost as its subnode. How could it be this
>> way? And what
>> can I do to make the planner pick up a better plan?
>>
>> We are running PostgreSQL 10.10.
>>
>
> Sometimes are problems with LIMIT clause, because it too much decrease
> costs. The system expects so necessary values are found quickly - but if
> this premise is not valid, then this plan can be bad.
>
> LIMIT node with nested loop decreases the cost proportionally to requested
> number of rows versus estimated number. That may be too aggressive, but the
> same LIMIT node with merge join does not decrease total cost at all. I do
> not understand why is it happening.
>
>
> typical solution is wrapping to subquery and using OFFSET 0 (that is
> optimizer fence)
>
> SELECT * FROM foo WHERE x = 10 LIMIT 10 -- should be transformed
>
> SELECT * FROM (SELECT * FROM foo WHERE x = 10 OFFSET 0) s LIMIT 10;
>
> Unfortunately, that has not worked for me (
> https://explain.depesz.com/s/slsM). Looks like OFFSET 0 is not working as
> a fence. However, I  managed to get some success with wrapping everything
> except LIMIT into a CTE: https://explain.depesz.com/s/n7c4.
>
>
> But you can see in explain very bad estimations - left join is estimated
> to 918K rows and result is just 83 rows
>
> There is relative very high rows removed in top part of query "Rows
> Removed by Filter: 1043891"
>
> It's EAV table - it is unoptimized pattern :/
>
> Is there any way to tune planner to choose better plan for such queries? I
> tried increasing default_statistics_target to 1000 and creating extended
> statistics: CREATE STATISTICS contacts_values_company_id_field
> (dependencies) ON company_id, field_id, field_name FROM contacts_values.
> After running ANALYZE on all relevant tables I noticed no changes in
> planner's behavior.
>

you can try increase a value FROM_COLLAPSE_LIMIT and JOIN_COLLAPSE_LIMIT
and  maybe geqo_threshold

or rewrite query to push some conditions deeper manually



>
> Pavel
>
>
>> Sincerely,
>>
>> -- Michael Korbakov
>>
>>
>>

Reply via email to