Re: Different execution plan between PostgreSQL 8.4 and 12.11

Josef Šimánek Tue, 11 Oct 2022 02:53:07 -0700
út 11. 10. 2022 v 11:17 odesílatel gzh <[email protected]> napsal:
>
> Thank you for all your assistance.
>
>
> By communicating with my customer, we have adopted the following solution to 
> fix the problem.
>
>
> set enable_seqscan = off
>
This can make some queries fail since there will be no way to gather
data without seqscan.
>
>
>
>
> At 2022-10-11 16:21:42, "Pavel Stehule" <[email protected]> wrote:
>
>
>
> út 11. 10. 2022 v 10:01 odesílatel gzh <[email protected]> napsal:
>>
>> Thank you for providing the requested information.
>>
>>
>> The WebSite has been used for many years, and this upgrade is only a version 
>> upgrade of the PostgreSQL database.
>>
>> My customer does not want to modify the SQL because it will increase the 
>> cost of the project(All SQL that contains a LIMIT clause needs to be 
>> analyzed and checked).
>>
>> Is there no other way to solve the problem?
>
>
> I don't know about any alternative
>
> Regards
>
> Pavel
>
>>
>>
>>
>>
>>
>> At 2022-10-11 13:24:12, "Pavel Stehule" <[email protected]> wrote:
>>
>>
>>
>> út 11. 10. 2022 v 7:08 odesílatel gzh <[email protected]> napsal:
>>>
>>> Hi, Pavel
>>>
>>>
>>> > The LIMIT clause changes total cost.  This is a very aggressive clause. 
>>> > And
>>>
>>> > although it is absolutely useless in this case, Postgres does not have any
>>>
>>> > logic for removing it. Postgres doesn't try to fix developer's mistakes.
>>>
>>> Sorry,I didn't understand what you mean.
>>>
>>> Couldn't the LIMIT clause be used like the SQL statement below?
>>>
>>>
>>> >> new=# explain analyze select 2 from analyze_word_reports where (cseid =
>>>
>>> >> 94) limit 1;
>>
>>
>> there was query
>>
>> SELECT aggregate() FROM xx LIMIT 1
>>
>>>
>>> This SQL statement is no problem under PostgreSQL 8.4, the index works well.
>>>
>>>
>>
>> The optimizer is under nonstop change. And you can expect from any new 
>> release
>>
>> 75% queries are without change, 20% queries are faster, and 5% queries are 
>> slower
>>
>> The optimization is based on statistics and estimations, and searching for 
>> the optimal solution in space of all solutions. In any version there are 
>> smaller or bigger changes of estimation methods, and between old 8.4 and 12 
>> there are big changes in possibilities of how the query can be executed. So 
>> there is a higher possibility to find some really fast queries, but there is 
>> a higher possibility to find some local optimum or slow query too.  Usually 
>> the optimizer is smarter (what is the benefit), but more sensitive too (what 
>> is the cost). You cannot expect the same result, when the data and algorithm 
>> is changed in any version. Postgres doesn't garant the immutability of 
>> execution plans.
>>
>> The clause LIMIT with low LIMIT value can be problematic in more cases. The 
>> model in Postgres expects data are uniformly stored in the table (heap), but 
>> the reality can be different. The common trick in these cases is using 
>> OFFSET 0 clause like
>>
>> SELECT * FROM (SELECT * FROM foo LIMIT 1000 OFFSET 0) s LIMIT 10.
>>
>>
>>
>>
>>
>>
>>>
>>>
>>>
>>>
>>>
>>> At 2022-10-11 12:13:47, "Pavel Stehule" <[email protected]> wrote:
>>>
>>>
>>>
>>> út 11. 10. 2022 v 6:05 odesílatel gzh <[email protected]> napsal:
>>>>
>>>>
>>>> Hi, Pavel
>>>>
>>>> Thank you for your reply.
>>>>
>>>>
>>>> > the LIMIT clause is in this case totally useless and messy, and maybe can
>>>>
>>>> > negative impacts optimizer
>>>>
>>>> Yes. After removing the LIMIT clause, the performance is improved.
>>>>
>>>> The execution plan shows that the index worked.
>>>>
>>>> We've noticed it, but I don't want to fix the problem by modifying the SQL 
>>>> until I find the cause.
>>>
>>>
>>> The LIMIT clause changes total cost.  This is a very aggressive clause. And 
>>> although it is absolutely useless in this case, Postgres does not have any 
>>> logic for removing it. Postgres doesn't try to fix developer's mistakes.
>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> At 2022-10-11 11:32:48, "Pavel Stehule" <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> út 11. 10. 2022 v 5:13 odesílatel gzh <[email protected]> napsal:
>>>>>
>>>>> Hi, Tom
>>>>> Thank you for your reply.
>>>>>
>>>>> > When you're asking for help, please don't give us vague statements
>>>>>
>>>>> > like "doesn't seem to work".
>>>>>
>>>>> I understand.
>>>>>
>>>>>
>>>>> > Did the plan (including rowcount
>>>>>
>>>>> > estimates) change at all?  To what?  How far off is that rowcount
>>>>>
>>>>> > estimate, anyway --- that is, how many rows actually have cseid = 94?
>>>>>
>>>>> Please refer to the new execution plan (PostgreSQL 12.11) below.
>>>>>
>>>>>
>>>>> new=# show enable_seqscan;
>>>>>
>>>>>  enable_seqscan
>>>>>
>>>>> ----------------
>>>>>
>>>>>  on
>>>>>
>>>>> (1 行)
>>>>>
>>>>>
>>>>> new=# select count(*) from analyze_word_reports;
>>>>>
>>>>>   count
>>>>>
>>>>> ----------
>>>>>
>>>>>  21331980
>>>>>
>>>>> (1 行)
>>>>>
>>>>>
>>>>> new=# select count(*) from analyze_word_reports where (cseid = 94);
>>>>>
>>>>>   count
>>>>>
>>>>> ---------
>>>>>
>>>>>  1287156
>>>>>
>>>>> (1 行)
>>>>>
>>>>>
>>>>> new=# explain analyze select count(2) from analyze_word_reports where 
>>>>> (cseid = 94) limit 1;
>>>>>
>>>>>                                                                           
>>>>>                        QUERY PLAN
>>>>>
>>>>>
>>>>
>>>>
>>>> the LIMIT clause is in this case totally useless and messy, and maybe can 
>>>> negative impacts optimizer
>>>>
>>>> Regards
>>>>
>>>> Pavel
>>>>
>>>>
>>>>>
>>>>> ----------------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>>  Limit  (cost=65184.06..65184.07 rows=1 width=8) (actual 
>>>>> time=123.713..133.035 rows=1 loops=1)
>>>>>
>>>>>    ->  Finalize Aggregate  (cost=65184.06..65184.07 rows=1 width=8) 
>>>>> (actual time=123.712..133.033 rows=1 loops=1)
>>>>>
>>>>>          ->  Gather  (cost=65183.85..65184.06 rows=2 width=8) (actual 
>>>>> time=123.548..133.024 rows=3 loops=1)
>>>>>
>>>>>                Workers Planned: 2
>>>>>
>>>>>                Workers Launched: 2
>>>>>
>>>>>                ->  Partial Aggregate  (cost=64183.85..64183.86 rows=1 
>>>>> width=8) (actual time=119.495..119.496 rows=1 loops=3)
>>>>>
>>>>>                      ->  Parallel Index Only Scan using 
>>>>> analyze_word_reports_index_cseid on analyze_word_reports  (cost=0.56..6290
>>>>>
>>>>> 9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)
>>>>>
>>>>>                            Index Cond: (cseid = 94)
>>>>>
>>>>>                            Heap Fetches: 1287156  Planning Time: 0.122 ms 
>>>>>  Execution Time: 133.069 ms
>>>>>
>>>>> (11 行)
>>>>>
>>>>>
>>>>> new=# explain analyze select 2 from analyze_word_reports where (cseid = 
>>>>> 94) limit 1;
>>>>>
>>>>>                                                              QUERY PLAN
>>>>>
>>>>>
>>>>>
>>>>> ----------------------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>> ---
>>>>>
>>>>>  Limit  (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966 
>>>>> rows=1 loops=1)
>>>>>
>>>>>    ->  Seq Scan on analyze_word_reports  (cost=0.00..528550.75 
>>>>> rows=1223533 width=4) (actual time=2156.962..2156.964 rows=1 loops=
>>>>>
>>>>> 1)
>>>>>
>>>>>          Filter: (cseid = 94)
>>>>>
>>>>>          Rows Removed by Filter: 18320180  Planning Time: 0.086 ms  
>>>>> Execution Time: 2156.985 ms
>>>>>
>>>>> (6 行)
>>>>>
>>>>>
>>>>>
>>>>> > If the estimate is far off, then increasing the table's statistics
>>>>>
>>>>> > target might help.
>>>>>
>>>>> Thank you for your advice.
>>>>>
>>>>> Please tell me how to set the table's statistics up to improve 
>>>>> performance.
>>>>>
>>>>>
>>>>> new=#  select oid from pg_class where relname = 'analyze_word_reports';
>>>>>
>>>>>   oid
>>>>>
>>>>> -------
>>>>>
>>>>> 16429
>>>>>
>>>>> (1 行)
>>>>>
>>>>>
>>>>> new=# select attrelid,attname,attstattarget from pg_attribute where 
>>>>> attrelid=16429 and attname='cseid';
>>>>>
>>>>> attrelid | attname | attstattarget
>>>>>
>>>>> ----------+---------+---------------
>>>>>
>>>>>     16429 | cseid   |            -1
>>>>>
>>>>> (1 行)
>>>>>
>>>>>
>>>>> > Another thing that would be worth checking is whether
>>>>>
>>>>> > "set enable_seqscan = off" prods it to choose the plan you want.
>>>>>
>>>>> > If not, then there's something else going on besides poor estimates.
>>>>>
>>>>> "set enable_seqscan = off" works, and the performance is greatly 
>>>>> improved, which is almost the same as PostgreSQL 8.4.
>>>>>
>>>>> The enable_seqscan(PostgreSQL 8.4) is on, will this change have an 
>>>>> unknown effect on other queries?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> At 2022-10-10 10:45:54, "Tom Lane" <[email protected]> wrote:
>>>>> >gzh  <[email protected]> writes:
>>>>> >> I've run analyze(not vacuum analyze), but it doesn't seem to work.
>>>>> >
>>>>> >When you're asking for help, please don't give us vague statements
>>>>> >like "doesn't seem to work".  Did the plan (including rowcount
>>>>> >estimates) change at all?  To what?  How far off is that rowcount
>>>>> >estimate, anyway --- that is, how many rows actually have cseid = 94?
>>>>> >
>>>>> >If the estimate is far off, then increasing the table's statistics
>>>>> >target might help.
>>>>> >
>>>>> >Another thing that would be worth checking is whether
>>>>> >"set enable_seqscan = off" prods it to choose the plan you want.
>>>>> >If not, then there's something else going on besides poor estimates.
>>>>> >
>>>>> > regards, tom lane
Re: Different execution plan between PostgreSQL 8.4 and 12.11

Reply via email to