Re: Slow query and wrong row estimates for CTE

Yoan SULTAN Wed, 17 Feb 2021 10:37:59 -0800

*Hi all, *

*This is my first post on this mailing list, I really enjoy it.*
*I wanted to add some details and answers to this disccusion.*


 17 févr. 2021 à 17:52, Dane Foster <studdu...@gmail.com> a écrit :

>
> A small update (see below/inline).
>
>
> On Tue, Feb 16, 2021 at 2:11 PM Dane Foster <studdu...@gmail.com> wrote:
>
>> Short conclusion:
>> Switching from CTEs to temporary tables and analyzing reduced the runtime
>> from 15 minutes to about 1.5 minutes.
>>
>> *The attempt_scores table is pretty big and it is called 3 times; try to
rewrite the query in order to reduce call to this table. for example :*














*EXPLAIN (ANALYZE, BUFFERS)WITH reports AS (  SELECT student_id,
assignment_id, max(score_value) FILTER (WHERE score_name =
'student_performance_index'),
max(attempt_report_id) maxid,
max(score_value) spi  FROM attempt_scores  GROUP BY student_id,
assignment_id  HAVING max(score_value) > 0 AND max(score_value) FILTER
(WHERE score_name = 'student_performance_index') = max(score_value))SELECT
avg(spi) spi,  avg(CASE score_name WHEN 'digital_clinical_experience' THEN
score_value END) dce,  avg(CASE score_name WHEN
'tier1_subjective_data_collection' THEN score_value END) sdcFROM
attempt_scores JOIN reports ON
reports.maxid=attempt_scores.attempt_report_id;*

*Also, I would continue to increase work_mem to 200MB until the external
merge is not required.*
*SET WORK_MEM='200MB'; -- to change only at session level*

>
>> Longer conclusion:
>>
>> @Justin Pryzby <pry...@telsasoft.com>
>>
>>    - I experimented w/ materializing the CTEs and it helped at the
>>    margins but did not significantly contribute to a reduction in runtime.
>>    - No clustering was required because once I switched to temporary
>>    tables the new plan no longer used the for_upsert index.
>>
>> @Michael Lewis <mle...@entrata.com>
>>
>>    - Increasing work_mem to 100MB (up from 4MB) helped at the margins
>>    (i.e., some 100's of millisecond improvement) but did not represent a
>>    significant reduction in the runtime.
>>    - It wasn't obvious to me which window function would be appropriate
>>    for the problem I was trying to solve therefore I didn't experiment w/ 
>> that
>>    approach.
>>
>> I want to update/correct this statement:
>
>>
>>    - The selectivity of score_name='student_performance_index' was not
>>    enough for the planner to choose an index over doing a FTS.
>>
>> I added a partial index (WHERE
> score_name='student_performance_index'::citext) and that had a *dramatic*
> impact. That part of the query went from ~12 seconds to ~1 second.
>

*Another way to generate perf. gains on this query,  CREATE HASH INDEX ON
attempt_scores(score_name); --since score_name doesn't seem to have a big
cardinality*

Finally, thank you both for helping me bring this poor performing query to
>> heel. Your insights were helpful and greatly appreciated.
>>
>> Sincerely,
>>
>> Dane
>>
>>
>> On Tue, Feb 16, 2021 at 10:25 AM Dane Foster <studdu...@gmail.com> wrote:
>>
>>>
>>> On Tue, Feb 16, 2021 at 10:13 AM Michael Lewis <mle...@entrata.com>
>>> wrote:
>>>
>>>>                                                    Sort Method:
>>>>> external merge  Disk: 30760kB
>>>>>                                                    Worker 0:  Sort
>>>>> Method: external merge  Disk: 30760kB
>>>>>                                                    Worker 1:  Sort
>>>>> Method: external merge  Disk: 30760kB
>>>>>
>>>>
>>>> If you can increase work_mem, even setting it temporarily higher for
>>>> the session or transaction, that may dramatically change the plan.
>>>>
>>> I will try increasing work_mem for the session later today.
>>>
>>>> The advice given by Justin particularly about row estimates would be
>>>> wise to pursue.
>>>>
>>>
>>>
>>>> I'd wonder how selective that condition of score_name =
>>>> 'student_performance_index' is in filtering out many of the 9.3
>>>> million tuples in that table and if an index with that as the leading
>>>> column, or just an index on that column would be helpful.
>>>>
>>> There are 1,206,355 rows where score_name='student_performance_idex'.
>>>
>>>> You'd need to look at pg_stats for the table and see how many distinct
>>>> values, and if student_performance_index is relatively high or low (or not
>>>> present) in the MCVs list.
>>>>
>>> I will look into that.
>>>
>>>
>>>> I am not sure if your query does what you want it to do as I admit I
>>>> didn't follow your explanation of the desired behavior. My hunch is that
>>>> you want to make use of a window function and get rid of one of the CTEs.
>>>>
>>> If you could tell me what part(s) are unclear I would appreciate it so
>>> that I can write a better comment.
>>>
>>> Thank you sooo much for all the feedback. It is *greatly* appreciated!
>>> Sincerely,
>>>
>>> Dane
>>>
>>>

-- 
Regards,
Yo.

Re: Slow query and wrong row estimates for CTE

Reply via email to