Short conclusion:
Switching from CTEs to temporary tables and analyzing reduced the runtime
from 15 minutes to about 1.5 minutes.


Longer conclusion:

@Justin Pryzby <pry...@telsasoft.com>

   - I experimented w/ materializing the CTEs and it helped at the margins
   but did not significantly contribute to a reduction in runtime.
   - No clustering was required because once I switched to temporary tables
   the new plan no longer used the for_upsert index.

@Michael Lewis <mle...@entrata.com>

   - Increasing work_mem to 100MB (up from 4MB) helped at the margins
   (i.e., some 100's of millisecond improvement) but did not represent a
   significant reduction in the runtime.
   - It wasn't obvious to me which window function would be appropriate for
   the problem I was trying to solve therefore I didn't experiment w/ that
   approach.
   - The selectivity of score_name='student_performance_index' was not
   enough for the planner to choose an index over doing a FTS.

Finally, thank you both for helping me bring this poor performing query to
heel. Your insights were helpful and greatly appreciated.

Sincerely,

Dane


On Tue, Feb 16, 2021 at 10:25 AM Dane Foster <studdu...@gmail.com> wrote:

>
> On Tue, Feb 16, 2021 at 10:13 AM Michael Lewis <mle...@entrata.com> wrote:
>
>>                                                    Sort Method: external
>>> merge  Disk: 30760kB
>>>                                                    Worker 0:  Sort
>>> Method: external merge  Disk: 30760kB
>>>                                                    Worker 1:  Sort
>>> Method: external merge  Disk: 30760kB
>>>
>>
>> If you can increase work_mem, even setting it temporarily higher for the
>> session or transaction, that may dramatically change the plan.
>>
> I will try increasing work_mem for the session later today.
>
>> The advice given by Justin particularly about row estimates would be wise
>> to pursue.
>>
>
>
>> I'd wonder how selective that condition of score_name =
>> 'student_performance_index' is in filtering out many of the 9.3 million
>> tuples in that table and if an index with that as the leading column, or
>> just an index on that column would be helpful.
>>
> There are 1,206,355 rows where score_name='student_performance_idex'.
>
>> You'd need to look at pg_stats for the table and see how many distinct
>> values, and if student_performance_index is relatively high or low (or not
>> present) in the MCVs list.
>>
> I will look into that.
>
>
>> I am not sure if your query does what you want it to do as I admit I
>> didn't follow your explanation of the desired behavior. My hunch is that
>> you want to make use of a window function and get rid of one of the CTEs.
>>
> If you could tell me what part(s) are unclear I would appreciate it so
> that I can write a better comment.
>
> Thank you sooo much for all the feedback. It is *greatly* appreciated!
> Sincerely,
>
> Dane
>
>

Reply via email to