Short conclusion: Switching from CTEs to temporary tables and analyzing reduced the runtime from 15 minutes to about 1.5 minutes.
Longer conclusion: @Justin Pryzby <pry...@telsasoft.com> - I experimented w/ materializing the CTEs and it helped at the margins but did not significantly contribute to a reduction in runtime. - No clustering was required because once I switched to temporary tables the new plan no longer used the for_upsert index. @Michael Lewis <mle...@entrata.com> - Increasing work_mem to 100MB (up from 4MB) helped at the margins (i.e., some 100's of millisecond improvement) but did not represent a significant reduction in the runtime. - It wasn't obvious to me which window function would be appropriate for the problem I was trying to solve therefore I didn't experiment w/ that approach. - The selectivity of score_name='student_performance_index' was not enough for the planner to choose an index over doing a FTS. Finally, thank you both for helping me bring this poor performing query to heel. Your insights were helpful and greatly appreciated. Sincerely, Dane On Tue, Feb 16, 2021 at 10:25 AM Dane Foster <studdu...@gmail.com> wrote: > > On Tue, Feb 16, 2021 at 10:13 AM Michael Lewis <mle...@entrata.com> wrote: > >> Sort Method: external >>> merge Disk: 30760kB >>> Worker 0: Sort >>> Method: external merge Disk: 30760kB >>> Worker 1: Sort >>> Method: external merge Disk: 30760kB >>> >> >> If you can increase work_mem, even setting it temporarily higher for the >> session or transaction, that may dramatically change the plan. >> > I will try increasing work_mem for the session later today. > >> The advice given by Justin particularly about row estimates would be wise >> to pursue. >> > > >> I'd wonder how selective that condition of score_name = >> 'student_performance_index' is in filtering out many of the 9.3 million >> tuples in that table and if an index with that as the leading column, or >> just an index on that column would be helpful. >> > There are 1,206,355 rows where score_name='student_performance_idex'. > >> You'd need to look at pg_stats for the table and see how many distinct >> values, and if student_performance_index is relatively high or low (or not >> present) in the MCVs list. >> > I will look into that. > > >> I am not sure if your query does what you want it to do as I admit I >> didn't follow your explanation of the desired behavior. My hunch is that >> you want to make use of a window function and get rid of one of the CTEs. >> > If you could tell me what part(s) are unclear I would appreciate it so > that I can write a better comment. > > Thank you sooo much for all the feedback. It is *greatly* appreciated! > Sincerely, > > Dane > >