A small update (see below/inline).
On Tue, Feb 16, 2021 at 2:11 PM Dane Foster <studdu...@gmail.com> wrote: > Short conclusion: > Switching from CTEs to temporary tables and analyzing reduced the runtime > from 15 minutes to about 1.5 minutes. > > > Longer conclusion: > > @Justin Pryzby <pry...@telsasoft.com> > > - I experimented w/ materializing the CTEs and it helped at the > margins but did not significantly contribute to a reduction in runtime. > - No clustering was required because once I switched to temporary > tables the new plan no longer used the for_upsert index. > > @Michael Lewis <mle...@entrata.com> > > - Increasing work_mem to 100MB (up from 4MB) helped at the margins > (i.e., some 100's of millisecond improvement) but did not represent a > significant reduction in the runtime. > - It wasn't obvious to me which window function would be appropriate > for the problem I was trying to solve therefore I didn't experiment w/ that > approach. > > I want to update/correct this statement: > > - The selectivity of score_name='student_performance_index' was not > enough for the planner to choose an index over doing a FTS. > > I added a partial index (WHERE score_name='student_performance_index'::citext) and that had a *dramatic* impact. That part of the query went from ~12 seconds to ~1 second. > Finally, thank you both for helping me bring this poor performing query to > heel. Your insights were helpful and greatly appreciated. > > Sincerely, > > Dane > > > On Tue, Feb 16, 2021 at 10:25 AM Dane Foster <studdu...@gmail.com> wrote: > >> >> On Tue, Feb 16, 2021 at 10:13 AM Michael Lewis <mle...@entrata.com> >> wrote: >> >>> Sort Method: external >>>> merge Disk: 30760kB >>>> Worker 0: Sort >>>> Method: external merge Disk: 30760kB >>>> Worker 1: Sort >>>> Method: external merge Disk: 30760kB >>>> >>> >>> If you can increase work_mem, even setting it temporarily higher for the >>> session or transaction, that may dramatically change the plan. >>> >> I will try increasing work_mem for the session later today. >> >>> The advice given by Justin particularly about row estimates would be >>> wise to pursue. >>> >> >> >>> I'd wonder how selective that condition of score_name = >>> 'student_performance_index' is in filtering out many of the 9.3 million >>> tuples in that table and if an index with that as the leading column, or >>> just an index on that column would be helpful. >>> >> There are 1,206,355 rows where score_name='student_performance_idex'. >> >>> You'd need to look at pg_stats for the table and see how many distinct >>> values, and if student_performance_index is relatively high or low (or not >>> present) in the MCVs list. >>> >> I will look into that. >> >> >>> I am not sure if your query does what you want it to do as I admit I >>> didn't follow your explanation of the desired behavior. My hunch is that >>> you want to make use of a window function and get rid of one of the CTEs. >>> >> If you could tell me what part(s) are unclear I would appreciate it so >> that I can write a better comment. >> >> Thank you sooo much for all the feedback. It is *greatly* appreciated! >> Sincerely, >> >> Dane >> >>