*Hi all, * *This is my first post on this mailing list, I really enjoy it.* *I wanted to add some details and answers to this disccusion.*
17 févr. 2021 à 17:52, Dane Foster <studdu...@gmail.com> a écrit : > > A small update (see below/inline). > > > On Tue, Feb 16, 2021 at 2:11 PM Dane Foster <studdu...@gmail.com> wrote: > >> Short conclusion: >> Switching from CTEs to temporary tables and analyzing reduced the runtime >> from 15 minutes to about 1.5 minutes. >> >> *The attempt_scores table is pretty big and it is called 3 times; try to rewrite the query in order to reduce call to this table. for example :* *EXPLAIN (ANALYZE, BUFFERS)WITH reports AS ( SELECT student_id, assignment_id, max(score_value) FILTER (WHERE score_name = 'student_performance_index'), max(attempt_report_id) maxid, max(score_value) spi FROM attempt_scores GROUP BY student_id, assignment_id HAVING max(score_value) > 0 AND max(score_value) FILTER (WHERE score_name = 'student_performance_index') = max(score_value))SELECT avg(spi) spi, avg(CASE score_name WHEN 'digital_clinical_experience' THEN score_value END) dce, avg(CASE score_name WHEN 'tier1_subjective_data_collection' THEN score_value END) sdcFROM attempt_scores JOIN reports ON reports.maxid=attempt_scores.attempt_report_id;* *Also, I would continue to increase work_mem to 200MB until the external merge is not required.* *SET WORK_MEM='200MB'; -- to change only at session level* > >> Longer conclusion: >> >> @Justin Pryzby <pry...@telsasoft.com> >> >> - I experimented w/ materializing the CTEs and it helped at the >> margins but did not significantly contribute to a reduction in runtime. >> - No clustering was required because once I switched to temporary >> tables the new plan no longer used the for_upsert index. >> >> @Michael Lewis <mle...@entrata.com> >> >> - Increasing work_mem to 100MB (up from 4MB) helped at the margins >> (i.e., some 100's of millisecond improvement) but did not represent a >> significant reduction in the runtime. >> - It wasn't obvious to me which window function would be appropriate >> for the problem I was trying to solve therefore I didn't experiment w/ >> that >> approach. >> >> I want to update/correct this statement: > >> >> - The selectivity of score_name='student_performance_index' was not >> enough for the planner to choose an index over doing a FTS. >> >> I added a partial index (WHERE > score_name='student_performance_index'::citext) and that had a *dramatic* > impact. That part of the query went from ~12 seconds to ~1 second. > *Another way to generate perf. gains on this query, CREATE HASH INDEX ON attempt_scores(score_name); --since score_name doesn't seem to have a big cardinality* Finally, thank you both for helping me bring this poor performing query to >> heel. Your insights were helpful and greatly appreciated. >> >> Sincerely, >> >> Dane >> >> >> On Tue, Feb 16, 2021 at 10:25 AM Dane Foster <studdu...@gmail.com> wrote: >> >>> >>> On Tue, Feb 16, 2021 at 10:13 AM Michael Lewis <mle...@entrata.com> >>> wrote: >>> >>>> Sort Method: >>>>> external merge Disk: 30760kB >>>>> Worker 0: Sort >>>>> Method: external merge Disk: 30760kB >>>>> Worker 1: Sort >>>>> Method: external merge Disk: 30760kB >>>>> >>>> >>>> If you can increase work_mem, even setting it temporarily higher for >>>> the session or transaction, that may dramatically change the plan. >>>> >>> I will try increasing work_mem for the session later today. >>> >>>> The advice given by Justin particularly about row estimates would be >>>> wise to pursue. >>>> >>> >>> >>>> I'd wonder how selective that condition of score_name = >>>> 'student_performance_index' is in filtering out many of the 9.3 >>>> million tuples in that table and if an index with that as the leading >>>> column, or just an index on that column would be helpful. >>>> >>> There are 1,206,355 rows where score_name='student_performance_idex'. >>> >>>> You'd need to look at pg_stats for the table and see how many distinct >>>> values, and if student_performance_index is relatively high or low (or not >>>> present) in the MCVs list. >>>> >>> I will look into that. >>> >>> >>>> I am not sure if your query does what you want it to do as I admit I >>>> didn't follow your explanation of the desired behavior. My hunch is that >>>> you want to make use of a window function and get rid of one of the CTEs. >>>> >>> If you could tell me what part(s) are unclear I would appreciate it so >>> that I can write a better comment. >>> >>> Thank you sooo much for all the feedback. It is *greatly* appreciated! >>> Sincerely, >>> >>> Dane >>> >>> -- Regards, Yo.