*You are totally right, the max(score_value) FILTER (WHERE score_name = 'student_performance_index') in the SELECT clause is redundant.*
Le mer. 17 févr. 2021 à 21:33, Dane Foster <studdu...@gmail.com> a écrit : > On Wed, Feb 17, 2021 at 1:37 PM Yoan SULTAN <yey...@gmail.com> wrote: > >> *Hi all, * >> >> *This is my first post on this mailing list, I really enjoy it.* >> *I wanted to add some details and answers to this disccusion.* >> > I'm happy you've decided to join the conversation and about the fact that > you've opened up an entirely new avenue for me to investigate and learn > from. I feel like I'm about to level up my SQL-fu! 😊 > > >> 17 févr. 2021 à 17:52, Dane Foster <studdu...@gmail.com> a écrit : >> >>> >>> A small update (see below/inline). >>> >>> >>> On Tue, Feb 16, 2021 at 2:11 PM Dane Foster <studdu...@gmail.com> wrote: >>> >>>> Short conclusion: >>>> Switching from CTEs to temporary tables and analyzing reduced the >>>> runtime from 15 minutes to about 1.5 minutes. >>>> >>>> *The attempt_scores table is pretty big and it is called 3 times; try >> to rewrite the query in order to reduce call to this table. for example :* >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> *EXPLAIN (ANALYZE, BUFFERS)WITH reports AS ( SELECT student_id, >> assignment_id, max(score_value) FILTER (WHERE score_name = >> 'student_performance_index'), >> max(attempt_report_id) maxid, >> max(score_value) spi FROM attempt_scores GROUP BY student_id, >> assignment_id HAVING max(score_value) > 0 AND max(score_value) FILTER >> (WHERE score_name = 'student_performance_index') = max(score_value))SELECT >> avg(spi) spi, avg(CASE score_name WHEN 'digital_clinical_experience' THEN >> score_value END) dce, avg(CASE score_name WHEN >> 'tier1_subjective_data_collection' THEN score_value END) sdcFROM >> attempt_scores JOIN reports ON >> reports.maxid=attempt_scores.attempt_report_id;* >> > Given: HAVING max(score_value) > 0 AND max(score_value) FILTER (WHERE > score_name = 'student_performance_index') = max(score_value) > > Why: max(score_value) FILTER (WHERE score_name = > 'student_performance_index') but no FILTER clause on: > max(attempt_report_id)? > > Some context for my question. I'm new to aggregate expressions therefore I > don't have a strong mental model for what's happening. So let me tell you > what I *think* is happening and you can correct me. > > The new HAVING clause that you've added ensures that for each > student/assignment pair/group that we are selecting the max spi value > (i.e., score_name = 'student_performance_index'). Therefore, isn't the > FILTER clause in the SELECT section redundant? And if it's *not* > redundant then why isn't it necessary for: max(attempt_report_id)? > > > *Also, I would continue to increase work_mem to 200MB until the external >> merge is not required.* >> *SET WORK_MEM='200MB'; -- to change only at session level* >> >>> >>>> Longer conclusion: >>>> >>>> @Justin Pryzby <pry...@telsasoft.com> >>>> >>>> - I experimented w/ materializing the CTEs and it helped at the >>>> margins but did not significantly contribute to a reduction in runtime. >>>> - No clustering was required because once I switched to temporary >>>> tables the new plan no longer used the for_upsert index. >>>> >>>> @Michael Lewis <mle...@entrata.com> >>>> >>>> - Increasing work_mem to 100MB (up from 4MB) helped at the margins >>>> (i.e., some 100's of millisecond improvement) but did not represent a >>>> significant reduction in the runtime. >>>> - It wasn't obvious to me which window function would be >>>> appropriate for the problem I was trying to solve therefore I didn't >>>> experiment w/ that approach. >>>> >>>> I want to update/correct this statement: >>> >>>> >>>> - The selectivity of score_name='student_performance_index' was not >>>> enough for the planner to choose an index over doing a FTS. >>>> >>>> I added a partial index (WHERE >>> score_name='student_performance_index'::citext) and that had a >>> *dramatic* impact. That part of the query went from ~12 seconds to ~1 >>> second. >>> >> >> *Another way to generate perf. gains on this query, CREATE HASH INDEX ON >> attempt_scores(score_name); --since score_name doesn't seem to have a big >> cardinality* >> >> Finally, thank you both for helping me bring this poor performing query >>>> to heel. Your insights were helpful and greatly appreciated. >>>> >>>> Sincerely, >>>> >>>> Dane >>>> >>>> >>>> On Tue, Feb 16, 2021 at 10:25 AM Dane Foster <studdu...@gmail.com> >>>> wrote: >>>> >>>>> >>>>> On Tue, Feb 16, 2021 at 10:13 AM Michael Lewis <mle...@entrata.com> >>>>> wrote: >>>>> >>>>>> Sort Method: >>>>>>> external merge Disk: 30760kB >>>>>>> Worker 0: Sort >>>>>>> Method: external merge Disk: 30760kB >>>>>>> Worker 1: Sort >>>>>>> Method: external merge Disk: 30760kB >>>>>>> >>>>>> >>>>>> If you can increase work_mem, even setting it temporarily higher for >>>>>> the session or transaction, that may dramatically change the plan. >>>>>> >>>>> I will try increasing work_mem for the session later today. >>>>> >>>>>> The advice given by Justin particularly about row estimates would be >>>>>> wise to pursue. >>>>>> >>>>> >>>>> >>>>>> I'd wonder how selective that condition of score_name = >>>>>> 'student_performance_index' is in filtering out many of the 9.3 >>>>>> million tuples in that table and if an index with that as the leading >>>>>> column, or just an index on that column would be helpful. >>>>>> >>>>> There are 1,206,355 rows where score_name='student_performance_idex'. >>>>> >>>>>> You'd need to look at pg_stats for the table and see how many >>>>>> distinct values, and if student_performance_index is relatively high or >>>>>> low >>>>>> (or not present) in the MCVs list. >>>>>> >>>>> I will look into that. >>>>> >>>>> >>>>>> I am not sure if your query does what you want it to do as I admit I >>>>>> didn't follow your explanation of the desired behavior. My hunch is that >>>>>> you want to make use of a window function and get rid of one of the CTEs. >>>>>> >>>>> If you could tell me what part(s) are unclear I would appreciate it so >>>>> that I can write a better comment. >>>>> >>>>> Thank you sooo much for all the feedback. It is *greatly* appreciated! >>>>> Sincerely, >>>>> >>>>> Dane >>>>> >>>>> >> >> -- >> Regards, >> Yo. >> > Again, thanks for joining the conversation. I look forward to hearing from > you. > > Sincerely, > > Dane > -- Regards, Yo.