James R Skaggs <james.r.ska...@seagate.com> wrote: > Agreed, we shouldn't have so many dead rows. Our autovacuum is > set on but has default parameters. We are clustering today. > This, of course, means downtime and inconvenience to the users.
Right, which is why it's important to figure out why the bloat happened. Sometimes it is unavoidable, like when you delete 90% of the rows in your table or a long-lived "idle in transaction" connection prevents autovacuum from being able to do its work normally. To prevent further downtime it is important to figure out what happened and make appropriate changes to your monitoring or vacuuming. > Here is the troublesome query: > >> select >> sum(case when t1.cdlc < 0 then 0 else t1.cdlc end) as code_line_cnt, >> sum(case when t1.cmlc < 0 then 0 else t1.cmlc end) as comment_line_cnt, >> sum(case when t1.bllc < 0 then 0 else t1.bllc end) as blank_line_cnt >> from >> ( select >> max(sf.current_code_line_count) as cdlc, >> max(sf.current_comment_line_count) as cmlc, >> max(sf.current_blank_line_count) as bllc >> from >> stream_file sf >> group by sf.file_path_id, sf.current_source_md5 >> ) as t1; Well, I don't see that the planner has a lot of choice there besides whether to use a sort or a hash to do the inner aggregation. Are you saying that prior to the bloat it used a hash aggregation, and that was faster? And that you feel that it should be using that even with the bloat? That the dead rows seem to be getting included in the statistics, driving to the slower plan, and you feel they should be omitted? Note that I'm not aruing one way or another on these points at the moment; I'm just trying to understand your point clearly. -Kevin -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs