> On Jan 13, 2020, at 3:19 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> Israel Brewster <ijbrews...@alaska.edu> writes:
>> In looking at the explain analyze output, I noticed that it had an “external 
>> merge Disk” sort going on, accounting for about 1 second of the runtime 
>> (explain analyze output here: https://explain.depesz.com/s/jx0q 
>> <https://explain.depesz.com/s/jx0q>). Since the machine has plenty of RAM 
>> available, I went ahead and increased the work_mem parameter. Whereupon the 
>> query plan got much simpler, and performance of said query completely 
>> tanked, increasing to about 15.5 seconds runtime 
>> (https://explain.depesz.com/s/Kl0S <https://explain.depesz.com/s/Kl0S>), 
>> most of which was in a HashAggregate.
>> How can I fix this? Thanks.
> 
> Well, the brute-force way not to get that plan is "set enable_hashagg =
> false".  But it'd likely be a better idea to try to improve the planner's
> rowcount estimates.  The problem here seems to be lack of stats for
> either "time_bucket('1 week', read_time)" or "read_time::date".
> In the case of the latter, do you really need a coercion to date?
> If it's a timestamp column, I'd think not.  As for the former,
> if the table doesn't get a lot of updates then creating an expression
> index on that expression might be useful.
> 

Thanks for the suggestions. Disabling hash aggregates actually made things even 
worse: (https://explain.depesz.com/s/cjDg <https://explain.depesz.com/s/cjDg>), 
so even if that wasn’t a brute-force option, it doesn’t appear to be a good 
one. Creating an index on the time_bucket expression didn’t seem to make any 
difference, and my data does get a lot of additions (though virtually no 
changes) anyway (about 1 additional record per second). As far as coercion to 
date, that’s so I can do queries bounded by date, and actually have all results 
from said date included. That said, I could of course simply make sure that 
when I get a query parameter of, say, 2020-1-13, I expand that into a full 
date-time for the end of the day. However, doing so for a test query didn’t 
seem to make much of a difference either: https://explain.depesz.com/s/X5VT 
<https://explain.depesz.com/s/X5VT>

So, to summarise:

Set enable_hasagg=off: worse
Index on time_bucket expression: no change in execution time or query plan that 
I can see
Get rid of coercion to date: *slight* improvement. 14.692 seconds instead of 
15.5 seconds. And it looks like the row count estimates were actually worse.
Lower work_mem, forcing a disk sort and completely different query plan: Way, 
way better (around 6 seconds)

…so so far, it looks like the best option is to lower the work_mem, run the 
query, then set it back?
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

>                       regards, tom lane

Reply via email to