> > On Jan 13, 2020, at 3:46 PM, Rob Sargent <robjsarg...@gmail.com> wrote: > > > >> On Jan 13, 2020, at 5:41 PM, Israel Brewster <ijbrews...@alaska.edu >> <mailto:ijbrews...@alaska.edu>> wrote: >> >>> On Jan 13, 2020, at 3:19 PM, Tom Lane <t...@sss.pgh.pa.us >>> <mailto:t...@sss.pgh.pa.us>> wrote: >>> >>> Israel Brewster <ijbrews...@alaska.edu <mailto:ijbrews...@alaska.edu>> >>> writes: >>>> In looking at the explain analyze output, I noticed that it had an >>>> “external merge Disk” sort going on, accounting for about 1 second of the >>>> runtime (explain analyze output here: https://explain.depesz.com/s/jx0q >>>> <https://explain.depesz.com/s/jx0q> <https://explain.depesz.com/s/jx0q >>>> <https://explain.depesz.com/s/jx0q>>). Since the machine has plenty of RAM >>>> available, I went ahead and increased the work_mem parameter. Whereupon >>>> the query plan got much simpler, and performance of said query completely >>>> tanked, increasing to about 15.5 seconds runtime >>>> (https://explain.depesz.com/s/Kl0S <https://explain.depesz.com/s/Kl0S> >>>> <https://explain.depesz.com/s/Kl0S <https://explain.depesz.com/s/Kl0S>>), >>>> most of which was in a HashAggregate. >>>> How can I fix this? Thanks. >>> >>> Well, the brute-force way not to get that plan is "set enable_hashagg = >>> false". But it'd likely be a better idea to try to improve the planner's >>> rowcount estimates. The problem here seems to be lack of stats for >>> either "time_bucket('1 week', read_time)" or "read_time::date". >>> In the case of the latter, do you really need a coercion to date? >>> If it's a timestamp column, I'd think not. As for the former, >>> if the table doesn't get a lot of updates then creating an expression >>> index on that expression might be useful. >>> >> >> Thanks for the suggestions. Disabling hash aggregates actually made things >> even worse: (https://explain.depesz.com/s/cjDg >> <https://explain.depesz.com/s/cjDg>), so even if that wasn’t a brute-force >> option, it doesn’t appear to be a good one. Creating an index on the >> time_bucket expression didn’t seem to make any difference, and my data does >> get a lot of additions (though virtually no changes) anyway (about 1 >> additional record per second). As far as coercion to date, that’s so I can >> do queries bounded by date, and actually have all results from said date >> included. That said, I could of course simply make sure that when I get a >> query parameter of, say, 2020-1-13, I expand that into a full date-time for >> the end of the day. However, doing so for a test query didn’t seem to make >> much of a difference either: https://explain.depesz.com/s/X5VT >> <https://explain.depesz.com/s/X5VT> >> >> So, to summarise: >> >> Set enable_hasagg=off: worse >> Index on time_bucket expression: no change in execution time or query plan >> that I can see >> Get rid of coercion to date: *slight* improvement. 14.692 seconds instead of >> 15.5 seconds. And it looks like the row count estimates were actually worse. >> Lower work_mem, forcing a disk sort and completely different query plan: >> Way, way better (around 6 seconds) >> >> …so so far, it looks like the best option is to lower the work_mem, run the >> query, then set it back? >> --- > > I don’t see that you’ve updated the statistics?
Ummmm….no. I know nothing about that :-) Some research tells me that a) it should happen as part of the autovacuum process, and that b) I may not be running autovacuum enough, since it is a large table and doesn’t change often. But I don’t really know. --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145