> On Jan 13, 2020, at 5:41 PM, Israel Brewster <ijbrews...@alaska.edu> wrote:
> 
>> On Jan 13, 2020, at 3:19 PM, Tom Lane <t...@sss.pgh.pa.us 
>> <mailto:t...@sss.pgh.pa.us>> wrote:
>> 
>> Israel Brewster <ijbrews...@alaska.edu <mailto:ijbrews...@alaska.edu>> 
>> writes:
>>> In looking at the explain analyze output, I noticed that it had an 
>>> “external merge Disk” sort going on, accounting for about 1 second of the 
>>> runtime (explain analyze output here: https://explain.depesz.com/s/jx0q 
>>> <https://explain.depesz.com/s/jx0q> <https://explain.depesz.com/s/jx0q 
>>> <https://explain.depesz.com/s/jx0q>>). Since the machine has plenty of RAM 
>>> available, I went ahead and increased the work_mem parameter. Whereupon the 
>>> query plan got much simpler, and performance of said query completely 
>>> tanked, increasing to about 15.5 seconds runtime 
>>> (https://explain.depesz.com/s/Kl0S <https://explain.depesz.com/s/Kl0S> 
>>> <https://explain.depesz.com/s/Kl0S <https://explain.depesz.com/s/Kl0S>>), 
>>> most of which was in a HashAggregate.
>>> How can I fix this? Thanks.
>> 
>> Well, the brute-force way not to get that plan is "set enable_hashagg =
>> false".  But it'd likely be a better idea to try to improve the planner's
>> rowcount estimates.  The problem here seems to be lack of stats for
>> either "time_bucket('1 week', read_time)" or "read_time::date".
>> In the case of the latter, do you really need a coercion to date?
>> If it's a timestamp column, I'd think not.  As for the former,
>> if the table doesn't get a lot of updates then creating an expression
>> index on that expression might be useful.
>> 
> 
> Thanks for the suggestions. Disabling hash aggregates actually made things 
> even worse: (https://explain.depesz.com/s/cjDg 
> <https://explain.depesz.com/s/cjDg>), so even if that wasn’t a brute-force 
> option, it doesn’t appear to be a good one. Creating an index on the 
> time_bucket expression didn’t seem to make any difference, and my data does 
> get a lot of additions (though virtually no changes) anyway (about 1 
> additional record per second). As far as coercion to date, that’s so I can do 
> queries bounded by date, and actually have all results from said date 
> included. That said, I could of course simply make sure that when I get a 
> query parameter of, say, 2020-1-13, I expand that into a full date-time for 
> the end of the day. However, doing so for a test query didn’t seem to make 
> much of a difference either: https://explain.depesz.com/s/X5VT 
> <https://explain.depesz.com/s/X5VT>
> 
> So, to summarise:
> 
> Set enable_hasagg=off: worse
> Index on time_bucket expression: no change in execution time or query plan 
> that I can see
> Get rid of coercion to date: *slight* improvement. 14.692 seconds instead of 
> 15.5 seconds. And it looks like the row count estimates were actually worse.
> Lower work_mem, forcing a disk sort and completely different query plan: Way, 
> way better (around 6 seconds)
> 
> …so so far, it looks like the best option is to lower the work_mem, run the 
> query, then set it back?
> ---

I don’t see that you’ve updated the statistics?


Reply via email to