> On Sep 22, 2021, at 5:10 PM, David Rowley <dgrowle...@gmail.com> wrote: > > On Thu, 23 Sept 2021 at 08:27, David G. Johnston > <david.g.johns...@gmail.com> wrote: >> >> On Wed, Sep 22, 2021 at 1:21 PM Michael Lewis <mle...@entrata.com> wrote: >>> If there is correlation between station & channel, then you might look at >>> creating a multivariate statistics object and analyzing the table so the >>> planner can make better choices >> >> >> There is no where clause so I'm doubtful there is much to be gained going >> down this path. The Index-Only scan seems like an optimal way to obtain >> this data and the existing query already does that. The aggregation path >> might vary though it seems like that shouldn't be the case here. > > ndistinct extended statistics would be used to estimate the number of > groups in the GROUP BY clause for the version of the query that > performs GROUP BY station, channel. We've not seen the EXPLAIN > ANALYZE for that query, so don't know if there's any use for extended > statistics there. However, if the planner was to think there were > more groups than there actually are, then it would be less inclined to > do parallelise the GROUP BY. I think writing the query in such a way > that allows it to be parallelised is likely going to result in some > quite good performance improvements. i.e: > > select station, array_agg(channel) as channels > FROM( > SELECT station,channel FROM data GROUP BY station,channel > ) AS sub > group by station; >
Ah, yes indeed. That version runs in about 30 seconds rather than 5 minutes! See the explain analyze output here: https://explain.depesz.com/s/L5Bf <https://explain.depesz.com/s/L5Bf> It looks more complicated, but being able to run parallel definitely makes a difference, and there may be some other improvements in there that I’m not aware of as well! Still not quite fast enough for real-time queries, but certainly fast enough to keep a materialized view updated. And this is why I love postgresql and this community - when something isn’t working as well as I would like, there is usually a way to improve it drastically :-) Thanks! --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > which is just the same as Michael's version but without DISTINCT. > Also, Tom's point about lots of heap fetches is going to count for > quite a bit too, especially so if I/O plays a large part in the total > query time. > > David