Heh, I honestly forgot about the recursive CTE. Certainly worth a try and wouldn't require installing other extensions.
This is what depesz is referring to: https://wiki.postgresql.org/wiki/Loose_indexscan On Thu, Sep 23, 2021 at 3:04 AM hubert depesz lubaczewski <dep...@depesz.com> wrote: > On Wed, Sep 22, 2021 at 12:05:22PM -0800, Israel Brewster wrote: > > I was wondering if there was any way to improve the performance of this > query: > > > > SELECT station,array_agg(distinct(channel)) as channels FROM data GROUP > BY station; > > > > The explain execution plan can be found here: > > https://explain.depesz.com/s/mtxB#html < > https://explain.depesz.com/s/mtxB#html> > > > > and it looks pretty straight forward. It does an index_only scan, > followed by an aggregate, to produce a result that is a list of stations > along with a list of channels associated with each (there can be anywhere > from 1 to 3 channels associated with each station). This query takes around > 5 minutes to run. > > > > To work around the issue, I created a materialized view that I can > update periodically, and of course I can query said view in no time flat. > However, I’m concerned that as the dataset grows, the time it takes to > refresh the view will also grow (correct me if I am wrong there). > > > > This is running PostgreSQL 13, and the index referenced is a two-column > index on data(station, channel) > > It looks that there is ~ 170 stations, and ~ 800 million rows int he > table. > > can you tell us how many rows has this: > > select distinct station, channel from data; > > If this is not huge, then you can make the query run much faster using > skip scan - recursive cte. > > Best regards, > > depesz > > > >