> On Sep 22, 2021, at 1:50 PM, Ryan Booz <r...@timescale.com> wrote: > > Cool. I'd be interested to see the explain on it if you ever try it again. On > that cardinality, I'd expect it to be really fast, so I'm interested to see > if the (SkipScan) nodes were actually used.
With timescaledb extension installed, the explain is what I posted in the original message (https://explain.depesz.com/s/mtxB#html <https://explain.depesz.com/s/mtxB#html>). Without timescaledb installed, the explain looks the same, except it takes twice as long to run. Unless I missed something in your message, i.e. some sort of tweak to the query to get it to use the timescaledb features? --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > > On Wed, Sep 22, 2021 at 5:35 PM Israel Brewster <ijbrews...@alaska.edu > <mailto:ijbrews...@alaska.edu>> wrote: > >> On Sep 22, 2021, at 12:49 PM, Ryan Booz <r...@timescale.com >> <mailto:r...@timescale.com>> wrote: >> >> [Timescale Dev Advocate here] >> I realize this might not be the most accepted answer (could be interpreted >> as trying to "sell" something), but feels like an opportunity to talk about >> DISTINCT queries and opportunities. Because you have that index, Timescale >> 2.3 added a "Skip Scan" query planner node that works on regular BTree >> indexes (it doesn't have to be time-series/TimescaleDB Hypertable data at >> all). In this case, your distinct query would likely run in a few >> milliseconds based on the counts you mention (170 stations, 3 channels per >> station), and then the outer aggregation would do the GROUP BY. So, you >> **could** add the TimescaleDB extension to your database (or a copy of) and >> give it a try. You don't actually need to use any TimescaleDB features >> otherwise. > > I had actually already done that, as I was considering, in spite of past > negative experiences with timescaledb, experimenting with it on this DB to > see if it worked any better with this data. Out of curiosity, I tried > removing the timescaledb extension, whereupon the query in question took > roughly twice as long. So you are right that installing timescaledb speeds > things up, even when not using any timescaledb specific functions. So that > was a good call. Thanks! > > --- > Israel Brewster > Software Engineer > Alaska Volcano Observatory > Geophysical Institute - UAF > 2156 Koyukuk Drive > Fairbanks AK 99775-7320 > Work: 907-474-5172 > cell: 907-328-9145 > >> >> A writeup of why this kind of DISTINCT query is slow in PostgreSQL (for now) >> and what we did to overcome it: >> https://blog.timescale.com/blog/how-we-made-distinct-queries-up-to-8000x-faster-on-postgresql/ >> >> <https://blog.timescale.com/blog/how-we-made-distinct-queries-up-to-8000x-faster-on-postgresql/> >> Plans for a similar feature in PostgreSQL proper that we'd totally support >> but hasn't made forward progress yet: >> https://commitfest.postgresql.org/19/1741/ >> <https://commitfest.postgresql.org/19/1741/> >> Anyway, it might be worth a shot. HTH >> >> Ryan B >> >> On Wed, Sep 22, 2021 at 4:27 PM David G. Johnston >> <david.g.johns...@gmail.com <mailto:david.g.johns...@gmail.com>> wrote: >> On Wed, Sep 22, 2021 at 1:21 PM Michael Lewis <mle...@entrata.com >> <mailto:mle...@entrata.com>> wrote: >> In the future, please share the plan returned by explain analyze, and some >> data about how many rows in the involved tables, >> >> I believe we consider it acceptable to link to an explain viewer, which is >> what the OP did. Reading explain output in email has its own challenges, >> and I'd rather have the website than a text attachment. >> >> >> How does the below work? It should do a very simple index scan only, then >> aggregate the relative few rows after the fact. >> >> select station, array_agg(distinct(channel)) as channels >> FROM( >> SELECT station,channel FROM data GROUP BY station,channel >> ) AS sub >> group by station; >> >> Yeah, am pondering this too, though seems like the queries should be >> identical so the plan/execution should be the same either way. >> >> >> If there is correlation between station & channel, then you might look at >> creating a multivariate statistics object and analyzing the table so the >> planner can make better choices >> >> There is no where clause so I'm doubtful there is much to be gained going >> down this path. The Index-Only scan seems like an optimal way to obtain >> this data and the existing query already does that. The aggregation path >> might vary though it seems like that shouldn't be the case here. >> >> David J. >