On Tue, Mar 23, 2021 at 2:52 AM Paul McGarry <p...@paulmcgarry.com> wrote:
> > > On Tue, 23 Mar 2021 at 16:13, Justin Pryzby <pry...@telsasoft.com> wrote: > >> On Tue, Mar 23, 2021 at 03:00:38PM +1100, Paul McGarry wrote: >> > I have a query where Postgresql (11.9 at the moment) is making an odd >> plan >> > choice, choosing to use index scans which require filtering out >> millions of >> > rows, rather than "just" doing an aggregate over the rows the where >> clause >> > targets which is much faster. >> > AFAICT it isn't a statistics problem, at least increasing the stats >> target >> > and analyzing the table doesn't seem to fix the problem. >> >> > explain analyze select min(risk_id),max(risk_id) from risk where >> > time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00'; >> >> I'm guessing the time and ID columns are highly correlated... >> >> So the planner thinks it can get the smallest ID by scanning the ID >> index, but >> then ends up rejecting the first 161e6 rows for which the time is too >> low, and >> fails the >= condition. >> >> And thinks it can get the greatest ID by backward scanning the ID idx, >> but ends >> up rejecting/filtering the first 41e6 rows, for which the time is too >> high, >> failing the < condition. >> > > Yes, the columns are highly correlated, but that alone doesn't seem like > it should be sufficient criteria to choose this plan. > Ie the selection criteria (1 day of data about a year ago) has a year+ > worth of data after it and probably a decade of data before it, so anything > walking a correlated index from top or bottom is going to have to walk past > a lot of data before it gets to data that fits the criteria. > I assume you have a statistic on the correlated columns, ie `create statistic` ? If you can't use partitions on your date column, can you use partial indexes instead? Or a functional index with min() over day and max() over day?