I increased (and decreased) the stats target for the column and re-analyzed. Didn't make a difference.
Is it possible that the row estimate is off because of a column other than time? I looked at the # of events in that time period and 1.8 million is actually a good estimate. What about the ((strpos(other_events_1004175222.hierarchy, '#close_onborading;'::text) <> 0) condition in the filter? It makes sense that Postgres wouldn't have a way to estimate how selective this condition is. On Tue, Aug 17, 2021 at 2:52 PM Justin Pryzby <pry...@telsasoft.com> wrote: > On Mon, Aug 16, 2021 at 11:22:44AM -0400, Matt Dupree wrote: > > > Is either half of the AND estimated correctly? If you do a query > > > with only ">=", and a query with only "<=", do either of them give an > > > accurate rowcount estimate ? > > > > Dropping >= results in the correct index being used. Dropping <= doesn't > > have this effect. > > This doesn't answer the question though: are the rowcount estimes accurate > (say > within 10%). > > It sounds like interpolating the histogram is giving a poor result, at > least > over that range of values. It'd be interesting to see the entire > histogram. > > You might try increasing (or decreasing) the stats target for that column, > and > re-analyzing. > > Your histogram bounds are for ~38 months of data, and your query is for the > previous month (July). > > $ date -d @1530186399 > Thu Jun 28 06:46:39 CDT 2018 > $ date -d @1629125609 > Mon Aug 16 09:53:29 CDT 2021 > > $ date -d @1627369200 > Tue Jul 27 02:00:00 CDT 2021 > $ date -d @1624777200 > Sun Jun 27 02:00:00 CDT 2021 > > The timestamp column has ndistinct near -1, similar to a continuous > distribution, so I'm not sure why the estimate would be so bad. > > -- > Justin > -- K. Matt Dupree Data Science Engineer 321.754.0526 | matt.dup...@heap.io