I increased (and decreased) the stats target for the column and
re-analyzed. Didn't make a difference.

Is it possible that the row estimate is off because of a column other than
time? I looked at the # of events in that time period and 1.8 million is
actually a good estimate. What about the
((strpos(other_events_1004175222.hierarchy, '#close_onborading;'::text) <>
0) condition in the filter? It makes sense that Postgres wouldn't have a
way to estimate how selective this condition is.

On Tue, Aug 17, 2021 at 2:52 PM Justin Pryzby <pry...@telsasoft.com> wrote:

> On Mon, Aug 16, 2021 at 11:22:44AM -0400, Matt Dupree wrote:
> > > Is either half of the AND estimated correctly?  If you do a query
> > > with only ">=", and a query with only "<=", do either of them give an
> > > accurate rowcount estimate ?
> >
> > Dropping >= results in the correct index being used. Dropping <= doesn't
> > have this effect.
>
> This doesn't answer the question though: are the rowcount estimes accurate
> (say
> within 10%).
>
> It sounds like interpolating the histogram is giving a poor result, at
> least
> over that range of values.  It'd be interesting to see the entire
> histogram.
>
> You might try increasing (or decreasing) the stats target for that column,
> and
> re-analyzing.
>
> Your histogram bounds are for ~38 months of data, and your query is for the
> previous month (July).
>
> $ date -d @1530186399
> Thu Jun 28 06:46:39 CDT 2018
> $ date -d @1629125609
> Mon Aug 16 09:53:29 CDT 2021
>
> $ date -d @1627369200
> Tue Jul 27 02:00:00 CDT 2021
> $ date -d @1624777200
> Sun Jun 27 02:00:00 CDT 2021
>
> The timestamp column has ndistinct near -1, similar to a continuous
> distribution, so I'm not sure why the estimate would be so bad.
>
> --
> Justin
>


-- 

K. Matt Dupree

Data Science Engineer
321.754.0526  |  matt.dup...@heap.io

Reply via email to