Dear Michael,

On Thu, Oct 10, 2019 at 5:20 PM Michael Lewis <mle...@entrata.com> wrote:

> Since the optimizer is choosing a seq scan over index scan when it seems
> like it has good row estimates in both cases, to me that may mean costs of
> scanning index are expected to be high. Is this workload on SSD? Has the
> random_page_cost config been decreased from default 4 (compared with cost
> of 1 unit for sequential scan)?
>

It's 1.5


> Your buffer hits aren't great. What is shared_buffers set to? How much ram
> on this cluster?
>

shared_buffers is 4GB.  It has 500G of RAM, but server has several clusters
on it.


>
> With this table being insert only, one assumes correlation is very high on
> the data in this column as shown in pg_stats, but have your confirmed?
>

Yes, but the issue isn't with the BRIN index performing badly or being
fragmented.  It's that it performs great (7x faster than the seq scan) but
postgres doesn't pick using it.  I have seen this same issue also in other
attempts I have made to use BRIN.


> To me, distinct ON is often a bad code smell and probably can be
> re-written to be much more efficient with GROUP BY, lateral & order by, or
> some other tool. Same with the window function. It is a powerful tool, but
> sometimes not the right one.
>

I don't really agree, but it's beside the point because the issue is not in
aggregation.  It's pre-aggregation.  Indeed if I run my query as a simple
select (as I tried) it's the exact same planning issue.  (In my experience,
distinct on for given example is the fastest.  Same with window functions
which prevent inefficient self-joins)


> Is "source" a function that is called on field1? What is it doing/how is
> it defined?
>

I can't see how that matters either, but the "source" function is a mask
for a built-in pg function that is trivial.  This whole query is masked so
as not to expose our actual prod query, but I hope it's still
understandable enough :).

My question is not how to make this query faster in general.  It's that I
want to use BRIN indexes very much, but I'm not sure I can trust they will
scale with the right query plan like I know BTREE will.

Thanks!
Jeremy

Reply via email to