Re: BRIN index which is much faster never chosen by planner

Tomas Vondra Fri, 11 Oct 2019 11:46:50 -0700

On Fri, Oct 11, 2019 at 09:08:05AM -0500, Jeremy Finzel wrote:

On Thu, Oct 10, 2019 at 7:22 PM David Rowley <david.row...@2ndquadrant.com>
wrote:

The planner might be able to get a better estimate on the number of
matching rows if the now() - interval '10 days' expression was
replaced with 'now'::timestamptz - interval '10 days'. However, care
would need to be taken to ensure the plan is never prepared since
'now' is evaluated during parse. The same care must be taken when
creating views, functions, stored procedures and the like.


You are on to something here I think with the now() function, even if above
suggestion is not exactly right as you said further down.  I am finding a
hard-coded timestamp gives the right query plan.  I also tested same with
even bigger window (last 16 days) and it yet still chooses the brin index.

foo_prod=# EXPLAIN
foo_prod-# SELECT
foo_prod-#  category, source, MIN(rec_insert_time) OVER (partition by
source order by rec_insert_time) AS first_source_time, MAX(rec_insert_time)
OVER (partition by source order by rec_insert_time) AS last_source_time
foo_prod-# FROM (SELECT DISTINCT ON (brand_id, last_change, log_id)
foo_prod(# category, source(field1) AS source, rec_insert_time
foo_prod(# FROM log_table l
foo_prod(# INNER JOIN public.small_join_table filter ON filter.category =
l.category
foo_prod(# WHERE field1 IS NOT NULL AND l.category = 'music'
foo_prod(# AND l.rec_insert_time >= now() - interval '10 days'
foo_prod(# ORDER BY brand_id, last_change, log_id, rec_insert_time DESC)
unique_cases;

  QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
WindowAgg  (cost=24436329.10..24436343.56 rows=643 width=120)
  ->  Sort  (cost=24436329.10..24436330.70 rows=643 width=104)
        Sort Key: unique_cases.source, unique_cases.rec_insert_time
        ->  Subquery Scan on unique_cases  (cost=24436286.24..24436299.10
rows=643 width=104)
              ->  Unique  (cost=24436286.24..24436292.67 rows=643
width=124)
                    ->  Sort  (cost=24436286.24..24436287.85 rows=643
width=124)
                          Sort Key: l.brand_id, l.last_change, l.log_id,
l.rec_insert_time DESC
                          ->  Nested Loop  (cost=0.00..24436256.25
rows=643 width=124)
                                Join Filter: ((l.category)::text =
filter.category)
                                ->  Seq Scan on small_join_table filter
(cost=0.00..26.99 rows=1399 width=8)
                                ->  Materialize  (cost=0.00..24420487.02
rows=643 width=99)
                                      ->  Seq Scan on log_table l
(cost=0.00..24420483.80 rows=643 width=99)
                                            Filter: ((field1 IS NOT NULL)
AND (category = 'music'::name) AND (rec_insert_time >= (now() - '10
days'::interval)))
(13 rows)

foo_prod=# SELECT now() - interval '10 days';
          ?column?
-------------------------------
2019-10-01 08:20:38.115471-05
(1 row)

foo_prod=# EXPLAIN
SELECT
category, source, MIN(rec_insert_time) OVER (partition by source order by
rec_insert_time) AS first_source_time, MAX(rec_insert_time) OVER (partition
by source order by rec_insert_time) AS last_source_time
FROM (SELECT DISTINCT ON (brand_id, last_change, log_id)
category, source(field1) AS source, rec_insert_time
FROM log_table l
INNER JOIN public.small_join_table filter ON filter.category = l.category
WHERE field1 IS NOT NULL AND l.category = 'music'
AND l.rec_insert_time >= '2019-10-01 08:20:38.115471-05'
ORDER BY brand_id, last_change, log_id, rec_insert_time DESC) unique_cases;

 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
WindowAgg  (cost=19664576.17..19664590.63 rows=643 width=120)
  ->  Sort  (cost=19664576.17..19664577.77 rows=643 width=104)
        Sort Key: unique_cases.source, unique_cases.rec_insert_time
        ->  Subquery Scan on unique_cases  (cost=19664533.31..19664546.17
rows=643 width=104)
              ->  Unique  (cost=19664533.31..19664539.74 rows=643
width=124)
                    ->  Sort  (cost=19664533.31..19664534.92 rows=643
width=124)
                          Sort Key: l.brand_id, l.last_change, l.log_id,
l.rec_insert_time DESC
                          ->  Nested Loop  (cost=3181.19..19664503.32
rows=643 width=124)
                                ->  Gather  (cost=3180.91..19662574.92
rows=643 width=99)
                                      Workers Planned: 3
                                      ->  Parallel Bitmap Heap Scan on
log_table l  (cost=2180.91..19661510.62 rows=207 width=99)
                                            Recheck Cond: (rec_insert_time

= '2019-10-01 08:20:38.115471-05'::timestamp with time zone)

                                            Filter: ((field1 IS NOT NULL)
AND (category = 'music'::name))
                                            ->  Bitmap Index Scan on
rec_insert_time_brin_1000  (cost=0.00..2180.75 rows=142602171 width=0)
                                                  Index Cond:
(rec_insert_time >= '2019-10-01 08:20:38.115471-05'::timestamp with time
zone)


Let me know if this rings any bells!  I will respond to other comments with
other replies.


My guess - it's (at least partially) due to cpu_operator_cost,
associated with the now() call. When replaced with a literal, this cost
disappears and so the total query cost decreases.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BRIN index which is much faster never chosen by planner

Reply via email to