Re: Unused expression indexes

Tom Lane Tue, 06 Aug 2024 13:25:27 -0700

Maciek Sakrejda <mac...@pganalyze.com> writes:
> In a blog post [1], Bruce Momjian notes that expression indexes can
> help with planning even if they're not used directly. But the examples
> cited in that post are vague (i.e., they improve stats, but it's not
> clear how they could change plans), and Bruce's answer to a comment
> [2] suggests that this is not documented.


> Is there any more info on this mechanism? Specifically, if one has
> unused expression indexes (according to pg_stat_user_indexes), is it
> safe to drop them? Or could they be providing statistics that
> materially affect query planning even though the indexes themselves
> are unused?

Expression indexes definitely can affect planning, because ANALYZE
collects stats on the values of those expressions.  As a trivial
example,

regression=# create table foo (x1 float8);
CREATE TABLE
regression=# insert into foo select 10 * random() from generate_series(1,10000);
INSERT 0 10000
regression=# analyze foo;
ANALYZE
regression=# explain analyze select * from foo where sqrt(x1) < 1;
                                             QUERY PLAN     
-----------------------------------------------------------------------------------------------------
 Seq Scan on foo  (cost=0.00..195.00 rows=3333 width=8) (actual 
time=0.009..0.546 rows=1028 loops=1)
   Filter: (sqrt(x1) < '1'::double precision)
   Rows Removed by Filter: 8972
 Planning Time: 0.065 ms
 Execution Time: 0.572 ms
(5 rows)

The planner has no info about the values of sqrt(x1), so you get a
default estimate (one-third) of the selectivity of the WHERE clause.
But watch this:

regression=# create index on foo (sqrt(x1));
CREATE INDEX
regression=# analyze foo;
ANALYZE
regression=# explain analyze select * from foo where sqrt(x1) < 1;
                                                         QUERY PLAN             
                                            
----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=24.24..84.63 rows=1026 width=8) (actual 
time=0.078..0.229 rows=1028 loops=1)
   Recheck Cond: (sqrt(x1) < '1'::double precision)
   Heap Blocks: exact=45
   ->  Bitmap Index Scan on foo_sqrt_idx  (cost=0.00..23.98 rows=1026 width=0) 
(actual time=0.068..0.068 rows=1028 loops=1)
         Index Cond: (sqrt(x1) < '1'::double precision)
 Planning Time: 0.113 ms
 Execution Time: 0.259 ms
(7 rows)

Now there are stats about the values of sqrt(x1), allowing a far more
accurate selectivity estimate to be made.  In this particular example
there's no change of plan (it would have used the index anyway), but
certainly a different rowcount estimate can make a big difference.

This mechanism is quite ancient, and in principle it's now superseded
by extended statistics.  For example, I can drop this index and
instead do

regression=# drop index foo_sqrt_idx;
DROP INDEX
regression=# create statistics foostats on sqrt(x1) from foo;
CREATE STATISTICS
regression=# analyze foo;
ANALYZE
regression=# explain analyze select * from foo where sqrt(x1) < 1;
                                             QUERY PLAN                         
                     
-----------------------------------------------------------------------------------------------------
 Seq Scan on foo  (cost=0.00..195.00 rows=1026 width=8) (actual 
time=0.006..0.479 rows=1028 loops=1)
   Filter: (sqrt(x1) < '1'::double precision)
   Rows Removed by Filter: 8972
 Planning Time: 0.079 ms
 Execution Time: 0.503 ms
(5 rows)

So the accurate rowcount estimate is still obtained in this example;
and we're not incurring any index maintenance costs, only ANALYZE
costs that are going to be roughly the same either way.

However, I am not certain that extended statistics are plugged into
all the places where the older mechanism applies.  Tomas Vondra might
have a better idea than I of where gaps remain in that.

                        regards, tom lane

Re: Unused expression indexes

Reply via email to