query plan using partial index expects a much larger number of rows than is possible

Olivier Poquet Wed, 28 Oct 2020 15:47:48 -0700

Hello,
I have a large table (about 13 million rows) full of customer order 
information. Since most of that information is for orders that have already 
been fulfilled, I have a partial index to help quickly zero in on rows that 
have not been fulfilled. This works well, but I noticed today when joining with 
another large table using its primary key that even though the planner was 
using my partial index, it decided to do a merge join to the second large table 
instead of the nested loop I would have expected.


Looking at it in more detail, I found that the planner is assuming that I'll 
get millions of rows back even when I do a simple query that does an index scan 
on my partial index:

=> \d orderitems_committed_unfulfilled 
Index "public.orderitems_committed_unfulfilled"
 Column |  Type  | Key? | Definition 
--------+--------+------+------------
 id     | bigint | yes  | id
btree, for table "public.orderitems", predicate (LEAST(committed, quantity) > 
fulfilled)

=> explain (analyze, buffers) select oi.id from orderitems oi where 
LEAST(oi.committed, oi.quantity) > oi.fulfilled;
                                                                            
QUERY PLAN                                                                      
      
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using orderitems_committed_unfulfilled on orderitems oi  
(cost=0.41..31688.23 rows=2861527 width=8) (actual time=0.039..2.092 rows=2274 
loops=1)
   Heap Fetches: 2493
   Buffers: shared hit=1883
 Planning Time: 0.110 ms
 Execution Time: 2.255 ms
(5 rows)

So nice and quick, but the planner thought it would get back 2861527 rows 
instead of the 2274 I actually have.  That explains why it thought it would 
make sense to do a merge join with my other large table instead of the nested 
loop over the 2k rows.  I would have expected the planner to know that there's 
no way it'll get back 2 million rows though, given that:

=> select relname, relpages, reltuples from pg_class where relname = 
'orderitems_committed_unfulfilled';
             relname              | relpages | reltuples 
----------------------------------+----------+-----------
 orderitems_committed_unfulfilled |     3051 |      2112
(1 row)

It knows there's only 2k-ish of them from the index. The 2 mil number is the 
same as what the planner expects if I disable using indexes and it does a seq 
scan, so I'm assuming it's just the guess from the column statistics and the 
planner is not using the size of the partial index.

I'm running:
=> select version();
                                          version                               
           
-------------------------------------------------------------------------------------------
 PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc, a 97d579287 p 
0be2109a97, 64-bit
(1 row)

I'm wondering if the behavior that I'm seeing is expected in 12.4, and if so if 
it changes in a later version or if I should file an enhancement request? Or if 
it's not expected is there's something I'm doing wrong, or should file a bug?

Thanks for your time.

-- 
  Olivier Poquet
  opoq...@plumdev.com

query plan using partial index expects a much larger number of rows than is possible

Reply via email to