Re: postgres chooses objectively wrong index

Merlin Moncure Wed, 18 Mar 2026 15:42:58 -0700

On Wed, Mar 18, 2026 at 2:12 PM Alexey Ermakov <[email protected]>
wrote:


> On 2026-03-19 00:38, Merlin Moncure wrote:
>
> Interesting... In that case functional index should help (as it also makes
>> statistic for the planner):
>>
>> create index concurrently on task_task_execution_state_idx async.task using 
>> btree ((async.task_execution_state(task)));
>>
>> analyze async.task;
>>
>>
> This can't help performance, as the index...
> CREATE INDEX ON async.task(concurrency_pool, priority, entered)
> WHERE async.task_execution_state(task) = 'READY';
>
> ...is very precisely configured to provide exactly what's needed; I need
> tasks for that exact pool in that exact order if and only if ready.  The
> partial predicate is designed to keep the index nice and small since only a
> small portion of tasks would be eligible at any specific time.
>
> The index I suggested was not intended to be used by such queries, it's
> only a way to provide statistics for the planner as `create statistics` on
> expression is not working here.
>
> It might not be enough and require additional columns (in that case it
> will be replacement for your index) or perhaps elevated statistics target.
> It could even make things worse but I'm sure you have ways to test that
> before putting it on an important database.
>
> It should help if planner underestimate number of rows but even then total
> estimation won't be perfect when we have 2 conditions on state that
> obviously statistically dependent. Combining both conditions on application
> side would work much better if that is possible...
>
> What would also might help there - output of `explain (analyze, buffers)`
> of a query that really had a bad plan and executes in seconds with sizes of
> indexes. And same output but with `set enable_sort = off` to see plan that
> supposed to be better. And just in case number of live/dead tuples in that
> table from pg_stat_user_tables.
>

Roger.  Turns out, I don't have access to pg_statistic (working on that).
Here are the plans with buffers:

orchestrator_service_user@orchestrator=> explain (analyze, buffers) select
> * from async.task where async.task_execution_state(task.*) =
> 'READY'::async.task_execution_state_t  and concurrency_pool =
> '065.laqjjj_live' order by priority, entered limit 10;
>
>                                                      QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=25.88..25.90 rows=10 width=563) (actual time=35.024..35.026
> rows=0 loops=1)
>    Buffers: shared hit=12542
>    ->  Sort  (cost=25.88..26.33 rows=179 width=563) (actual
> time=35.023..35.024 rows=0 loops=1)
>          Sort Key: priority, entered
>          Sort Method: quicksort  Memory: 25kB
>          Buffers: shared hit=12542
>          ->  Bitmap Heap Scan on task  (cost=9.23..22.01 rows=179
> width=563) (actual time=34.989..34.990 rows=0 loops=1)
>                Recheck Cond: ((async.task_execution_state(task.*) = ANY
> ('{READY,RUNNING,YIELDED}'::async.task_execution_state_t[])) AND
> (concurrency_pool = '065.laqjjj_live'::text) AND
> (async.task_execution_state(task.*
>                Buffers: shared hit=12536
>                ->  BitmapAnd  (cost=9.23..9.23 rows=3 width=0) (actual
> time=34.979..34.980 rows=0 loops=1)
>                      Buffers: shared hit=12536
>                      ->  Bitmap Index Scan on task_task_id_idx
>  (cost=0.00..4.38 rows=575191 width=0) (actual time=34.882..34.883 rows=97
> loops=1)
>                            Buffers: shared hit=12502
>                      ->  Bitmap Index Scan on
> task_concurrency_pool_priority_entered_idx  (cost=0.00..4.51 rows=179
> width=0) (actual time=0.092..0.093 rows=0 loops=1)
>                            Index Cond: (concurrency_pool =
> '065.laqjjj_live'::text)
>                            Buffers: shared hit=34
>  Planning:
>    Buffers: shared hit=350
>  Planning Time: 1.571 ms
>  Execution Time: 35.091 ms
> (20 rows)



orchestrator_service_user@orchestrator=> set enable_sort to false;
> SET
>
> orchestrator_service_user@orchestrator=> explain (analyze, buffers)
> select * from async.task where async.task_execution_state(task.*) =
> 'READY'::async.task_execution_state_t  and concurrency_pool =
> '065.laqjjj_live' order by priority, entered limit 10;
>
>  QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.50..39.87 rows=10 width=563) (actual time=0.091..0.092
> rows=0 loops=1)
>    Buffers: shared hit=34
>    ->  Index Scan using task_concurrency_pool_priority_entered_idx on task
>  (cost=0.50..705.21 rows=179 width=563) (actual time=0.090..0.091 rows=0
> loops=1)
>          Index Cond: (concurrency_pool = '065.laqjjj_live'::text)
>          Buffers: shared hit=34
>  Planning Time: 0.251 ms
>  Execution Time: 0.110 ms
> (7 rows)


What I'm driving at here is that while statistics contribute to the issue,
it's somewhat baffling that postgres estimates the 'good' plan slower than
the 'bad' plan.  Note that it incorrectly estimated 500k rows in the heap
scan  to avoid 179 heap fetches, since scanning the index itself is a wash,
with a non-trivial recheck.  So I see this as a planning problem because,
even with the estimated stats the plan makes no sense.

What seems to have suppressed the bad plan is:
REINDEX INDEX async.task_task_id_idx;

yielding this plan:
orchestrator_service_user@orchestrator=> explain (analyze, buffers) select
* from async.task where async.task_execution_state(task.*) =
'READY'::async.task_execution_state_t  and concurrency_pool =
'065.laqjjj_live' order by priority, entered limit 10;

 QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.50..39.88 rows=10 width=563) (actual time=0.111..0.112
rows=0 loops=1)
   Buffers: shared hit=34
   ->  Index Scan using task_concurrency_pool_priority_entered_idx on task
 (cost=0.50..685.64 rows=174 width=563) (actual time=0.110..0.110 rows=0
loops=1)
         Index Cond: (concurrency_pool = '065.laqjjj_live'::text)
         Buffers: shared hit=34
 Planning:
   Buffers: shared hit=311 read=41
   I/O Timings: shared read=23.052
 Planning Time: 24.972 ms
 Execution Time: 0.152 ms

...with no planner tweaks.

forcing the bad plan,

orchestrator_service_user@orchestrator=> set enable_indexscan to false;
SET
Time: 98.377 ms
orchestrator_service_user@orchestrator=> explain (analyze, buffers) select
* from async.task where async.task_execution_state(task.*) =
'READY'::async.task_execution_state_t  and concurrency_pool =
'065.laqjjj_live' order by priority, entered limit 10;

 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=745.70..745.73 rows=10 width=563) (actual time=0.103..0.103
rows=0 loops=1)
   Buffers: shared hit=40
   ->  Sort  (cost=745.70..746.14 rows=174 width=563) (actual
time=0.102..0.102 rows=0 loops=1)
         Sort Key: priority, entered
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=40
         ->  Bitmap Heap Scan on task  (cost=4.55..741.94 rows=174
width=563) (actual time=0.068..0.069 rows=0 loops=1)
               Recheck Cond: ((concurrency_pool = '065.laqjjj_live'::text)
AND (async.task_execution_state(task.*) =
'READY'::async.task_execution_state_t))
               Buffers: shared hit=34
               ->  Bitmap Index Scan on
task_concurrency_pool_priority_entered_idx  (cost=0.00..4.51 rows=174
width=0) (actual time=0.064..0.064 rows=0 loops=1)
                     Index Cond: (concurrency_pool =
'065.laqjjj_live'::text)
                     Buffers: shared hit=34
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.258 ms
 Execution Time: 0.136 ms

Point being, bloat is definitely a factor here. However, the database seems
to be doing the opposite of what's expected, in the presence of bloat, it's
veering toward a more bloat sensitive plan.

merlin

Re: postgres chooses objectively wrong index

Reply via email to