Re: postgres chooses objectively wrong index

Merlin Moncure Wed, 18 Mar 2026 11:38:36 -0700

On Tue, Mar 17, 2026 at 11:27 PM Alexey Ermakov <[email protected]>
wrote:

> On 2026-03-18 04:52, Merlin Moncure wrote:
>
> Hello. I think planner doesn't have information about distribution of
>> *async.task_execution_state(task)* unless it's part of any full index. I
>> would try to give that with extended statistics (postgresql 14+):
>>
>> create statistics (mcv) task_task_execution_state_stat on 
>> ((async.task_execution_state(task))) from async.task;
>> analyze async.task;
>>
>> If that won't help - please show distribution from pg_stats_ext view for
>> extended statistic above.
>>
>
> This unfortunately fails, probably because the table type includes system
> columns (despite not using them).
>
> orchestrator_service_user@orchestrator=> create statistics  task_stats
> (mcv) on (async.task_execution_state(task)) from async.task;
> ERROR:  statistics creation on system columns is not supported
>
> This would require some refactoring to fix.
>
> Interesting... In that case functional index should help (as it also makes
> statistic for the planner):
>
> create index concurrently on task_task_execution_state_idx async.task using 
> btree ((async.task_execution_state(task)));
>
> analyze async.task;
>
>
This can't help performance, as the index...
CREATE INDEX ON async.task(concurrency_pool, priority, entered)
WHERE async.task_execution_state(task) = 'READY';

...is very precisely configured to provide exactly what's needed; I need
tasks for that exact pool in that exact order if and only if ready.  The
partial predicate is designed to keep the index nice and small since only a
small portion of tasks would be eligible at any specific time.

@Tom Lane <[email protected]> I'm pretty sure you were following me, but my
abstraction earlier was a bit off;

Simplified, the query is:
> SELECT * FROM foo WHERE a=? AND b=K ORDER BY c, d LIMIT N;
> CREATE INDEX ON foo(a,b,c) WHERE b=K;

Should have been:
SELECT * FROM foo WHERE a=? AND d=K ORDER BY b, c LIMIT N;
CREATE INDEX ON foo(a,b,c) WHERE d=K;

Point being, the index match in on (=, order, order). If a contains any
less than 100% of the total records, and N is small relative to table size,
this ought to be the ideal index for just about any case, the exact match
on partial qual is just gravy.

I think the planner is not giving enough bonus for an exact match versus an
inexact match on partial index mathcing,  (A=A should be better than A
IN(A,B,C)), and it's unclear why the planner things bitmap heap + sort is
outperforming a raw read off the index base on marginal estimated row
counts.  Lowering random_page_cost definitely biases the plan I like,
but it skews both estimates.

@Alexey Ermakov <[email protected]>
If you're interested in more context, see:

pgasync <https://github.com/merlinm/pgasync>
pgflow <https://github.com/merlinm/pgflow>
graph example <https://imgur.com/a/LZNpTC1>

merlin

Re: postgres chooses objectively wrong index

Reply via email to