On Tue, Mar 17, 2026 at 11:27 PM Alexey Ermakov <[email protected]> wrote:
> On 2026-03-18 04:52, Merlin Moncure wrote: > > Hello. I think planner doesn't have information about distribution of >> *async.task_execution_state(task)* unless it's part of any full index. I >> would try to give that with extended statistics (postgresql 14+): >> >> create statistics (mcv) task_task_execution_state_stat on >> ((async.task_execution_state(task))) from async.task; >> analyze async.task; >> >> If that won't help - please show distribution from pg_stats_ext view for >> extended statistic above. >> > > This unfortunately fails, probably because the table type includes system > columns (despite not using them). > > orchestrator_service_user@orchestrator=> create statistics task_stats > (mcv) on (async.task_execution_state(task)) from async.task; > ERROR: statistics creation on system columns is not supported > > This would require some refactoring to fix. > > Interesting... In that case functional index should help (as it also makes > statistic for the planner): > > create index concurrently on task_task_execution_state_idx async.task using > btree ((async.task_execution_state(task))); > > analyze async.task; > > This can't help performance, as the index... CREATE INDEX ON async.task(concurrency_pool, priority, entered) WHERE async.task_execution_state(task) = 'READY'; ...is very precisely configured to provide exactly what's needed; I need tasks for that exact pool in that exact order if and only if ready. The partial predicate is designed to keep the index nice and small since only a small portion of tasks would be eligible at any specific time. @Tom Lane <[email protected]> I'm pretty sure you were following me, but my abstraction earlier was a bit off; Simplified, the query is: > SELECT * FROM foo WHERE a=? AND b=K ORDER BY c, d LIMIT N; > CREATE INDEX ON foo(a,b,c) WHERE b=K; Should have been: SELECT * FROM foo WHERE a=? AND d=K ORDER BY b, c LIMIT N; CREATE INDEX ON foo(a,b,c) WHERE d=K; Point being, the index match in on (=, order, order). If a contains any less than 100% of the total records, and N is small relative to table size, this ought to be the ideal index for just about any case, the exact match on partial qual is just gravy. I think the planner is not giving enough bonus for an exact match versus an inexact match on partial index mathcing, (A=A should be better than A IN(A,B,C)), and it's unclear why the planner things bitmap heap + sort is outperforming a raw read off the index base on marginal estimated row counts. Lowering random_page_cost definitely biases the plan I like, but it skews both estimates. @Alexey Ermakov <[email protected]> If you're interested in more context, see: pgasync <https://github.com/merlinm/pgasync> pgflow <https://github.com/merlinm/pgflow> graph example <https://imgur.com/a/LZNpTC1> merlin
