On 2026-03-19 00:38, Merlin Moncure wrote:
On Tue, Mar 17, 2026 at 11:27 PM Alexey Ermakov <[email protected]> wrote:On 2026-03-18 04:52, Merlin Moncure wrote:Hello. I think planner doesn't have information about distribution of *async.task_execution_state(task)* unless it's part of any full index. I would try to give that with extended statistics (postgresql 14+): create statistics (mcv) task_task_execution_state_stat on ((async.task_execution_state(task))) from async.task; analyze async.task; If that won't help - please show distribution from pg_stats_ext view for extended statistic above. This unfortunately fails, probably because the table type includes system columns (despite not using them). orchestrator_service_user@orchestrator=> create statistics task_stats (mcv) on (async.task_execution_state(task)) from async.task; ERROR: statistics creation on system columns is not supported This would require some refactoring to fix.Interesting... In that case functional index should help (as it also makes statistic for the planner): create index concurrently on task_task_execution_state_idx async.task using btree ((async.task_execution_state(task))); analyze async.task; This can't help performance, as the index... CREATE INDEX ON async.task(concurrency_pool, priority, entered) WHERE async.task_execution_state(task) = 'READY';...is very precisely configured to provide exactly what's needed; I need tasks for that exact pool in that exact order if and only if ready. The partial predicate is designed to keep the index nice and small since only a small portion of tasks would be eligible at any specific time.
The index I suggested was not intended to be used by such queries, it's only a way to provide statistics for the planner as `create statistics` on expression is not working here.
It might not be enough and require additional columns (in that case it will be replacement for your index) or perhaps elevated statistics target. It could even make things worse but I'm sure you have ways to test that before putting it on an important database.
It should help if planner underestimate number of rows but even then total estimation won't be perfect when we have 2 conditions on state that obviously statistically dependent. Combining both conditions on application side would work much better if that is possible...
What would also might help there - output of `explain (analyze, buffers)` of a query that really had a bad plan and executes in seconds with sizes of indexes. And same output but with `set enable_sort = off` to see plan that supposed to be better. And just in case number of live/dead tuples in that table from pg_stat_user_tables.
-- Alexey Ermakov
