2017-06-29 7:17 GMT+02:00 Yevhenii Kurtov <yevhenii.kur...@gmail.com>:
> Hello folks, > > Thank you very much for analysis and suggested - there is a lot to learn > here. I just tried UNION queries and got following error: > > ERROR: FOR UPDATE is not allowed with UNION/INTERSECT/EXCEPT > it is sad :( maybe bitmap index scan can work postgres=# create table test(id int, started date, failed date, status int); CREATE TABLE postgres=# create index on test(id) where status = 0; CREATE INDEX postgres=# create index on test(started) where status = 1; CREATE INDEX postgres=# create index on test(failed ) where status = 2; CREATE INDEX postgres=# explain select id from test where (status = 0 and id in (1,2,3,4,5)) or (status = 1 and started < current_date) or (status = 2 and failed > current_date); ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ QUERY PLAN ╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ │ Bitmap Heap Scan on test (cost=12.93..22.50 rows=6 width=4) │ Recheck Cond: (((id = ANY ('{1,2,3,4,5}'::integer[])) AND (status = 0)) OR ((started < CURRENT_DATE) AND (status = 1)) OR ((faile │ Filter: (((status = 0) AND (id = ANY ('{1,2,3,4,5}'::integer[]))) OR ((status = 1) AND (started < CURRENT_DATE)) OR ((status = 2) │ -> BitmapOr (cost=12.93..12.93 rows=6 width=0) │ -> Bitmap Index Scan on test_id_idx (cost=0.00..4.66 rows=1 width=0) │ Index Cond: (id = ANY ('{1,2,3,4,5}'::integer[])) │ -> Bitmap Index Scan on test_started_idx (cost=0.00..4.13 rows=3 width=0) │ Index Cond: (started < CURRENT_DATE) │ -> Bitmap Index Scan on test_failed_idx (cost=0.00..4.13 rows=3 width=0) │ Index Cond: (failed > CURRENT_DATE) └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── (10 rows) > > I made a table dump for anyone who wants to give it a spin > https://app.box.com/s/464b12glmlk5o4gvzz7krc4c8s2fxlwr > and here is the gist for the original commands https://gist.github. > com/lessless/33215d0c147645db721e74e07498ac53 > > On Wed, Jun 28, 2017 at 8:10 PM, Brad DeJong <brad.dej...@infor.com> > wrote: > >> >> >> On 2017-06-28, Pavel Stehule wrote ... >> > On 2017-06-28, Yevhenii Kurtov wrote ... >> >> On 2017-06-28, Pavel Stehule wrote ... >> >>> On 2017-06-28, Yevhenii Kurtov wrote ... >> >>>> We have a query that is run almost each second and it's very >> important to squeeze every other ms out of it. The query is: >> >>>> ... >> >>>> I added following index: CREATE INDEX ON campaign_jobs(id, status, >> failed_at, started_at, priority DESC, times_failed); >> >>>> ... >> >>> There are few issues >> >>> a) parametrized LIMIT >> >>> b) complex predicate with lot of OR >> >>> c) slow external sort >> >>> >> >>> b) signalize maybe some strange in design .. try to replace "OR" by >> "UNION" query >> >>> c) if you can and you have good enough memory .. try to increase >> work_mem .. maybe 20MB >> >>> >> >>> if you change query to union queries, then you can use conditional >> indexes >> >>> >> >>> create index(id) where status = 0; >> >>> create index(failed_at) where status = 2; >> >>> create index(started_at) where status = 1; >> >> >> >> Can you please give a tip how to rewrite the query with UNION clause? >> > >> > SELECT c0."id" FROM "campaign_jobs" AS c0 >> > WHERE (((c0."status" = $1) AND NOT (c0."id" = ANY($2)))) >> > UNION SELECT c0."id" FROM "campaign_jobs" AS c0 >> > WHERE ((c0."status" = $3) AND (c0."failed_at" > $4)) >> > UNION SELECT c0."id" FROM "campaign_jobs" AS c0 >> > WHERE ((c0."status" = $5) AND (c0."started_at" < $6)) >> > ORDER BY c0."priority" DESC, c0."times_failed" >> > LIMIT $7 >> > FOR UPDATE SKIP LOCKED >> >> >> Normally (at least for developers I've worked with), that kind of query >> structure is used when the "status" values don't overlap and don't change >> from query to query. Judging from Pavel's suggested conditional indexes >> (i.e. "where status = <constant>"), he also thinks that is likely. >> >> Give the optimizer that information so that it can use it. Assuming $1 = >> 0 and $3 = 2 and $5 = 1, substitute literals. Substitute literal for $7 in >> limit. Push order by and limit to each branch of the union all (or does >> Postgres figure that out automatically?) Replace union with union all (not >> sure about Postgres, but allows other dbms to avoid sorting and merging >> result sets to eliminate duplicates). (Use of UNION ALL assumes that "id" >> is unique across rows as implied by only "id" being selected with FOR >> UPDATE. If multiple rows can have the same "id", then use UNION to >> eliminate the duplicates.) >> >> SELECT "id" FROM "campaign_jobs" WHERE "status" = 0 AND NOT "id" = ANY($1) >> UNION ALL >> SELECT "id" FROM "campaign_jobs" WHERE "status" = 2 AND "failed_at" > $2 >> UNION ALL >> SELECT "id" FROM "campaign_jobs" WHERE "status" = 1 AND "started_at" < $3 >> ORDER BY "priority" DESC, "times_failed" >> LIMIT 100 >> FOR UPDATE SKIP LOCKED >> >> >> Another thing that you could try is to push the ORDER BY and LIMIT to the >> branches of the UNION (or does Postgres figure that out automatically?) and >> use slightly different indexes. This may not make sense for all the >> branches but one nice thing about UNION is that each branch can be tweaked >> independently. Also, there are probably unmentioned functional dependencies >> that you can use to reduce the index size and/or improve your match rate. >> Example - if status = 1 means that the campaign_job has started but not >> failed or completed, then you may know that started_at is set, but >> failed_at and ended_at are null. The < comparison in and of itself implies >> that only rows where "started_at" is not null will match the condition. >> >> SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE (((c0."status" = 0) AND >> NOT (c0."id" = ANY($1)))) ORDER BY c0."priority" DESC, c0."times_failed" >> LIMIT 100 >> UNION ALL >> SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0."status" = 2) AND >> (c0."failed_at" > $2)) ORDER BY c0."priority" DESC, c0."times_failed" LIMIT >> 100 >> UNION ALL >> SELECT c0."id" FROM "campaign_jobs" AS c0 WHERE ((c0."status" = 1) AND >> (c0."started_at" < $3)) ORDER BY c0."priority" DESC, c0."times_failed" >> LIMIT 100 >> ORDER BY c0."priority" DESC, c0."times_failed" >> LIMIT 100 >> FOR UPDATE SKIP LOCKED >> >> Including the "priority", "times_failed" and "id" columns in the indexes >> along with "failed_at"/"started_at" allows the optimizer to do index only >> scans. (May still have to do random I/O to the data page to determine tuple >> version visibility but I don't think that can be eliminated.) >> >> create index ... ("priority" desc, "times_failed", "id") >> where "status" = 0; >> create index ... ("priority" desc, "times_failed", "id", "failed_at") >> where "status" = 2 and "failed_at" is not null; >> create index ... ("priority" desc, "times_failed", "id", "started_at") >> where "status" = 1 and "started_at" is not null; -- and ended_at is null >> and ... >> >> >> I'm assuming that the optimizer knows that "where status = 1 and >> started_at < $3" implies "and started_at is not null" and will consider the >> conditional index. If not, then the "and started_at is not null" needs to >> be explicit. >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org >> ) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance >> > >