út 1. 9. 2020 v 9:22 odesílatel Kyotaro Horiguchi <horikyota....@gmail.com> napsal:
> At Mon, 31 Aug 2020 16:04:43 +0200, Pavel Stehule <pavel.steh...@gmail.com> > wrote in > > po 31. 8. 2020 v 13:29 odesílatel Thomas Kellerer <sham...@gmx.net> > napsal: > > > > > Thorsten Schöning schrieb am 31.08.2020 um 12:37: > > > > So for what query size or number of IDs to compare in IN would you > > > > consider a different approach at all? > > > > > > > > > In my experience "hundreds" of IDs tend to be quite slow if used with > an > > > IN clause. > > > > > > Rewriting the IN to a JOIN against a VALUES clause is very often > faster: > > > > > > So instead of: > > > > > > select * > > > from t > > > where id in (1,2,3, .... ,500); > > > > > > using this: > > > > > > select * > > > from t > > > join ( > > > values (1),(2),(3),...(500) > > > ) as x(id) on x.id = t.id > > > > > > produces more often than not a more efficient execution plan (assuming > no > > > values are duplicated in the IN list) > > > > > > Obviously I don't know if such a re-write is even feasible though. > > > > > > > yes - this query probably will have a slow start, but the execution will > be > > fast. Unfortunately, there are not available statistics. > > FWIW, the attached is the dusted-off version of a part of a stalled > development of mine, which unconditionally(!) creates on-the-fly > statistics on VALUES list. It seems to work for certain cases, > although the planning time increases significantly. > > =$ CREATE TABLE t1 AS SELECT a, a * 2 AS b FROM generate_series(0, 99999) > a; > =$ CREATE INDEX ON t1 (a); > > perl q.pl(*) | psql > > *: q.pl: > > print "explain analyze select b from t1 join (values "; > > foreach $i (0..10000) { > > print ", " if ($i > 0); > > printf("(%d)", $i/10 + 1000); > > } > > print ") as v(v) on (v.v = t1.a);"; > > > patched: > > Merge Join (cost=824.25..1005.19 rows=10001 width=4) (actual > time=13.513..24.285 rows=10001 loops=1) > Merge Cond: (t1.a = "*VALUES*".column1) > -> Index Scan using t1_a_idx on t1 (cost=0.29..3050.29 rows=100000 > width=8) (actual time=0.033..1.629 rows=2002 loops=1) > -> Sort (cost=789.47..814.47 rows=10001 width=4) (actual > time=12.557..14.546 rows=10001 loops=1) > Sort Key: "*VALUES*".column1 > Sort Method: quicksort Memory: 931kB > -> Values Scan on "*VALUES*" (cost=0.00..125.01 rows=10001 > width=4) (actual time=0.002..8.271 rows=10001 loops=1) > Planning Time: 17.290 ms > Execution Time: 26.344 ms > (9 rows) > > master: > Hash Join (cost=250.03..2168.03 rows=10001 width=4) (actual > time=14.482..77.205 rows=10001 loops=1) > Hash Cond: (t1.a = "*VALUES*".column1) > -> Seq Scan on t1 (cost=0.00..1443.00 rows=100000 width=8) (actual > time=0.017..23.540 rows=100000 loops=1) > -> Hash (cost=125.01..125.01 rows=10001 width=4) (actual > time=13.786..13.788 rows=10001 loops=1) > Buckets: 16384 Batches: 1 Memory Usage: 480kB > -> Values Scan on "*VALUES*" (cost=0.00..125.01 rows=10001 > width=4) (actual time=0.002..8.503 rows=10001 loops=1) > Planning Time: 12.365 ms > Execution Time: 78.567 ms > (8 rows) > > regards. > > nice :) Pavel -- > Kyotaro Horiguchi > NTT Open Source Software Center >