Hi čt 28. 11. 2019 v 3:56 odesílatel David Fetter <da...@fetter.org> napsal:
> On Wed, Nov 27, 2019 at 08:47:56AM +0100, Pavel Stehule wrote: > > Hi > > > > I have a report from my customer about migration his application from > > Oracle to Postgres. > > > > The most significant issue was missing correct estimation for coalesce > > function. He had to rewrite coalesce(var, X) = X to "var IS NULL or var = > > X". Then the result was very satisfactory. > > > > Example: > > > > create table xxx(a int); > > insert into xxx select null from generate_series(1,10000); > > insert into xxx select 1 from generate_series(1,1000); > > insert into xxx select 0 from generate_series(1,1000); > > analyze xxx; > > > > postgres=# explain analyze select * from xxx where coalesce(a, 0) = 0; > > QUERY PLAN > > > > > ---------------------------------------------------------------------------------------------------- > > Seq Scan on xxx (cost=0.00..194.00 rows=60 width=4) (actual > > time=0.041..4.276 rows=11000 loops=1) > > Filter: (COALESCE(a, 0) = 0) > > Rows Removed by Filter: 1000 > > Planning Time: 0.099 ms > > Execution Time: 5.412 ms > > (5 rows) > > > > postgres=# explain analyze select * from xxx where a is null or a = 0; > > QUERY PLAN > > > > > ------------------------------------------------------------------------------------------------------- > > Seq Scan on xxx (cost=0.00..194.00 rows=10167 width=4) (actual > > time=0.052..5.891 rows=11000 loops=1) > > Filter: ((a IS NULL) OR (a = 0)) > > Rows Removed by Filter: 1000 > > Planning Time: 0.136 ms > > Execution Time: 7.522 ms > > (5 rows) > > > > I think so pattern coalesce(var, X) = X is very common so can be very > > interesting to support it better. > > Better support sounds great! > > How specifically might this be better supported? On this relatively > short table, I see planning times considerably longer, I assume > because they need to take a function call into account, and execution > times longer but not all that much longer. I tried with 3 million > rows, and got the representative samples below: > > shackle@[local]:5413/ctest(13devel)(149711) # EXPLAIN ANALYZE SELECT * > FROM xxx WHERE COALESCE(a, 0)=0; > QUERY PLAN > > > ══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ > Gather (cost=1000.00..30391.00 rows=15000 width=4) (actual > time=1.315..346.406 rows=999772 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Parallel Seq Scan on xxx (cost=0.00..27891.00 rows=6250 width=4) > (actual time=0.029..216.419 rows=333257 loops=3) > Filter: (COALESCE(a, 0) = 0) > Rows Removed by Filter: 666743 > Planning Time: 0.204 ms > Execution Time: 389.307 ms > (8 rows) > > Time: 391.394 ms > > shackle@[local]:5413/ctest(13devel)(149711) # EXPLAIN ANALYZE SELECT * > FROM xxx WHERE a IS NULL OR a = 0; > QUERY PLAN > > > ═════════════════════════════════════════════════════════════════════════════════════════════════════════════ > Seq Scan on xxx (cost=0.00..49766.00 rows=995700 width=4) (actual > time=0.043..524.401 rows=999772 loops=1) > Filter: ((a IS NULL) OR (a = 0)) > Rows Removed by Filter: 2000228 > Planning Time: 0.106 ms > Execution Time: 560.593 ms > (5 rows) > > Time: 561.186 ms > I didn't thing about rewriting. The correct solution should be via own selectivity function. Now for coalesce is used 5% estimation (like for other functions). Probably it should not be hard code because coalesce is a node already. But it is part of code that I never modified. Pavel > > Best, > David. > -- > David Fetter <david(at)fetter(dot)org> http://fetter.org/ > Phone: +1 415 235 3778 > > Remember to vote! > Consider donating to Postgres: http://www.postgresql.org/about/donate >