On Tue, Apr 17, 2018 at 6:49 AM, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:
> > > On 04/16/2018 10:42 PM, Hackety Man wrote: > >> ... >> The first thing I did was to run some baseline tests using the basic >> queries inside of the IF() checks found in each of the functions to >> see how the query planner handled them. I ran the following two >> queries. >> >> EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE >> LOWER(text_distinct) = LOWER('Test5000001'); >> EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE >> LOWER(text_distinct) = LOWER('Test5000001'); >> >> > Those are not the interesting plans, though. The EXISTS only cares about > the first row, so you should be looking at > > EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE > LOWER(text_distinct) = LOWER('Test5000001') LIMIT 1; Okay. I tested this query and it did return an execution time on par with my tests of the "zz_spx_ifexists_noidx" function. > > > I moved on to test the other function with the following query... >> >> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM >> zz_spx_ifcount_noidx('Test5000001'); >> >> and I got the following "auto_explain" results... >> >> 2018-04-16 14:58:34.134 EDT [12616] LOG: duration: 426.279 ms >> plan: >> Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE >> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) >> Result (cost=4.08..4.09 rows=1 width=1) (actual >> time=426.274..426.274 rows=1 loops=1) >> Buffers: shared read=5406 >> InitPlan 1 (returns $0) >> -> Seq Scan on zz_noidx1 (cost=0.00..20406.00 rows=5000 >> width=0) (actual time=426.273..426.273 rows=0 loops=1) >> Filter: (lower(text_distinct) = 'test5000001'::text) >> Rows Removed by Filter: 1000000 >> Buffers: shared read=5406 >> 2018-04-16 14:58:34.134 EDT [12616] CONTEXT: SQL statement >> "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE >> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))" >> PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF >> 2018-04-16 14:58:34.134 EDT [12616] LOG: duration: 428.077 ms >> plan: >> Query Text: explain (analyze, buffers) select * from >> zz_spx_ifexists_noidx('Test5000001') >> Function Scan on zz_spx_ifexists_noidx (cost=0.25..0.26 >> rows=1 width=32) (actual time=428.076..428.076 rows=1 loops=1) >> Buffers: shared hit=30 read=5438 >> >> Definitely not the execution time, or query plan, results I was >> expecting. As we can see, no workers were employed here and my >> guess was that this was the reason or the large execution time >> difference between these 2 tests? 199 milliseconds versus 428 >> milliseconds, which is a big difference. Why are workers not being >> employed here like they were when I tested the query found inside of >> the IF() check in a standalone manner? But then I ran another test >> and the results made even less sense to me. >> >> > The plan difference is due to not realizing the EXISTS essentially implies > LIMIT 1. Secondly, it expects about 5000 rows matching the condition, > uniformly spread through the table. But it apparently takes much longer to > find the first one, hence the increased duration. > Ah. I did not know that. So EXISTS inherently applies a LIMIT 1, even though it doesn't show in the query plan, correct? Is it not possible for parallel scans to be implemented while applying an implicit, or explicit, LIMIT 1? > > How did you generate the data? I used generate_series() to create 1 million records in sequence at the same time that I created the table using the following script... CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 999999) AS int_distinct, 'Test'::text || generate_series(0, 999999)::text AS text_distinct; > > When I ran the above query the first 5 times after starting my >> Postgres service, I got the same results each time (around 428 >> milliseconds), but when running the query 6 or more times, the >> execution time jumps up to almost double that. Here are the >> "auto_explain" results running this query a 6th time... >> >> > This is likely due to generating a generic plan after the fifth execution. > There seems to be only small difference in costs, though. > > > --"auto_explain" results after running the same query 6 or more >> times. >> 2018-04-16 15:01:51.635 EDT [12616] LOG: duration: 761.847 ms >> plan: >> Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE >> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) >> Result (cost=4.58..4.59 rows=1 width=1) (actual >> time=761.843..761.843 rows=1 loops=1) >> Buffers: shared hit=160 read=5246 >> InitPlan 1 (returns $0) >> -> Seq Scan on zz_noidx1 (cost=0.00..22906.00 rows=5000 >> width=0) (actual time=761.841..761.841 rows=0 loops=1) >> Filter: (lower(text_distinct) = lower($1)) >> Rows Removed by Filter: 1000000 >> Buffers: shared hit=160 read=5246 >> 2018-04-16 15:01:51.635 EDT [12616] CONTEXT: SQL statement >> "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE >> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))" >> PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF >> 2018-04-16 15:01:51.635 EDT [12616] LOG: duration: 762.156 ms >> plan: >> Query Text: explain (analyze, buffers) select * from >> zz_spx_ifexists_noidx('Test5000001') >> Function Scan on zz_spx_ifexists_noidx (cost=0.25..0.26 >> rows=1 width=32) (actual time=762.154..762.155 rows=1 loops=1) >> Buffers: shared hit=160 read=5246 >> >> As you can see, the execution time jumps up to about 762 >> milliseonds. I can see in the sequence scan node that the LOWER() >> function shows up on the right side of the equal operator, whereas >> in the first 5 runs of this test query the plan did not show this. >> Why is this? >> >> > It doesn't really matter on which side it shows, it's more about a generic > plan built without knowledge of the parameter value. > Right. I was more wondering why it switched over to a generic plan, as you've stated, like clockwork starting with the 6th execution run. > > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >