Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)

Hackety Man Tue, 17 Apr 2018 08:40:19 -0700

On Tue, Apr 17, 2018 at 6:49 AM, Tomas Vondra <[email protected]>
wrote:


>
>
> On 04/16/2018 10:42 PM, Hackety Man wrote:
>
>> ...
>>     The first thing I did was to run some baseline tests using the basic
>>     queries inside of the IF() checks found in each of the functions to
>>     see how the query planner handled them.  I ran the following two
>>     queries.
>>
>>         EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE
>>         LOWER(text_distinct) = LOWER('Test5000001');
>>         EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
>>         LOWER(text_distinct) = LOWER('Test5000001');
>>
>>
> Those are not the interesting plans, though. The EXISTS only cares about
> the first row, so you should be looking at
>
>     EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
>     LOWER(text_distinct) = LOWER('Test5000001') LIMIT 1;



Okay.  I tested this query and it did return an execution time on par with
my tests of the "zz_spx_ifexists_noidx" function.



>
>
>     I moved on to test the other function with the following query...
>>
>>         EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM
>>         zz_spx_ifcount_noidx('Test5000001');
>>
>>     and I got the following "auto_explain" results...
>>
>>         2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 426.279 ms
>>      plan:
>>           Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>>         LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
>>           Result  (cost=4.08..4.09 rows=1 width=1) (actual
>>         time=426.274..426.274 rows=1 loops=1)
>>             Buffers: shared read=5406
>>             InitPlan 1 (returns $0)
>>            ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00 rows=5000
>>         width=0) (actual time=426.273..426.273 rows=0 loops=1)
>>               Filter: (lower(text_distinct) = 'test5000001'::text)
>>               Rows Removed by Filter: 1000000
>>               Buffers: shared read=5406
>>         2018-04-16 14:58:34.134 EDT [12616] CONTEXT:  SQL statement
>>         "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>>         LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
>>           PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
>>         2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 428.077 ms
>>      plan:
>>           Query Text: explain (analyze, buffers) select * from
>>         zz_spx_ifexists_noidx('Test5000001')
>>           Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
>>         rows=1 width=32) (actual time=428.076..428.076 rows=1 loops=1)
>>             Buffers: shared hit=30 read=5438
>>
>>     Definitely not the execution time, or query plan, results I was
>>     expecting.  As we can see, no workers were employed here and my
>>     guess was that this was the reason or the large execution time
>>     difference between these 2 tests?  199 milliseconds versus 428
>>     milliseconds, which is a big difference.  Why are workers not being
>>     employed here like they were when I tested the query found inside of
>>     the IF() check in a standalone manner?  But then I ran another test
>>     and the results made even less sense to me.
>>
>>
> The plan difference is due to not realizing the EXISTS essentially implies
> LIMIT 1. Secondly, it expects about 5000 rows matching the condition,
> uniformly spread through the table. But it apparently takes much longer to
> find the first one, hence the increased duration.
>


Ah.  I did not know that.  So EXISTS inherently applies a LIMIT 1, even
though it doesn't show in the query plan, correct?  Is it not possible for
parallel scans to be implemented while applying an implicit, or explicit,
LIMIT 1?



>
> How did you generate the data?



I used generate_series() to create 1 million records in sequence at the
same time that I created the table using the following script...

CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 999999) AS
int_distinct, 'Test'::text || generate_series(0, 999999)::text AS
text_distinct;


>
>     When I ran the above query the first 5 times after starting my
>>     Postgres service, I got the same results each time (around 428
>>     milliseconds), but when running the query 6 or more times, the
>>     execution time jumps up to almost double that.  Here are the
>>     "auto_explain" results running this query a 6th time...
>>
>>
> This is likely due to generating a generic plan after the fifth execution.
> There seems to be only small difference in costs, though.
>
>
>         --"auto_explain" results after running the same query 6 or more
>>         times.
>>         2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 761.847 ms
>>      plan:
>>           Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>>         LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
>>           Result  (cost=4.58..4.59 rows=1 width=1) (actual
>>         time=761.843..761.843 rows=1 loops=1)
>>             Buffers: shared hit=160 read=5246
>>             InitPlan 1 (returns $0)
>>            ->  Seq Scan on zz_noidx1  (cost=0.00..22906.00 rows=5000
>>         width=0) (actual time=761.841..761.841 rows=0 loops=1)
>>               Filter: (lower(text_distinct) = lower($1))
>>               Rows Removed by Filter: 1000000
>>               Buffers: shared hit=160 read=5246
>>         2018-04-16 15:01:51.635 EDT [12616] CONTEXT:  SQL statement
>>         "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>>         LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
>>           PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
>>         2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 762.156 ms
>>      plan:
>>           Query Text: explain (analyze, buffers) select * from
>>         zz_spx_ifexists_noidx('Test5000001')
>>           Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
>>         rows=1 width=32) (actual time=762.154..762.155 rows=1 loops=1)
>>             Buffers: shared hit=160 read=5246
>>
>>     As you can see, the execution time jumps up to about 762
>>     milliseonds.  I can see in the sequence scan node that the LOWER()
>>     function shows up on the right side of the equal operator, whereas
>>     in the first 5 runs of this test query the plan did not show this.
>>  Why is this?
>>
>>
> It doesn't really matter on which side it shows, it's more about a generic
> plan built without knowledge of the parameter value.
>


Right.  I was more wondering why it switched over to a generic plan, as
you've stated, like clockwork starting with the 6th execution run.



>
>
> regards
>
> --
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)

Reply via email to