Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)

Hackety Man Tue, 17 Apr 2018 08:40:38 -0700

Hi Pavel,

Thanks for sharing that information.  I was not aware that the parallel
query functionality was not yet fully implemented.


Thanks,
Ryan

On Tue, Apr 17, 2018 at 1:17 AM, Pavel Stehule <[email protected]>
wrote:

> Hi
>
> 2018-04-16 22:42 GMT+02:00 Hackety Man <[email protected]>:
>
>> *A description of what you are trying to achieve and what results you
>> expect.:*
>>
>> My end goal was to test the execution time difference between using an
>> IF(SELECT COUNT(*)...) and an IF EXISTS() when no indexes were used and
>> when a string match was not found.  My expectation was that my 2 functions
>> would behave fairly similarly, but they most certainly did not.  Here are
>> the table, functions, test queries, and test query results I received, as
>> well as comments as I present the pieces and talk about the results from my
>> perspective.
>>
>> This is the table and data that I used for my tests.  A table with 1
>> million sequenced records.  No indexing on any columns.  I ran ANALYZE on
>> this table and a VACUUM on the entire database, just to be sure.
>>
>> CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 999999) AS
>> int_distinct, 'Test'::text || generate_series(0, 999999)::text AS
>> text_distinct;
>>
>> These are the 2 functions that I ran my final tests with.  My goal was to
>> determine which function would perform the fastest and my expectation was
>> that they would still be somewhat close in execution time comparison.
>>
>> --Test Function #1
>> CREATE OR REPLACE FUNCTION zz_spx_ifcount_noidx(p_findme text)
>>  RETURNS text
>>  LANGUAGE 'plpgsql'
>>  STABLE
>> AS $$
>>
>> BEGIN
>>  IF (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct)
>> = LOWER(p_findme)) > 0 THEN
>>   RETURN 'Found';
>>  ELSE
>>   RETURN 'Not Found';
>>  END IF;
>> END;
>> $$;
>>
>> --Test Function #2
>> CREATE OR REPLACE FUNCTION zz_spx_ifexists_noidx(p_findme text)
>>  RETURNS text
>>  LANGUAGE 'plpgsql'
>>  STABLE
>> AS $$
>>
>> BEGIN
>>  IF EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct)
>> = LOWER(p_findme)) THEN
>>   RETURN 'Found';
>>  ELSE
>>   RETURN 'Not Found';
>>  END IF;
>> END;
>> $$;
>>
>> The first thing I did was to run some baseline tests using the basic
>> queries inside of the IF() checks found in each of the functions to see how
>> the query planner handled them.  I ran the following two queries.
>>
>> EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE
>> LOWER(text_distinct) = LOWER('Test5000001');
>> EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
>> LOWER(text_distinct) = LOWER('Test5000001');
>>
>> The execution time results and query plans for these two were very
>> similar, as expected.  In the results I can see that 2 workers were
>> employed for each query plan.
>>
>> --Results for the SELECT COUNT(*) query.
>> QUERY PLAN
>>
>> ------------------------------------------------------------
>> ------------------------------------------------------------
>> ----------------
>> Finalize Aggregate  (cost=12661.42..12661.43 rows=1 width=8) (actual
>> time=172.105..172.105 rows=1 loops=1)
>>   Buffers: shared read=1912
>>
>>
>>   ->  Gather  (cost=12661.21..12661.42 rows=2 width=8) (actual
>> time=172.020..172.099 rows=3 loops=1)
>>   Workers Planned: 2
>>
>>
>>   Workers Launched: 2
>>
>>
>>   Buffers: shared read=1912
>>
>>
>>   ->  Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8) (actual
>> time=155.123..155.123 rows=1 loops=3)
>>      Buffers: shared read=5406
>>
>>
>>      ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083
>> width=0) (actual time=155.103..155.103 rows=0 loops=3)
>>      Filter: (lower(text_distinct) = 'test5000001'::text)
>>
>>      Rows Removed by Filter: 333333
>>
>>      Buffers: shared read=5406
>>
>> Planning time: 0.718 ms
>>
>>
>> Execution time: 187.601 ms
>>
>> --Results for the SELECT 1 query.
>> QUERY PLAN
>>
>> ------------------------------------------------------------
>> ----------------------------------------------------------------
>> Gather  (cost=1000.00..13156.00 rows=5000 width=4) (actual
>> time=175.682..175.682 rows=0 loops=1)
>>   Workers Planned: 2
>>
>>
>>   Workers Launched: 2
>>
>>
>>   Buffers: shared read=2021
>>
>>
>>   ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083
>> width=4) (actual time=159.769..159.769 rows=0 loops=3)
>>   Filter: (lower(text_distinct) = 'test5000001'::text)
>>
>>   Rows Removed by Filter: 333333
>>
>>   Buffers: shared read=5406
>>
>> Planning time: 0.874 ms
>>
>> Execution time: 192.045 ms
>>
>> After running these baseline tests and viewing the fairly similar
>> results, right or wrong, I expected my queries that tested the functions to
>> behave similarly.  I started with the following query...
>>
>> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM zz_spx_ifcount_noidx('Test5000
>> 001');
>>
>> and I got the following "auto_explain" results...
>>
>> 2018-04-16 14:57:22.624 EDT [17812] LOG:  duration: 155.239 ms  plan:
>>  Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE
>> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0
>>  Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8) (actual
>> time=155.230..155.230 rows=1 loops=1)
>>    Buffers: shared read=1682
>>    ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083
>> width=0) (actual time=155.222..155.222 rows=0 loops=1)
>>    Filter: (lower(text_distinct) = 'test5000001'::text)
>>    Rows Removed by Filter: 311170
>>    Buffers: shared read=1682
>> 2018-04-16 14:57:22.624 EDT [9096] LOG:  duration: 154.603 ms  plan:
>>  Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE
>> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0
>>  Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8) (actual
>> time=154.576..154.576 rows=1 loops=1)
>>    Buffers: shared read=1682
>>    ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00 rows=2083
>> width=0) (actual time=154.570..154.570 rows=0 loops=1)
>>    Filter: (lower(text_distinct) = 'test5000001'::text)
>>    Rows Removed by Filter: 311061
>>    Buffers: shared read=1682
>> 2018-04-16 14:57:22.642 EDT [15132] LOG:  duration: 197.260 ms  plan:
>>  Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE
>> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0
>>  Result  (cost=12661.43..12661.45 rows=1 width=1) (actual
>> time=179.561..179.561 rows=1 loops=1)
>>    Buffers: shared read=2042
>>    InitPlan 1 (returns $1)
>>   ->  Finalize Aggregate  (cost=12661.42..12661.43 rows=1 width=8)
>> (actual time=179.559..179.559 rows=1 loops=1)
>>      Buffers: shared read=2042
>>      ->  Gather  (cost=12661.21..12661.42 rows=2 width=8) (actual
>> time=179.529..179.556 rows=3 loops=1)
>>      Workers Planned: 2
>>      Workers Launched: 2
>>      Buffers: shared read=2042
>>      ->  Partial Aggregate  (cost=11661.21..11661.22 rows=1 width=8)
>> (actual time=162.831..162.831 rows=1 loops=3)
>>         Buffers: shared read=5406
>>         ->  Parallel Seq Scan on zz_noidx1  (cost=0.00..11656.00
>> rows=2083 width=0) (actual time=162.824..162.824 rows=0 loops=3)
>>         Filter: (lower(text_distinct) = 'test5000001'::text)
>>         Rows Removed by Filter: 333333
>>         Buffers: shared read=5406
>> 2018-04-16 14:57:22.642 EDT [15132] CONTEXT:  SQL statement "SELECT
>> (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) =
>> LOWER(p_findme)) > 0"
>>  PL/pgSQL function zz_spx_ifcount_noidx(text) line 4 at IF
>> 2018-04-16 14:57:22.642 EDT [15132] LOG:  duration: 199.371 ms  plan:
>>  Query Text: explain (analyze, buffers) select * from
>> zz_spx_ifcount_noidx('Test5000001')
>>  Function Scan on zz_spx_ifcount_noidx  (cost=0.25..0.26 rows=1 width=32)
>> (actual time=199.370..199.370 rows=1 loops=1)
>>    Buffers: shared hit=218 read=5446
>>
>> Here I could see that the 2 workers were getting employed again, which is
>> great.  Just what I expected.  And the execution time was in the same
>> ballpark as my first baseline test using just the query found inside of the
>> IF() check.  199 milliseonds.  Okay.
>>
>> I moved on to test the other function with the following query...
>>
>> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM zz_spx_ifcount_noidx('Test5000
>> 001');
>>
>> and I got the following "auto_explain" results...
>>
>> 2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 426.279 ms  plan:
>>  Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
>>  Result  (cost=4.08..4.09 rows=1 width=1) (actual time=426.274..426.274
>> rows=1 loops=1)
>>    Buffers: shared read=5406
>>    InitPlan 1 (returns $0)
>>   ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00 rows=5000 width=0)
>> (actual time=426.273..426.273 rows=0 loops=1)
>>      Filter: (lower(text_distinct) = 'test5000001'::text)
>>      Rows Removed by Filter: 1000000
>>      Buffers: shared read=5406
>> 2018-04-16 14:58:34.134 EDT [12616] CONTEXT:  SQL statement "SELECT
>> EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) =
>> LOWER(p_findme))"
>>  PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
>> 2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 428.077 ms  plan:
>>  Query Text: explain (analyze, buffers) select * from
>> zz_spx_ifexists_noidx('Test5000001')
>>  Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26 rows=1
>> width=32) (actual time=428.076..428.076 rows=1 loops=1)
>>    Buffers: shared hit=30 read=5438
>>
>> Definitely not the execution time, or query plan, results I was
>> expecting.  As we can see, no workers were employed here and my guess was
>> that this was the reason or the large execution time difference between
>> these 2 tests?  199 milliseconds versus 428 milliseconds, which is a big
>> difference.  Why are workers not being employed here like they were when I
>> tested the query found inside of the IF() check in a standalone manner?
>> But then I ran another test and the results made even less sense to me.
>>
>> When I ran the above query the first 5 times after starting my Postgres
>> service, I got the same results each time (around 428 milliseconds), but
>> when running the query 6 or more times, the execution time jumps up to
>> almost double that.  Here are the "auto_explain" results running this query
>> a 6th time...
>>
>> --"auto_explain" results after running the same query 6 or more times.
>> 2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 761.847 ms  plan:
>>  Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
>>  Result  (cost=4.58..4.59 rows=1 width=1) (actual time=761.843..761.843
>> rows=1 loops=1)
>>    Buffers: shared hit=160 read=5246
>>    InitPlan 1 (returns $0)
>>   ->  Seq Scan on zz_noidx1  (cost=0.00..22906.00 rows=5000 width=0)
>> (actual time=761.841..761.841 rows=0 loops=1)
>>      Filter: (lower(text_distinct) = lower($1))
>>      Rows Removed by Filter: 1000000
>>      Buffers: shared hit=160 read=5246
>> 2018-04-16 15:01:51.635 EDT [12616] CONTEXT:  SQL statement "SELECT
>> EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) =
>> LOWER(p_findme))"
>>  PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
>> 2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 762.156 ms  plan:
>>  Query Text: explain (analyze, buffers) select * from
>> zz_spx_ifexists_noidx('Test5000001')
>>  Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26 rows=1
>> width=32) (actual time=762.154..762.155 rows=1 loops=1)
>>    Buffers: shared hit=160 read=5246
>>
>> As you can see, the execution time jumps up to about 762 milliseonds.  I
>> can see in the sequence scan node that the LOWER() function shows up on the
>> right side of the equal operator, whereas in the first 5 runs of this test
>> query the plan did not show this.  Why is this?
>>
>> I tried increasing the "work_mem" setting to 1GB to see if this made any
>> difference, but the results were the same.
>>
>> So those were the tests that I performed and the results I received,
>> which left me with many questions.  If anyone is able to help me understand
>> this behavior, I'd greatly appreciate it.  This is my first post to the
>> email list, so I hope I did a good enough job providing all the information
>> needed.
>>
>> Thanks!
>> Ryan
>>
>> *PostgreSQL version number you are running:*
>>
>> PostgreSQL 10.2, compiled by Visual C++ build 1800, 64-bit
>>
>> *How you installed PostgreSQL:*
>>
>> Using the Enterprise DB installer.
>>
>> I have also installed Enterprise DB's Postgres Enterprise Manager (PEM)
>> 7.2.0 software and Enterprise DB's SQL Profiler PG10-7.2.0 software.  The
>> PEM Agent service that gets installed is currently turned off.
>>
>> *Changes made to the settings in the postgresql.conf file:  see Server
>> Configuration for a quick way to list them all.*
>>
>> name                               |current_setting
>> |source
>> -----------------------------------|------------------------
>> ---------------|---------------------
>> application_name                   |DBeaver 5.0.3 -
>> Main                   |session
>> auto_explain.log_analyze           |on
>> |configuration file
>> auto_explain.log_buffers           |on
>> |configuration file
>> auto_explain.log_min_duration      |0
>> |configuration file
>> auto_explain.log_nested_statements |on
>> |configuration file
>> auto_explain.log_triggers          |on
>> |configuration file
>> client_encoding                    |UTF8
>> |client
>> DateStyle                          |ISO, MDY
>> |client
>> default_text_search_config         |pg_catalog.english
>> |configuration file
>> dynamic_shared_memory_type         |windows
>> |configuration file
>> extra_float_digits                 |3
>> |session
>> lc_messages                        |English_United
>> States.1252             |configuration file
>> lc_monetary                        |English_United
>> States.1252             |configuration file
>> lc_numeric                         |English_United
>> States.1252             |configuration file
>> lc_time                            |English_United
>> States.1252             |configuration file
>> listen_addresses                   |*
>> |configuration file
>> log_destination                    |stderr
>> |configuration file
>> log_timezone                       |US/Eastern
>> |configuration file
>> logging_collector                  |on
>> |configuration file
>> max_connections                    |100
>> |configuration file
>> max_stack_depth                    |2MB
>> |environment variable
>> port                               |5432
>> |configuration file
>> shared_buffers                     |128MB
>> |configuration file
>> shared_preload_libraries           |$libdir/sql-profiler.dll,
>> auto_explain |configuration file
>> ssl                                |on
>> |configuration file
>> ssl_ca_file                        |root.crt
>> |configuration file
>> ssl_cert_file                      |server.crt
>> |configuration file
>> ssl_crl_file                       |root.crl
>> |configuration file
>> ssl_key_file                       |server.key
>> |configuration file
>> TimeZone                           |America/New_York
>> |client
>>
>> *Operating system and version:*
>>
>> Windows 10 Pro 64-bit, Version 1709 (Build 16299.309)
>>
>> *Hardware:*
>>
>> Processor - Intel Core i7-7820HQ @ 2.90GHz
>> RAM - 16GB
>> RAID? - No
>> Hard Drive - Samsung 512 GB SSD M.2 PCIe NVMe Opal2
>>
>> *What program you're using to connect to PostgreSQL:*
>>
>> DBeaver Community Edition v5.0.3
>>
>> *Is there anything relevant or unusual in the PostgreSQL server logs?:*
>>
>> Not that I noticed.
>>
>> *For questions about any kind of error:*
>>
>> N/A
>>
>> *What you were doing when the error happened / how to cause the error:*
>>
>> N/A
>>
>> *The EXACT TEXT of the error message you're getting, if there is one:
>> (Copy and paste the message to the email, do not send a screenshot)*
>>
>> N/A
>>
>>
> A support of parallel query execution is not complete -  it doesn't work
> in PostgreSQL 11 too. So although EXISTS variant can be faster (but can be
> - the worst case of EXISTS is same like COUNT), then due disabled parallel
> execution the COUNT(*) is faster now. It is unfortunate, because I believe
> so this issue will be fixed in few years.
>
> Regards
>
> Pavel
>

Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)

Reply via email to