Hi Pavel, Thanks for sharing that information. I was not aware that the parallel query functionality was not yet fully implemented.
Thanks, Ryan On Tue, Apr 17, 2018 at 1:17 AM, Pavel Stehule <pavel.steh...@gmail.com> wrote: > Hi > > 2018-04-16 22:42 GMT+02:00 Hackety Man <hackety...@gmail.com>: > >> *A description of what you are trying to achieve and what results you >> expect.:* >> >> My end goal was to test the execution time difference between using an >> IF(SELECT COUNT(*)...) and an IF EXISTS() when no indexes were used and >> when a string match was not found. My expectation was that my 2 functions >> would behave fairly similarly, but they most certainly did not. Here are >> the table, functions, test queries, and test query results I received, as >> well as comments as I present the pieces and talk about the results from my >> perspective. >> >> This is the table and data that I used for my tests. A table with 1 >> million sequenced records. No indexing on any columns. I ran ANALYZE on >> this table and a VACUUM on the entire database, just to be sure. >> >> CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 999999) AS >> int_distinct, 'Test'::text || generate_series(0, 999999)::text AS >> text_distinct; >> >> These are the 2 functions that I ran my final tests with. My goal was to >> determine which function would perform the fastest and my expectation was >> that they would still be somewhat close in execution time comparison. >> >> --Test Function #1 >> CREATE OR REPLACE FUNCTION zz_spx_ifcount_noidx(p_findme text) >> RETURNS text >> LANGUAGE 'plpgsql' >> STABLE >> AS $$ >> >> BEGIN >> IF (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) >> = LOWER(p_findme)) > 0 THEN >> RETURN 'Found'; >> ELSE >> RETURN 'Not Found'; >> END IF; >> END; >> $$; >> >> --Test Function #2 >> CREATE OR REPLACE FUNCTION zz_spx_ifexists_noidx(p_findme text) >> RETURNS text >> LANGUAGE 'plpgsql' >> STABLE >> AS $$ >> >> BEGIN >> IF EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) >> = LOWER(p_findme)) THEN >> RETURN 'Found'; >> ELSE >> RETURN 'Not Found'; >> END IF; >> END; >> $$; >> >> The first thing I did was to run some baseline tests using the basic >> queries inside of the IF() checks found in each of the functions to see how >> the query planner handled them. I ran the following two queries. >> >> EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE >> LOWER(text_distinct) = LOWER('Test5000001'); >> EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE >> LOWER(text_distinct) = LOWER('Test5000001'); >> >> The execution time results and query plans for these two were very >> similar, as expected. In the results I can see that 2 workers were >> employed for each query plan. >> >> --Results for the SELECT COUNT(*) query. >> QUERY PLAN >> >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> ---------------- >> Finalize Aggregate (cost=12661.42..12661.43 rows=1 width=8) (actual >> time=172.105..172.105 rows=1 loops=1) >> Buffers: shared read=1912 >> >> >> -> Gather (cost=12661.21..12661.42 rows=2 width=8) (actual >> time=172.020..172.099 rows=3 loops=1) >> Workers Planned: 2 >> >> >> Workers Launched: 2 >> >> >> Buffers: shared read=1912 >> >> >> -> Partial Aggregate (cost=11661.21..11661.22 rows=1 width=8) (actual >> time=155.123..155.123 rows=1 loops=3) >> Buffers: shared read=5406 >> >> >> -> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 rows=2083 >> width=0) (actual time=155.103..155.103 rows=0 loops=3) >> Filter: (lower(text_distinct) = 'test5000001'::text) >> >> Rows Removed by Filter: 333333 >> >> Buffers: shared read=5406 >> >> Planning time: 0.718 ms >> >> >> Execution time: 187.601 ms >> >> --Results for the SELECT 1 query. >> QUERY PLAN >> >> ------------------------------------------------------------ >> ---------------------------------------------------------------- >> Gather (cost=1000.00..13156.00 rows=5000 width=4) (actual >> time=175.682..175.682 rows=0 loops=1) >> Workers Planned: 2 >> >> >> Workers Launched: 2 >> >> >> Buffers: shared read=2021 >> >> >> -> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 rows=2083 >> width=4) (actual time=159.769..159.769 rows=0 loops=3) >> Filter: (lower(text_distinct) = 'test5000001'::text) >> >> Rows Removed by Filter: 333333 >> >> Buffers: shared read=5406 >> >> Planning time: 0.874 ms >> >> Execution time: 192.045 ms >> >> After running these baseline tests and viewing the fairly similar >> results, right or wrong, I expected my queries that tested the functions to >> behave similarly. I started with the following query... >> >> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM zz_spx_ifcount_noidx('Test5000 >> 001'); >> >> and I got the following "auto_explain" results... >> >> 2018-04-16 14:57:22.624 EDT [17812] LOG: duration: 155.239 ms plan: >> Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE >> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0 >> Partial Aggregate (cost=11661.21..11661.22 rows=1 width=8) (actual >> time=155.230..155.230 rows=1 loops=1) >> Buffers: shared read=1682 >> -> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 rows=2083 >> width=0) (actual time=155.222..155.222 rows=0 loops=1) >> Filter: (lower(text_distinct) = 'test5000001'::text) >> Rows Removed by Filter: 311170 >> Buffers: shared read=1682 >> 2018-04-16 14:57:22.624 EDT [9096] LOG: duration: 154.603 ms plan: >> Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE >> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0 >> Partial Aggregate (cost=11661.21..11661.22 rows=1 width=8) (actual >> time=154.576..154.576 rows=1 loops=1) >> Buffers: shared read=1682 >> -> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 rows=2083 >> width=0) (actual time=154.570..154.570 rows=0 loops=1) >> Filter: (lower(text_distinct) = 'test5000001'::text) >> Rows Removed by Filter: 311061 >> Buffers: shared read=1682 >> 2018-04-16 14:57:22.642 EDT [15132] LOG: duration: 197.260 ms plan: >> Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE >> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0 >> Result (cost=12661.43..12661.45 rows=1 width=1) (actual >> time=179.561..179.561 rows=1 loops=1) >> Buffers: shared read=2042 >> InitPlan 1 (returns $1) >> -> Finalize Aggregate (cost=12661.42..12661.43 rows=1 width=8) >> (actual time=179.559..179.559 rows=1 loops=1) >> Buffers: shared read=2042 >> -> Gather (cost=12661.21..12661.42 rows=2 width=8) (actual >> time=179.529..179.556 rows=3 loops=1) >> Workers Planned: 2 >> Workers Launched: 2 >> Buffers: shared read=2042 >> -> Partial Aggregate (cost=11661.21..11661.22 rows=1 width=8) >> (actual time=162.831..162.831 rows=1 loops=3) >> Buffers: shared read=5406 >> -> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 >> rows=2083 width=0) (actual time=162.824..162.824 rows=0 loops=3) >> Filter: (lower(text_distinct) = 'test5000001'::text) >> Rows Removed by Filter: 333333 >> Buffers: shared read=5406 >> 2018-04-16 14:57:22.642 EDT [15132] CONTEXT: SQL statement "SELECT >> (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = >> LOWER(p_findme)) > 0" >> PL/pgSQL function zz_spx_ifcount_noidx(text) line 4 at IF >> 2018-04-16 14:57:22.642 EDT [15132] LOG: duration: 199.371 ms plan: >> Query Text: explain (analyze, buffers) select * from >> zz_spx_ifcount_noidx('Test5000001') >> Function Scan on zz_spx_ifcount_noidx (cost=0.25..0.26 rows=1 width=32) >> (actual time=199.370..199.370 rows=1 loops=1) >> Buffers: shared hit=218 read=5446 >> >> Here I could see that the 2 workers were getting employed again, which is >> great. Just what I expected. And the execution time was in the same >> ballpark as my first baseline test using just the query found inside of the >> IF() check. 199 milliseonds. Okay. >> >> I moved on to test the other function with the following query... >> >> EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM zz_spx_ifcount_noidx('Test5000 >> 001'); >> >> and I got the following "auto_explain" results... >> >> 2018-04-16 14:58:34.134 EDT [12616] LOG: duration: 426.279 ms plan: >> Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE >> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) >> Result (cost=4.08..4.09 rows=1 width=1) (actual time=426.274..426.274 >> rows=1 loops=1) >> Buffers: shared read=5406 >> InitPlan 1 (returns $0) >> -> Seq Scan on zz_noidx1 (cost=0.00..20406.00 rows=5000 width=0) >> (actual time=426.273..426.273 rows=0 loops=1) >> Filter: (lower(text_distinct) = 'test5000001'::text) >> Rows Removed by Filter: 1000000 >> Buffers: shared read=5406 >> 2018-04-16 14:58:34.134 EDT [12616] CONTEXT: SQL statement "SELECT >> EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = >> LOWER(p_findme))" >> PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF >> 2018-04-16 14:58:34.134 EDT [12616] LOG: duration: 428.077 ms plan: >> Query Text: explain (analyze, buffers) select * from >> zz_spx_ifexists_noidx('Test5000001') >> Function Scan on zz_spx_ifexists_noidx (cost=0.25..0.26 rows=1 >> width=32) (actual time=428.076..428.076 rows=1 loops=1) >> Buffers: shared hit=30 read=5438 >> >> Definitely not the execution time, or query plan, results I was >> expecting. As we can see, no workers were employed here and my guess was >> that this was the reason or the large execution time difference between >> these 2 tests? 199 milliseconds versus 428 milliseconds, which is a big >> difference. Why are workers not being employed here like they were when I >> tested the query found inside of the IF() check in a standalone manner? >> But then I ran another test and the results made even less sense to me. >> >> When I ran the above query the first 5 times after starting my Postgres >> service, I got the same results each time (around 428 milliseconds), but >> when running the query 6 or more times, the execution time jumps up to >> almost double that. Here are the "auto_explain" results running this query >> a 6th time... >> >> --"auto_explain" results after running the same query 6 or more times. >> 2018-04-16 15:01:51.635 EDT [12616] LOG: duration: 761.847 ms plan: >> Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE >> LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) >> Result (cost=4.58..4.59 rows=1 width=1) (actual time=761.843..761.843 >> rows=1 loops=1) >> Buffers: shared hit=160 read=5246 >> InitPlan 1 (returns $0) >> -> Seq Scan on zz_noidx1 (cost=0.00..22906.00 rows=5000 width=0) >> (actual time=761.841..761.841 rows=0 loops=1) >> Filter: (lower(text_distinct) = lower($1)) >> Rows Removed by Filter: 1000000 >> Buffers: shared hit=160 read=5246 >> 2018-04-16 15:01:51.635 EDT [12616] CONTEXT: SQL statement "SELECT >> EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = >> LOWER(p_findme))" >> PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF >> 2018-04-16 15:01:51.635 EDT [12616] LOG: duration: 762.156 ms plan: >> Query Text: explain (analyze, buffers) select * from >> zz_spx_ifexists_noidx('Test5000001') >> Function Scan on zz_spx_ifexists_noidx (cost=0.25..0.26 rows=1 >> width=32) (actual time=762.154..762.155 rows=1 loops=1) >> Buffers: shared hit=160 read=5246 >> >> As you can see, the execution time jumps up to about 762 milliseonds. I >> can see in the sequence scan node that the LOWER() function shows up on the >> right side of the equal operator, whereas in the first 5 runs of this test >> query the plan did not show this. Why is this? >> >> I tried increasing the "work_mem" setting to 1GB to see if this made any >> difference, but the results were the same. >> >> So those were the tests that I performed and the results I received, >> which left me with many questions. If anyone is able to help me understand >> this behavior, I'd greatly appreciate it. This is my first post to the >> email list, so I hope I did a good enough job providing all the information >> needed. >> >> Thanks! >> Ryan >> >> *PostgreSQL version number you are running:* >> >> PostgreSQL 10.2, compiled by Visual C++ build 1800, 64-bit >> >> *How you installed PostgreSQL:* >> >> Using the Enterprise DB installer. >> >> I have also installed Enterprise DB's Postgres Enterprise Manager (PEM) >> 7.2.0 software and Enterprise DB's SQL Profiler PG10-7.2.0 software. The >> PEM Agent service that gets installed is currently turned off. >> >> *Changes made to the settings in the postgresql.conf file: see Server >> Configuration for a quick way to list them all.* >> >> name |current_setting >> |source >> -----------------------------------|------------------------ >> ---------------|--------------------- >> application_name |DBeaver 5.0.3 - >> Main |session >> auto_explain.log_analyze |on >> |configuration file >> auto_explain.log_buffers |on >> |configuration file >> auto_explain.log_min_duration |0 >> |configuration file >> auto_explain.log_nested_statements |on >> |configuration file >> auto_explain.log_triggers |on >> |configuration file >> client_encoding |UTF8 >> |client >> DateStyle |ISO, MDY >> |client >> default_text_search_config |pg_catalog.english >> |configuration file >> dynamic_shared_memory_type |windows >> |configuration file >> extra_float_digits |3 >> |session >> lc_messages |English_United >> States.1252 |configuration file >> lc_monetary |English_United >> States.1252 |configuration file >> lc_numeric |English_United >> States.1252 |configuration file >> lc_time |English_United >> States.1252 |configuration file >> listen_addresses |* >> |configuration file >> log_destination |stderr >> |configuration file >> log_timezone |US/Eastern >> |configuration file >> logging_collector |on >> |configuration file >> max_connections |100 >> |configuration file >> max_stack_depth |2MB >> |environment variable >> port |5432 >> |configuration file >> shared_buffers |128MB >> |configuration file >> shared_preload_libraries |$libdir/sql-profiler.dll, >> auto_explain |configuration file >> ssl |on >> |configuration file >> ssl_ca_file |root.crt >> |configuration file >> ssl_cert_file |server.crt >> |configuration file >> ssl_crl_file |root.crl >> |configuration file >> ssl_key_file |server.key >> |configuration file >> TimeZone |America/New_York >> |client >> >> *Operating system and version:* >> >> Windows 10 Pro 64-bit, Version 1709 (Build 16299.309) >> >> *Hardware:* >> >> Processor - Intel Core i7-7820HQ @ 2.90GHz >> RAM - 16GB >> RAID? - No >> Hard Drive - Samsung 512 GB SSD M.2 PCIe NVMe Opal2 >> >> *What program you're using to connect to PostgreSQL:* >> >> DBeaver Community Edition v5.0.3 >> >> *Is there anything relevant or unusual in the PostgreSQL server logs?:* >> >> Not that I noticed. >> >> *For questions about any kind of error:* >> >> N/A >> >> *What you were doing when the error happened / how to cause the error:* >> >> N/A >> >> *The EXACT TEXT of the error message you're getting, if there is one: >> (Copy and paste the message to the email, do not send a screenshot)* >> >> N/A >> >> > A support of parallel query execution is not complete - it doesn't work > in PostgreSQL 11 too. So although EXISTS variant can be faster (but can be > - the worst case of EXISTS is same like COUNT), then due disabled parallel > execution the COUNT(*) is faster now. It is unfortunate, because I believe > so this issue will be fixed in few years. > > Regards > > Pavel >