I have encountered what seems to be a bad decision made by the query planner, on a quite simple query. I have attached a shell script which creates a large (180MB) sql script, to demonstrate the problem. To reproduce, execute the bash script, and pipe the stdout into psql. Redirect the stdout of psql somewhere, as it isn't very interesting. psql produces several EXPLAIN ANALYSE results on its stderr. I have also attached a copy of the results I get.
The test creates three tables: jointest3_1 (keya int, infoa int) jointest3_2 (keya int, keyb int) jointest3_3 (keyb int, infob int) It then fills tables jointest3_1 and jointest3_3 each with 1 million rows, where the key value is sequential (0 to 999999), and the info value is a random zero or positive integer below 10000. It then fills table jointest3_2 with 2 million rows, each with random 6-digit integers in both of its columns. Therefore, table jointest3_2 represents a many-to-many mapping between jointest3_1 and jointest3_3, and the natural join of all three tables should have 2 million rows. The test then creates lots of indexes, clusters on the indexes, and does a vacuum analyse. The test does six EXPLAIN ANALYSE commands. There are actually three distinct commands, each of which is performed twice, to get a before-cached and after-cached performance indication. The first command does a natural join of all three tables, and restricts the results on jointest3_1.infoa. This is performed quickly, as there is an index on infoa which returns approximately 100 rows. The second distinct command also does a natural join of all three tables, but restricts the results by jointest3_3.infob. The query planner appears to make a bad decision, and the query takes a long time to return. The third distinct command does the same query as the second distinct command, but it forces the query planner to join tables jointest3_2 and jointest3_3 together before joining with jointest3_1 by doing a subselect. This performs as quickly as the first distinct command. I would have expected the query planner to find the fast query plan, given there are only three tables being joined together. I am using PostgreSQL version 7.2.1, running on Debian unstable/testing, reasonably up to date. If you need more details, please email back. select version(); gives the following: PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.95.4 PostgreSQL was installed as a Debian package. The machine is a Athlon XP1900, with 512MB of RAM, a 40GB hard drive. Kernel: Linux 2.4.19-686 libc6: 2.2.5-14.1 Matthew Wakeling
#!/bin/bash echo "create table jointest3_1 (keya int, infoa int);" echo "create table jointest3_2 (keya int, keyb int);" echo "create table jointest3_3 (keyb int, infob int);" echo "BEGIN;" hexdump -d </dev/urandom \ | sed -e "s/^[0123456789abcdefABCDEF]\+ \+[^ ]\([0123456789]\+\) \+[^ ]\([0123456789]\+\) \+[^ ]\([0123456789]\+\) \+[^ ]\([0123456789]\+\) \+[^ ]\([0123456789]\+\) \+[^ ]\([0123456789]\+\) \+[^ ]\([0123456789]\+\) \+[^ ]\([0123456789]\+\).*$/\1\2\3\4\5\6\7\8/" \ | sed -e "=" | sed -e "/^[0123456789]\{1,7\}$/h;/^[0123456789]\{1,7\}$/d;G;s/^\(.*\)\n\(.*\)/\2 \1/;s/^1000000/0/" \ | sed -e "s/^\([0123456789]*\) \([0123456789]\{4\}\)\([0123456789]\{4\}\)\([0123456789]\{6\}\)\([0123456789]\{6\}\)\([0123456789]\{6\}\)\([0123456789]\{6\}\).*$/insert into jointest3_1 values (\1, \2); insert into jointest3_3 values (\1, \3);insert into jointest3_2 values (\4, \5);insert into jointest3_2 values (\6, \7);/" \ | head -n 1000000 echo "COMMIT;" echo "create index indexjointest3_1_keya on jointest3_1 (keya);" echo "cluster indexjointest3_1_keya on jointest3_1;" echo "create index indexjointest3_1_infoa on jointest3_1 (infoa);" echo "create index indexjointest3_2_keya on jointest3_2 (keya);" echo "cluster indexjointest3_2_keya on jointest3_2;" echo "create index indexjointest3_2_keyb on jointest3_2 (keyb);" echo "create index indexjointest3_3_keyb on jointest3_3 (keyb);" echo "cluster indexjointest3_3_keyb on jointest3_3;" echo "create index indexjointest3_3_infob on jointest3_3 (infob);" echo "vacuum analyse;" echo "explain analyse select infoa, keya, keyb, infob from jointest3_1 natural join jointest3_2 natural join jointest3_3 where infoa = 1234;" echo "explain analyse select infoa, keya, keyb, infob from jointest3_1 natural join jointest3_2 natural join jointest3_3 where infoa = 1234;" echo "explain analyse select infoa, keya, keyb, infob from jointest3_1 natural join jointest3_2 natural join jointest3_3 where infob = 1234;" echo "explain analyse select infoa, keya, keyb, infob from jointest3_1 natural join jointest3_2 natural join jointest3_3 where infob = 1234;" echo "explain analyse select infoa, keya, keyb, infob from jointest3_1 natural join (select * from jointest3_2 natural join jointest3_3 where infob = 1234) as s;" echo "explain analyse select infoa, keya, keyb, infob from jointest3_1 natural join (select * from jointest3_2 natural join jointest3_3 where infob = 1234) as s;"
NOTICE: Skipping "pg_group" --- only table or database owner can VACUUM it NOTICE: Skipping "pg_database" --- only table or database owner can VACUUM it NOTICE: Skipping "pg_shadow" --- only table or database owner can VACUUM it NOTICE: QUERY PLAN: Nested Loop (cost=0.00..1343.81 rows=199 width=24) (actual time=101.51..3583.62 rows=181 loops=1) -> Nested Loop (cost=0.00..741.80 rows=199 width=16) (actual time=71.32..1990.54 rows=181 loops=1) -> Index Scan using indexjointest3_1_infoa on jointest3_1 (cost=0.00..420.65 rows=105 width=8) (actual time=25.48..856.21 rows=92 loops=1) -> Index Scan using indexjointest3_2_keya on jointest3_2 (cost=0.00..3.03 rows=2 width=8) (actual time=12.31..12.32 rows=2 loops=92) -> Index Scan using indexjointest3_3_keyb on jointest3_3 (cost=0.00..3.01 rows=1 width=8) (actual time=8.79..8.80 rows=1 loops=181) Total runtime: 3583.96 msec NOTICE: QUERY PLAN: Nested Loop (cost=0.00..1343.81 rows=199 width=24) (actual time=0.32..23.74 rows=181 loops=1) -> Nested Loop (cost=0.00..741.80 rows=199 width=16) (actual time=0.25..10.50 rows=181 loops=1) -> Index Scan using indexjointest3_1_infoa on jointest3_1 (cost=0.00..420.65 rows=105 width=8) (actual time=0.16..3.49 rows=92 loops=1) -> Index Scan using indexjointest3_2_keya on jointest3_2 (cost=0.00..3.03 rows=2 width=8) (actual time=0.07..0.07 rows=2 loops=92) -> Index Scan using indexjointest3_3_keyb on jointest3_3 (cost=0.00..3.01 rows=1 width=8) (actual time=0.07..0.07 rows=1 loops=181) Total runtime: 23.96 msec NOTICE: QUERY PLAN: Hash Join (cost=426.08..101497.21 rows=201 width=24) (actual time=319.68..15399.50 rows=224 loops=1) -> Merge Join (cost=0.00..86902.39 rows=1888831 width=16) (actual time=28.48..13852.19 rows=2000000 loops=1) -> Index Scan using indexjointest3_1_keya on jointest3_1 (cost=0.00..18599.00 rows=1000000 width=8) (actual time=28.30..2474.90 rows=1000000 loops=1) -> Index Scan using indexjointest3_2_keya on jointest3_2 (cost=0.00..37193.00 rows=2000000 width=8) (actual time=0.15..5663.64 rows=2000000 loops=1) -> Hash (cost=425.81..425.81 rows=107 width=8) (actual time=217.25..217.25 rows=0 loops=1) -> Index Scan using indexjointest3_3_infob on jointest3_3 (cost=0.00..425.81 rows=107 width=8) (actual time=24.98..217.09 rows=112 loops=1) Total runtime: 15420.55 msec NOTICE: QUERY PLAN: Hash Join (cost=426.08..101497.21 rows=201 width=24) (actual time=363.25..14638.78 rows=224 loops=1) -> Merge Join (cost=0.00..86902.39 rows=1888831 width=16) (actual time=36.47..13099.69 rows=2000000 loops=1) -> Index Scan using indexjointest3_1_keya on jointest3_1 (cost=0.00..18599.00 rows=1000000 width=8) (actual time=28.30..2486.89 rows=1000000 loops=1) -> Index Scan using indexjointest3_2_keya on jointest3_2 (cost=0.00..37193.00 rows=2000000 width=8) (actual time=8.14..4948.04 rows=2000000 loops=1) -> Hash (cost=425.81..425.81 rows=107 width=8) (actual time=246.15..246.15 rows=0 loops=1) -> Index Scan using indexjointest3_3_infob on jointest3_3 (cost=0.00..425.81 rows=107 width=8) (actual time=53.91..245.99 rows=112 loops=1) Total runtime: 14639.31 msec NOTICE: QUERY PLAN: Nested Loop (cost=0.00..4010.50 rows=201 width=24) (actual time=56.30..893.31 rows=224 loops=1) -> Nested Loop (cost=0.00..3365.04 rows=213 width=16) (actual time=56.13..876.68 rows=224 loops=1) -> Index Scan using indexjointest3_3_infob on jointest3_3 (cost=0.00..425.81 rows=107 width=8) (actual time=0.18..4.18 rows=112 loops=1) -> Index Scan using indexjointest3_2_keyb on jointest3_2 (cost=0.00..27.48 rows=6 width=8) (actual time=7.75..7.78 rows=2 loops=112) -> Index Scan using indexjointest3_1_keya on jointest3_1 (cost=0.00..3.01 rows=1 width=8) (actual time=0.07..0.07 rows=1 loops=224) Total runtime: 893.62 msec NOTICE: QUERY PLAN: Nested Loop (cost=0.00..4010.50 rows=201 width=24) (actual time=0.39..40.16 rows=224 loops=1) -> Nested Loop (cost=0.00..3365.04 rows=213 width=16) (actual time=0.32..16.79 rows=224 loops=1) -> Index Scan using indexjointest3_3_infob on jointest3_3 (cost=0.00..425.81 rows=107 width=8) (actual time=0.14..4.05 rows=112 loops=1) -> Index Scan using indexjointest3_2_keyb on jointest3_2 (cost=0.00..27.48 rows=6 width=8) (actual time=0.07..0.11 rows=2 loops=112) -> Index Scan using indexjointest3_1_keya on jointest3_1 (cost=0.00..3.01 rows=1 width=8) (actual time=0.10..0.10 rows=1 loops=224) Total runtime: 40.39 msec
---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html