Dear Tom, I am writing to you because you are the maintainer of the query optimizer and planner. I have found a very significant performance degradation between PostgreSQL 6.5.3 and 7.1beta3, which will severely impact two large applications that I have developed and maintain for several clients. The performance difference is seen with the use of indices in SELECT statements, whereby the current release does not make effective use of the indices and 6.5.3 does. All of these tests were run on a SGI R10000 Indigo2 system running Irix 6.5. All the regression tests passed as expected for both versions. I have followed the discussion in pgsql-hackers over the previous months and others have noted some performance problems, and the response has typically been to VACUUM the tables. Unfortunately, this is not a practical option for my applications. They are very large -- I have one table that is 17GB in length, and the applications are used frequently. More importantly, PostgreSQL 6.5.3 works very, very well without VACUUM'ing. In order to assist you to diagnosing and correcting this problem, I have prepared a test database that shows the problems. I will attach three files; the test script, the log from running it on version 6.5.3, and the log from running it on version 7.1beta3. In addition, I have setup an anonymous FTP directory on ftp.congen.com:/pub/pg_perf which contains all of these files as well as the compressed table dumps used to build the test database. (When you have finished copying the files, please let me know.) The test script creates the database including the necessary indexing, and then runs EXPLAIN on each of the queries followed by actually executing the queries with "timex" commands to report elapsed times. The striking difference in the query plans is that 7.1 uses only sequential searches for the SELECT's whereas 6.5.3 uses index scans. As a result, 7.1 is almost two orders of magnitude slower than 6.5.3 with exactly the same data, schema, and queries. I plead with you to revisit this question of performance and fix PostgreSQL 7.1 to work as well as PostgreSQL 6.5.3 does. I depend upon PostgreSQL for much of my work, and I do not want to abandon it because of this performance problem which arose only recently. Thank you. +----------------------------------+------------------------------------+ | Robert E. Bruccoleri, Ph.D. | Phone: 609 737 6383 | | President, Congenomics, Inc. | Fax: 609 737 7528 | | 114 W Franklin Ave, Suite K1,4,5 | email: [EMAIL PROTECTED] | | P.O. Box 314 | URL: http://www.congen.com/~bruc | | Pennington, NJ 08534 | | +----------------------------------+------------------------------------+
#!/bin/csh createdb perf_test gunzip <proteins.dmp.gz | timex psql -e perf_test gunzip <comparisons_4.dmp.gz | timex psql -e perf_test gunzip <concordance_91.dmp.gz | timex psql -e perf_test psql -e perf_test <<EOF explain select * from comparisons_4 where name1 = 'HI0001'; explain select count(*) from comparisons_4 where code = 80003; explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name; explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name = p.name; EOF timex psql -e -c "select * from comparisons_4 where name1 = 'HI0001'" perf_test timex psql -e -c "select count(*) from comparisons_4 where code = 80003" perf_test timex psql -e -c "select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name" perf_test timex psql -e -c "select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name = p.name" perf_test
CREATE TABLE "proteins" ( "name" character varying(16), "organism" text, "start_position" int4, "last_position" int4, "seq" text, "purpose" text, "alternate_key" character varying(16), "comment" text, "compared" bool, "complement" bool, "chromosome" character varying(4), "essentiality" float8); QUERY: CREATE TABLE "proteins" ( "name" character varying(16), "organism" text, "start_position" int4, "last_position" int4, "seq" text, "purpose" text, "alternate_key" character varying(16), "comment" text, "compared" bool, "complement" bool, "chromosome" character varying(4), "essentiality" float8); COPY "proteins" FROM stdin; QUERY: COPY "proteins" FROM stdin; CREATE INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" ); QUERY: CREATE INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" ); CREATE INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" ); QUERY: CREATE INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" ); CREATE CREATE CREATE EOF real 1:11.42 user 3.15 sys 0.53 CREATE TABLE "comparisons_4" ( "name1" character varying(16), "name2" character varying(16), "z_score" float8, "expected" float8, "local_overlap_ratio" float8, "local_overlap_count" int4, "overlap_ratio" float8, "code" int4); QUERY: CREATE TABLE "comparisons_4" ( "name1" character varying(16), "name2" character varying(16), "z_score" float8, "expected" float8, "local_overlap_ratio" float8, "local_overlap_count" int4, "overlap_ratio" float8, "code" int4); COPY "comparisons_4" FROM stdin; QUERY: COPY "comparisons_4" FROM stdin; CREATE INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" ); QUERY: CREATE INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" ); CREATE INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" ); QUERY: CREATE INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" ); CREATE INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" ); QUERY: CREATE INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" ); CREATE CREATE CREATE CREATE EOF real 16:42.13 user 5.86 sys 0.96 CREATE TABLE "concordance_91" ( "target_name" character varying(16), "matched_name" character varying(16), "score" text); QUERY: CREATE TABLE "concordance_91" ( "target_name" character varying(16), "matched_name" character varying(16), "score" text); REVOKE ALL on "concordance_91" from PUBLIC; QUERY: REVOKE ALL on "concordance_91" from PUBLIC; GRANT ALL on "concordance_91" to PUBLIC; QUERY: GRANT ALL on "concordance_91" to PUBLIC; COPY "concordance_91" FROM stdin; QUERY: COPY "concordance_91" FROM stdin; CREATE CHANGE CHANGE EOF real 0.30 user 0.02 sys 0.04 explain select * from comparisons_4 where name1 = 'HI0001'; QUERY: explain select * from comparisons_4 where name1 = 'HI0001'; NOTICE: QUERY PLAN: Index Scan using comparisons_4_name1 on comparisons_4 (cost=2.05 rows=1 width=64) explain select count(*) from comparisons_4 where code = 80003; QUERY: explain select count(*) from comparisons_4 where code = 80003; NOTICE: QUERY PLAN: Aggregate (cost=2.05 rows=1 width=12) -> Index Scan using comparisons_4_code on comparisons_4 (cost=2.05 rows=1 width=12) explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name; QUERY: explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name; NOTICE: QUERY PLAN: Nested Loop (cost=4.10 rows=1 width=36) -> Index Scan using comparisons_4_name1 on comparisons_4 c (cost=2.05 rows=1 width=12) -> Index Scan using protein_names on proteins p (cost=2.05 rows=36840 width=24) explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name = p.name; QUERY: explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name = p.name; NOTICE: QUERY PLAN: Nested Loop (cost=2093.00 rows=36840 width=60) -> Seq Scan on concordance_91 c (cost=43.00 rows=1000 width=36) -> Index Scan using protein_names on proteins p (cost=2.05 rows=36840 width=24) EXPLAIN EXPLAIN EXPLAIN EXPLAIN EOF QUERY: select * from comparisons_4 where name1 = 'HI0001' name1 |name2 |z_score|expected|local_overlap_ratio|local_overlap_count|overlap_ratio| code ------+---------------+-------+--------+-------------------+-------------------+-------------+----- HI0001|PDB2DBV_O | 1217.4| 0| 0.56716| 335| 0.560468|30012 HI0001|PDB4DBV_O | 1207| 0| 0.56418| 335| 0.557523|30012 HI0001|PDB2GD1_P | 1226.4| 0| 0.57015| 335| 0.563423|30012 HI0001|PDB1GAE_O | 1861.8| 0| 0.83133| 332| 0.814164|30012 HI0001|PDB4GPD_1 | 1357.8| 0| 0.64865| 333| 0.637169|30012 HI0001|HP1346 | 850.3| 6.9e-41| 0.39222| 334| 0.386435|30005 HI0001|TP0844 | 780.3| 5.8e-37| 0.46307| 352| 0.465716|30014 HI0001|PDB1HDG_O | 1020.4| 0| 0.48024| 329| 0.466074|30012 HI0001|SCPIR-DEBYG1 | 1405.2| 0| 0.6497| 334| 0.640117|30000 HI0001|Rv1436 | 970.4| 0| 0.49558| 339| 0.49558|30010 HI0001|PDB1CER_O | 949.7| 0| 0.47734| 331| 0.466075|30012 HI0001|PDB1NLH_ | 935.1| 0| 0.46847| 333| 0.458825|30012 HI0001|PDB1GGA_A | 918| 0| 0.52125| 353| 0.51397|30012 HI0001|PDB1GAD_O | 1869.5| 0| 0.83434| 332| 0.817112|30012 HI0001|PDB1GYP_A | 900.1| 0| 0.51275| 353| 0.505589|30012 HI0001|MG301 | 866.7| 0| 0.43155| 336| 0.427731|30004 HI0001|SCSW-G3P1_YEAST| 1425.3| 0| 0.65868| 334| 0.648965|30000 HI0001|ScTDH1 | 1424.6| 0| 0.65868| 334| 0.648965|30013 HI0001|ScTDH2 | 1405.2| 0| 0.6497| 334| 0.640117|30013 HI0001|SCSW-G3P3_YEAST| 1417.5| 0| 0.65868| 334| 0.648965|30000 HI0001|ScTDH3 | 1416.8| 0| 0.65868| 334| 0.648965|30013 HI0001|SCGP-3720 | 1416.8| 0| 0.66168| 334| 0.651921|30000 HI0001|SCGP-E243731 | 1416.8| 0| 0.65868| 334| 0.648965|30000 HI0001|SCSW-G3P2_YEAST| 1405.9| 0| 0.6497| 334| 0.640117|30000 HI0001|SCGP-1008189 | 1424.6| 0| 0.65868| 334| 0.648965|30000 HI0001|SCGP-3726 | 1398.7| 0| 0.6497| 334| 0.640117|30000 HI0001|PDB3GPD_R | 1432.2| 0| 0.63772| 334| 0.628314|30012 HI0001|HP0921 | 762.6| 5.6e-36| 0.40407| 344| 0.41003|30005 HI0001|MJ1146 | 124.7| 1.9| 0.25094| 267| 0.195338|30007 HI0001|SCGP-3724 | 1371.5| 0| 0.63772| 334| 0.628314|30000 (30 rows) real 0.18 user 0.02 sys 0.03 QUERY: select count(*) from comparisons_4 where code = 80003 count ----- 3231 (1 row) real 0.34 user 0.02 sys 0.03 QUERY: select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name name |purpose -------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- MG263 |hypothetical protein HP0652 |phosphoserine phosphatase MJ1594 |phosphoserine phosphatase MG125 |hypothetical protein TP0290 |conserved hypothetical protein HI1033 |phosphoserine phosphatase (o-phosphoserine phosphohydrolase) HI0597 |hypothetical protein Rv3813c|(MTCY409.17), len: 273. Unknown, similar to many hypothetical proteins eg. YXEH_BACSU P54947 hypothetical 30.2 kd protein in idh-deor (270 aa), fasta results; opt: 329 z-score: 456.0 E(): 2.2e-18, 32.2% identity in 267 aa overlap Rv3042c|(MTV012.57c), len: 409. The C-terminal domain (150-409) is highly similar to several SerB proteins e.g. P06862|SERB_ECOLI. N-terminus (1-150) shows no similarity, FASTA score: sp|P06862|SERB_ECOLI PHOSPHOSERINE PHOSPHATASE (EC 3.1 (322 aa) opt: 628 z-score: 753.3 E(): 0; 46.8%identity in 235 aa overlap. TBparse score is 0.884 MG265 |hypothetical protein (10 rows) real 0.24 user 0.02 sys 0.03 QUERY: select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name = p.name target_name|matched_name| score|purpose -----------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ECinfA |BSInfA |0.680556|initiation factor IF-1 ECinfA |HI0548 | 0.80952|initiation factor IF-1 ECinfA |HP1298 | 0.61111|translation initiation factor EF-1 ECinfA |Rv3462c |0.684936|(MTCY13E12.15c), len: 73 aa. infA. Probable initiation factor IF-1. FASTA results: identical to IF1_MYCBO P45957 initiation factor if-1 (72 aa) ECrpmA |BB0780 |0.635297|ribosomal protein L27 ECrpmA |HI0879 | 0.87059|ribosomal protein L27 ECrpmA |HP0297 |0.613632|ribosomal protein L27 ECrpmA |Rv2441c |0.616278|(MTCY428.05), len: 86. Probable rpmA, similar to eg RL27_ECOLI P02427 50s ribosomal protein l27, (84 aa), fasta scores, opt: 328, E(): 7.1e-17, (64.2% identity in 81 aa overlap); contains PS00831 Ribosomal protein L27 signature (8 rows) real 0.17 user 0.02 sys 0.03
CREATE DATABASE CREATE TABLE "proteins" ( "name" character varying(16), "organism" text, "start_position" int4, "last_position" int4, "seq" text, "purpose" text, "alternate_key" character varying(16), "comment" text, "compared" bool, "complement" bool, "chromosome" character varying(4), "essentiality" float8); CREATE COPY "proteins" FROM stdin; CREATE INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" ); CREATE CREATE INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" ); CREATE real 1:04.49 user 3.14 sys 0.57 CREATE TABLE "comparisons_4" ( "name1" character varying(16), "name2" character varying(16), "z_score" float8, "expected" float8, "local_overlap_ratio" float8, "local_overlap_count" int4, "overlap_ratio" float8, "code" int4); CREATE COPY "comparisons_4" FROM stdin; CREATE INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" ); CREATE CREATE INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" ); CREATE CREATE INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" ); CREATE real 7:04.43 user 5.87 sys 1.03 CREATE TABLE "concordance_91" ( "target_name" character varying(16), "matched_name" character varying(16), "score" text); CREATE REVOKE ALL on "concordance_91" from PUBLIC; CHANGE GRANT ALL on "concordance_91" to PUBLIC; CHANGE COPY "concordance_91" FROM stdin; real 0.60 user 0.01 sys 0.03 explain select * from comparisons_4 where name1 = 'HI0001'; NOTICE: QUERY PLAN: Seq Scan on comparisons_4 (cost=0.00..15640.81 rows=5918 width=64) EXPLAIN explain select count(*) from comparisons_4 where code = 80003; NOTICE: QUERY PLAN: Aggregate (cost=15655.61..15655.61 rows=1 width=0) -> Seq Scan on comparisons_4 (cost=0.00..15640.81 rows=5918 width=0) EXPLAIN explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name; NOTICE: QUERY PLAN: Merge Join (cost=22495.22..23029.70 rows=2180283 width=36) -> Sort (cost=16011.62..16011.62 rows=5918 width=12) -> Seq Scan on comparisons_4 c (cost=0.00..15640.81 rows=5918 width=12) -> Sort (cost=6483.60..6483.60 rows=36840 width=24) -> Seq Scan on proteins p (cost=0.00..3247.40 rows=36840 width=24) EXPLAIN explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name = p.name; NOTICE: QUERY PLAN: Merge Join (cost=6553.43..7026.43 rows=368400 width=60) -> Sort (cost=69.83..69.83 rows=1000 width=36) -> Seq Scan on concordance_91 c (cost=0.00..20.00 rows=1000 width=36) -> Sort (cost=6483.60..6483.60 rows=36840 width=24) -> Seq Scan on proteins p (cost=0.00..3247.40 rows=36840 width=24) EXPLAIN select * from comparisons_4 where name1 = 'HI0001' name1 | name2 | z_score | expected | local_overlap_ratio | local_overlap_count | overlap_ratio | code --------+-----------------+---------+----------+---------------------+---------------------+---------------+------- HI0001 | PDB1GAD_O | 1869.5 | 0 | 0.83434 | 332 | 0.817112 | 30012 HI0001 | PDB1GAE_O | 1861.8 | 0 | 0.83133 | 332 | 0.814164 | 30012 HI0001 | PDB3GPD_R | 1432.2 | 0 | 0.63772 | 334 | 0.628314 | 30012 HI0001 | SCSW-G3P1_YEAST | 1425.3 | 0 | 0.65868 | 334 | 0.648965 | 30000 HI0001 | SCGP-1008189 | 1424.6 | 0 | 0.65868 | 334 | 0.648965 | 30000 HI0001 | ScTDH1 | 1424.6 | 0 | 0.65868 | 334 | 0.648965 | 30013 HI0001 | SCSW-G3P3_YEAST | 1417.5 | 0 | 0.65868 | 334 | 0.648965 | 30000 HI0001 | ScTDH3 | 1416.8 | 0 | 0.65868 | 334 | 0.648965 | 30013 HI0001 | SCGP-3720 | 1416.8 | 0 | 0.66168 | 334 | 0.651921 | 30000 HI0001 | SCGP-E243731 | 1416.8 | 0 | 0.65868 | 334 | 0.648965 | 30000 HI0001 | SCSW-G3P2_YEAST | 1405.9 | 0 | 0.6497 | 334 | 0.640117 | 30000 HI0001 | ScTDH2 | 1405.2 | 0 | 0.6497 | 334 | 0.640117 | 30013 HI0001 | SCPIR-DEBYG1 | 1405.2 | 0 | 0.6497 | 334 | 0.640117 | 30000 HI0001 | SCGP-3726 | 1398.7 | 0 | 0.6497 | 334 | 0.640117 | 30000 HI0001 | SCGP-3724 | 1371.5 | 0 | 0.63772 | 334 | 0.628314 | 30000 HI0001 | PDB4GPD_1 | 1357.8 | 0 | 0.64865 | 333 | 0.637169 | 30012 HI0001 | PDB2GD1_P | 1226.4 | 0 | 0.57015 | 335 | 0.563423 | 30012 HI0001 | PDB2DBV_O | 1217.4 | 0 | 0.56716 | 335 | 0.560468 | 30012 HI0001 | PDB4DBV_O | 1207 | 0 | 0.56418 | 335 | 0.557523 | 30012 HI0001 | PDB1HDG_O | 1020.4 | 0 | 0.48024 | 329 | 0.466074 | 30012 HI0001 | Rv1436 | 970.4 | 0 | 0.49558 | 339 | 0.49558 | 30010 HI0001 | PDB1CER_O | 949.7 | 0 | 0.47734 | 331 | 0.466075 | 30012 HI0001 | PDB1NLH_ | 935.1 | 0 | 0.46847 | 333 | 0.458825 | 30012 HI0001 | PDB1GGA_A | 918 | 0 | 0.52125 | 353 | 0.51397 | 30012 HI0001 | PDB1GYP_A | 900.1 | 0 | 0.51275 | 353 | 0.505589 | 30012 HI0001 | MG301 | 866.7 | 0 | 0.43155 | 336 | 0.427731 | 30004 HI0001 | HP1346 | 850.3 | 6.9e-41 | 0.39222 | 334 | 0.386435 | 30005 HI0001 | TP0844 | 780.3 | 5.8e-37 | 0.46307 | 352 | 0.465716 | 30014 HI0001 | HP0921 | 762.6 | 5.6e-36 | 0.40407 | 344 | 0.41003 | 30005 HI0001 | MJ1146 | 124.7 | 1.9 | 0.25094 | 267 | 0.195338 | 30007 (30 rows) real 22.68 user 0.01 sys 0.03 select count(*) from comparisons_4 where code = 80003 count ------- 3231 (1 row) real 21.49 user 0.01 sys 0.03 select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name name | purpose ---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HI0597 | hypothetical protein HI1033 | phosphoserine phosphatase (o-phosphoserine phosphohydrolase) HP0652 | phosphoserine phosphatase MG125 | hypothetical protein MG263 | hypothetical protein MG265 | hypothetical protein MJ1594 | phosphoserine phosphatase Rv3042c | (MTV012.57c), len: 409. The C-terminal domain (150-409) is highly similar to several SerB proteins e.g. P06862|SERB_ECOLI. N-terminus (1-150) shows no similarity, FASTA score: sp|P06862|SERB_ECOLI PHOSPHOSERINE PHOSPHATASE (EC 3.1 (322 aa) opt: 628 z-score: 753.3 E(): 0; 46.8%identity in 235 aa overlap. TBparse score is 0.884 Rv3813c | (MTCY409.17), len: 273. Unknown, similar to many hypothetical proteins eg. YXEH_BACSU P54947 hypothetical 30.2 kd protein in idh-deor (270 aa), fasta results; opt: 329 z-score: 456.0 E(): 2.2e-18, 32.2% identity in 267 aa overlap TP0290 | conserved hypothetical protein (10 rows) real 23.13 user 0.01 sys 0.03 select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name = p.name target_name | matched_name | score | purpose -------------+--------------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ECrpmA | BB0780 | 0.635297 | ribosomal protein L27 ECinfA | BSInfA | 0.680556 | initiation factor IF-1 ECinfA | HI0548 | 0.80952 | initiation factor IF-1 ECrpmA | HI0879 | 0.87059 | ribosomal protein L27 ECrpmA | HP0297 | 0.613632 | ribosomal protein L27 ECinfA | HP1298 | 0.61111 | translation initiation factor EF-1 ECrpmA | Rv2441c | 0.616278 | (MTCY428.05), len: 86. Probable rpmA, similar to eg RL27_ECOLI P02427 50s ribosomal protein l27, (84 aa), fasta scores, opt: 328, E(): 7.1e-17, (64.2% identity in 81 aa overlap); contains PS00831 Ribosomal protein L27 signature ECinfA | Rv3462c | 0.684936 | (MTCY13E12.15c), len: 73 aa. infA. Probable initiation factor IF-1. FASTA results: identical to IF1_MYCBO P45957 initiation factor if-1 (72 aa) (8 rows) real 11.16 user 0.01 sys 0.03