[BUGS] BUG #4899: Open parenthesis breaks query plan
The following bug has been logged online: Bug reference: 4899 Logged by: Peter Headland Email address: pheadl...@actuate.com PostgreSQL version: 8.4.0 Operating system: Windows Description:Open parenthesis breaks query plan Details: In a moderate-size table (~400,000 rows), an equality match on an unindexed varchar column to a string that contains an open parenthesis '(' prevents the optimizer from using an obvious index. Changing the open parenthesis to another character, such as ')' allows the obvious index to be used. I have been unable to reproduce this on simple test data so far, so it is obviously fairly subtle. Abstract example of the issue: o table t has a composite index i comprising columns c1, c2, c3 o column t.c4 is not indexed Illustration of the queries: -- Full table scan SELECT COUNT(*) FROM t WHERE t.c1 = 123 AND t.c4 = '('; -- Uses index i SELECT COUNT(*) FROM t WHERE t.c1 = 123 AND t.c4 = ')'; I am really hoping that this defect can be found by inspection of the source, because trying to reproduce it is fast getting me nowhere. Unfortunately, the data involved are customer confidential, so I cannot provide the original table. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #4900: Query planner misses obvious optimization on ordered UNION DISTINCT
The following bug has been logged online: Bug reference: 4900 Logged by: Peter Headland Email address: pheadl...@actuate.com PostgreSQL version: 8.4.0 Operating system: Windows Description:Query planner misses obvious optimization on ordered UNION DISTINCT Details: Consider the following union: SELECT a, b, c FROM t WHERE d = 1 UNION DISTINCT SELECT a, b, c FROM t WHERE d = 2 ORDER BY b, c; I have a table for which the plan for the above is ...->sort->unique->sort. I infer that the first sort is a,b,c. The obvious optimization is to reorder the columns used in the first sort to eliminate the need for the second sort. To illustrate this, I change the query to SELECT b, c, a FROM t WHERE d = 1 UNION DISTINCT SELECT b, c, a FROM t WHERE d = 2 ORDER BY b, c; the plan now becomes ...->sort->distinct. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4899: Open parenthesis breaks query plan
While noodling around some more, I found that a comparison to '()' allows use of the index, as does '(abc)' and even '(a(b(c)d)e)'. It appears that mismatched open/close paren pairs trigger the bug. Obviously something is parsing the string literal and mishandling parentheses. I don't understand why parentheses should be significant inside a string literal in the first place. Also, just to be 100% clear, the open paren can be anywhere in the string, so a comparison to 'abcdefgh(ijklmnop' still triggers the bug. -- Peter Headland Architect - e.Reports Actuate Corporation -----Original Message- From: Peter Headland Sent: Saturday, July 04, 2009 18:03 To: pgsql-bugs@postgresql.org Subject: BUG #4899: Open parenthesis breaks query plan The following bug has been logged online: Bug reference: 4899 Logged by: Peter Headland Email address: pheadl...@actuate.com PostgreSQL version: 8.4.0 Operating system: Windows Description:Open parenthesis breaks query plan Details: In a moderate-size table (~400,000 rows), an equality match on an unindexed varchar column to a string that contains an open parenthesis '(' prevents the optimizer from using an obvious index. Changing the open parenthesis to another character, such as ')' allows the obvious index to be used. I have been unable to reproduce this on simple test data so far, so it is obviously fairly subtle. Abstract example of the issue: o table t has a composite index i comprising columns c1, c2, c3 o column t.c4 is not indexed Illustration of the queries: -- Full table scan SELECT COUNT(*) FROM t WHERE t.c1 = 123 AND t.c4 = '('; -- Uses index i SELECT COUNT(*) FROM t WHERE t.c1 = 123 AND t.c4 = ')'; I am really hoping that this defect can be found by inspection of the source, because trying to reproduce it is fast getting me nowhere. Unfortunately, the data involved are customer confidential, so I cannot provide the original table. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4899: Open parenthesis breaks query plan
> your theory is pretty much nonsense ... > What I think is happening is that '(' is a sufficiently common value that > the planner thinks a seqscan is superior to an indexscan for it. Your theory is also "pretty much nonsense" if you read the detailed description I gave in my initial post, in which I explain that presence of a '(' character anywhere at all in the string literal triggers the problem. For example 'abc(def'. It also totally fails to explain the way that matched parentheses prevent the problem (which makes it obvious that _something_ somewhere is doing enough parsing to count parentheses). Now that I know about EXPLAIN ANALYZE, I got these (I apologise for the redactions and obfuscation, which represent the downside of self-documenting column names): "Aggregate (cost=534.40..534.41 rows=1 width=0) (actual time=0.442..0.444 rows=1 loops=1)" " Output: count(*)" " -> Bitmap Heap Scan on a_table (cost=9.49..534.39 rows=1 width=0) (actual time=0.412..0.412 rows=0 loops=1)" "Output: ... 21 columns ..." "Recheck Cond: (an_integer_column = 65)" "Filter: ((a_varchar_column)::text = 'abc(def'::text)" "-> Bitmap Index Scan on an_index (cost=0.00..9.49 rows=146 width=0) (actual time=0.118..0.118 rows=197 loops=1)" " Index Cond: (an_integer_column = 65)" "Total runtime: 0.611 ms" "Aggregate (cost=534.40..534.41 rows=1 width=0) (actual time=0.418..0.421 rows=1 loops=1)" " Output: count(*)" " -> Bitmap Heap Scan on a_table (cost=9.49..534.39 rows=1 width=0) (actual time=0.395..0.395 rows=0 loops=1)" "Output: ... 21 columns ..." "Recheck Cond: (an_integer_column = 65)" "Filter: ((a_varchar_column)::text = 'abc()def'::text)" "-> Bitmap Index Scan on an_index (cost=0.00..9.49 rows=146 width=0) (actual time=0.108..0.108 rows=197 loops=1)" " Index Cond: (an_integer_column = 65)" "Total runtime: 0.563 ms" This puzzles me, because it seems to say that the plan is the same in both cases, but the graphical display of the plan in pgAdmin III looks different for the two queries (is there some way/somewhere I can post screen grabs?). I think the issue is a bug in the way pgAdmin III parses the output from EXPLAIN. My inability to reproduce the issue with dummy data would be down to the fact I'd have to get the optimizer to choose the same plan, which is beyond my ability at this stage. I also just realized that the graphical display of the plan in pgAdmin III does not show a full table scan for the mismatched parentheses case; it shows something that looks like a variant of the index scan symbol, but with the name of the table underneath. The difference between display of the two plans is that the initial symbol with the name of the index underneath vanishes when there is an unmatched open parenthesis. I have been unable to find an explanation of the symbols used in pgAdmin III - is there such a thing anywhere? If we are agreed that the issue is a bug in pgAdmin III, please advise where I should report such things. -- Peter Headland Architect - e.Reports Actuate Corporation -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Sunday, July 05, 2009 08:39 To: Peter Headland Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #4899: Open parenthesis breaks query plan "Peter Headland" writes: > While noodling around some more, I found that a comparison to '()' > allows use of the index, as does '(abc)' and even '(a(b(c)d)e)'. It > appears that mismatched open/close paren pairs trigger the > bug. Obviously something is parsing the string literal and mishandling > parentheses. This isn't "obvious" at all, and in fact your theory is pretty much nonsense. What I think is happening is that '(' is a sufficiently common value that the planner thinks a seqscan is superior to an indexscan for it. However, since you have not shown us EXPLAIN output (much less EXPLAIN ANALYZE output), that's just a guess. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4899: Open parenthesis breaks query plan
As I said further down my previous e-mail, it looks as if the optimizer is just fine, and the problem is simply a bug in the way pgAdmin III parses and displays EXPLAIN ANALYZE output in its graphical view. -- Peter Headland Architect - e.Reports Actuate Corporation -Original Message- From: gsst...@gmail.com [mailto:gsst...@gmail.com] On Behalf Of Greg Stark Sent: Monday, July 06, 2009 10:35 To: Peter Headland Cc: Tom Lane; pgsql-bugs@postgresql.org Subject: Re: BUG #4899: Open parenthesis breaks query plan On Mon, Jul 6, 2009 at 5:40 PM, Peter Headland wrote: > presence > of a '(' character anywhere at all in the string literal triggers the > problem. For example 'abc(def'. Except according to that explain analyze 'abc(def' ran exactly the speed as 'abc()def'. This all seems much more likely to depend on the c1/an_integer_column value you're querying for than on the string. Keep trying different values for both columns until you find one that triggers the problem and send the explain analyze result for that. It could be that '(' was such a value yesterday but not today if autovacuum has run analyze since. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs