[BUGS] BUG #4899: Open parenthesis breaks query plan

2009-07-05 Thread Peter Headland

The following bug has been logged online:

Bug reference:  4899
Logged by:  Peter Headland
Email address:  pheadl...@actuate.com
PostgreSQL version: 8.4.0
Operating system:   Windows
Description:Open parenthesis breaks query plan
Details: 

In a moderate-size table (~400,000 rows), an equality match on an unindexed
varchar column to a string that contains an open parenthesis '(' prevents
the optimizer from using an obvious index. Changing the open parenthesis to
another character, such as ')' allows the obvious index to be used. I have
been unable to reproduce this on simple test data so far, so it is obviously
fairly subtle.

Abstract example of the issue:

o table t has a composite index i comprising columns c1, c2, c3

o column t.c4 is not indexed

Illustration of the queries:

-- Full table scan
SELECT COUNT(*) FROM t WHERE t.c1 = 123 AND t.c4 = '(';

-- Uses index i
SELECT COUNT(*) FROM t WHERE t.c1 = 123 AND t.c4 = ')';

I am really hoping that this defect can be found by inspection of the
source, because trying to reproduce it is fast getting me nowhere.
Unfortunately, the data involved are customer confidential, so I cannot
provide the original table.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] BUG #4900: Query planner misses obvious optimization on ordered UNION DISTINCT

2009-07-05 Thread Peter Headland

The following bug has been logged online:

Bug reference:  4900
Logged by:  Peter Headland
Email address:  pheadl...@actuate.com
PostgreSQL version: 8.4.0
Operating system:   Windows
Description:Query planner misses obvious optimization on ordered
UNION DISTINCT
Details: 

Consider the following union:

SELECT a, b, c FROM t WHERE d = 1
UNION DISTINCT
SELECT a, b, c FROM t WHERE d = 2
ORDER BY b, c;

I have a table for which the plan for the above is ...->sort->unique->sort.
I infer that the first sort is a,b,c.

The obvious optimization is to reorder the columns used in the first sort to
eliminate the need for the second sort. To illustrate this, I change the
query to 

SELECT b, c, a FROM t WHERE d = 1
UNION DISTINCT
SELECT b, c, a FROM t WHERE d = 2
ORDER BY b, c;

the plan now becomes ...->sort->distinct.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #4899: Open parenthesis breaks query plan

2009-07-05 Thread Peter Headland
While noodling around some more, I found that a comparison to '()' allows use 
of the index, as does '(abc)' and even '(a(b(c)d)e)'. It appears that 
mismatched open/close paren pairs trigger the bug. Obviously something is 
parsing the string literal and mishandling parentheses. I don't understand why 
parentheses should be significant inside a string literal in the first place.

Also, just to be 100% clear, the open paren can be anywhere in the string, so a 
comparison to 'abcdefgh(ijklmnop' still triggers the bug.

-- 
Peter Headland
Architect - e.Reports
Actuate Corporation

-----Original Message-
From: Peter Headland 
Sent: Saturday, July 04, 2009 18:03
To: pgsql-bugs@postgresql.org
Subject: BUG #4899: Open parenthesis breaks query plan


The following bug has been logged online:

Bug reference:  4899
Logged by:  Peter Headland
Email address:  pheadl...@actuate.com
PostgreSQL version: 8.4.0
Operating system:   Windows
Description:Open parenthesis breaks query plan
Details: 

In a moderate-size table (~400,000 rows), an equality match on an unindexed
varchar column to a string that contains an open parenthesis '(' prevents
the optimizer from using an obvious index. Changing the open parenthesis to
another character, such as ')' allows the obvious index to be used. I have
been unable to reproduce this on simple test data so far, so it is obviously
fairly subtle.

Abstract example of the issue:

o table t has a composite index i comprising columns c1, c2, c3

o column t.c4 is not indexed

Illustration of the queries:

-- Full table scan
SELECT COUNT(*) FROM t WHERE t.c1 = 123 AND t.c4 = '(';

-- Uses index i
SELECT COUNT(*) FROM t WHERE t.c1 = 123 AND t.c4 = ')';

I am really hoping that this defect can be found by inspection of the
source, because trying to reproduce it is fast getting me nowhere.
Unfortunately, the data involved are customer confidential, so I cannot
provide the original table.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #4899: Open parenthesis breaks query plan

2009-07-06 Thread Peter Headland
> your theory is pretty much nonsense
...
> What I think is happening is that '(' is a sufficiently common value
that
> the planner thinks a seqscan is superior to an indexscan for it.

Your theory is also "pretty much nonsense" if you read the detailed
description I gave in my initial post, in which I explain that presence
of a '(' character anywhere at all in the string literal triggers the
problem. For example 'abc(def'. It also totally fails to explain the way
that matched parentheses prevent the problem (which makes it obvious
that _something_ somewhere is doing enough parsing to count
parentheses).

Now that I know about EXPLAIN ANALYZE, I got these (I apologise for the
redactions and obfuscation, which represent the downside of
self-documenting column names):


"Aggregate  (cost=534.40..534.41 rows=1 width=0) (actual
time=0.442..0.444 rows=1 loops=1)"
"  Output: count(*)"
"  ->  Bitmap Heap Scan on a_table  (cost=9.49..534.39 rows=1 width=0)
(actual time=0.412..0.412 rows=0 loops=1)"
"Output: ... 21 columns ..."
"Recheck Cond: (an_integer_column = 65)"
"Filter: ((a_varchar_column)::text = 'abc(def'::text)"
"->  Bitmap Index Scan on an_index  (cost=0.00..9.49 rows=146
width=0) (actual time=0.118..0.118 rows=197 loops=1)"
"  Index Cond: (an_integer_column = 65)"
"Total runtime: 0.611 ms"



"Aggregate  (cost=534.40..534.41 rows=1 width=0) (actual
time=0.418..0.421 rows=1 loops=1)"
"  Output: count(*)"
"  ->  Bitmap Heap Scan on a_table  (cost=9.49..534.39 rows=1 width=0)
(actual time=0.395..0.395 rows=0 loops=1)"
"Output: ... 21 columns ..."
"Recheck Cond: (an_integer_column = 65)"
"Filter: ((a_varchar_column)::text = 'abc()def'::text)"
"->  Bitmap Index Scan on an_index  (cost=0.00..9.49 rows=146
width=0) (actual time=0.108..0.108 rows=197 loops=1)"
"  Index Cond: (an_integer_column = 65)"
"Total runtime: 0.563 ms"


This puzzles me, because it seems to say that the plan is the same in
both cases, but the graphical display of the plan in pgAdmin III looks
different for the two queries (is there some way/somewhere I can post
screen grabs?). I think the issue is a bug in the way pgAdmin III parses
the output from EXPLAIN. My inability to reproduce the issue with dummy
data would be down to the fact I'd have to get the optimizer to choose
the same plan, which is beyond my ability at this stage.

I also just realized that the graphical display of the plan in pgAdmin
III does not show a full table scan for the mismatched parentheses case;
it shows something that looks like a variant of the index scan symbol,
but with the name of the table underneath. The difference between
display of the two plans is that the initial symbol with the name of the
index underneath vanishes when there is an unmatched open parenthesis. I
have been unable to find an explanation of the symbols used in pgAdmin
III - is there such a thing anywhere?

If we are agreed that the issue is a bug in pgAdmin III, please advise
where I should report such things.

-- 
Peter Headland
Architect - e.Reports
Actuate Corporation


-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Sunday, July 05, 2009 08:39
To: Peter Headland
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #4899: Open parenthesis breaks query plan 

"Peter Headland"  writes:
> While noodling around some more, I found that a comparison to '()'
> allows use of the index, as does '(abc)' and even '(a(b(c)d)e)'. It
> appears that mismatched open/close paren pairs trigger the
> bug. Obviously something is parsing the string literal and mishandling
> parentheses.

This isn't "obvious" at all, and in fact your theory is pretty much
nonsense.  What I think is happening is that '(' is a sufficiently
common value that the planner thinks a seqscan is superior to an
indexscan for it.  However, since you have not shown us EXPLAIN output
(much less EXPLAIN ANALYZE output), that's just a guess.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #4899: Open parenthesis breaks query plan

2009-07-06 Thread Peter Headland
As I said further down my previous e-mail, it looks as if the optimizer
is just fine, and the problem is simply a bug in the way pgAdmin III
parses and displays EXPLAIN ANALYZE output in its graphical view.

-- 
Peter Headland
Architect - e.Reports
Actuate Corporation


-Original Message-
From: gsst...@gmail.com [mailto:gsst...@gmail.com] On Behalf Of Greg
Stark
Sent: Monday, July 06, 2009 10:35
To: Peter Headland
Cc: Tom Lane; pgsql-bugs@postgresql.org
Subject: Re: BUG #4899: Open parenthesis breaks query plan

On Mon, Jul 6, 2009 at 5:40 PM, Peter Headland
wrote:
> presence
> of a '(' character anywhere at all in the string literal triggers the
> problem. For example 'abc(def'.

Except according to that explain analyze 'abc(def' ran exactly the
speed as 'abc()def'.

This all seems much more likely to depend on the c1/an_integer_column
value you're querying for than on the string. Keep trying different
values for both columns until you find one that triggers the problem
and send the explain analyze result for that. It could be that '(' was
such a value yesterday but not today if autovacuum has run analyze
since.


-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs