Hi,

philipp resiner wrote a mail about this problem yesterday. I've now traced it 
down to something that looks like a bug in the
query planer to me. Please have a look at this and let me know if this is a bug 
or I am compleatly wrong..

(this is done right after a complete ANALYZE over the database, so the 
statistics are up-to-date)

        sd-beta=> select n_tup_ins, n_tup_del from pg_stat_user_tables where 
relname = 'contractelements';
         n_tup_ins | n_tup_del
        -----------+-----------
             91821 |         0
        (1 row)

        sd-beta=> select n_distinct, most_common_vals, most_common_freqs from 
pg_stats where tablename = 'contractelements' and attname = 'isactiv';
         n_distinct | most_common_vals |  most_common_freqs
        ------------+------------------+----------------------
                  2 | {Y,N}            | {0.966467,0.0335333}
        (1 row)

        sd-beta=> explain analyze select 1 from contractelements where isActiv 
= 'Y';
                                                              QUERY PLAN
        
-----------------------------------------------------------------------------------------------------------------------
         Seq Scan on contractelements  (cost=0.00..4963.76 rows=88742 width=0) 
(actual time=0.014..137.930 rows=88838 loops=1)
           Filter: ((isactiv)::text = 'Y'::text)
         Total runtime: 153.543 ms
        (3 rows)

The query planner estimates that isActiv = 'Y' will match 88742 rows. This is 
reasonable (91821 * 0.966467 = 88741.966407) and
correct. However, the following case causes some troubles:

        sd-beta=> explain analyze select 1 from contractelements where 
upper(isActiv) = 'Y';
                                                             QUERY PLAN
        
---------------------------------------------------------------------------------------------------------------------
         Seq Scan on contractelements  (cost=0.00..5193.32 rows=459 width=0) 
(actual time=0.030..198.493 rows=88838 loops=1)
           Filter: (upper((isactiv)::text) = 'Y'::text)
         Total runtime: 214.035 ms
        (3 rows)

Here we match on upper(isActiv) = 'Y' (which is totally braindead, but the 
query is auto-generated by a customer-supplied
application, so I can not change it). Shouldn't the query planner execute 
upper(isActiv) for both values in pg_stats
and so come to the same conclusion as in the first case?

It doesn't. Led by this misapprehension the query planner generates pretty 
creative, but unfortunately very suboptimal
query plans.

A 'CREATE INDEX clifford_temp ON contractelements ( upper(isActiv) )' followed 
by an 'ANALYZE contractelements' solves the
problem in this particular case. But this is not a solution to the problem in 
general..

Shouldn't the query planner be able to do the right thing without the index? 
Where does the magic 'rows=459' come from?

yours,
 - clifford

-- 
: Clifford Wolf                                 Tel +43-1-8178292-00 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to