[BUGS] LIKE predicate and ERROR: 42P22: could not determine which collation to use for string comparison - HINT: Use the COLLATE clause ...

John Lumby Fri, 09 Dec 2011 11:39:07 -0800

postgresql 9.1.2 running on linux debian squeeze on 32-bit x86

I have encountered behaviour relating to subject line which I'm fairly sure is 
a bug,  although I'm not sure exactly what.


My database was created with LC_LOCALE C,   and loaded using Pg-9.1.1.   it 
looks like

psql -d proteusdb -c "select datname, datdba, encoding, datcollate, datctype, 
datistemplate, datallowconn, datconnlimit, datlastsysoid, datfrozenxid, 
dattablespace from pg_database where datname = 'proteusdb'";
  datname  | datdba | encoding | datcollate | datctype | datistemplate | 
datallowconn | datconnlimit | datlastsysoid | datfrozenxid | dattablespace 
-----------+--------+----------+------------+----------+---------------+--------------+--------------+---------------+--------------+---------------
 proteusdb |     10 |        6 | C          | C        | f             | 
t            |           -1 |         11910 |          670 |          1663
(1 row)


 I have a table which as a column named discriminator :

CREATE TABLE entity (
    id bigint DEFAULT nextval('entity_id_seq'::regclass) NOT NULL,
    discriminator character varying(255) NOT NULL,
    version integer NOT NULL,
    inherit_right boolean,
    name text,
    parent_id bigint,
    association_id bigint,
    association2_id bigint,
    long1 bigint,
    long2 bigint
);

which shows up looking like this in the catalog

psql -d proteusdb -c "select N.nspname , C.relname, A.attrelid , A.attname , 
A.atttypid , T.typname , L.collname from pg_attribute A, pg_type T, pg_class C 
, pg_namespace N , pg_collation L where C.oid = A.attrelid and T.oid = 
A.atttypid and A.attcollation = L.oid and C.relnamespace = N.oid and N.nspname 
= 'public' and C.relname = 'entity' and A.attname = 'discriminator'";
 nspname | relname | attrelid |    attname    | atttypid | typname | collname 
---------+---------+----------+---------------+----------+---------+----------
  public  | entity  |    25349 | discriminator |     1043 | varchar | default
(1 row)

and a query like so :

psql -d proteusdb -c "select count(*) from entity  as e1 where ( 
e1.association_id IN ( select id from entity as e2 where 
e2.parent_id=3043508)   OR e1.association_id=3043508 ) and e1.discriminator 
LIKE 'DEPLOY%'";
ERROR:  could not determine which collation to use for string comparison
HINT:  Use the COLLATE clause to set the collation explicitly.

explain gives a clue what is going on :
psql -d proteusdb -c "explain select count(*) from entity  as e1 where ( 
e1.association_id IN ( select id from entity as e2 where 
e2.parent_id=3043508)   OR e1.association_id=3043508 ) and e1.discriminator 
LIKE 'DEPLOY%'";
                                                     QUERY 
PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=85045.70..85045.71 rows=1 width=0)
   ->  Index Scan using entity_id_association_id_discriminator on entity e1  
(cost=523.55..85045.61 rows=36 width=0)
         Index Cond: (((discriminator)::text >= 'DEPLOY'::text) AND 
((discriminator)::text < 'DEPLOZ'::text))
         Filter: (((discriminator)::text ~~ 'DEPLOY%'::text) AND ((hashed 
SubPlan 1) OR (association_id = 3043508)))
         SubPlan 1
           ->  Index Scan using entity_parent_id on entity e2  
(cost=0.00..522.84 rows=281 width=8)
                 Index Cond: (parent_id = 3043508)
(7 rows)


So Pg has rewritten my LIKE pred to add the pair of range predicates so as to 
be able to exploit my index ... Good.
However  ---   It has then discovered that there is a mismatch between the 
collation of the column (default) and the constant (I assume C,  inherited from 
the database).

Well   --    I am not an expert on collation,   but I am fairly sure that the 
semantics of the LIKE predicate do not involve order,   so an error relating to 
ordering or collation should be impossible.     LIKE involves only exact 
(equality) matching of parts of the column value against the pattern.     If 
the query rewriter has satisified itself that it is a correct transformation of 
the query to add the two range predicates,   then it must  (should?)  have 
checked that any mismatch of collation does not affect correctness, and 
therefore no other part of postgres (planner,  runtime) should then object on 
those grounds.

So  - I *think* there is a  bug,   either that the query should not have been 
rewritten  (if collation does indeed make the rewrite incorrect),  or else it 
should have planned and executed the rewritten query.

I re-ran this on PG 9.1.2 to check and the same happens there as well.

Hoping that some expert can make a ruling on this.

John Lumby
                                          
-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] LIKE predicate and ERROR: 42P22: could not determine which collation to use for string comparison - HINT: Use the COLLATE clause ...

Reply via email to