I've been looking into bug #6053, in which Regina Obe complains that hash-based DISTINCT queries fail for type "citext". The cause is not far to seek: the header comment for execGrouping.c states
* Note: we currently assume that equality and hashing functions are not * collation-sensitive, so the code in this file has no support for passing * collation settings through from callers. That may have to change someday. and indeed the failure comes directly from the fact that citext's hash function *does* expect a collation to be passed to it. I'm a bit embarrassed to not have noticed that citext was a counterexample for this assumption, especially since I already fixed one bug that should have clued me in (commit a0b75a41a907e1582acdb8aa6ebb9cacca39d7d8). Now, removing this assumption from execGrouping.c is already a pretty sizable task --- for starters, at least plan node types Agg, Group, SetOp, Unique, and WindowAgg would need collation attributes that they don't have today. But the assumption that equality operators are not collation-sensitive is baked into a number of other places too; for instance nodeAgg.c @ line 600 indxpath.c @ line 2200 prepunion.c @ line 640 ri_triggers.c @ line 3000 and that's just places where there's a comment about it :-(. It's worth noting also that in many of these places, paying attention to collation is not merely going to need more coding; it will directly translate to a performance hit, one that is entirely unnecessary for the normal case where collation doesn't affect equality. So this leaves us between a rock and a hard place. I think there's just about no chance of fixing all these things without a serious fresh slip in the 9.1 schedule. Also, I'm *not* prepared to fix these things personally. I already regret the amount of time I put into collations this past winter/spring, and am not willing to drop another several weeks down that sinkhole right now. The most workable alternative that I can see is to lobotomize citext so that it always does lower-casing according to the database's "default" collation, which would allow us to pretend that its notion of equality is not collation-sensitive after all. We could hope to improve this in future release cycles, but not till we've done the infrastructure work outlined above. One bit of infrastructure that might be a good idea is a flag to indicate whether an equality operator's behavior is potentially collation-dependent, so that we could avoid taking performance hits in the normal case. Comments, other ideas? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers