On Wed, 07 Jun 2000 22:22:06 -0400  Tom Lane wrote:

> Since '\341' and '\342' are two different accented forms of 'a'
> (if I'm looking at the right character set), this is perhaps not so
> improbable as all that.  Evidently the collation rule is that different
> accent forms sort the same unless the strings would otherwise be
> considered equal, in which case an ordering is assigned to them.

I thought that was common, but while I've worked on
internationalisation issues sometimes I'm no linguist.

> So, the rule we thought we had for generating index bounds falls flat,
> and we're back to the same old question: given a proposed prefix string,
> how can we generate bounds that are certain to be considered <= and >=
> all strings starting with that prefix?

To confess ignorance, why does PostgreSQL need to generate such
bounds?  Complete string comparisons with a locale aware function such
as strcoll() are safe.  Using less than a full string is tricky
indeed, and I'm not sure is possible in general although it might be.

Other problematic cases are likely to include one-to-two collations (ß
in German, for example) and two-to-one collations (the reverse, but
I've forgotten my example.  Anyone?)

Then there are wide characters, including some encodings that are
stateful.

Regards,

Giles


Reply via email to