On Sun, Nov 21, 2010 at 6:22 PM, Andrew Dunstan <and...@dunslane.net> wrote:
> On 11/21/2010 06:09 PM, Robert Haas wrote:
> I think that's fair.  It actually doesn't seem like it should be that
> hard if we knew that the server encoding were UTF8 - it's just a big
> translation table somewhere, no?
> No, it's far more complex. See for example
> <http://unicode.org/reports/tr21/tr21-3.html>, which says:
> There are a number of complications to case mappings that occur once the
> repertoire of characters is expanded beyond ASCII.
> Because of the inclusion of certain composite characters for compatibility,
> such as 01F1 "DZ" capital dz, there is a third case, called titlecase, which
> is used where the first letter of a word is to be capitalized (e.g.
> Titlecase, vs. UPPERCASE, or lowercase).
> For example, the title case of the example character is 01F2 "Dz" capital d
> with small z.
> Case mappings may produce strings of different length than the original.
> For example, the German character 00DF "ß" small letter sharp s expands when
> uppercased to the sequence of two characters "SS". This also occurs where
> there is no precomposed character corresponding to a case mapping, such as
> with 0149 "ʼn" latin small letter n preceded by apostrophe.
> Characters may also have different case mappings, depending on the context.
> For example, 03A3 "Σ" capital sigma lowercases to 03C3 "σ" small sigma if it
> is followed by another letter, but lowercases to 03C2 "ς" small final sigma
> if it is not.
> Characters may have case mappings that depend on the locale.
> For example, in Turkish the letter 0049 "I" capital letter i lowercases to
> 0131 "ı" small dotless i.
> Case mappings are not, in general, reversible.
> For example, once the string "McGowan" has been uppercased, lowercased or
> titlecased, the original cannot be recovered by applying another uppercase,
> lowercase, or titlecase operation.

Yikes.  So what do people do about this?

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to