On Sun, Nov 21, 2010 at 6:22 PM, Andrew Dunstan <and...@dunslane.net> wrote: > > > On 11/21/2010 06:09 PM, Robert Haas wrote: > > I think that's fair. It actually doesn't seem like it should be that > hard if we knew that the server encoding were UTF8 - it's just a big > translation table somewhere, no? > > No, it's far more complex. See for example > <http://unicode.org/reports/tr21/tr21-3.html>, which says: > > There are a number of complications to case mappings that occur once the > repertoire of characters is expanded beyond ASCII. > > Because of the inclusion of certain composite characters for compatibility, > such as 01F1 "DZ" capital dz, there is a third case, called titlecase, which > is used where the first letter of a word is to be capitalized (e.g. > Titlecase, vs. UPPERCASE, or lowercase). > > For example, the title case of the example character is 01F2 "Dz" capital d > with small z. > > Case mappings may produce strings of different length than the original. > > For example, the German character 00DF "ß" small letter sharp s expands when > uppercased to the sequence of two characters "SS". This also occurs where > there is no precomposed character corresponding to a case mapping, such as > with 0149 "ʼn" latin small letter n preceded by apostrophe. > > Characters may also have different case mappings, depending on the context. > > For example, 03A3 "Σ" capital sigma lowercases to 03C3 "σ" small sigma if it > is followed by another letter, but lowercases to 03C2 "ς" small final sigma > if it is not. > > Characters may have case mappings that depend on the locale. > > For example, in Turkish the letter 0049 "I" capital letter i lowercases to > 0131 "ı" small dotless i. > > Case mappings are not, in general, reversible. > > For example, once the string "McGowan" has been uppercased, lowercased or > titlecased, the original cannot be recovered by applying another uppercase, > lowercase, or titlecase operation.
Yikes. So what do people do about this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers