I use a few of them, and in my opinion there is a distinct group of
characters at last in the 8859-1 character set which have a lower and
upper case instance. The ranges are 0xC0 to 0xDD for upper case and 0xE0
to 0xFD for upper (with the exception of 0xD0, 0xD7, 0xF0, and 0xF7).
I haven't examined all the relevant docs, so I might be wrong. The lists
are based on my own observations of the characters in question.
There is probably no harm in sending a few extra bytes, so I am appending
a related function below. If someone finds a flaw with the function,
please tell me; that would be greatly appreciated.
I am also including a list of related characters. This email is going
out with 8859-1 as the charset, so I hope you are able to view them.
UPPER CASE:
192: � (0xc0) 193: � (0xc1) 194: � (0xc2) 195: � (0xc3)
196: � (0xc4) 197: � (0xc5) 198: � (0xc6) 199: � (0xc7)
200: � (0xc8) 201: � (0xc9) 202: � (0xca) 203: � (0xcb)
204: � (0xcc) 205: � (0xcd) 206: � (0xce) 207: � (0xcf)
209: � (0xd1) 210: � (0xd2) 211: � (0xd3)
212: � (0xd4) 213: � (0xd5) 214: � (0xd6)
216: � (0xd8) 217: � (0xd9) 218: � (0xda) 219: � (0xdb)
220: � (0xdc) 221: � (0xdd)
LOWER CASE:
224: � (0xe0) 225: � (0xe1) 226: � (0xe2) 227: � (0xe3)
228: � (0xe4) 229: � (0xe5) 230: � (0xe6) 231: � (0xe7)
232: � (0xe8) 233: � (0xe9) 234: � (0xea) 235: � (0xeb)
236: � (0xec) 237: � (0xed) 238: � (0xee) 239: � (0xef)
241: � (0xf1) 242: � (0xf2) 243: � (0xf3)
244: � (0xf4) 245: � (0xf5) 246: � (0xf6)
248: � (0xf8) 249: � (0xf9) 250: � (0xfa) 251: � (0xfb)
252: � (0xfc) 253: � (0xfd)
SKIPPED
208: � (0xd0)
215: � (0xd7)
222: � (0xde)
240: � (0xf0)
247: � (0xf7)
254: � (0xfe)
CREATE FUNCTION lower8859_1 (text) RETURNS text
AS '/usr/include/pgsql/lib/str8859_1.so'
LANGUAGE 'C';
/* No warranty of any kind, use at your own risk. Use freely.
*/
text * lower8859_1 (text * str1) {
text * result;
int32 len1 = 0, i;
unsigned char * p, * p2, c;
unsigned char upper_min = 0xC0;
unsigned char upper_max = 0xDD;
len1 = VARSIZE(str1) - VARHDRSZ;
if (len1 <= 0)
return str1;
result = (text *) palloc (len1 + 2 + VARHDRSZ);
if (! result)
return str1;
memset (result, 0, len1 + 2 + VARHDRSZ);
p = VARDATA(result);
p2 = VARDATA(str1);
for (i=0; i < len1; i++) {
c = p2[i];
if (isupper(c) || (c >= upper_min && c <= upper_max && c != 0xD0 && c != 0xD7))
p[i] = c + 0x20;
else
p[i] = c;
}
VARSIZE(result) = len1 + VARHDRSZ;
return result;
}
Troy
> "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes:
> > If upper() and lower() operate on characters in 8859-1 and other character
> > sets when the appropriate locale is set, then a difference in the behavior
> > of upper() and lower() would seem like a bug.
>
> Au contraire ... upper() and lower() are not symmetric operations in
> quite a few non-English locales. I'll let those who regularly work with
> them give specific details, but handling of accents, German esstet (sp?),
> etc are the gotchas that I recall.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/users-lounge/docs/faq.html