I use a few of them, and in my opinion there is a distinct group of
characters at last in the 8859-1 character set which have a lower and
upper case instance. The ranges are 0xC0 to 0xDD for upper case and 0xE0
to 0xFD for upper (with the exception of 0xD0, 0xD7, 0xF0, and 0xF7).
I haven't examined all the relevant docs, so I might be wrong.  The lists
are based on my own observations of the characters in question.

There is probably no harm in sending a few extra bytes, so I am appending
a related function below. If someone finds a flaw with the function,
please tell me; that would be greatly appreciated.

I am also including a list of related characters.  This email is going
out with 8859-1 as the charset, so I hope you are able to view them.




UPPER CASE:
192: � (0xc0)  193: � (0xc1)  194: � (0xc2)  195: � (0xc3)
196: � (0xc4)  197: � (0xc5)  198: � (0xc6)  199: � (0xc7)
200: � (0xc8)  201: � (0xc9)  202: � (0xca)  203: � (0xcb)
204: � (0xcc)  205: � (0xcd)  206: � (0xce)  207: � (0xcf)
               209: � (0xd1)  210: � (0xd2)  211: � (0xd3)
212: � (0xd4)  213: � (0xd5)  214: � (0xd6)
216: � (0xd8)  217: � (0xd9)  218: � (0xda)  219: � (0xdb)
220: � (0xdc)  221: � (0xdd)  

LOWER CASE:
224: � (0xe0)  225: � (0xe1)  226: � (0xe2)  227: � (0xe3)
228: � (0xe4)  229: � (0xe5)  230: � (0xe6)  231: � (0xe7)
232: � (0xe8)  233: � (0xe9)  234: � (0xea)  235: � (0xeb)
236: � (0xec)  237: � (0xed)  238: � (0xee)  239: � (0xef)
               241: � (0xf1)  242: � (0xf2)  243: � (0xf3)
244: � (0xf4)  245: � (0xf5)  246: � (0xf6)
248: � (0xf8)  249: � (0xf9)  250: � (0xfa)  251: � (0xfb)
252: � (0xfc)  253: � (0xfd)

SKIPPED
208: � (0xd0) 
215: � (0xd7)
222: � (0xde)
240: � (0xf0)  
247: � (0xf7)
254: � (0xfe)



CREATE FUNCTION lower8859_1 (text) RETURNS text
   AS '/usr/include/pgsql/lib/str8859_1.so'
   LANGUAGE 'C';




/* No warranty of any kind, use at your own risk. Use freely. 
 */

text * lower8859_1 (text * str1) {
   text * result;
   int32 len1  = 0, i;
   unsigned char * p, * p2, c;
   unsigned char upper_min = 0xC0;
   unsigned char upper_max = 0xDD;

   len1 = VARSIZE(str1) - VARHDRSZ;

   if (len1 <= 0)
      return str1;

   result = (text *) palloc (len1 + 2 + VARHDRSZ);
   if (! result)
      return str1;

   memset (result, 0, len1 + 2 + VARHDRSZ);

   p = VARDATA(result);
   p2 = VARDATA(str1);

   for (i=0; i < len1; i++) {
      c = p2[i];
      if (isupper(c) || (c >= upper_min && c <= upper_max && c != 0xD0 && c != 0xD7))
         p[i] = c + 0x20;
      else
         p[i] = c;
   }

   VARSIZE(result) = len1 + VARHDRSZ;

   return result;
}




Troy








 
> "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes:
> > If upper() and lower() operate on characters in 8859-1 and other character
> > sets when the appropriate locale is set, then a difference in the behavior
> > of upper() and lower() would seem like a bug.
> 
> Au contraire ... upper() and lower() are not symmetric operations in
> quite a few non-English locales.  I'll let those who regularly work with
> them give specific details, but handling of accents, German esstet (sp?),
> etc are the gotchas that I recall.
> 
>                       regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html
> 


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Reply via email to