On Jun 15 09:33, Tom Lane wrote: > Volkan YAZICI <[EMAIL PROTECTED]> writes: > > Also, if you'd wish, I can prepare an ad-hoc regression tests patch > > for LATIN5 and UTF-8 support of Turkish characters. > > We know it's broken. What's needed is a patch.
I couldn't understand why you're so aggressive. I'm just trying to help. And, IMNSHO, posted test results are quite helpful to determine the exact problem. As I understand from the tests, ILIKE and ~* don't work properly while using UTF-8, despite lower() and upper() work without any problem. Therefore, I've tried to imitate the code of lower() to form a working iwchareq() method. [Related patch is attached.] It succeded in all of my previous tests (and plus in regression tests). As you can see, it's a quite ad-hoc patch. (No win32 support added yet.) Also, it needs a HAVE_MBTOWC definition too. I just wanted to give it a V0 speed. I think, like.c and oracle_compat.c files should be written from scratch by somebody with more experience. They look like deprecated in some aspects. (For instance, like.c is still using CHARMAX despite Bruce generalized it as HIGHBIT in c.h) Regards.
Index: src/backend/utils/adt/like.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/like.c,v retrieving revision 1.64 diff -c -r1.64 like.c *** src/backend/utils/adt/like.c 5 Mar 2006 15:58:42 -0000 1.64 --- src/backend/utils/adt/like.c 15 Jun 2006 15:36:19 -0000 *************** *** 19,25 **** --- 19,33 ---- #include <ctype.h> + #ifdef HAVE_WCHAR_H + #include <wchar.h> + #endif + #ifdef HAVE_WCTYPE_H + #include <wctype.h> + #endif + #include "mb/pg_wchar.h" + #include "utils/pg_locale.h" #include "utils/builtins.h" *************** *** 70,109 **** * If they match, returns 1 otherwise returns 0. *-------------------- */ - #define CHARMAX 0x80 static int iwchareq(char *p1, char *p2) { ! pg_wchar c1[2], ! c2[2]; ! int l; ! /* ! * short cut. if *p1 and *p2 is lower than CHARMAX, then we could assume ! * they are ASCII */ ! if ((unsigned char) *p1 < CHARMAX && (unsigned char) *p2 < CHARMAX) ! return (tolower((unsigned char) *p1) == tolower((unsigned char) *p2)); ! /* ! * if one of them is an ASCII while the other is not, then they must be ! * different characters ! */ ! else if ((unsigned char) *p1 < CHARMAX || (unsigned char) *p2 < CHARMAX) ! return 0; ! /* ! * ok, p1 and p2 are both > CHARMAX, then they must be multibyte ! * characters ! */ ! l = pg_mblen(p1); ! (void) pg_mb2wchar_with_len(p1, c1, l); ! c1[0] = tolower(c1[0]); ! l = pg_mblen(p2); ! (void) pg_mb2wchar_with_len(p2, c2, l); ! c2[0] = tolower(c2[0]); ! return (c1[0] == c2[0]); } #define CHAREQ(p1, p2) wchareq(p1, p2) --- 78,119 ---- * If they match, returns 1 otherwise returns 0. *-------------------- */ static int iwchareq(char *p1, char *p2) { ! #if defined(HAVE_WCSTOMBS) && defined(HAVE_TOWLOWER) /* ! * While lowercasing, applying same rules as in lower(). */ ! if (pg_database_encoding_max_length() > 1 && !lc_ctype_is_c()) ! { ! wchar_t c1, c2; ! int l1 = mbtowc(&c1, p1, MB_CUR_MAX); ! int l2 = mbtowc(&c2, p2, MB_CUR_MAX); ! if (l1 == (size_t) -1 || l2 == (size_t) -1) ! { ! /* ! * Invalid multibyte character encountered. (Shouldn't happen.) ! */ ! ereport(ERROR, ! (errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE), ! errmsg("invalid multibyte character for locale"))); ! } ! Assert(l1 <= (size_t) MB_CUR_MAX && l2 <= (size_t) MB_CUR_MAX); ! ! return (towlower(c1) == towlower(c2)); ! } ! else ! #endif ! { ! char c1 = tolower((unsigned char) *p1); ! char c2 = tolower((unsigned char) *p2); ! ! return (c1 == c2); ! } } #define CHAREQ(p1, p2) wchareq(p1, p2)
---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly