Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale

Volkan YAZICI Thu, 15 Jun 2006 08:56:22 -0700

On Jun 15 09:33, Tom Lane wrote:
> Volkan YAZICI <[EMAIL PROTECTED]> writes:
> > Also, if you'd wish, I can prepare an ad-hoc regression tests patch
> > for LATIN5 and UTF-8 support of Turkish characters.
> 
> We know it's broken.  What's needed is a patch.


I couldn't understand why you're so aggressive. I'm just trying to help.
And, IMNSHO, posted test results are quite helpful to determine the
exact problem.

As I understand from the tests, ILIKE and ~* don't work properly while
using UTF-8, despite lower() and upper() work without any problem.
Therefore, I've tried to imitate the code of lower() to form a working
iwchareq() method. [Related patch is attached.] It succeded in all of my
previous tests (and plus in regression tests).

As you can see, it's a quite ad-hoc patch. (No win32 support added yet.)
Also, it needs a HAVE_MBTOWC definition too. I just wanted to give it a
V0 speed.

I think, like.c and oracle_compat.c files should be written from
scratch by somebody with more experience. They look like deprecated in
some aspects. (For instance, like.c is still using CHARMAX despite Bruce
generalized it as HIGHBIT in c.h)


Regards.

Index: src/backend/utils/adt/like.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/like.c,v
retrieving revision 1.64
diff -c -r1.64 like.c
*** src/backend/utils/adt/like.c        5 Mar 2006 15:58:42 -0000       1.64
--- src/backend/utils/adt/like.c        15 Jun 2006 15:36:19 -0000
***************
*** 19,25 ****
--- 19,33 ----
  
  #include <ctype.h>
  
+ #ifdef HAVE_WCHAR_H
+ #include <wchar.h>
+ #endif
+ #ifdef HAVE_WCTYPE_H
+ #include <wctype.h>
+ #endif
+ 
  #include "mb/pg_wchar.h"
+ #include "utils/pg_locale.h"
  #include "utils/builtins.h"
  
  
***************
*** 70,109 ****
   * If they match, returns 1 otherwise returns 0.
   *--------------------
   */
- #define CHARMAX 0x80
  
  static int
  iwchareq(char *p1, char *p2)
  {
!       pg_wchar        c1[2],
!                               c2[2];
!       int                     l;
! 
        /*
!        * short cut. if *p1 and *p2 is lower than CHARMAX, then we could assume
!        * they are ASCII
         */
!       if ((unsigned char) *p1 < CHARMAX && (unsigned char) *p2 < CHARMAX)
!               return (tolower((unsigned char) *p1) == tolower((unsigned char) 
*p2));
  
!       /*
!        * if one of them is an ASCII while the other is not, then they must be
!        * different characters
!        */
!       else if ((unsigned char) *p1 < CHARMAX || (unsigned char) *p2 < CHARMAX)
!               return 0;
  
!       /*
!        * ok, p1 and p2 are both > CHARMAX, then they must be multibyte
!        * characters
!        */
!       l = pg_mblen(p1);
!       (void) pg_mb2wchar_with_len(p1, c1, l);
!       c1[0] = tolower(c1[0]);
!       l = pg_mblen(p2);
!       (void) pg_mb2wchar_with_len(p2, c2, l);
!       c2[0] = tolower(c2[0]);
!       return (c1[0] == c2[0]);
  }
  
  #define CHAREQ(p1, p2) wchareq(p1, p2)
--- 78,119 ----
   * If they match, returns 1 otherwise returns 0.
   *--------------------
   */
  
  static int
  iwchareq(char *p1, char *p2)
  {
! #if defined(HAVE_WCSTOMBS) && defined(HAVE_TOWLOWER)
        /*
!        * While lowercasing, applying same rules as in lower().
         */
!       if (pg_database_encoding_max_length() > 1 && !lc_ctype_is_c())
!       {
!               wchar_t c1, c2;
!               int             l1 = mbtowc(&c1, p1, MB_CUR_MAX);
!               int             l2 = mbtowc(&c2, p2, MB_CUR_MAX);
  
!               if (l1 == (size_t) -1 || l2 == (size_t) -1)
!               {
!                       /*
!                        * Invalid multibyte character encountered. (Shouldn't 
happen.)
!                        */
!                       ereport(ERROR,
!                                       
(errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE),
!                                        errmsg("invalid multibyte character 
for locale")));
!               }
  
!               Assert(l1 <= (size_t) MB_CUR_MAX && l2 <= (size_t) MB_CUR_MAX);
! 
!               return (towlower(c1) == towlower(c2));
!       }
!       else
! #endif
!       {
!               char    c1 = tolower((unsigned char) *p1);
!               char    c2 = tolower((unsigned char) *p2);
! 
!               return (c1 == c2);
!       }
  }
  
  #define CHAREQ(p1, p2) wchareq(p1, p2)

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [EMAIL PROTECTED] so that your
       message can get through to the mailing list cleanly

Re: [BUGS] BUG #1931: ILIKE and LIKE fails on Turkish locale

Reply via email to