subject:"Re\: \[HACKERS\] A thought about regex versus multibyte character sets"

Re: [HACKERS] A thought about regex versus multibyte character sets

2009-12-01 Thread Tom Lane

Alvaro Herrera writes: > Tom Lane wrote: >> I just spent a bit of time considering what we might do to fix this. >> The idea mentioned in the above thread was to switch over to using >> wchar_t in the regex code, but that seems to have a number of problems. >> One showstopper is that on some platf

Re: [HACKERS] A thought about regex versus multibyte character sets

2009-12-01 Thread Alvaro Herrera

Tom Lane wrote: > I just spent a bit of time considering what we might do to fix this. > The idea mentioned in the above thread was to switch over to using > wchar_t in the regex code, but that seems to have a number of problems. > One showstopper is that on some platforms wchar_t is only 16 bits

Re: [HACKERS] A thought about regex versus multibyte character sets

2009-11-30 Thread Tom Lane

I wrote: > I therefore propose the following idea: if the database encoding is > UTF8, allow the regc_locale.c functions to call the > functions, assuming that wchar_t and pg_wchar_t share the same > representation. On platforms where wchar_t is only 16 bits, we can do > this up to U+ and be