Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-25 Thread Hannu Krosing
Ühel kenal päeval, R, 2007-03-23 kell 06:10, kirjutas Andrew - Supernews: > On 2007-03-23, ITAGAKI Takahiro <[EMAIL PROTECTED]> wrote: > > Thanks, it all made sense to me. My proposal was completely wrong. > > Actually, I think your proposal is fundamentally correct, merely incomplete. > > Doing

Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-22 Thread Andrew - Supernews
On 2007-03-23, ITAGAKI Takahiro <[EMAIL PROTECTED]> wrote: > Thanks, it all made sense to me. My proposal was completely wrong. Actually, I think your proposal is fundamentally correct, merely incomplete. Doing octet-based rather than character-based matching of strings is a _design goal_ of UTF8

Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-22 Thread Andrew - Supernews
On 2007-03-22, Tom Lane <[EMAIL PROTECTED]> wrote: > ITAGAKI Takahiro <[EMAIL PROTECTED]> writes: >> I found LIKE operators are slower on multi-byte encoding databases >> than single-byte encoding ones. It comes from difference between >> MatchText() and MBMatchText(). > >> We've had an optimizatio

Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-22 Thread Dennis Bjorklund
ITAGAKI Takahiro skrev: I guess it works well for % but not for _ , the latter has to know, how many bytes the current (multibyte) character covers. Yes, % is not used in trailing bytes for all encodings, but _ is used in some of them. I think we can use the optimization for all of the server

Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-22 Thread ITAGAKI Takahiro
Dennis Bjorklund <[EMAIL PROTECTED]> wrote: > The problem with the like pattern _ is that it has to know how long the > single caracter is that it should pass over. Say you have a UTF-8 string > with 2 characters encoded in 3 bytes ('ÖA'). Where the first character > is 2 bytes: > > 0xC3 0x96

Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-22 Thread ITAGAKI Takahiro
Hannu Krosing <[EMAIL PROTECTED]> wrote: > > > We've had an optimization for single-byte encodings using > > > pg_database_encoding_max_length() == 1 test. I'll propose to extend it > > > in UTF-8 with locale-C case. > > > > If this works for UTF8, won't it work for all the backend-legal > > enc

Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-22 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-03-22 kell 11:08, kirjutas Tom Lane: > ITAGAKI Takahiro <[EMAIL PROTECTED]> writes: > > I found LIKE operators are slower on multi-byte encoding databases > > than single-byte encoding ones. It comes from difference between > > MatchText() and MBMatchText(). > > > We've

Re: [HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-22 Thread Tom Lane
ITAGAKI Takahiro <[EMAIL PROTECTED]> writes: > I found LIKE operators are slower on multi-byte encoding databases > than single-byte encoding ones. It comes from difference between > MatchText() and MBMatchText(). > We've had an optimization for single-byte encodings using > pg_database_encoding_