subject:"\[HACKERS\] Unicode combining characters"

Re: [PATCHES] [HACKERS] Unicode combining characters

2001-10-14 Thread Tatsuo Ishii

I have committed part of Patrice's patches with minor fixes. Uncommitted changes are related to the backend side, and the reason could be found in the previous discussions (basically this is due to the fact that current regex code does not support UTF-8 chars >= 0x1). Instead pg_veryfymbstr()

Re: [HACKERS] Unicode combining characters

2001-10-12 Thread Bruce Momjian

> * Bruce Momjian <[EMAIL PROTECTED]> [011011 22:49]: > > > > Can I ask about the status of this? > > I have sent a patch a few days ago solving the client-side issue (on > the pgsql-patches mailing list) for review. I think Tatsuo said it > looked OK, however he should confirm/infirm this. OK,

Re: [HACKERS] Unicode combining characters

2001-10-12 Thread Tatsuo Ishii

> * Bruce Momjian <[EMAIL PROTECTED]> [011011 22:49]: > > > > Can I ask about the status of this? > > I have sent a patch a few days ago solving the client-side issue (on > the pgsql-patches mailing list) for review. I think Tatsuo said it > looked OK, however he should confirm/infirm this. I'v

Re: [HACKERS] Unicode combining characters

2001-10-11 Thread Patrice Hédé

* Bruce Momjian <[EMAIL PROTECTED]> [011011 22:49]: > > Can I ask about the status of this? I have sent a patch a few days ago solving the client-side issue (on the pgsql-patches mailing list) for review. I think Tatsuo said it looked OK, however he should confirm/infirm this. There is still th

Re: [HACKERS] Unicode combining characters

2001-10-11 Thread Bruce Momjian

Can I ask about the status of this? > Hi all, > > while working on a new project involving PostgreSQL and making some > tests, I have come up with the following output from psql : > > lang | length | length | text| text > --+++---+--- > isl |

Re: [HACKERS] Unicode combining characters

2001-10-03 Thread Bruce Momjian

> > Ok. I ran the modified test (now the iteration is reduced to 10 in > > liketest()). As you can see, there's huge difference. MB seems up to > > ~8 times slower:-< There seems some problems existing in the > > implementation. Considering REGEX is not so slow, maybe we should > > employ the

Re: [HACKERS] Unicode combining characters

2001-10-03 Thread Tatsuo Ishii

> I think that we were supposed to go beta a month ago, and so this is > no time to start adding new features to this release. Let's plan to > make this happen (one way or the other) in 7.3, instead. Agreed. -- Tatsuo Ishii ---(end of broadcast)--

Re: [HACKERS] Unicode combining characters

2001-10-03 Thread Tom Lane

Tatsuo Ishii <[EMAIL PROTECTED]> writes: > What do you think? I think that we were supposed to go beta a month ago, and so this is no time to start adding new features to this release. Let's plan to make this happen (one way or the other) in 7.3, instead. regards, tom la

Re: [HACKERS] Unicode combining characters

2001-10-03 Thread Tatsuo Ishii

> What's your feeling now about the original question: whether to enable > multibyte by default now, or not? I'm still thinking that Peter's > counsel is the wisest: plan to do it in 7.3, not today. But this fix > seems to eliminate the only hard reason we have not to do it today ... If SQL99's

Re: [HACKERS] Unicode combining characters

2001-10-03 Thread Tom Lane

Tatsuo Ishii <[EMAIL PROTECTED]> writes: > To accomplish this, I moved MatchText etc. to a separate file and now > like.c includes it *twice* (similar technique used in regexec()). This > makes like.o a little bit larger, but I believe this is worth for the > optimization. That sounds great. Wha

Re: [HACKERS] Unicode combining characters

2001-10-03 Thread Tatsuo Ishii

> Ok. I ran the modified test (now the iteration is reduced to 10 in > liketest()). As you can see, there's huge difference. MB seems up to > ~8 times slower:-< There seems some problems existing in the > implementation. Considering REGEX is not so slow, maybe we should > employ the same desig

Re: [HACKERS] Unicode combining characters

2001-10-03 Thread Bruce Momjian

> Tom Lane writes: > > > In the meantime, we still have the question of whether to enable > > multibyte in the default configuration. > > This would make more sense if all of multibyte, locale, and NLS became > defaults in one release. I haven't quite sold people in the second item > yet, altho

Re: [HACKERS] Unicode combining characters

2001-10-03 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes: > Tom Lane writes: >> In the meantime, we still have the question of whether to enable >> multibyte in the default configuration. > Perhaps we could make it a release goal for 7.3 Yeah, that's probably the best way to proceed... it's awfully late in t

Re: [HACKERS] Unicode combining characters

2001-10-03 Thread Peter Eisentraut

Tom Lane writes: > In the meantime, we still have the question of whether to enable > multibyte in the default configuration. This would make more sense if all of multibyte, locale, and NLS became defaults in one release. I haven't quite sold people in the second item yet, although I have a des

Re: [HACKERS] Unicode combining characters

2001-10-03 Thread Bruce Momjian

> Bruce Momjian <[EMAIL PROTECTED]> writes: > > Added to TODO: > > > * Use wide characters to evaluate regular expressions, for performance > > (Tatsuo) > > Regexes are fine; it's LIKE that's slow. Oops, thanks. Changed to: * Use wide characters to evaluate LIKE, for performance (Tatsuo)

Re: [HACKERS] Unicode combining characters

2001-10-03 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes: > Added to TODO: > * Use wide characters to evaluate regular expressions, for performance > (Tatsuo) Regexes are fine; it's LIKE that's slow. regards, tom lane ---(end of broadcast)--

Re: [HACKERS] Unicode combining characters

2001-10-03 Thread Bruce Momjian

> Tatsuo Ishii <[EMAIL PROTECTED]> writes: > > ... There seems some problems existing in the > > implementation. Considering REGEX is not so slow, maybe we should > > employ the same design as REGEX. i.e. using wide charcters, not > > multibyte streams... > > Seems like a good thing to put on the

Re: [HACKERS] Unicode combining characters

2001-10-03 Thread Zeugswetter Andreas SB SD

> Tatsuo Ishii <[EMAIL PROTECTED]> writes: > > ... There seems some problems existing in the > > implementation. Considering REGEX is not so slow, maybe we should > > employ the same design as REGEX. i.e. using wide charcters, not > > multibyte streams... > > Seems like a good thing to put on th

Re: [HACKERS] Unicode combining characters

2001-10-03 Thread Tom Lane

Tatsuo Ishii <[EMAIL PROTECTED]> writes: > ... There seems some problems existing in the > implementation. Considering REGEX is not so slow, maybe we should > employ the same design as REGEX. i.e. using wide charcters, not > multibyte streams... Seems like a good thing to put on the to-do list.

Re: [HACKERS] Unicode combining characters

2001-10-03 Thread Tatsuo Ishii

> I don't think your search string is sufficient for a test. > With 'aaa' it actually knows that it only needs to look at the > first three characters of a. Imho you need to try something > like liketest(a,'%aaa%'). Ok. I ran the modified test (now the iteration is reduced to 10 in liketes

Re: [HACKERS] Unicode combining characters

2001-10-03 Thread Zeugswetter Andreas SB SD

> - shell script --- > for i in 32 64 128 256 512 1024 2048 4096 8192 > do > psql -c "explain analyze select liketest(a,'aaa') from > (select substring('very_long_text' from 0 for $i) as a) as a" test > done > - shell script --- I don't th

Re: [HACKERS] Unicode combining characters

2001-10-03 Thread Tatsuo Ishii

> Maybe something like this: declare a plpgsql function that takes two > text parameters and has a body like > > for (i = 0 to a million) > boolvar := $1 like $2; > > Then call it with strings of different lengths and see how the runtime > varies. You need to apply the LIKE

Re: [HACKERS] Unicode combining characters

2001-10-02 Thread Tom Lane

Tatsuo Ishii <[EMAIL PROTECTED]> writes: >> I'd feel more confident if the measurements were done using operators >> repeated enough times to yield multiple-second runtimes. > Any idea to do that? Maybe something like this: declare a plpgsql function that takes two text parameters and has a body

Re: [HACKERS] Unicode combining characters

2001-10-02 Thread Tatsuo Ishii

> > LIKE with MB seemed to be resonably fast, but REGEX with MB seemed a > > little bit slow. Probably this is due the wide character conversion > > overhead. > > Could this conversion be optimized to recognize when it's dealing with a > single-byte character encoding? Not sure, will look into..

Re: [HACKERS] Unicode combining characters

2001-10-02 Thread Tatsuo Ishii

> Yeah, I suspect there's 10% or more noise in these numbers. But then > one could read the results as saying we can't reliably measure any > difference at all ... > > I'd feel more confident if the measurements were done using operators > repeated enough times to yield multiple-second runtimes.

Re: [HACKERS] Unicode combining characters

2001-10-02 Thread Peter Eisentraut

Tatsuo Ishii writes: > LIKE with MB seemed to be resonably fast, but REGEX with MB seemed a > little bit slow. Probably this is due the wide character conversion > overhead. Could this conversion be optimized to recognize when it's dealing with a single-byte character encoding? -- Peter Eisent

Re: [HACKERS] Unicode combining characters

2001-10-02 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes: > But the strange thing is that LIKE is faster, perhaps meaning his > measurements can't even see the difference, Yeah, I suspect there's 10% or more noise in these numbers. But then one could read the results as saying we can't reliably measure any diff

Re: [HACKERS] Unicode combining characters

2001-10-02 Thread Bruce Momjian

> Bruce Momjian <[EMAIL PROTECTED]> writes: > > If no one can find a case where multibyte is slower, I think we should > > enable it by default. Comments? > > Well, he just did point out such a case: > > >> no MB with MB > >> LIKE 0.09 msec 0.08 msec > >> REGEX

Re: [HACKERS] Unicode combining characters

2001-10-02 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes: > If no one can find a case where multibyte is slower, I think we should > enable it by default. Comments? Well, he just did point out such a case: >> no MB with MB >> LIKE 0.09 msec 0.08 msec >> REGEX0.09 ms

Re: [HACKERS] Unicode combining characters

2001-10-02 Thread Bruce Momjian

If no one can find a case where multibyte is slower, I think we should enable it by default. Comments? > > Also, have we decided if multibyte should be the configure default now? > > Not sure. > > Anyway I have tested LIKE/REGEX query test using current. The query > executed is: > > explain

Re: [HACKERS] Unicode combining characters

2001-10-01 Thread Tatsuo Ishii

> Also, have we decided if multibyte should be the configure default now? Not sure. Anyway I have tested LIKE/REGEX query test using current. The query executed is: explain analyze select '000 5089 474e...( 16475 bytes long text containing only 0-9a-z chars) like 'aaa'; and explain analyz

Re: [HACKERS] Unicode combining characters

2001-10-01 Thread Bruce Momjian

> > Can someone give me TODO items for this discussion? > > What about: > > Improve Unicode combined character handling Done. I can't update the web version because I don't have permission. Also, have we decided if multibyte should be the configure default now? -- Bruce Momjian

Re: [HACKERS] Unicode combining characters

2001-10-01 Thread Tatsuo Ishii

> Can someone give me TODO items for this discussion? What about: Improve Unicode combined character handling -- Tatsuo Ishii > > > So, this shows two problems : > > > > > > - length() on the server side doesn't handle correctly Unicode [I have > > > the same result with char_length()], and

Re: [HACKERS] Unicode combining characters

2001-10-01 Thread Bruce Momjian

Can someone give me TODO items for this discussion? > > So, this shows two problems : > > > > - length() on the server side doesn't handle correctly Unicode [I have > > the same result with char_length()], and returns the number of chars > > (as it is however advertised to do), rather the l

Re: [HACKERS] Unicode combining characters

2001-09-26 Thread Thomas Lockhart

> BTW, I see "CHARACTER SET" in gram.y. Does current already support > that syntax? Yes and no. gram.y knows about CHARACTER SET, but only for the long form, the clause is in the wrong position (it preceeds the length specification) and it does not do much useful (generates a data type based on t

Re: [HACKERS] Unicode combining characters

2001-09-26 Thread Tatsuo Ishii

> > I would like to see SQL99's charset, collate functionality for 7.3 (or > > later). If this happens, current multibyte implementation would be > > dramatically changed... > > I'm *still* interested in working on this (an old story I know). I'm > working on date/time stuff for 7.2, but hopefull

Re: [HACKERS] Unicode combining characters

2001-09-25 Thread Thomas Lockhart

> I would like to see SQL99's charset, collate functionality for 7.3 (or > later). If this happens, current multibyte implementation would be > dramatically changed... I'm *still* interested in working on this (an old story I know). I'm working on date/time stuff for 7.2, but hopefully 7.3 will s

Re: [HACKERS] Unicode combining characters

2001-09-25 Thread Tatsuo Ishii

> > > - length() on the server side doesn't handle correctly Unicode [I > > > have the same result with char_length()], and returns the number > > > of chars (as it is however advertised to do), rather the length > > > of the string. > > > > This is a known limitation. > > To solve this, w

Re: [HACKERS] Unicode combining characters

2001-09-25 Thread Patrice Hédé

Hi, * Tatsuo Ishii <[EMAIL PROTECTED]> [010925 18:18]: > > So, this shows two problems : > > > > - length() on the server side doesn't handle correctly Unicode [I > > have the same result with char_length()], and returns the number > > of chars (as it is however advertised to do), rather the

Re: [HACKERS] Unicode combining characters

2001-09-25 Thread Oleg Bartunov

Looks like a good project for 7.3 Probably the best starting point would be to develope contrib/unicode with smooth transition to core. Oleg On Mon, 24 Sep 2001, Patrice [iso-8859-15] Hédé wrote: > Hi all, > > while working on a new project involving PostgreSQL and making some > tests, I

Re: [HACKERS] Unicode combining characters

2001-09-24 Thread Tatsuo Ishii

> So, this shows two problems : > > - length() on the server side doesn't handle correctly Unicode [I have > the same result with char_length()], and returns the number of chars > (as it is however advertised to do), rather the length of the > string. This is a known limitation. > - the p

[HACKERS] Unicode combining characters

2001-09-24 Thread Patrice Hédé

Hi all, while working on a new project involving PostgreSQL and making some tests, I have come up with the following output from psql : lang | length | length | text| text --+++---+--- isl | 7 | 6 | álíta | áleit isl | 7 | 7 |

42 matches

Mail list logo