I have committed part of Patrice's patches with minor fixes.
Uncommitted changes are related to the backend side, and the reason
could be found in the previous discussions (basically this is due to
the fact that current regex code does not support UTF-8 chars >=
0x1). Instead pg_veryfymbstr()
> * Bruce Momjian <[EMAIL PROTECTED]> [011011 22:49]:
> >
> > Can I ask about the status of this?
>
> I have sent a patch a few days ago solving the client-side issue (on
> the pgsql-patches mailing list) for review. I think Tatsuo said it
> looked OK, however he should confirm/infirm this.
OK,
> * Bruce Momjian <[EMAIL PROTECTED]> [011011 22:49]:
> >
> > Can I ask about the status of this?
>
> I have sent a patch a few days ago solving the client-side issue (on
> the pgsql-patches mailing list) for review. I think Tatsuo said it
> looked OK, however he should confirm/infirm this.
I'v
* Bruce Momjian <[EMAIL PROTECTED]> [011011 22:49]:
>
> Can I ask about the status of this?
I have sent a patch a few days ago solving the client-side issue (on
the pgsql-patches mailing list) for review. I think Tatsuo said it
looked OK, however he should confirm/infirm this.
There is still th
Can I ask about the status of this?
> Hi all,
>
> while working on a new project involving PostgreSQL and making some
> tests, I have come up with the following output from psql :
>
> lang | length | length | text| text
> --+++---+---
> isl |
> > Ok. I ran the modified test (now the iteration is reduced to 10 in
> > liketest()). As you can see, there's huge difference. MB seems up to
> > ~8 times slower:-< There seems some problems existing in the
> > implementation. Considering REGEX is not so slow, maybe we should
> > employ the
> I think that we were supposed to go beta a month ago, and so this is
> no time to start adding new features to this release. Let's plan to
> make this happen (one way or the other) in 7.3, instead.
Agreed.
--
Tatsuo Ishii
---(end of broadcast)--
Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> What do you think?
I think that we were supposed to go beta a month ago, and so this is
no time to start adding new features to this release. Let's plan to
make this happen (one way or the other) in 7.3, instead.
regards, tom la
> What's your feeling now about the original question: whether to enable
> multibyte by default now, or not? I'm still thinking that Peter's
> counsel is the wisest: plan to do it in 7.3, not today. But this fix
> seems to eliminate the only hard reason we have not to do it today ...
If SQL99's
Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> To accomplish this, I moved MatchText etc. to a separate file and now
> like.c includes it *twice* (similar technique used in regexec()). This
> makes like.o a little bit larger, but I believe this is worth for the
> optimization.
That sounds great.
Wha
> Ok. I ran the modified test (now the iteration is reduced to 10 in
> liketest()). As you can see, there's huge difference. MB seems up to
> ~8 times slower:-< There seems some problems existing in the
> implementation. Considering REGEX is not so slow, maybe we should
> employ the same desig
> Tom Lane writes:
>
> > In the meantime, we still have the question of whether to enable
> > multibyte in the default configuration.
>
> This would make more sense if all of multibyte, locale, and NLS became
> defaults in one release. I haven't quite sold people in the second item
> yet, altho
Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Tom Lane writes:
>> In the meantime, we still have the question of whether to enable
>> multibyte in the default configuration.
> Perhaps we could make it a release goal for 7.3
Yeah, that's probably the best way to proceed... it's awfully late
in t
Tom Lane writes:
> In the meantime, we still have the question of whether to enable
> multibyte in the default configuration.
This would make more sense if all of multibyte, locale, and NLS became
defaults in one release. I haven't quite sold people in the second item
yet, although I have a des
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Added to TODO:
>
> > * Use wide characters to evaluate regular expressions, for performance
> > (Tatsuo)
>
> Regexes are fine; it's LIKE that's slow.
Oops, thanks. Changed to:
* Use wide characters to evaluate LIKE, for performance (Tatsuo)
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Added to TODO:
> * Use wide characters to evaluate regular expressions, for performance
> (Tatsuo)
Regexes are fine; it's LIKE that's slow.
regards, tom lane
---(end of broadcast)--
> Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> > ... There seems some problems existing in the
> > implementation. Considering REGEX is not so slow, maybe we should
> > employ the same design as REGEX. i.e. using wide charcters, not
> > multibyte streams...
>
> Seems like a good thing to put on the
> Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> > ... There seems some problems existing in the
> > implementation. Considering REGEX is not so slow, maybe we should
> > employ the same design as REGEX. i.e. using wide charcters, not
> > multibyte streams...
>
> Seems like a good thing to put on th
Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> ... There seems some problems existing in the
> implementation. Considering REGEX is not so slow, maybe we should
> employ the same design as REGEX. i.e. using wide charcters, not
> multibyte streams...
Seems like a good thing to put on the to-do list.
> I don't think your search string is sufficient for a test.
> With 'aaa' it actually knows that it only needs to look at the
> first three characters of a. Imho you need to try something
> like liketest(a,'%aaa%').
Ok. I ran the modified test (now the iteration is reduced to 10 in
liketes
> - shell script ---
> for i in 32 64 128 256 512 1024 2048 4096 8192
> do
> psql -c "explain analyze select liketest(a,'aaa') from
> (select substring('very_long_text' from 0 for $i) as a) as a" test
> done
> - shell script ---
I don't th
> Maybe something like this: declare a plpgsql function that takes two
> text parameters and has a body like
>
> for (i = 0 to a million)
> boolvar := $1 like $2;
>
> Then call it with strings of different lengths and see how the runtime
> varies. You need to apply the LIKE
Tatsuo Ishii <[EMAIL PROTECTED]> writes:
>> I'd feel more confident if the measurements were done using operators
>> repeated enough times to yield multiple-second runtimes.
> Any idea to do that?
Maybe something like this: declare a plpgsql function that takes two
text parameters and has a body
> > LIKE with MB seemed to be resonably fast, but REGEX with MB seemed a
> > little bit slow. Probably this is due the wide character conversion
> > overhead.
>
> Could this conversion be optimized to recognize when it's dealing with a
> single-byte character encoding?
Not sure, will look into..
> Yeah, I suspect there's 10% or more noise in these numbers. But then
> one could read the results as saying we can't reliably measure any
> difference at all ...
>
> I'd feel more confident if the measurements were done using operators
> repeated enough times to yield multiple-second runtimes.
Tatsuo Ishii writes:
> LIKE with MB seemed to be resonably fast, but REGEX with MB seemed a
> little bit slow. Probably this is due the wide character conversion
> overhead.
Could this conversion be optimized to recognize when it's dealing with a
single-byte character encoding?
--
Peter Eisent
Bruce Momjian <[EMAIL PROTECTED]> writes:
> But the strange thing is that LIKE is faster, perhaps meaning his
> measurements can't even see the difference,
Yeah, I suspect there's 10% or more noise in these numbers. But then
one could read the results as saying we can't reliably measure any
diff
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > If no one can find a case where multibyte is slower, I think we should
> > enable it by default. Comments?
>
> Well, he just did point out such a case:
>
> >> no MB with MB
> >> LIKE 0.09 msec 0.08 msec
> >> REGEX
Bruce Momjian <[EMAIL PROTECTED]> writes:
> If no one can find a case where multibyte is slower, I think we should
> enable it by default. Comments?
Well, he just did point out such a case:
>> no MB with MB
>> LIKE 0.09 msec 0.08 msec
>> REGEX0.09 ms
If no one can find a case where multibyte is slower, I think we should
enable it by default. Comments?
> > Also, have we decided if multibyte should be the configure default now?
>
> Not sure.
>
> Anyway I have tested LIKE/REGEX query test using current. The query
> executed is:
>
> explain
> Also, have we decided if multibyte should be the configure default now?
Not sure.
Anyway I have tested LIKE/REGEX query test using current. The query
executed is:
explain analyze select '000 5089 474e...( 16475
bytes long text containing only 0-9a-z chars) like 'aaa';
and
explain analyz
> > Can someone give me TODO items for this discussion?
>
> What about:
>
> Improve Unicode combined character handling
Done. I can't update the web version because I don't have permission.
Also, have we decided if multibyte should be the configure default now?
--
Bruce Momjian
> Can someone give me TODO items for this discussion?
What about:
Improve Unicode combined character handling
--
Tatsuo Ishii
> > > So, this shows two problems :
> > >
> > > - length() on the server side doesn't handle correctly Unicode [I have
> > > the same result with char_length()], and
Can someone give me TODO items for this discussion?
> > So, this shows two problems :
> >
> > - length() on the server side doesn't handle correctly Unicode [I have
> > the same result with char_length()], and returns the number of chars
> > (as it is however advertised to do), rather the l
> BTW, I see "CHARACTER SET" in gram.y. Does current already support
> that syntax?
Yes and no. gram.y knows about CHARACTER SET, but only for the long
form, the clause is in the wrong position (it preceeds the length
specification) and it does not do much useful (generates a data type
based on t
> > I would like to see SQL99's charset, collate functionality for 7.3 (or
> > later). If this happens, current multibyte implementation would be
> > dramatically changed...
>
> I'm *still* interested in working on this (an old story I know). I'm
> working on date/time stuff for 7.2, but hopefull
> I would like to see SQL99's charset, collate functionality for 7.3 (or
> later). If this happens, current multibyte implementation would be
> dramatically changed...
I'm *still* interested in working on this (an old story I know). I'm
working on date/time stuff for 7.2, but hopefully 7.3 will s
> > > - length() on the server side doesn't handle correctly Unicode [I
> > > have the same result with char_length()], and returns the number
> > > of chars (as it is however advertised to do), rather the length
> > > of the string.
> >
> > This is a known limitation.
>
> To solve this, w
Hi,
* Tatsuo Ishii <[EMAIL PROTECTED]> [010925 18:18]:
> > So, this shows two problems :
> >
> > - length() on the server side doesn't handle correctly Unicode [I
> > have the same result with char_length()], and returns the number
> > of chars (as it is however advertised to do), rather the
Looks like a good project for 7.3
Probably the best starting point would be to develope contrib/unicode
with smooth transition to core.
Oleg
On Mon, 24 Sep 2001, Patrice [iso-8859-15] Hédé wrote:
> Hi all,
>
> while working on a new project involving PostgreSQL and making some
> tests, I
> So, this shows two problems :
>
> - length() on the server side doesn't handle correctly Unicode [I have
> the same result with char_length()], and returns the number of chars
> (as it is however advertised to do), rather the length of the
> string.
This is a known limitation.
> - the p
Hi all,
while working on a new project involving PostgreSQL and making some
tests, I have come up with the following output from psql :
lang | length | length | text| text
--+++---+---
isl | 7 | 6 | álíta | áleit
isl | 7 | 7 |
42 matches
Mail list logo