On Thu, Apr 2, 2020 at 3:51 PM Peter Eisentraut
wrote:
>
> On 2020-03-26 18:41, John Naylor wrote:
> > We don't have a trie implementation in Postgres, but we do have a
> > perfect hash implementation. Doing that would bring the tables back to
> > 64 bits per entry, but would likely be noticeably
On 2020-03-26 18:41, John Naylor wrote:
We don't have a trie implementation in Postgres, but we do have a
perfect hash implementation. Doing that would bring the tables back to
64 bits per entry, but would likely be noticeably faster than binary
search. Since v4 has left out the biggest tables en
On 2020-03-26 08:25, Peter Eisentraut wrote:
On 2020-03-24 10:20, Peter Eisentraut wrote:
Now I have some concerns about the size of the new table in
unicode_normprops_table.h, and the resulting binary size. At the very
least, we should probably make that #ifndef FRONTEND or something like
that
I wrote:
>
> Regression tests pass, but I haven't measured performance yet.
Using a test similar to one upthread:
select count(*) from (select md5(i::text) as t from
generate_series(1,10) as i) s where t is nfc normalized ;
I get (median of three)
v4 419ms
v5 310ms
with binary size
v4 HE
On 2020-03-23 17:26, Daniel Verite wrote:
Peter Eisentraut wrote:
What is that status of this patch set? I think we have nailed down the
behavior, but there were some concerns about certain performance
characteristics. Do people feel that those are required to be addressed
in this cyc
Peter Eisentraut wrote:
> What is that status of this patch set? I think we have nailed down the
> behavior, but there were some concerns about certain performance
> characteristics. Do people feel that those are required to be addressed
> in this cycle?
Not finding any other issue w
On 3/19/20 3:41 PM, Peter Eisentraut wrote:
What is that status of this patch set? I think we have nailed down the
behavior, but there were some concerns about certain performance
characteristics. Do people feel that those are required to be addressed
in this cycle?
Personally I would rathe
What is that status of this patch set? I think we have nailed down the
behavior, but there were some concerns about certain performance
characteristics. Do people feel that those are required to be addressed
in this cycle?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreS
On 2020-02-17 20:08, Daniel Verite wrote:
Concerning execution speed, there's an excessive CPU usage when
normalizing into NFC or NFKC. Looking at pre-existing code, it looks
like recompose_code() in unicode_norm.c looping over the
UnicodeDecompMain array might be very costly.
Yes, this is a kn
On 2020-02-17 20:14, Daniel Verite wrote:
The comment in full says:
/*
* unicode_normalize - Normalize a Unicode string to the specified form.
*
* The
One nitpick:
Around this hunk:
- * unicode_normalize_kc - Normalize a Unicode string to NFKC form.
+ * unicode_normalize - Normalize a Unicode string to the specified form.
*
* The input is a 0-terminated array of codepoints.
*
@@ -304,8 +306,10 @@ decompose_code(pg_wchar code, pg_wchar **
Hi,
I've checked the v3 patch against the results of the normalization
done by ICU [1] on my test data again, and they're identical
(as they were with v1; v2 had the bug discussed upthread, now fixed).
Concerning execution speed, there's an excessive CPU usage when
normalizing into NFC or NFKC.
On 2020-02-13 01:23, Andreas Karlsson wrote:
A potential optimization would be to merge utf8_to_unicode() and
pg_utf_mblen() into one function in unicode_normalize_func() since
utf8_to_unicode() already knows length of the character. Probably not
worth it though.
This would also require untangl
On Thu, Feb 13, 2020 at 01:23:41AM +0100, Andreas Karlsson wrote:
> On 1/28/20 9:21 PM, Peter Eisentraut wrote:
>> You're right, this didn't make any sense. Here is a new patch set with
>> that fixed.
>
> Thanks for this patch. This is a feature which has been on my personal todo
> list for a whi
On 1/28/20 9:21 PM, Peter Eisentraut wrote:
You're right, this didn't make any sense. Here is a new patch set with
that fixed.
Thanks for this patch. This is a feature which has been on my personal
todo list for a while and something which I have wished to have a couple
of times.
I took a
Peter Eisentraut wrote:
> Here is an updated patch set that now also implements the "quick check"
> algorithm from UTR #15 for making IS NORMALIZED very fast in many cases,
> which I had mentioned earlier in the thread.
I found a bug in unicode_is_normalized_quickcheck() which is
trigge
On 2020-01-06 17:00, Daniel Verite wrote:
Peter Eisentraut wrote:
Also, there is a way to optimize the "is normalized" test for common
cases, described in UTR #15. For that we'll need an additional data
file from Unicode. In order to simplify that, I would like my patch
"Add support f
Peter Eisentraut wrote:
> Also, there is a way to optimize the "is normalized" test for common
> cases, described in UTR #15. For that we'll need an additional data
> file from Unicode. In order to simplify that, I would like my patch
> "Add support for automatically updating Unicode
18 matches
Mail list logo