Re: Optimization for lower(), upper(), casefold() functions.

2025-03-18 Thread Tom Lane
Jeff Davis writes: > On Tue, 2025-03-18 at 11:11 -0400, Tom Lane wrote: >> Also, probably better to make it const: >> >> -static const pg_wchar *casekind_map[NCaseKind] = >> +static const pg_wchar * const casekind_map[NCaseKind] = > Was this a general suggestion, or did you see something in part

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-18 Thread Jeff Davis
On Tue, 2025-03-18 at 11:11 -0400, Tom Lane wrote: > It's not apparent to me why that table needs to be in a header > file and not in the sole user .c file? Thank you, fixed. > Also, probably better to make it const: > > -static const pg_wchar *casekind_map[NCaseKind] = > +static const pg_wchar

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-18 Thread Tom Lane
One more thing: I observe that headerscheck is now unhappy: $ src/tools/pginclude/headerscheck In file included from /tmp/headerscheck.yOpahZ/test.c:2: ./src/include/common/unicode_case_table.h:8598:24: warning: 'casekind_map' defined but not used [-Wunused-variable] static const pg_wchar *case

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-15 Thread Alexander Borisov
15.03.2025 23:07, Jeff Davis wrote: On Fri, 2025-03-14 at 15:00 +0300, Alexander Borisov wrote: I tried adding a loop to create tables, and everything looks fine (v7). [...] I prefer to generalize when we have the other code in place. As it was, it was a bit confusing why the extra arguments

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-15 Thread Tom Lane
Jeff Davis writes: > On Sat, Mar 15, 2025 at 1:11 PM Tom Lane wrote: >> crake doesn't like your perl style: >> ./src/common/unicode/generate-unicode_case_table.pl: Loop iterator is not >> lexical at line 638, column 2. See page 108 of PBP. > I suppose pgperltidy didn't catch that. I will fix it

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-15 Thread Jeff Davis
On Sat, Mar 15, 2025 at 1:11 PM Tom Lane wrote: > Jeff Davis writes: > > Committed. Thank you! > > crake doesn't like your perl style: > > ./src/common/unicode/generate-unicode_case_table.pl: Loop iterator is not > lexical at line 638, column 2. See page 108 of PBP. I suppose pgperltidy didn'

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-15 Thread Tom Lane
Jeff Davis writes: > Committed. Thank you! crake doesn't like your perl style: ./src/common/unicode/generate-unicode_case_table.pl: Loop iterator is not lexical at line 638, column 2. See page 108 of PBP. ([Variables::RequireLexicalLoopIterators] Severity: 5) regard

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-15 Thread Jeff Davis
On Fri, 2025-03-14 at 15:00 +0300, Alexander Borisov wrote: > I tried adding a loop to create tables, and everything looks fine > (v7). > Also removed unnecessary (hanging) global variables. Changed. I used a loop more similar to your first one (hash of arrays), and I left case_map_special outside

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-15 Thread Heikki Linnakangas
On 14/03/2025 05:43, Jeff Davis wrote: On Wed, 2025-03-12 at 23:39 +0300, Alexander Borisov wrote: v5 attached. Attached v6j. * marked arrays as "static const" rather than just "static" * ran pgindent * changed data types where appropriate (uint32->pg_wchar) * modified perl code so that it pr

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-14 Thread Jeff Davis
On Fri, 2025-03-14 at 13:16 +0200, Heikki Linnakangas wrote: > Attached are fixes for those and some other minor things. Thank you, I agree and I have applied your changes. Regards, Jeff Davis

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-12 Thread Jeff Davis
On Wed, 2025-03-12 at 19:55 +0300, Alexander Borisov wrote: > 1. Added static for casemap() function. Otherwise the compiler could > not > optimize the code and the performance dropped significantly. Oops, it was static, but I made it external just to see what code it generated. I didn't intend to

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-12 Thread Alexander Borisov
12.03.2025 19:55, Alexander Borisov wrote: [...] A couple questions: * Is there a reason the fast-path for codepoints < 0x80 is in unicode_case.c rather than unicode_case_func.h? Yes, this is an important optimization, below are benchmarks that [...] I forgot to add the benchmark: Benchm

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-02 Thread Alexander Borisov
19.02.2025 01:56, Jeff Davis пишет: On Wed, 2025-02-19 at 01:54 +0300, Alexander Borisov wrote: In proposing the patch for v3, I struck a balance between improving performance and reducing binary size, without sacrificing code clarity. Fair enough. I will continue reviewing v3. Did you have

Re: Optimization for lower(), upper(), casefold() functions.

2025-02-18 Thread Jeff Davis
On Wed, 2025-02-19 at 01:54 +0300, Alexander Borisov wrote: > In proposing the patch for v3, I struck a balance between improving > performance and reducing binary size, without sacrificing code > clarity. Fair enough. I will continue reviewing v3. Regards, Jeff Davis

Re: Optimization for lower(), upper(), casefold() functions.

2025-02-18 Thread Alexander Borisov
19.02.2025 01:02, Jeff Davis пишет: On Tue, 2025-02-11 at 23:08 +0300, Alexander Borisov wrote: I tried the approach via a range table. The result was worse than without the table. With branching in a function, the result is better. Patch v3 — ranges binary search by branches. Patch v4 — ranges

Re: Optimization for lower(), upper(), casefold() functions.

2025-02-18 Thread Jeff Davis
On Tue, 2025-02-11 at 23:08 +0300, Alexander Borisov wrote: > I tried the approach via a range table. The result was worse than > without the table. With branching in a function, the result is > better. > > Patch v3 — ranges binary search by branches. > Patch v4 — ranges binary search by table. T

Re: Optimization for lower(), upper(), casefold() functions.

2025-02-12 Thread Jeff Davis
On Tue, 2025-02-11 at 23:08 +0300, Alexander Borisov wrote: > What's the result? > > I would use Range Binary in Unicode case/normalization. The algorithm > shows good results. Plus it can be customized (increasing/decreasing) > the table by allowing empty values. > > Also, I got a strong feeling

Re: Optimization for lower(), upper(), casefold() functions.

2025-02-06 Thread Alexander Borisov
06.02.2025 22:08, Jeff Davis пишет: On Thu, 2025-02-06 at 18:39 +0300, Alexander Borisov wrote: Since I started to improve Unicode Case, I used the same approach, essentially a binary search, only not by individual values, but by ranges. I considered it a 4th approach because of the generated

Re: Optimization for lower(), upper(), casefold() functions.

2025-02-06 Thread Jeff Davis
On Thu, 2025-02-06 at 18:39 +0300, Alexander Borisov wrote: > Since I started to improve Unicode Case, I used the same approach, > essentially a binary search, only not by individual values, but by > ranges. I considered it a 4th approach because of the generated branches in case_index(). Case_ind

Re: Optimization for lower(), upper(), casefold() functions.

2025-02-06 Thread Alexander Borisov
Hi Jeff, 06.02.2025 00:46, Jeff Davis пишет: On Tue, 2025-02-04 at 23:19 +0300, Alexander Borisov wrote: I've done many different experiments and everywhere the result is within the margin of the v2 patch result. Great, thank you for working on this! There doesn't appear to be a downside. Ev

Re: Optimization for lower(), upper(), casefold() functions.

2025-02-05 Thread Jeff Davis
On Tue, 2025-02-04 at 23:19 +0300, Alexander Borisov wrote: > I've done many different experiments and everywhere the result is > within > the margin of the v2 patch result. Great, thank you for working on this! There doesn't appear to be a downside. Even though it's more complex, we have exhaust

Re: Optimization for lower(), upper(), casefold() functions.

2025-01-31 Thread Alexander Borisov
31.01.2025 01:43, Heikki Linnakangas пишет: Hi Heikki, Did you consider using a radix tree? We use that method in src/backend/ utils/mb/Unicode/convutils.pm. I'm not sure if that's better or worse than what's proposed here, but it would seem like a more standard technique at least. Or if this

Re: Optimization for lower(), upper(), casefold() functions.

2025-01-30 Thread Heikki Linnakangas
On 30/01/2025 15:39, Alexander Borisov wrote: The code is fixed, now the patch passes all tests. Change from the original patch (v1): Reduce the main table from 3003 to 1677 (duplicates removed) records. Added records from 0x00 to 0x80 for fast path. Renamed get_case() function to pg_unicode_cas

Re: Optimization for lower(), upper(), casefold() functions.

2025-01-29 Thread Alexander Borisov
Sorry, I made a mistake in the code. It's not worth watching this patch yet. 29.01.2025 23:23, Alexander Borisov пишет: Hi, hackers! I propose to consider a simple optimization for Unicode case tables. The main changes affect the generate-unicode_case_table.pl file. Because of the modified app