Re: Proposal to add a new URL data type.

2024-12-06 Thread Alexander Borisov
05.12.2024 17:59, Peter Eisentraut пишет: On 05.12.24 15:01, Alexander Borisov wrote: Postgres users often store URLs in the database.  As an example, they provide links to their pages on the web, analyze users posts and get links for further storage and analysis.  Naturally, there is a need to

Re: Proposal to add a new URL data type.

2024-12-09 Thread Alexander Borisov
06.12.2024 21:04, Matthias van de Meent: On Thu, 5 Dec 2024 at 15:02, Alexander Borisov wrote: [..] I'd be extremely annoyed if URLs I wrote into the database didn't return in identical manner when fetched from the database. See also how numeric has different representations o

Re: Optimization for lower(), upper(), casefold() functions.

2025-02-06 Thread Alexander Borisov
06.02.2025 22:08, Jeff Davis пишет: On Thu, 2025-02-06 at 18:39 +0300, Alexander Borisov wrote: Since I started to improve Unicode Case, I used the same approach, essentially a binary search, only not by individual values, but by ranges. I considered it a 4th approach because of the generated

Re: Optimization for lower(), upper(), casefold() functions.

2025-02-06 Thread Alexander Borisov
Hi Jeff, 06.02.2025 00:46, Jeff Davis пишет: On Tue, 2025-02-04 at 23:19 +0300, Alexander Borisov wrote: I've done many different experiments and everywhere the result is within the margin of the v2 patch result. Great, thank you for working on this! There doesn't appear to be

Re: Optimization for lower(), upper(), casefold() functions.

2025-01-31 Thread Alexander Borisov
by uint8*n. Thanks, after the weekend I'll send an updated patch that takes into account the comments/advice. -- SberTech Alexander Borisov

Re: Proposal to add a new URL data type.

2024-12-11 Thread Alexander Borisov
10.12.2024 13:59, Victor Yegorov пишет: чт, 5 дек. 2024 г. в 17:02, Alexander Borisov <mailto:lex.bori...@gmail.com>>: [..] Hey, I had a look at this patch and found its functionality mature and performant. As Peter mentioned pguri, I used it to compare with the proposed

Re: Proposal to add a new URL data type.

2024-12-06 Thread Alexander Borisov
Hi Daniel, 06.12.2024 16:46, Daniel Gustafsson пишет: On 6 Dec 2024, at 13:59, Alexander Borisov wrote: As I've written before, there is a difference between parsing URLs according to the RFC 3986 specification and WHATWG URLs. This is especially true for host. Here are a couple

Re: Optimization for lower(), upper(), casefold() functions.

2025-01-29 Thread Alexander Borisov
Sorry, I made a mistake in the code. It's not worth watching this patch yet. 29.01.2025 23:23, Alexander Borisov пишет: Hi, hackers! I propose to consider a simple optimization for Unicode case tables. The main changes affect the generate-unicode_case_table.pl file. Because of the mod

Re: Optimization for lower(), upper(), casefold() functions.

2025-02-18 Thread Alexander Borisov
19.02.2025 01:02, Jeff Davis пишет: On Tue, 2025-02-11 at 23:08 +0300, Alexander Borisov wrote: I tried the approach via a range table. The result was worse than without the table. With branching in a function, the result is better. Patch v3 — ranges binary search by branches. Patch v4

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-02 Thread Alexander Borisov
19.02.2025 01:56, Jeff Davis пишет: On Wed, 2025-02-19 at 01:54 +0300, Alexander Borisov wrote: In proposing the patch for v3, I struck a balance between improving performance and reducing binary size, without sacrificing code clarity. Fair enough. I will continue reviewing v3. Did you have

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-12 Thread Alexander Borisov
12.03.2025 19:55, Alexander Borisov wrote: [...] A couple questions: * Is there a reason the fast-path for codepoints < 0x80 is in unicode_case.c rather than unicode_case_func.h? Yes, this is an important optimization, below are benchmarks that [...] I forgot to add the benchm

Re: Optimization for lower(), upper(), casefold() functions.

2025-03-15 Thread Alexander Borisov
15.03.2025 23:07, Jeff Davis wrote: On Fri, 2025-03-14 at 15:00 +0300, Alexander Borisov wrote: I tried adding a loop to create tables, and everything looks fine (v7). [...] I prefer to generalize when we have the other code in place. As it was, it was a bit confusing why the extra

Re: PG 18 release notes draft committed

2025-05-04 Thread Alexander Borisov
me from the commit message nor the skimming the original thread, whether the perf improvement numbers listed by Alexander also apply to lower() and upper(), or if they only apply to casefold(): On Sun, 4 May 2025 at 00:32, Alexander Borisov wrote: ASCII by ≈10% Cyrillic by ≈80% Unicode in general by

Re: PG 18 release notes draft committed

2025-05-05 Thread Alexander Borisov
e in this area. But again, I'm new to the Postgres community and I'm getting to know what's going on here and how it works. Thank you for paying attention to it! -- Regards, Alexander Borisov

Re: PG 18 release notes draft committed

2025-05-03 Thread Alexander Borisov
d want to understand. Commit: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=27bdec06841d1bb004ca7627eac97808b08a7ac7 I am now actively working on a major improvement to Unicode Normalization Forms. Thanks! -- Regards, Alexander Borisov

Re: PG 18 release notes draft committed

2025-05-03 Thread Alexander Borisov
algorithms. Because of which the functions lower(), upper(), casefold() got a significant boost. -- Regards, Alexander Borisov

Re: PG 18 release notes draft committed

2025-05-03 Thread Alexander Borisov
u for clarifying! Users are not interested in performance gains. Then it's not worth considering. Sorry to interrupt. -- Regards, Alexander Borisov