Re: Add CASEFOLD() function.

2025-01-25 Thread Jeff Davis
On Sat, 2025-01-25 at 00:00 -0500, Tom Lane wrote: > Found characters that cannot be output in the PDF document;  see > README.non-ASCII Thank you, fixed. > Not sure about a good workaround for this.  Are there any characters > within LATIN-1 that have interesting case-folding behavior? I just r

Re: Add CASEFOLD() function.

2025-01-24 Thread Tom Lane
Jeff Davis writes: > v6 attached. I plan to commit this soon. The documentation for this function is giving the PDF docs build indigestion: [WARN] FOUserAgent - Glyph "?" (0x3a3, Sigma) not available in font "Courier". [WARN] FOUserAgent - Glyph "?" (0x3c3, sigma) not available in font "Courier"

Re: Add CASEFOLD() function.

2025-01-23 Thread Jeff Davis
On Fri, 2025-01-17 at 16:34 -0800, Jeff Davis wrote: > v5 attached. v6 attached. I plan to commit this soon. A couple things to note: * The ICU API for lower/title/uppercasing is slightly different from folding. The former accept a locale, while the latter just has an option which is relevant on

Re: Add CASEFOLD() function.

2025-01-08 Thread Jeff Davis
On Thu, 2024-12-19 at 09:51 -0800, Jeff Davis wrote: > But there's a problem: full case folding doesn't preserve the normal > form, so even if the input is NFC normalized, the output might not > be. > If we solve this problem, then we can just say that CASEFOLD() > preserves the normal form, consis

Re: Add CASEFOLD() function.

2024-12-19 Thread Jeff Davis
On Thu, 2024-12-19 at 17:18 +0100, Peter Eisentraut wrote: > Can you explain this in further detail?  I don't quite follow why > this > would be required. I am unsure now. My initial reasoning was based on the idea that users would want to use CASEFOLD(t) in a unique expression index as an impro

Re: Add CASEFOLD() function.

2024-12-19 Thread Peter Eisentraut
On 16.12.24 18:49, Jeff Davis wrote: One question I have is whether we want this function to normalize the output. I believe most usecases would want the output normalized, because normalization differences (e.g. "a" U+0061 followed by "combining acute" U+0301 vs "a with acute" U+00E1) are more

Re: Add CASEFOLD() function.

2024-12-18 Thread Jeff Davis
On Mon, 2024-12-16 at 16:27 -0500, Joe Conway wrote: > > SQL 2023 seems to include the NORMALIZE syntax, but the only case > folding considered is UPPER and LOWER. As such, I think it ought to > be a > function but not part of the grammar. Should the standard support something like the Unicode

Re: Add CASEFOLD() function.

2024-12-17 Thread Andreas Karlsson
On 12/12/24 10:00 AM, Jeff Davis wrote: Patch attached. I have not looked at the patch yet but +1 to the idea. I am leaning towards that the function also optionally normalizing the codepoints would be handy too since I think that is what most usecases want. Otherwise people would have to al

Re: Add CASEFOLD() function.

2024-12-16 Thread Joe Conway
On 12/16/24 12:49, Jeff Davis wrote: One question I have is whether we want this function to normalize the output. I believe most usecases would want the output normalized, because normalization differences (e.g. "a" U+0061 followed by "combining acute" U+0301 vs "a with acute" U+00E1) are more

Re: Add CASEFOLD() function.

2024-12-12 Thread Joe Conway
On 12/12/24 13:30, Jeff Davis wrote: On Thu, 2024-12-12 at 21:52 +0900, Ian Lawrence Barwick wrote: and it seems to work as advertised, except the function is named "FOLDCASE()" in the patch, so I'm wondering which is intended? Thank you for looking into this, I went back and forth on the name

Re: Add CASEFOLD() function.

2024-12-12 Thread Jeff Davis
On Thu, 2024-12-12 at 21:52 +0900, Ian Lawrence Barwick wrote: > and it seems to work as advertised, except the function is named > "FOLDCASE()" > in the patch, so I'm wondering which is intended? Thank you for looking into this, I went back and forth on the name, and mistyped it a few times. ICU

Re: Add CASEFOLD() function.

2024-12-12 Thread Ian Lawrence Barwick
Hi 2024年12月12日(木) 18:00 Jeff Davis : > > Unicode case folding is a way to convert a string to a canonical case > for the purpose of case-insensitive matching. > > Users have long used LOWER() for that purpose, but there are a few edge > case problems: > > * Some characters have more than two cased