On Thu, 19 Jun 2025 at 18:39, David E. Wheeler wrote:
>
> On Jun 19, 2025, at 12:59, Thom Brown wrote:
>
> > No. But given the options, I would personally choose nondeterministic
> > collations now that they are available. I just wish they were more
> > user-friendly as I suspect the majority o
On Jun 19, 2025, at 12:59, Thom Brown wrote:
> No. But given the options, I would personally choose nondeterministic
> collations now that they are available. I just wish they were more
> user-friendly as I suspect the majority of people either won't know about
> them, or won't know how to use
On Thu, 19 Jun 2025, 17:33 Jeff Davis, wrote:
> On Thu, 2025-06-19 at 16:36 +0100, Thom Brown wrote:
> > Ease of use, perhaps. It seems easier to use:
> >
> > column_name cftext
> >
> > rather than:
> >
> > CREATE COLLATION case_insensitive_collation (
> > PROVIDER = icu,
> > LOCALE = 'un
On Thu, 2025-06-19 at 18:21 +0200, Vik Fearing wrote:
> >
> > The SQL standard also says in a few other places that normalization
> > should be applied, and we do none of those, so this is probably not
> > a
> > reason to change CASEFOLD at this point.
> >
>
> Works for me.
Sounds good. We can
On Thu, Jun 19, 2025 at 12:33 PM Jeff Davis wrote:
>
> On Thu, 2025-06-19 at 16:36 +0100, Thom Brown wrote:
> > Ease of use, perhaps. It seems easier to use:
> >
> > column_name cftext
> >
> > rather than:
> >
> > CREATE COLLATION case_insensitive_collation (
> > PROVIDER = icu,
> > LOCALE
On Thu, 2025-06-19 at 16:36 +0100, Thom Brown wrote:
> Ease of use, perhaps. It seems easier to use:
>
> column_name cftext
>
> rather than:
>
> CREATE COLLATION case_insensitive_collation (
> PROVIDER = icu,
> LOCALE = 'und-u-ks-level2',
> DETERMINISTIC = FALSE
> );
We could auto-c
On 19/06/2025 16:47, Peter Eisentraut wrote:
On 17.06.25 17:37, Vik Fearing wrote:
For (which includes LOWER() and UPPER()), the text says in
Section 6.35 GR 7.e:
If the character set of is UTF8, UTF16, or UTF32,
then FR is replaced by
Case:
i) If the S IS NORMALIZED eval
On Thu, Jun 19, 2025 at 11:37 AM Thom Brown wrote:
> On Thu, 19 Jun 2025 at 15:51, Peter Eisentraut wrote:
> > On 19.06.25 06:03, Thom Brown wrote:
> > > Late to the party, but is there an argument for porting this to the
> > > citext type? Or supplementing the extension with an additional type
>
On Thu, 19 Jun 2025 at 15:51, Peter Eisentraut wrote:
>
> On 19.06.25 06:03, Thom Brown wrote:
> > Late to the party, but is there an argument for porting this to the
> > citext type? Or supplementing the extension with an additional type
> > ("cftext"? *shrug*). It currently uses lower(), so our
On 19.06.25 06:03, Thom Brown wrote:
Late to the party, but is there an argument for porting this to the
citext type? Or supplementing the extension with an additional type
("cftext"? *shrug*). It currently uses lower(), so our current
recommendation for dealing with all unicode characters is t
On 17.06.25 17:37, Vik Fearing wrote:
For (which includes LOWER() and UPPER()), the text says in
Section 6.35 GR 7.e:
If the character set of is UTF8, UTF16, or UTF32,
then FR is replaced by
Case:
i) If the S IS NORMALIZED evaluates to True,
then NORMALIZE (FR)
ii
On Thu, 2025-06-19 at 05:03 +0100, Thom Brown wrote:
> Late to the party, but is there an argument for porting this to the
> citext type? Or supplementing the extension with an additional type
> ("cftext"? *shrug*).
CASEFOLD() addresses a lot of the problems with using LOWER(), so that
sounds like
On Thu, 19 Jun 2025, 03:53 Jeff Davis, wrote:
> On Wed, 2025-06-18 at 19:09 +0200, Vik Fearing wrote:
> > I don't know. I am just pointing out what the Standard says. I
> > think
> > we should either comply, or say that we don't do it for LOWER and
> > UPPER
> > so let's keep things implementat
On Wed, 2025-06-18 at 19:09 +0200, Vik Fearing wrote:
> I don't know. I am just pointing out what the Standard says. I
> think
> we should either comply, or say that we don't do it for LOWER and
> UPPER
> so let's keep things implementation-consistent.
For the standard, I see two potential phi
On 17/06/2025 20:14, Jeff Davis wrote:
On Tue, 2025-06-17 at 17:37 +0200, Vik Fearing wrote:
If the character set of is UTF8, UTF16, or UTF32,
then FR is replaced by
Case:
i) If the S IS NORMALIZED evaluates to
True, then NORMALIZE (FR)
ii) Otherwise, FR.
I read th
On Tue, 2025-06-17 at 17:37 +0200, Vik Fearing wrote:
> If the character set of is UTF8, UTF16, or UTF32,
> then FR is replaced by
> Case:
> i) If the S IS NORMALIZED evaluates to
> True, then NORMALIZE (FR)
> ii) Otherwise, FR.
I read that as "if the input is normalized,
On 16/12/2024 18:49, Jeff Davis wrote:
One question I have is whether we want this function to normalize the
output.
Yes, we do.
I am sorry that I am so late to the party, but I am currently writing
the Change Proposal for the SQL Standard for this function.
For (which includes LOWER()
On Sat, 2025-01-25 at 00:00 -0500, Tom Lane wrote:
> Found characters that cannot be output in the PDF document; see
> README.non-ASCII
Thank you, fixed.
> Not sure about a good workaround for this. Are there any characters
> within LATIN-1 that have interesting case-folding behavior?
I just r
Jeff Davis writes:
> v6 attached. I plan to commit this soon.
The documentation for this function is giving the PDF docs build
indigestion:
[WARN] FOUserAgent - Glyph "?" (0x3a3, Sigma) not available in font "Courier".
[WARN] FOUserAgent - Glyph "?" (0x3c3, sigma) not available in font "Courier"
On Fri, 2025-01-17 at 16:34 -0800, Jeff Davis wrote:
> v5 attached.
v6 attached. I plan to commit this soon.
A couple things to note:
* The ICU API for lower/title/uppercasing is slightly different from
folding. The former accept a locale, while the latter just has an
option which is relevant on
On Thu, 2024-12-19 at 09:51 -0800, Jeff Davis wrote:
> But there's a problem: full case folding doesn't preserve the normal
> form, so even if the input is NFC normalized, the output might not
> be.
> If we solve this problem, then we can just say that CASEFOLD()
> preserves the normal form, consis
On Thu, 2024-12-19 at 17:18 +0100, Peter Eisentraut wrote:
> Can you explain this in further detail? I don't quite follow why
> this
> would be required.
I am unsure now.
My initial reasoning was based on the idea that users would want to use
CASEFOLD(t) in a unique expression index as an impro
On 16.12.24 18:49, Jeff Davis wrote:
One question I have is whether we want this function to normalize the
output.
I believe most usecases would want the output normalized, because
normalization differences (e.g. "a" U+0061 followed by "combining
acute" U+0301 vs "a with acute" U+00E1) are more
On Mon, 2024-12-16 at 16:27 -0500, Joe Conway wrote:
>
> SQL 2023 seems to include the NORMALIZE syntax, but the only case
> folding considered is UPPER and LOWER. As such, I think it ought to
> be a
> function but not part of the grammar.
Should the standard support something like the Unicode
On 12/12/24 10:00 AM, Jeff Davis wrote:
Patch attached.
I have not looked at the patch yet but +1 to the idea. I am leaning
towards that the function also optionally normalizing the codepoints
would be handy too since I think that is what most usecases want.
Otherwise people would have to al
On 12/16/24 12:49, Jeff Davis wrote:
One question I have is whether we want this function to normalize the
output.
I believe most usecases would want the output normalized, because
normalization differences (e.g. "a" U+0061 followed by "combining
acute" U+0301 vs "a with acute" U+00E1) are more
On 12/12/24 13:30, Jeff Davis wrote:
On Thu, 2024-12-12 at 21:52 +0900, Ian Lawrence Barwick wrote:
and it seems to work as advertised, except the function is named
"FOLDCASE()"
in the patch, so I'm wondering which is intended?
Thank you for looking into this, I went back and forth on the name
On Thu, 2024-12-12 at 21:52 +0900, Ian Lawrence Barwick wrote:
> and it seems to work as advertised, except the function is named
> "FOLDCASE()"
> in the patch, so I'm wondering which is intended?
Thank you for looking into this, I went back and forth on the name, and
mistyped it a few times.
ICU
Hi
2024年12月12日(木) 18:00 Jeff Davis :
>
> Unicode case folding is a way to convert a string to a canonical case
> for the purpose of case-insensitive matching.
>
> Users have long used LOWER() for that purpose, but there are a few edge
> case problems:
>
> * Some characters have more than two cased
29 matches
Mail list logo