On Sat, 2023-07-01 at 10:31 -0700, Noah Misch wrote:
> As of commit b8c3f6d, InstallCheck-C got daticulocale=en-US-u-va-
> posix. Check
> got daticulocale=NULL.
With the same test setup, that locale takes about 8.6 seconds (opening
it 10M times), about 2.5X slower than "en-US" and about 7X slower
On Sat, May 20, 2023 at 10:19:30AM -0700, Jeff Davis wrote:
> On Tue, 2023-05-02 at 07:29 -0700, Noah Misch wrote:
> > On Thu, Mar 30, 2023 at 08:59:41AM +0200, Peter Eisentraut wrote:
> > > On 30.03.23 04:33, Jeff Davis wrote:
> > > > Attached is a new version of the final patch, which performs
>
On Tue, 2023-05-02 at 07:29 -0700, Noah Misch wrote:
> On Thu, Mar 30, 2023 at 08:59:41AM +0200, Peter Eisentraut wrote:
> > On 30.03.23 04:33, Jeff Davis wrote:
> > > Attached is a new version of the final patch, which performs
> > > canonicalization. I'm not 100% sure that it's wanted, but it
> >
On Thu, Mar 30, 2023 at 08:59:41AM +0200, Peter Eisentraut wrote:
> On 30.03.23 04:33, Jeff Davis wrote:
> >Attached is a new version of the final patch, which performs
> >canonicalization. I'm not 100% sure that it's wanted, but it still
> >seems like a good idea to get the locales into a standard
MSVC now says this on master:
[17:48:12.446] c:\cirrus\src\backend\utils\adt\pg_locale.c(2912) :
warning C4715: 'icu_language_tag': not all control paths return a
value
CI doesn't currently fail for MSVC warnings, so it's a bit hidden.
FWIW cfbot does show this with a ⚠ sign with its new system f
On 31.03.23 12:11, Jeff Davis wrote:
On Thu, 2023-03-30 at 08:59 +0200, Peter Eisentraut wrote:
I don't think we should show the notice when the canonicalization
doesn't change anything. This is not useful:
+NOTICE: using language tag "und-u-kf-upper" for locale "und-u-kf-
upper"
Done.
Al
On Thu, 2023-03-30 at 08:59 +0200, Peter Eisentraut wrote:
> I don't think we should show the notice when the canonicalization
> doesn't change anything. This is not useful:
>
> +NOTICE: using language tag "und-u-kf-upper" for locale "und-u-kf-
> upper"
Done.
> Also, the message should be phr
On Thu, 2023-03-30 at 08:59 +0200, Peter Eisentraut wrote:
> I don't think the special handling of IsBinaryUpgrade is needed or
> wanted. I would hope that with this feature, all old-style locale
> IDs
> would go away, but this way we would keep them forever. If we
> believe
> that canonicali
On 30.03.23 04:33, Jeff Davis wrote:
Attached is a new version of the final patch, which performs
canonicalization. I'm not 100% sure that it's wanted, but it still
seems like a good idea to get the locales into a standard format in the
catalogs, and if a lot more people start using ICU in v16 (b
On Tue, 2023-03-28 at 08:41 +0200, Peter Eisentraut wrote:
> [PATCH v9 1/5] Fix error inconsistency in older ICU versions.
>
> ok
Committed 0001.
> [PATCH v9 2/5] initdb: replace check_icu_locale() with
> default_icu_locale().
>
> I would keep the #ifdef USE_ICU inside the lower-level functio
[PATCH v9 1/5] Fix error inconsistency in older ICU versions.
ok
[PATCH v9 2/5] initdb: replace check_icu_locale() with
default_icu_locale().
I would keep the #ifdef USE_ICU inside the lower-level function
default_icu_locale(), like it was before, so that the higher-level
setlocales() doesn
On Fri, 2023-03-24 at 10:10 +0100, Peter Eisentraut wrote:
> Couldn't we do this in a simpler way by just freeing the collator
> before
> the ereport() calls.
I committed a tiny patch to do this.
We still need to address the error inconsistency though. The problem is
that, in older ICU versions,
On 24.03.23 07:39, Jeff Davis wrote:
On Thu, 2023-03-23 at 10:16 -0700, Jeff Davis wrote:
I could get rid of the SQL-callable function and move the rest of the
changes into 0006. I'll see if that arrangement works better, and
that
way we can add the SQL-callable function later (or perhaps not at
On 23.03.23 18:16, Jeff Davis wrote:
In 0002, the error "opening default collator is not supported",
should
that be an assert or an elog? Is it reachable by the user?
It's not reachable by the user, but could catch a bug if we
accidentally read a NULL field from the catalog or something like th
On Thu, 2023-03-23 at 10:16 -0700, Jeff Davis wrote:
> I could get rid of the SQL-callable function and move the rest of the
> changes into 0006. I'll see if that arrangement works better, and
> that
> way we can add the SQL-callable function later (or perhaps not at all
> if it's not desired).
At
On Thu, 2023-03-23 at 07:27 +0100, Peter Eisentraut wrote:
> So, does uloc_canonicalize() always convert to ICU locale IDs? What
> if
> you pass a language tag, does it convert it to ICU locale ID as well?
Yes.
The documentation is not clear on that point, but my testing shows that
it does. And
On 22.03.23 19:05, Jeff Davis wrote:
On Tue, 2023-03-21 at 10:35 +0100, Peter Eisentraut wrote:
[PATCH v6 1/6] Support language tags in older ICU versions (53 and
earlier).
In pg_import_system_collations(), this is now redundant and can be
simplified:
- if (!pg_is_ascii(langta
On Tue, 2023-03-21 at 10:35 +0100, Peter Eisentraut wrote:
> [PATCH v6 1/6] Support language tags in older ICU versions (53 and
> earlier).
>
> In pg_import_system_collations(), this is now redundant and can be
> simplified:
>
> - if (!pg_is_ascii(langtag) || !pg_is_ascii(iculocs
On 17.03.23 18:55, Jeff Davis wrote:
On Wed, 2023-03-15 at 15:18 -0700, Jeff Davis wrote:
I left out the validation patch for now, and I'm evaluating a
different
approach that will attempt to match to the locales retrieved with
uloc_countAvailable()/uloc_getAvailable().
I like this approach, a
On Wed, 2023-03-15 at 15:18 -0700, Jeff Davis wrote:
> I left out the validation patch for now, and I'm evaluating a
> different
> approach that will attempt to match to the locales retrieved with
> uloc_countAvailable()/uloc_getAvailable().
I like this approach, attached new patch series with tha
On Tue, 2023-03-14 at 23:47 -0700, Jeff Davis wrote:
> On Tue, 2023-03-14 at 10:10 -0700, Jeff Davis wrote:
> > One loose end is that we really should support language tags like
> > "und"
> > in those older versions (54 and earlier). Your commit d72900bded
> > avoided the problem, but perhaps we sh
On Tue, 2023-03-14 at 10:10 -0700, Jeff Davis wrote:
> One loose end is that we really should support language tags like
> "und"
> in those older versions (54 and earlier). Your commit d72900bded
> avoided the problem, but perhaps we should fix it by looking for
> "und"
> and replacing it with "roo
On Tue, 2023-03-14 at 08:08 +0100, Peter Eisentraut wrote:
> Another issue that came to mind: Right now, you can, say, develop
> SQL
> schemas on a newer ICU version, say, your laptop, and then deploy
> them
> on a server running an older ICU version. If we have a cutoff beyond
> which we conve
On 13.03.23 16:31, Jeff Davis wrote:
What we had discussed a while ago in one of these threads is that ICU
before version 54 do not support keyword lists, and we have custom
code
to do that parsing ourselves, but we don't have code to do the same
for
language tags. Therefore, if I understand thi
On Mon, 2023-03-13 at 08:25 +0100, Peter Eisentraut wrote:
> For clarification, I wasn't complaining about the notice, but about
> the
> automatic conversion from old-style ICU locale ID to language tag.
Canonicalization means that we pick one format, and automatically
convert to it, right?
> Wh
On 09.03.23 21:17, Jeff Davis wrote:
Personally, I'm not on board with this behavior:
=> CREATE COLLATION test (provider = icu, locale =
'de@collation=phonebook');
NOTICE: 0: using language tag "de-u-co-phonebk" for locale
"de@collation=phonebook"
I mean, maybe that is a thing we want to d
On Thu, 2023-03-09 at 09:46 +0100, Peter Eisentraut wrote:
> This patch appears to do about three things at once, and it's not
> clear
> exactly where the boundaries are between them and which ones we might
> actually want. And I think the terminology also gets mixed up a bit,
> which makes follo
On 28.02.23 06:57, Jeff Davis wrote:
On Mon, 2023-02-20 at 15:23 -0800, Jeff Davis wrote:
New patch attached. The new patch also includes a GUC that (when
enabled) validates that the collator is actually found.
New patch attached.
Now it always preserves the exact locale string during pg_upg
On Mon, 2023-02-20 at 15:23 -0800, Jeff Davis wrote:
>
> New patch attached. The new patch also includes a GUC that (when
> enabled) validates that the collator is actually found.
New patch attached.
Now it always preserves the exact locale string during pg_upgrade, and
does not attempt to canon
New patch attached. The new patch also includes a GUC that (when
enabled) validates that the collator is actually found.
On Mon, 2023-02-20 at 15:46 +0100, Peter Eisentraut wrote:
> a) BCP47 tags are preferred, and
Agreed.
> b) They don't work with ICU versions before 54.
I tried in versions 5
On 10.02.23 18:53, Jeff Davis wrote:
To represent ICU locale strings in the catalog consistently, we have
two choices, which as far as I can tell are equivalent:
1. ICU format Locale IDs. These are more readable, and still specified
(albeit non-standard).
2. BCP47 language tags. These are stand
On Thu, 2023-02-09 at 14:09 -0800, Jeff Davis wrote:
> It feels like BCP 47 is the right catalog representation. We are
> already using it for the import of initial collations, and it's a
> standard, and there seems to be good support in ICU.
Patch attached.
We should have been canonicalizing all
On Fri, 2023-02-10 at 22:50 -0500, Robert Haas wrote:
> The fact that you're figuring out how it all works from reading the
> source code does not give me a warm feeling.
Right. On the other hand, the behavior is quite well documented, it was
just the keyword that was undocumented (or I didn't fin
On Fri, Feb 10, 2023 at 12:54 PM Jeff Davis wrote:
> In our tests you can see colstrength=primary is used to mean "case
> insensitive". That's where I picked up the "colstrength" keyword, which
> is also present in the ICU sources, but now that you ask I'm embarassed
> that I don't see the keyword
On Fri, 2023-02-10 at 09:43 -0500, Robert Haas wrote:
> On Thu, Feb 9, 2023 at 5:09 PM Jeff Davis wrote:
> > I do like the ICU format locale IDs from a readability standpoint.
> > "en_US@colstrength=primary" is more meaningful to me than "en-US-u-
> > ks-
> > level1" (the equivalent language tag).
On Fri, 2023-02-10 at 07:42 +0100, Peter Eisentraut wrote:
> It turns out that 'de_AT' is actually a distinct collation from 'de'
> in
> CLDR, so that was not the best example. What behavior do you see for
> 'de_CH'?
The canonicalized form is de_CH and the bcp47 tag is de-CH.
uloc_canonicalize(
On Thu, Feb 9, 2023 at 5:09 PM Jeff Davis wrote:
> I do like the ICU format locale IDs from a readability standpoint.
> "en_US@colstrength=primary" is more meaningful to me than "en-US-u-ks-
> level1" (the equivalent language tag).
Sadly, neither of those means a whole lot to me? :-(
How did you
On 09.02.23 22:15, Jeff Davis wrote:
On Thu, 2023-02-09 at 15:44 +0100, Peter Eisentraut wrote:
One use case is that if a user specifies a locale, say, of 'de-AT',
this
might canonicalize to 'de' today,
Canonicalization should not lose useful information, it should just
rearrange it, so I don't
On 2/10/23 02:22, Jeff Davis wrote:
We will still allow the ICU format locale IDs for input; we would just
convert them to BCP47 before storing them in the catalog. And there's
an inverse function, so it's easy enough to offer a view that shows the
ICU format locale IDs in addition to the BCP 47
On Fri, 2023-02-10 at 01:04 +0100, Andreas Karlsson wrote:
> I have the same feeling one is readable and the other unreadable but
> the
> unreadable one is standardized. Hard call.
>
> And in general I agree, if we are going to make ICU default it needs
> to
> be more user friendly than it is no
On 2/9/23 23:09, Jeff Davis wrote:
I do like the ICU format locale IDs from a readability standpoint.
"en_US@colstrength=primary" is more meaningful to me than "en-US-u-ks-
level1" (the equivalent language tag). And the format is specified[1],
even though it's not an independent standard. But I t
On Thu, 2023-02-09 at 10:53 -0500, Robert Haas wrote:
> Unfortunately, I have no idea whether your specific ideas about how
> to
> make that happen are any good or not. But I hope they are, because
> the
> current situation is pessimal.
It feels like BCP 47 is the right catalog representation. We
On Thu, 2023-02-09 at 15:44 +0100, Peter Eisentraut wrote:
> One use case is that if a user specifies a locale, say, of 'de-AT',
> this
> might canonicalize to 'de' today,
Canonicalization should not lose useful information, it should just
rearrange it, so I don't see a risk here based on what I
On Wed, Feb 8, 2023 at 2:59 AM Jeff Davis wrote:
> We do check that the value is accepted by ICU, but ICU seems to accept
> anything and use some fallback logic. Bogus strings will typically end
> up as the "root" locale (spelled "root" or "").
I've noticed this, and I think it's really frustrati
On 08.02.23 08:59, Jeff Davis wrote:
The overall benefit here is that we keep our catalogs consistently
using an independent standard format for ICU locale strings, rather
than whatever the user specifies. That makes it less likely that ICU
needs to use any fallback logic when trying to open a co
Right now, ICU locales are not validated:
initdb ... --locale-provider=icu --icu-locale=anything
CREATE COLLATION foo (PROVIDER=icu, LOCALE='anything');
CREATE DATABASE anythingdb ICU_LOCALE 'anything';
all succeed.
We do check that the value is accepted by ICU, but ICU seems to accept
a
46 matches
Mail list logo