On Tue, Jun 20, 2023 at 6:48 AM Jeff Davis wrote:
> On Sat, 2023-06-17 at 17:54 +1200, Thomas Munro wrote:
> > > Would it be correct to interpret LC_COLLATE=C.UTF-8 as
> > > LC_COLLATE=C,
> > > but leave LC_CTYPE=C.UTF-8 as-is?
> >
> > Yes. The basic idea, at least for these two OSes, is that eve
Thomas Munro wrote:
> What could we do that would be helpful here, without affecting users
> of the "true" C.UTF-8 for the rest of time? This is a Debian (+
> downstream distro) only problem as far as we know so far, and only
> for Debian 11 and older.
It seems to include RedHat-based di
On Sat, 2023-06-17 at 17:54 +1200, Thomas Munro wrote:
>
> > Would it be correct to interpret LC_COLLATE=C.UTF-8 as
> > LC_COLLATE=C,
> > but leave LC_CTYPE=C.UTF-8 as-is?
>
> Yes. The basic idea, at least for these two OSes, is that every
> category behaves as if set to C, except LC_CTYPE.
If
On Sat, Jun 17, 2023 at 10:03 AM Jeff Davis wrote:
> On Thu, 2023-06-15 at 19:15 +1200, Thomas Munro wrote:
> > Hmm, OK let's explore that. What could we do that would be helpful
> > here, without affecting users of the "true" C.UTF-8 for the rest of
> > time?
>
> Where is the "true" C.UTF-8 defi
On Thu, 2023-06-15 at 19:15 +1200, Thomas Munro wrote:
> Hmm, OK let's explore that. What could we do that would be helpful
> here, without affecting users of the "true" C.UTF-8 for the rest of
> time?
Where is the "true" C.UTF-8 defined?
I assume you mean that the collation order can't (shouldn
On Sun, Apr 23, 2023 at 5:22 AM Daniel Verite wrote:
> I understand that my proposal to version C.* like any other collation
> might be erring on the side of caution, but ignoring these collation
> changes on at least one major OS does not feel right either.
> Maybe we should consider doing platfo
On Wed, 2023-06-07 at 23:28 +0200, Peter Eisentraut wrote:
> On 06.06.23 21:23, Jeff Davis wrote:
> > What about ICU? How should provider=icu locale=C.UTF-8 behave? We
> > could:
>
> It should be an error.
>
> > a. Just pass it to the provider and see what happens (older
> > versions of
> > ICU w
On 06.06.23 21:23, Jeff Davis wrote:
What about ICU? How should provider=icu locale=C.UTF-8 behave? We
could:
It should be an error.
a. Just pass it to the provider and see what happens (older versions of
ICU would interpret it as en-US-u-va-posix; newer versions would give
the root locale).
I wrote:
> Consider matching '\d' in a regexp. With C.UTF-8 (glibc-2.35), we
> only match ASCII characters 0-9, or 10 codepoints. With
> "en-US-u-va-posix-x-icu" we match 660 codepoints comprising all the
> digit characters in all languages, plus a bunch of variants for
> mathematical sym
Jeff Davis wrote:
> What about ICU? How should provider=icu locale=C.UTF-8 behave? We
> could:
>
> a. Just pass it to the provider and see what happens (older versions of
> ICU would interpret it as en-US-u-va-posix; newer versions would give
> the root locale).
>
> b. Consistently inter
On 6/6/23 15:23, Jeff Davis wrote:
On Mon, 2023-06-05 at 19:43 +0200, Daniel Verite wrote:
But in the meantime, personally I don't quite see why Postgres should
start forcing C.UTF-8 to sort differently in the database than in the
OS.
I can see both points of view. It could be surprising to us
On Mon, 2023-06-05 at 19:43 +0200, Daniel Verite wrote:
> But in the meantime, personally I don't quite see why Postgres should
> start forcing C.UTF-8 to sort differently in the database than in the
> OS.
I can see both points of view. It could be surprising to users if
C.UTF-8 does not sort like
Jeff Davis wrote:
> > For libc: this change may affect any user who happened to have
> > LANG=C.UTF-8 in their environment at initdb time, which is probably a
> > lot of users, and some buildfarm members. However, the average risk
> > seems to be much lower, because we've gone a long tim
On Fri, 2023-05-26 at 10:43 -0700, Jeff Davis wrote:
> We still need to consider backwards compatibility. If someone has a
> collation with locale name C.UTF-8 in an earlier version, any change
> to
> the interpretation of that locale name after an upgrade carries a
> corruption risk. The risks are
On Thu, 2023-05-25 at 14:48 -0400, Tom Lane wrote:
> Jeff Davis writes:
> > What should we do with locales like C.UTF-8 in both libc and ICU?
>
> I vote for passing those to the existing C-specific code paths,
Great, this would be a big step toward solving the ICU usability issues
in this threa
Jeff Davis writes:
> What should we do with locales like C.UTF-8 in both libc and ICU?
I vote for passing those to the existing C-specific code paths,
whereever we have any (not sure that we do for functionality).
The semantics are quite well-defined and I can see no good coming of
allowing eit
On Wed, 2023-04-19 at 14:07 +1200, Thomas Munro wrote:
> That strengthens my opinion that C.UTF-8 (the real C.UTF-8 supplied
> by
> the glibc project) isn't supposed to be versioned, but it's extremely
> unfortunate that a bunch of OSes (Debian and maybe more) have been
> sorting text in some other
Thomas Munro wrote:
> It looks like for technical reasons
> inside glibc, that couldn't be done before 2.35:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=17318
>
> That strengthens my opinion that C.UTF-8 (the real C.UTF-8 supplied
> by the glibc project) isn't supposed to be vers
On Wed, Apr 19, 2023 at 1:30 PM Jeff Davis wrote:
> On Wed, 2023-04-19 at 07:48 +1200, Thomas Munro wrote:
> > Many OSes have a locale with this name. I don't know this history,
> > who did it first etc, but now I am wondering if they all took the
> > "obvious" interpretation, that it should be c
On Wed, 2023-04-19 at 07:48 +1200, Thomas Munro wrote:
> Many OSes have a locale with this name. I don't know this history,
> who did it first etc, but now I am wondering if they all took the
> "obvious" interpretation, that it should be code-point based,
> extrapolating from "C" (really memcmp or
On Wed, Apr 19, 2023 at 12:36 AM Daniel Verite wrote:
> This seems to be based on the idea that C.* collations provide an
> immutable sort like "C", but it appears that it's not the case.
Hmm. It seems I added that exemption initially for FreeBSD only in
ca051d8b101, and then merged the cases fo
Hi,
get_collation_actual_version() in pg_locale.c currently
excludes C.UTF-8 (and more generally C.*) from versioning,
which makes pg_collation.collversion being empty for these
collations.
char *
get_collation_actual_version(char collprovider, const char *collcollate)
{
if (collpr
22 matches
Mail list logo