On Sun, Apr 06, 2025 at 09:16:17AM -0500, Nathan Bossart wrote:
> Coverity is unhappy about these. I think we should at least do something
> like the following. I'll commit this when I have an opportunity.
Committed.
--
nathan
On Wed, Mar 05, 2025 at 02:33:42PM -0600, Nathan Bossart wrote:
> + /*
> + * The builtin provider did not exist prior to version 17. While there
> are
> + * still problems that could potentially be caught from earlier
> versions,
> + * such as an index on NORMALIZE(), we don't
On Tue, Mar 18, 2025 at 10:33 PM Jeff Davis wrote:
> If we compare the following two problems:
>
> A. With glibc or ICU, every text index, including primary keys, are
> highly vulnerable to inconsistencies after an OS upgrade, even if
> there's no Postgres upgrade; vs.
>
> B. With the builtin
On 17.03.25 19:54, Jeff Davis wrote:
On Thu, 2025-03-13 at 14:49 +0100, Peter Eisentraut wrote:
I think these test result changes are incorrect. AFAICT, nothing has
changed in the upstream data that would explain such a change.
I didn't get such test differences in my original patch. Did you
On Tue, Mar 18, 2025 at 2:55 PM Jeff Davis wrote:
> Continuing on with Unicode 15.1 and accepting the unassigned code point
> *cannot* prevent breakage.
Under your definition, this is true, but I think Jeremy would define
breakage differently. His primary concern, I expect, is *stability*.
Breaka
On Wed, Mar 19, 2025 at 5:47 PM Jeff Davis wrote:
> Do you have a sketch of what the ideal Unicode version management
> experience might look like? Very high level, like "this is what happens
> by default during an upgrade" and "this is how a user discovers that
> that they might want to update Un
On Fri, 21 Mar 2025 13:45:24 -0700
Jeff Davis wrote:
> > Maybe we should actually move in the direction of having encodings
> > that are essentially specific versions of Unicode. Instead of just
> > having UTF-8 that accepts whatever, you could have UTF-8.v16.0.0 or
> > whatever, which would only
On Thu, 2025-03-20 at 08:45 -0400, Robert Haas wrote:
> * When the collation/ctype/whatever definitions upon which you are
> relying change, you can either decide to switch to the new ones
> without rebuilding your indexes and risk wrong results until you
> reindex, or you can decide to create new
On Fri, Mar 21, 2025 at 2:27 PM Jeff Davis wrote:
> > Those who are now trying to make the builtin collation provider have
> > properties that it does not have and was not intended to have when it
> > was added, they would need to arrange the work to make it have those
> > properties if they want
On Fri, 2025-03-21 at 14:54 -0400, Robert Haas wrote:
> And nobody has refuted the argument that refusing to
> update the Unicode tables will cause other problems (such as not
> knowing what to do with new code points that are added in the other
> places where those tables are used).
The argument
On Fri, 2025-03-21 at 10:45 -0400, Robert Haas wrote:
> We might need a way for ALTER DATABASE to allow the
> database default to be adjusted. I'm not quite sure here, but my
> general feeling is that Unicode version feels like part of the
> collation and that we should avoid introducing a separate
On Fri, 2025-03-21 at 17:15 +0100, Peter Eisentraut wrote:
> And we knew at the time the builtin collation
> provider was added that it would have certain problems with the
> Unicode
> table updates, and we accepted it with the understanding that this
> would
> not change our procedures.
Correc
On Fri, Mar 21, 2025 at 2:45 AM Jeff Davis wrote:
> On Thu, 2025-03-20 at 08:45 -0400, Robert Haas wrote:
> > * When the collation/ctype/whatever definitions upon which you are
> > relying change, you can either decide to switch to the new ones
> > without rebuilding your indexes and risk wrong re
On 15.03.25 07:54, Jeremy Schneider wrote:
in favor of leaving it alone because ICU is there for when I need
up-to-date unicode versions.
From my perspective, the whole point of the builtin collation was to
one option that avoids these problems that come with updating both ICU
and glibc.
So I
On Wed, 2025-03-19 at 14:33 -0400, Robert Haas wrote:
> I strongly believe users want to control what happens, not have
> the system try to fix it for them automatically without their
> knowledge.
Do you have a sketch of what the ideal Unicode version management
experience might look like? Very h
On Wed, 2025-03-19 at 08:46 -0400, Robert Haas wrote:
> I see your point, but most people don't use the builtin collation
> provider.
The other providers aren't affected by us updating Unicode, so I think
we got off track somehow. I suppose what I meant was:
"If you are concerned about inconsiste
On Wed, Mar 19, 2025 at 1:39 PM Jeff Davis wrote:
> On Wed, 2025-03-19 at 08:46 -0400, Robert Haas wrote:
> > I see your point, but most people don't use the builtin collation
> > provider.
>
> The other providers aren't affected by us updating Unicode, so I think
> we got off track somehow. I sup
On Wed, Mar 19, 2025 at 1:25 AM Jeremy Schneider
wrote:
> Maybe Postgres can be the first database to always ship support for the
> latest Unicode with each major version
Shipping the latest Unicode with our latest major version is existing
policy, as I understand it. I don't think we're breaking
On Tue, 18 Mar 2025 19:33:00 -0700
Jeff Davis wrote:
> If we compare the following two problems:
>
> A. With glibc or ICU, every text index, including primary keys, are
> highly vulnerable to inconsistencies after an OS upgrade, even if
> there's no Postgres upgrade; vs.
>
> B. With the bui
On Tue, 2025-03-18 at 21:34 -0400, Robert Haas wrote:
> But I could not disagree more strongly with the idea that this
> problem
> is 99% solved. That doesn't seem remotely true to me. I'm not sure
> the
> problem is 1% solved.
If we compare the following two problems:
A. With glibc or ICU, eve
On Tue, Mar 18, 2025 at 5:09 PM Jeff Davis wrote:
> I am not trying to be dismissive of the concerns raised in this thread,
> but I'd like others to understand that what they are asking for is a
> lot of work, and that the builtin collation provider solves 99% of it
> already. All this effort is t
On 3/18/25 16:30, Robert Haas wrote:
On Tue, Mar 18, 2025 at 3:50 PM Tom Lane wrote:
That approach works only if you sit on Unicode 15.1 *forever*.
The impracticality of that seems obvious to me. Sooner or later
you will need to update, and then you are going to suffer pain.
I completely agr
On Tue, 18 Mar 2025 08:53:56 -0700
Jeff Davis wrote:
> What do you think of Tom's argument that waiting to update Unicode is
> what creates the problem in the first place?
>
> "by then they might well have instances of the newly-assigned code
> points in their database"[1]
>
> [1]
> https://www
On Tue, 2025-03-18 at 14:45 -0400, Robert Haas wrote:
> I think Joe has the right idea. The way to actually provide the
> stability that people want here is to continue supporting old
> versions
> while adding support for new versions. Anything else we do works
> subject to assumptions: you can eit
On Tue, Mar 18, 2025 at 3:50 PM Tom Lane wrote:
> That approach works only if you sit on Unicode 15.1 *forever*.
> The impracticality of that seems obvious to me. Sooner or later
> you will need to update, and then you are going to suffer pain.
I completely agree.
> The short answer is that "im
Robert Haas writes:
> On Tue, Mar 18, 2025 at 2:55 PM Jeff Davis wrote:
>> Continuing on with Unicode 15.1 and accepting the unassigned code point
>> *cannot* prevent breakage.
> Under your definition, this is true, but I think Jeremy would define
> breakage differently. His primary concern, I e
On Tue, 2025-03-18 at 09:28 -0700, Jeremy Schneider wrote:
> We think case-insensitive indexes are probably uncommon, so as
> long as its "rare" we can let them break.
Let's define "break" in this context to mean that the constraints are
not enforced, or that the query doesn't return the results t
On Tue, Mar 18, 2025 at 11:54 AM Jeff Davis wrote:
> What do you think of Tom's argument that waiting to update Unicode is
> what creates the problem in the first place?
>
> "by then they might well have instances of the newly-assigned code
> points in their database"[1]
I know you weren't asking
On Sat, 2025-03-15 at 10:14 -0400, Joe Conway wrote:
> In the long term I think we should figure out how to support newer
> versions of unicode for the builtin, but in my mind that might
> involve
> the necessity of supporting multiple versions of unicode such that
> the
> choice remains to rema
On Mon, 17 Mar 2025 at 23:03, Jeff Davis wrote:
>
> On Sun, 2025-03-16 at 19:10 +0530, vignesh C wrote:
> > We currently have two Commitfest entries for the same thread at [1]
> > and [2]. Are both still necessary, or can we consolidate tracking
> > into
> > a single entry?
>
> I'm fine removing m
On Sat, 2025-03-15 at 12:15 -0400, Tom Lane wrote:
> In fact, on the analogy of timezones, I think we should not only
> adopt newly-published Unicode versions pretty quickly but push
> them into released branches as well.
That approach suggests that we consider something like my previous
STRICT_UN
Jeff Davis writes:
> That was discussed a few times, but:
> (a) That doesn't exactly solve the problem, because people still need
> indexes on LOWER() or CASEFOLD(); and
> (b) If we change IMMUTABLE to mean "returns the same results on every
> platform for all time", that would be too strict for
On Sun, 2025-03-16 at 19:10 +0530, vignesh C wrote:
> We currently have two Commitfest entries for the same thread at [1]
> and [2]. Are both still necessary, or can we consolidate tracking
> into
> a single entry?
I'm fine removing my CF entry, but unfortunately there's no "withdrawn
-- duplicate
On Sat, 2025-03-15 at 18:23 -0700, Jeremy Schneider wrote:
> Is the simple answer that functions & clauses related to both time
> zones and character semantics should just all be considered STABLE
> instead of IMMUTABLE?
That was discussed a few times, but:
(a) That doesn't exactly solve the prob
On Sat, 8 Mar 2025 at 02:41, Jeff Davis wrote:
>
> On Wed, 2025-03-05 at 20:43 -0600, Nathan Bossart wrote:
> > I see. Do we provide any suggested next steps for users to assess
> > the
> > potentially-affected relations?
>
> I don't know exactly where we should document it, but I've attached a
>
> On Mar 15, 2025, at 10:22 AM, Jeff Davis wrote:
>
> On Sat, 2025-03-15 at 12:15 -0400, Tom Lane wrote:
>> On the other hand, if we keep up with the Joneses by updating the
>> Unicode data, we can hopefully put those behavioral changes into
>> effect *before* they'd affect any real data.
>
>
On Fri, 2025-03-14 at 23:54 -0700, Jeremy Schneider wrote:
> From my perspective, the whole point of the builtin collation was to
> one option that avoids these problems that come with updating both
> ICU
> and glibc.
>
> So I guess the main point of the builtin provider just that it's
> faster
>
On Sat, 2025-03-15 at 12:15 -0400, Tom Lane wrote:
> On the other hand, if we keep up with the Joneses by updating the
> Unicode data, we can hopefully put those behavioral changes into
> effect *before* they'd affect any real data.
That's a good point.
Regards,
Jeff Davis
Jeremy Schneider writes:
> On Fri, 07 Mar 2025 13:11:18 -0800
> Jeff Davis wrote:
>> The change in Unicode that I'm focusing on is the addition of U+A7DC,
>> which is unassigned in Unicode 15.1 and assigned in Unicode 16, which
>> lowercases to U+019B. The examples assume that the user is using
>
On 3/15/25 03:26, Laurenz Albe wrote:
On Fri, 2025-03-14 at 23:54 -0700, Jeremy Schneider wrote:
On Fri, 07 Mar 2025 13:11:18 -0800
It seems the consensus is to update unicode in core... FWIW, I'm still
in favor of leaving it alone because ICU is there for when I need
up-to-date unicode versions
On Fri, 2025-03-14 at 23:54 -0700, Jeremy Schneider wrote:
> On Fri, 07 Mar 2025 13:11:18 -0800
> It seems the consensus is to update unicode in core... FWIW, I'm still
> in favor of leaving it alone because ICU is there for when I need
> up-to-date unicode versions.
Me too.
> From my perspective
On Fri, 07 Mar 2025 13:11:18 -0800
Jeff Davis wrote:
> On Wed, 2025-03-05 at 20:43 -0600, Nathan Bossart wrote:
> > I see. Do we provide any suggested next steps for users to assess
> > the
> > potentially-affected relations?
>
> I don't know exactly where we should document it, but I've atta
On 17.02.25 20:46, Jeff Davis wrote:
Note that the Unicode update has a few test diffs for NORMALIZE(),
please check to see if the tests themselves need an update.
I think these test result changes are incorrect. AFAICT, nothing has
changed in the upstream data that would explain such a chang
On Wed, 2025-03-05 at 20:43 -0600, Nathan Bossart wrote:
> I see. Do we provide any suggested next steps for users to assess
> the
> potentially-affected relations?
I don't know exactly where we should document it, but I've attached a
SQL file that demonstrates what can happen for a PG17->PG18 up
On Wed, Mar 05, 2025 at 03:34:06PM -0800, Jeff Davis wrote:
> On Wed, 2025-03-05 at 14:33 -0600, Nathan Bossart wrote:
>> + report_status(PG_WARNING, "warning");
>> + pg_log(PG_WARNING, "Your installation contains
>> relations that may be affected by a new version of Uni
On Wed, 2025-03-05 at 14:33 -0600, Nathan Bossart wrote:
> + report_status(PG_WARNING, "warning");
> + pg_log(PG_WARNING, "Your installation contains
> relations that may be affected by a new version of Unicode.\n"
> + "A list of potentially-affe
On Mon, Feb 17, 2025 at 11:46:43AM -0800, Jeff Davis wrote:
> Attached a version that rebases both patches. In my patch, I added a
> report_status().
I briefly looked at v2-0002, and the UpgradeTask usage looks correct to me.
Did you find it easy enough to use?
+ /*
+* The builtin p
On Mon, Feb 17, 2025 at 11:39:14AM -0800, Jeff Davis wrote:
> On Mon, 2024-11-18 at 13:58 +0900, Michael Paquier wrote:
>> Worth noting that unaccent.rules is unchanged after switching to
>> 16.0.0:
>> cd contrib/unaccent && make update-unicode
>
> What diffs are you seeing? I don't see any diffs
24 07:21:48 +0100
Subject: [PATCH v2 1/2] Update Unicode data to Unicode 16.0.0
---
src/Makefile.global.in| 2 +-
src/common/unicode/meson.build| 2 +-
src/include/common/unicode_case_table.h | 56 +-
src/include/common/unicode_categor
On Mon, 2024-11-18 at 13:58 +0900, Michael Paquier wrote:
> Worth noting that unaccent.rules is unchanged after switching to
> 16.0.0:
> cd contrib/unaccent && make update-unicode
What diffs are you seeing? I don't see any diffs to unaccent.rules
since Unicode 14.0.0.
Aside: it looks like that ta
On 05.02.25 22:47, Jeff Davis wrote:
(b) we should make reasonable attempts to mitigate potential
problems.
One idea for (b) resurfaced, which was to make a best-effort check at
pg_upgrade time for affected indexes. The check would not be
bulletproof, because we can't catch dependencies that
On Mon, 2024-11-11 at 07:27 +0100, Peter Eisentraut wrote:
> Here is the patch to update the Unicode data to version 16.0.0.
>
> Normally, this would have been routine, but a few months ago there
> was
> some debate about how this should be handled. [0] AFAICT, the
> consensus
> was to go ahead
On Wed, 2025-01-22 at 19:03 +0100, Peter Eisentraut wrote:
> Building a collation provider on this came much later. It was
> possibly
> a mistake how that was done.
It wasn't a mistake. "Stability within a PG major version" was called a
*benefit* near the top of the first email on the subject[1]
On Wed, 2025-01-22 at 19:08 +0100, Peter Eisentraut wrote:
> But I don't think it would be a compile-time decision. I think it
> would
> be a run-time selection, similar to the theorized multiple-ICU-
> versions
> feature. (Those two features might even go together, since a given
> ICU
> versi
On 20.01.25 22:39, Jeff Davis wrote:
On Fri, 2024-11-15 at 17:09 +0100, Peter Eisentraut wrote:
The practice of regularly updating the Unicode files is older than
the
builtin collation provider. It is similar to updating the time zone
files, the encoding conversion files, the snowball files, et
On 21.01.25 02:06, Jeremy Schneider wrote:
FWIW, after adding ICU support I personally don't think there's a
pressing need to continue updating the tables anymore.
That appears to ignore what these tables are actually used for. They
are used for Unicode normalization, which is used by SCRAM.
On Mon, 2025-01-20 at 17:06 -0800, Jeremy Schneider wrote:
> On the user side, my main concerns are the same as they've always
> been: 100% confidence that Postgres updates will not corrupt any data
> or cause incorrect query results
I'll add that, while 100% may be a good goal, it hasn't been the
On Mon, 2025-01-20 at 17:06 -0800, Jeremy Schneider wrote:
> FWIW, after adding ICU support I personally don't think there's a
> pressing need to continue updating the tables anymore.
I agree that it's not a pressing concern.
> If Postgres does go the path of multiple tables, does the community
>
On Mon, 20 Jan 2025 13:39:35 -0800
Jeff Davis wrote:
> On Fri, 2024-11-15 at 17:09 +0100, Peter Eisentraut wrote:
> > The practice of regularly updating the Unicode files is older than
> > the
> > builtin collation provider. It is similar to updating the time
> > zone files, the encoding conver
On Fri, 2024-11-15 at 17:09 +0100, Peter Eisentraut wrote:
> The practice of regularly updating the Unicode files is older than
> the
> builtin collation provider. It is similar to updating the time zone
> files, the encoding conversion files, the snowball files, etc. We
> need
> to move all o
On Wed, 2024-11-20 at 06:41 +0100, Laurenz Albe wrote:
> That looks like a nice idea, since it obviates the need to build
> PostgreSQL yourself if you want to use a non-standard copy of - say -
> the ICU library. You still have to build your own ICU library,
> though.
It would work with the built
On Tue, 2024-11-19 at 13:42 -0800, Jeff Davis wrote:
> On Tue, 2024-11-12 at 10:40 +0100, Laurenz Albe wrote:
> > I want to reiterate what I said in the above thread:
> > If that means that indexes on strings using the "builtin" collation
> > provider need to be reindexed after an upgrade, I am ver
On Tue, 2024-11-12 at 10:40 +0100, Laurenz Albe wrote:
> I want to reiterate what I said in the above thread:
> If that means that indexes on strings using the "builtin" collation
> provider need to be reindexed after an upgrade, I am very much
> against it.
How would you feel if there was a bette
On Mon, Nov 11, 2024 at 07:27:53AM +0100, Peter Eisentraut wrote:
> Here is the patch to update the Unicode data to version 16.0.0.
>
> Normally, this would have been routine, but a few months ago there was some
> debate about how this should be handled. [0] AFAICT, the consensus was to
> go ahea
On 12.11.24 10:40, Laurenz Albe wrote:
On Mon, 2024-11-11 at 14:52 -0500, Joe Conway wrote:
On 11/11/24 01:27, Peter Eisentraut wrote:
Here is the patch to update the Unicode data to version 16.0.0.
Normally, this would have been routine, but a few months ago there was
some debate about how th
On Mon, 2024-11-11 at 14:52 -0500, Joe Conway wrote:
> On 11/11/24 01:27, Peter Eisentraut wrote:
> > Here is the patch to update the Unicode data to version 16.0.0.
> >
> > Normally, this would have been routine, but a few months ago there was
> > some debate about how this should be handled. [0]
On 11/11/24 01:27, Peter Eisentraut wrote:
Here is the patch to update the Unicode data to version 16.0.0.
Normally, this would have been routine, but a few months ago there was
some debate about how this should be handled. [0] AFAICT, the consensus
was to go ahead with it, but I just wanted to
67 matches
Mail list logo