ith no users).
Beyond that, there seems to be some danger: if the syntax for rules is
not perfectly compatible between ICU versions, the user might run into
big problems.
Regards,
Jeff Davis
0001 and 0002), but I'd like to
hear what others think.
For historical reasons, users may assume that LC_COLLATE controls the
default collation order because that's true in libc. And if their
provider is ICU, they may be surprised that it doesn't. I believe w
On Tue, 2023-05-02 at 07:29 -0700, Noah Misch wrote:
> On Thu, Mar 30, 2023 at 08:59:41AM +0200, Peter Eisentraut wrote:
> > On 30.03.23 04:33, Jeff Davis wrote:
> > > Attached is a new version of the final patch, which performs
> > > canonicalization. I'm not 10
directly and rely on the server environment. But in those cases,
there's no way to set a provider at all, it's just relying on the
server environment. There aren't many of these cases, and hopefully we
can eliminate the reliance on the server environment over time.
If I'm missing something, let me know what cases you have in mind.
Regards,
Jeff Davis
ome back and forth, like checking
datlocprovider, then looking in the right fields and ignoring the wrong
ones.
Regards,
Jeff Davis
atter in some cases.
I'd just say that they are too confusing (likely to be misused), and
becoming obsolete (or less relevant), or something along those lines.
Otherwise, this is fine with me. I didn't do a detailed review because
it's just mechanical.
Regards,
Jeff Davis
indexes
or something.
> > * I don't understand what "kc" means if "ks" is not set to
> > "level1".
>
> There is an example here:
> https://peter.eisentraut.org/blog/2023/05/16/overview-of-icu-collation-settings#colcaselevel
Interesting, thank you.
Regards,
Jeff Davis
E
>
>
> In practice we're probably getting the "und" ICU locale whereas "fr"
> would
> be appropriate.
This is a good point and illustrates that ICU is not a drop-in
replacement for libc in all cases.
I don't see a solution here that doesn't involve some rough edges,
though. "Locale" is a generic term, and if we continue to insist that
it really means a libc locale, then ICU will never be on an equal
footing with libc, let alone the preferred provider.
Regards,
Jeff Davis
hich is not great for such a common locale name). ICU versions
63 and earlier recognize C.UTF-8 as en-US-u-va-posix (a.k.a.
en_US_POSIX), which has some adjustments to match expectations of C
sorting (e.g. upper case first).
* libc: problems as raised in this thread.
Regards,
Jeff Davis
s, where you can make better use of those concepts. I feel like
there are some interesting things that can be done with rules, but I
haven't had a chance to really dig in yet.
Regards,
Jeff Davis
On Thu, 2023-05-25 at 14:48 -0400, Tom Lane wrote:
> Jeff Davis writes:
> > What should we do with locales like C.UTF-8 in both libc and ICU?
>
> I vote for passing those to the existing C-specific code paths,
Great, this would be a big step toward solving the ICU usability
s of reasons). But I'm open to suggestion if someone knows a good
way to do it.
--
Jeff Davis
PostgreSQL Contributor Team - AWS
From 5c6d707a88c887641d551ed9a6983c74d6a82a7a Mon Sep 17 00:00:00 2001
From: Jeff Davis
Date: Tue, 18 Apr 2023 10:45:51 -0700
Subject: [PATCH] Fix search_path
On Fri, 2023-05-26 at 10:43 -0700, Jeff Davis wrote:
> We still need to consider backwards compatibility. If someone has a
> collation with locale name C.UTF-8 in an earlier version, any change
> to
> the interpretation of that locale name after an upgrade carries a
> corruption
ctly what I did in v6 of this series: I created a "none"
provider, and when someone specified provider=icu iculocale=C, it would
change the provider to "none":
https://www.postgresql.org/message-id/5f9bf4a0b040428c5db2dc1f23cc3ad96acb5672.camel%40j-davis.com
I'm fine with either approach.
Regards,
Jeff Davis
see what happens (older versions of
ICU would interpret it as en-US-u-va-posix; newer versions would give
the root locale).
b. Consistently interpret it as en-US-u-va-posix.
c. Don't pass it to the provider at all and treat it with memcmp
semantics.
Regards,
Jeff Davis
On Wed, 2023-06-07 at 23:28 +0200, Peter Eisentraut wrote:
> On 06.06.23 21:23, Jeff Davis wrote:
> > What about ICU? How should provider=icu locale=C.UTF-8 behave? We
> > could:
>
> It should be an error.
>
> > a. Just pass it to the provider and see what happens
subthread.
It also leaves the fundamental problem in place that LOCALE only
applies to the libc provider, which multiple people have agreed is not
acceptable.
Regards,
Jeff Davis
On Thu, 2023-06-08 at 00:11 +0200, Peter Eisentraut wrote:
> On 05.06.23 19:54, Jeff Davis wrote:
> > New patch series attached.
>
> Could you clarify what here is intended for 16 and what is for later?
I apologize about the patch churn here. I implemented several
approaches to se
ror. It's hard for me to estimate how
many users might be inconvenienced by that, but it sounds like a risk.
Perhaps for this specific case, and only in initdb, we change
C.anything and POSIX.anything to the builtin provider? CREATE DATABASE
and CREATE COLLATION could still reject such locales.
eems cleaner.
You also suggested that we consider switching the provider to libc any
time ICU doesn't support something. I'm not sure whether you meant a
static list (C, C.UTF-8, POSIX, ...?) or some kind of dynamic test. I'm
skeptical of being too smart here, but I'd like
gt; I'm inclined to agree that this is reasonable to desupport.
Committed.
> I bet we could skip forcing the search_path for maintenance commands
> run as
> the table owner, but such a discrepancy seems likely to cause far
> more
> confusion than anything else.
Agreed.
Regards,
Jeff Davis
ommand-line option, or GUC, etc. That way we can
> mark the old behaviour "deprecated", with a workaround for those who
> may desperately need it, and in another release or so, finally pull
> the plug on old behaviour.
That sounds wise, though others may not like the idea of a GUC just for
this change.
Regards,
Jeff Davis
ms,
best to take that out and reconsider in 17 if worthwhile.
Regards,
Jeff Davis
://www.postgresql.org/message-id/87sfb4gwgv.fsf%40news-spur.riddles.org.uk
[2]
https://www.postgresql.org/message-id/8a3dc06f-9b9d-4ed7-9a12-2070d8b01...@manitou-mail.org
--
Jeff Davis
PostgreSQL Contributor Team - AWS
From 065cdf57239280ef121b51d2616c0729946af9dd Mon Sep 17 00:00:00 2001
From: Je
fault
for "CREATE DATABASE ... TEMPLATE template0", which then becomes the
default provider for "CREATE COLLATION (LOCALE='...')".
--
Jeff Davis
PostgreSQL Contributor Team - AWS
From 329e32bfe5e1883a2cfd6e224c1d512b67256870 Mon Sep 17 00:00:00 2001
From: Jeff Dav
On Mon, 2023-06-12 at 23:04 +0200, Peter Eisentraut wrote:
> I object to adding a new provider for PG16 (patch 0001).
Added to July CF for 17.
> > 2. Patch 0004 is possibly out of scope for 16
> Also clearly a new feature.
Added to July CF for 17.
Regards,
Jeff Davis
On Fri, 2023-06-16 at 16:50 +0200, Peter Eisentraut wrote:
> This looks good to me.
>
> Attached is small fixup patch with some documentation tweaks and
> simplifying some test code (also includes pgperltidy).
Thank you. Committed with your fixups.
Regards,
Jeff Davis
Patch attached. Currently, the Makefile specifies NO_LOCALE=1, and the
meson.build does not.
--
Jeff Davis
PostgreSQL Contributor Team - AWS
From 1775c98badb94a2ee185d7a6bd11482a4e5db58a Mon Sep 17 00:00:00 2001
From: Jeff Davis
Date: Fri, 16 Jun 2023 11:51:00 -0700
Subject: [PATCH v1
ovider needs to be explicitly requested (as in
the current patch), it's still useful, so I don't think we need to
decide now.
We should also keep in mind that whatever provider is selected at
initdb time also becomes the default for future databases.
Regards,
Jeff Davis
,
but leave LC_CTYPE=C.UTF-8 as-is?
Regards,
Jeff Davis
ing
to the "true" semantics, if they are truly simple and well-defined and
stable. But I don't think ctype=C.UTF-8 is actually stable because new
characters can be added, right?
Regards,
Jeff Davis
later. But if the default collation provider goes
back to libc, the risk of ICU validation errors goes way down, so I
don't object if Peter would like to change it back to an ERROR.
Regards,
Jeff Davis
le from the table's owner is
an edge case in behavior and both make sense to me.
In the absense of a use case, I'd be inclined towards just being
consistent with the other privileges.
Regards,
Jeff Davis
the check for !skip_privs but need to add it to the flags in
vacuum_is_permitted_for_relation().
Regards,
Jeff Davis
fusing to users.
Regards,
Jeff Davis
On Tue, 2023-06-20 at 10:56 -0700, Nathan Bossart wrote:
> On Tue, Jun 20, 2023 at 10:49:27AM -0700, Nathan Bossart wrote:
> > Patch incoming...
>
> Attached.
Looks good to me.
Regards,
Jeff Davis
On Tue, 2023-06-20 at 12:16 -0400, Tom Lane wrote:
> Jeff Davis writes:
> > Status on collation loose ends:
>
> This all sounds good to me.
Patches attached.
0001 also removes the code to get a default locale when ICU is being
used, because that was a part of the same commit t
we could add some
explanation along the way about how the rule is constructed to match
EBCDIC, which would reduce the shock of a long rule like that.
I wonder why the rule syntax is such that it cannot be broken up? Would
it be incorrect for us to allow some whitespace in there?
Regards,
Jeff Davis
10) TO (20);
CREATE INDEX p_idx ON p (i);
CREATE INDEX special_idx ON p0 (j);
GRANT MAINTAIN ON p TO foo;
\c - foo
REINDEX TABLE p;
That would reindex p0_i_idx and p1_i_idx, but skip special_idx. That
might be too confusing, but feels a bit more consistent permissions-
wise.
Regards,
Jeff Davis
, we might also consider making REINDEX work a bit more like
> VACUUM
> and ANALYZE and emit a WARNING for any relations that the user is not
> permitted to process. But this probably deserves its own thread, and
> it
> might even need to wait until v17.
Yes, we can revisit for 17.
Regards,
Jeff Davis
ch is new, so no breakage. And if someone is using the MAINTAIN
privilege, they wouldn't be able to abuse the search_path, so it would
close the hole.
Patch attached (created a bit quickly, but seems to work).
Regards,
Jeff Davis
[1]
https://postgr.es/m/CAKFQuwaVJkM9u%
table.
At some point in the very near future (though I realize that point may
come after version 16), we need to lock down the search path in a lot
of cases (not just maintenance commands), and I don't see any way
around that.
Regards,
Jeff Davis
.build does not need to, either.
Regards,
Jeff Davis
if (!vacuum_is_relation_owner(relid, classForm,
options))
+ continue;
in get_all_vacuum_rels() whereas your patch left it out -- double-check
that we're doing the right thing there.
Also remember to bump the catversion. Other than that, it looks good to
me.
Regards,
Jeff Davis
'@' < \' < '=' < '"'
> < a < b < c < d < e < f < g < h < i
> < j < k < l < m < n < o < p < q < r
> < '~' < s < t < u < v < w < x < y < z
> < '[' < '^' < ']'
> < '{' < A < B < C < D < E < F < G < H < I
> < '}' < J < K < L < M < N < O < P < Q < R
> < '\' < S < T < U < V < W < X < Y < Z
> < 0 < 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9
> $$);
That looks much nicer and would go nicely in the documentation along
with some explanation.
Regards,
Jeff Davis
On Fri, 2023-05-26 at 16:21 -0700, Jeff Davis wrote:
> Maintenance commands (ANALYZE, CLUSTER, REFRESH MATERIALIZED VIEW,
> REINDEX, and VACUUM) currently run as the table owner, and as a
> SECURITY_RESTRICTED_OPERATION.
>
> I propose that we also fix the search_path to "
evel I suspect we want lexical scoping, which is what most of us
> have in our programming languages, in the database; but the database
> has many elements of dynamic scoping, and changing that is both a
> compatibility break and requires significant changes in the way the
> database is designed.
Does that suggest another approach?
Regards,
Jeff Davis
is accepting it?
If some libc implementations are too permissive, I might need to just
disable this test. But if we can find a locale that is consistently
acceptable in ICU but invalid in libc, then I can keep it... perhaps
"und@colStrength=primary"?
Regards,
Jeff Davis
bly
indicates a user mistake). I don't think this is a practical problem
any more.
Regards,
Jeff Davis
of weirdness.
Also I'm not quite sure how quickly my search_path fix will be
committed. Hopefully soon, because the current state is not great, but
it's hard for me to say for sure.
Regards,
Jeff Davis
or a
non-C locale.
A GUC might be a better default, and we could have CREATE COLLATION
default to ICU if the server is built with ICU and if PROVIDER,
LC_COLLATE and LC_CTYPE are unspecified.
Regards,
Jeff Davis
On Sat, 2023-07-08 at 07:04 +1200, Thomas Munro wrote:
> Doesn't look too hopeful: https://man.openbsd.org/setlocale.3
Hmm. I could try using a bogus encoding, but that may be too clever.
I'll just remove the test.
Regards,
Jeff Davis
be a table of methods,
which means we can add an extension hook to provide a different method
table. That still requires more work, I'm just mentioning it here for
context.
Regards,
Jeff Davis
From 6f0c0a9e05039cd295c6c090b3d98d381244b35c Mon Sep 17 00:00:00 2001
From: Jeff Davis
Date
yway (which also materializes it; see
tts_virtual_copyslot()) at heapam.c:2710?
* After correcting the memory issues, can you get updated performance
numbers for COPY?
Regards,
Jeff Davis
ator
over the buffered tuples to the caller. The caller can then use the
iterator to insert into indexes, return a tuple to the executor, etc.,
and then release the iterator when done (freeing the buffer). That
control flow is less convenient for most callers, though, so perhaps
that should be optional?
Regards,
Jeff Davis
ad-of-row triggers, and volatile
functions in the query. We could also just consider RETURNING another
restriction, which could be lifted later by implementing the logic in
the callback (as described above) without an API change.
Regards,
Jeff Davis
using the
callback to copy tuples into the caller's context.
In 0003, why do you need the global insert_modify_buffer_flush_context?
0004 is the only place that calls table_modify_buffer_flush(). Is that
really necessary, or is automatic flushing enough?
Regards,
Jeff Davis
On Mon, 2024-08-26 at 14:18 -0700, Jeff Davis wrote:
> 0001 implementation issues:
>
> * We need default implementations for AMs that don't implement the
> new
> APIs, so that the AM will still function even if it only defines the
> single-tuple APIs. If we need
useful and seems relatively easy -- A JOIN B or B
JOIN A (though there's some nuance about when you try to make that
decision).
The latter requires controlling an explosion of possibilities, and
would be an entirely different kind of hook.
Regards,
Jeff Davis
where there's enough context to
know what's happening. There could be many such hooks, but I suspect
only a handful of important ones.
This idea allows the extension author to preserve the right paths long
enough to use set_rel_pathlist_hook/set_join_pathlist_hook, which can
editorialize on costs or do its own pruning.
Regards,
Jeff Davis
On Wed, 2024-08-28 at 16:35 -0400, Robert Haas wrote:
> On Wed, Aug 28, 2024 at 4:29 PM Jeff Davis wrote:
> > Preserving a path for the right amount of time seems like the
> > primary
> > challenge for most of the use cases you raised (removing paths is
> > easier tha
nd it still requires a solution for #4).
Regards,
Jeff Davis
[1]
https://www.postgresql.org/docs/devel/trigger-datachanges.html
to
hold onto multiple paths for longer, similar to pathkeys, which might
offer some benefits or simplifications.
Regards,
Jeff Davis
[1]
https://www.postgresql.org/message-id/CA+TgmoZQyVxnRU--4g2bJonJ8RyJqNi2CHpy-=nwwbtnpaj...@mail.gmail.com
formance goes, I'm only looking at branch in add_path()
that calls compare_pathkeys(). Do you have some example queries which
would be a worst case for that path?
In general if you can post some details about how you are measuring,
that would be helpful.
Regards,
Jeff Davis
On Wed, 2024-08-28 at 18:43 +0200, Andreas Karlsson wrote:
> On 8/15/24 12:55 AM, Jeff Davis wrote:
> > This overlaps a bit with what Peter already proposed here:
> >
> > https://www.postgresql.org/message-id/4f562d84-87f4-44dc-8946-01d6c437936f%40eisentraut.org
> >
Committed v2-0001.
On Tue, 2024-09-03 at 22:04 -0700, Jeff Davis wrote:
> * This patch may change the handling of collation oid 0, and I'm not
> sure whether that was intentional or not. lc_collate_is_c(0) returned
> false, whereas pg_newlocale_from_collation(0)->collate_is_c r
Regards,
Jeff Davis
On Mon, 2024-07-15 at 13:44 -0400, Robert Haas wrote:
> But ... why? I mean, what's the point of prohibiting that?
Agreed. We ignore all kinds of stuff in search_path that doesn't make
sense, like non-existent schemas. Simpler is better.
Regards,
Jeff Davis
for the
session, or on a function that's not part of an extension.
On re-reading, I see that you mean it should work if they explicitly
set it as a part of a function that *is* part of an extension. And I
agree with that -- just make it work.
Regards,
Jeff Davis
ctions.
We've been following (A), and that's the defacto policy today[3][4].
Noah and Laurenz argued[5] that the policy starting in version 18
should be (B). Given that it's a policy decision that affects more than
just the builtin collation provider, I'd like to discus
o in a complete way, and hard to do with much
accuracy. I don't oppose it though -- if someone finds a way to provide
enough information to be useful, then that's fine with me.
Regards,
Jeff Davis
b2c332c47e3e0a67f0640b49c.ca...@j-davis.com
Regards,
Jeff Davis
bd45b2c332c47e3e0a67f0640b49c.camel%40j-davis.com
which seems like a more direct (and more complete) path to a resolution
of your concerns. I speak only for myself, but I assure you that I have
an open mind in that discussion, and that I have no intention force a
Unicode update past objections.
x27;t engage in the version 18 policy
discussion.
> Maybe someone will change
> something in v18 so it's not like that, but don't count on it.
That's backwards. If nothing happens in v18, then there will be no
breaking Unicode change. It takes an active step by a
above is an accurate characterization. There's plenty of opportunity
for deliberation and compromise in version 18, and my mind is still
open to pretty much everything, up to and including freezing Unicode
updates if necessary[3].
Regards,
Jeff Davis
[1]
https://www.postgresql.org/m
version 18
like normal, because there's no actual problem now, I see no reason
your objections would be taken less seriously later.
Regards,
Jeff Davis
[1]
https://www.postgresql.org/message-id/d75d2d0d1d2bd45b2c332c47e3e0a67f0640b49c.camel%40j-davis.com
e
more that you say so in the policy thread here:
https://www.postgresql.org/message-id/d75d2d0d1d2bd45b2c332c47e3e0a67f0640b49c.camel%40j-davis.com
which would get broader visibility and I believe provide you with
stronger assurances that *everyone* will be careful with Unicode
updates.
Regards,
In other words, it would be reviewed like any other
change.
Ideally, some new developments would make it less worrisome, and
Unicode updates could become more routine. I have some ideas, which I
can propose in separate threads. But for now, I don't see a reason to
rush Unicode updates.
Regards,
Jeff Davis
lways completely replaced, but the way you
can call pg_set_attribute_stats() doesn't imply that -- calling
pg_set_attribute_stats(..., most_common_vals => ..., most_common_freqs
=> ...) looks like it would just replace the most_common_vals+freqs and
leave histogram_bounds as it was, but it actually clears
histogram_bounds, right? Should we make that work or should we document
better that it doesn't?
Regards,
Jeff Davis
's mostly a
theoretical problem because, at least in my experience, I can't recall
ever seeing an index that would be affected.
Regards,
Jeff Davis
rings and EXECUTE them.
Though perhaps not impossible if we use some kind of runtime detection.
We could have some kind of global context that tracks, at runtime, when
an expression is executing for the purposes of an index. If a function
depends on a versioned collation, then mark the index or add a version
somewhere.
Regards,
Jeff Davis
obody has commented yet.)
Regards,
Jeff Davis
x27;t be imported from
an old version into a new version because it's either gone or the
meaning has changed too much. But that argument doesn't apply to a
bogus call, where the name/value pairs get misaligned or something.
Regards,
Jeff Davis
ffers code point order collation combined with
Unicode ctype semantics.
With PG17, between ICU and the builtin provider, there's little
remaining reason to use libc (aside from legacy).
Regards,
Jeff Davis
tes, so primary keys will never be affected. The risks
we are talking about are for expression indexes, e.g. on LOWER(). Even
if you do have such expression indexes, the types of changes Unicode
makes to casing and character properties are typically much more mild.
Regards,
Jeff Davis
code are intolerable,
and only for PG_C_UTF8?
Regards,
Jeff Davis
rité documented[1] cases where the libc C.UTF-8 locale changed
the *sort* behavior, thereby affecting primary keys.
Regards,
Jeff Davis
[1]
https://www.postgresql.org/message-id/8a3dc06f-9b9d-4ed7-9a12-2070d8b0165f%40manitou-mail.org
;s much more tractable to review your expression indexes and look for
problems (not ideal, but better). Also, as Peter points out, CTYPE
changes are typically more narrow, so there's a good chance that
there's no problem at all.
Regards,
Jeff Davis
re you
rebuild/fix objects to use the new collation, and when that's done then
you change the default so that queries use version 2. How does all that
work?
Regards,
Jeff Davis
logy doesn't quite capture this distinction. I don't mean to
over-emphasize this point, but I do think we need to keep some
perspective here.
But I agree with your general point that we shouldn't dismiss the
problem just because it's minor. We should expect the problem to
surface at some point and be reasonably prepared.
Regards,
Jeff Davis
On Wed, 2024-07-24 at 14:47 -0400, Robert Haas wrote:
> On Wed, Jul 24, 2024 at 1:45 PM Jeff Davis wrote:
> > There's a qualitative difference between a collation update which
> > can
> > break your PKs and FKs, and a ctype update which definitely will
> > not.
>
stics and control, the control parameters will be:
I don't like the idea of mixing statistics and control parameters in
the same list.
I do like the idea of returning a set, but I think it should be the
positive set (effectively a representation of what is now in the
pg_stats view) and any ignored settings would be output as WARNINGs.
Regards,
Jeff Davis
pg_database will be locale-
related?
Regards,
Jeff Davis
; > {
> >
> > The patch sequencing might be a bit tricky here. Maybe it's ok if
> > patch 0004 stays as is in this respect if 0006 were to fix it back.
Addressed in v3-0006.
> > * v2-0005-Avoid-setlocale-in-lc_collate_is_c-and-lc_ctype_i.patch
> >
> &g
gt; Also is there any reaosn you do not squash th 4th and the 6th patch?
Done. I had to rearrange the patch ordering a bit because prior to the
cache refactoring patch, it's unsafe to call
pg_newlocale_from_collation() without checking lc_collate_is_c() or
lc_
On Thu, 2024-06-20 at 17:07 +0700, John Naylor wrote:
> On Sat, Jun 15, 2024 at 6:46 AM Jeff Davis wrote:
> > Attached is a patch to use simplehash.h instead, which speeds
> > things up
> > enough to make them fairly close (from around 15% slower to around
> > 8%).
&
nds of
functions between releases. Even if the signatures remain the same, the
parse structures may change, which creates similar incompatibilities.
So let's just get rid of the 'params' argument from both functions.
Regards,
Jeff Davis
On Thu, 2024-07-25 at 13:29 -0700, Jeff Davis wrote:
> it may be a good idea to version collation and ctype
> separately. The ctype version is, more or less, the Unicode version,
> and we know what that is for the builtin provider as well as ICU.
Attached a rough patch for the pu
is
> ready for committer.
Committed, thank you.
> And then we can discuss after committing if an additional cache of
> the
> last locale is worth it or not.
Yeah, I'm holding off on that until refactoring in the area settles,
and we'll see if it's still worth it.
Regards,
Jeff Davis
etlocale(). I
changed this to lookup the collation and then use pg_strxfrm(). That
should improve histogram selectivity estimates because it uses the
correct provider, rather than relying on setlocale(), right?
New series attached.
Regards,
Jeff Davis
From 5b903c82f34f5da9cab58ecd0a268345
1001 - 1100 of 1501 matches
Mail list logo