ference
limitation, and that needs to be solved regardless.
After the type inference figures out what the right type is, then I
think you're right that an OID is not required to track it, and however
we do track it should be able to reuse some of the existing
infrastructure for dealing with rec
On Thu, 2023-10-26 at 16:49 +0200, Peter Eisentraut wrote:
> On 25.10.23 20:32, Jeff Davis wrote:
> > But what should the result of UPPER('á' COLLATE UCS_BASIC) be? In
> > Postgres, the answer is 'á', but intuitively, one could reasonably
> > expect the an
On Thu, 2023-10-26 at 09:21 -0700, Jeff Davis wrote:
> Our initcap() is not defined in the standard, and we document that it
> only differentiates between alphanumeric and non-alphanumeric
> characters, so we could get that behavior pretty easily as well. If
> we
> wanted to do it
ing_8h.html#aa64fbd4ad23af84d01c931d7cfa25f89
See also the part about tailorings here:
https://www.unicode.org/versions/Unicode15.1.0/ch03.pdf#G33992
Regards,
Jeff Davis
ot;builtin"
provider I proposed earlier. If the behavior does change with a new
Unicode version it would be easier to see and less likely to affect on-
disk structures than a collation change.
Regards,
Jeff Davis
On Thu, 2023-10-26 at 16:28 -0500, Nathan Bossart wrote:
> On Fri, Aug 18, 2023 at 02:44:31PM -0700, Jeff Davis wrote:
> > + SET search_path = admin, "!pg_temp";
>
> I think it's unfortunate that these new identifiers must be quoted.
> I
> wonder i
d be good to comment that the function still works past
the flush pointer, and that it will be safe to remove later (right?).
* An "Assert(!RecoveryInProgress())" would be more appropriate than
an error. Perhaps we will remove even that check in the future to
achieve cascaded replication of unflushed data.
Regards,
Jeff Davis
On Thu, 2023-09-21 at 14:33 -0700, Jeff Davis wrote:
> I have attached an updated patch. Changes:
Withdrawing this from CF due to lack of consensus.
I'm happy to resume this discussion if someone sees a path forward to
make it easier to secure the search_path; or at least help warn user
On Mon, 2023-10-16 at 20:32 -0700, Jeff Davis wrote:
> On Wed, 2023-10-11 at 08:56 +0200, Peter Eisentraut wrote:
> > We need to be careful about precise terminology. "Valid" has a
> > defined
> > meaning for Unicode. A byte sequence can be valid or not as UTF
On Fri, 2023-07-21 at 15:32 -0700, Jeff Davis wrote:
> Attached is a new version.
Do we still want to do this?
Right now, the MAINTAIN privilege is blocking on some way to prevent
malicious users from abusing the MAINTAIN privilege and search_path to
acquire the table owner's privileg
t I can add the patch
> to the commitfest.
>
> Tiny as the patch is, I don't want it to fall between the cracks.
Committed with adjusted wording. Thank you!
--
Jeff Davis
PostgreSQL Contributor Team - AWS
(sorry to backtrack yet again...)? It couldn't be used in an
arbitrary expression, but that also means that it couldn't end up in
the wrong kind of expression.
Regards,
Jeff Davis
rned about the call going through a function
pointer? If so, is it possible that setting a flag and then branching
would be better?
Also, if it's a concern, should we also consider making an inlineable
version of pg_comp_crc32c_sse42()?
Regards,
Jeff Davis
On Tue, 2023-10-31 at 12:45 +0100, Vik Fearing wrote:
> On 10/24/23 21:10, Jeff Davis wrote:
> > Can we revisit the idea of a per-WHEN RETURNING clause?
>
> For the record, I dislike this idea.
I agree that it makes things awkward, and if it creates grammatical
problems as well
parse analysis code, and
> a
> lot more if you grep more widely across the whole of the backend
> code.)
If you can point to a precedent, then I'm much more inclined to be OK
with the implementation.
Regards,
Jeff Davis
which would compromise the index structure and any
constraints using that index. But that problem is more bounded, at
least. ]
> After that, change search_path on function invocation as usual
> rather than having special rules for what happens when a function is
> invoked during a m
mple examples with gcc
at -O2, which seem to emit the loads/stores where expected.
What is the guidance here? Is the volatile pointer use in
AdvanceXLInsertBuffer() required, and if so, why not other places?
Regards,
Jeff Davis
7;t think that's true right now: AdvanceXLInsertBuffers() zeroes the
old page before updating xlblocks[nextidx]. I think it needs something
like:
pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], InvalidXLogRecPtr);
pg_write_barrier();
before the MemSet.
I didn't review your latest v14 patch yet.
Regards,
Jeff Davis
unicode_category.c to @pgcommonallfiles in
Mkvcbuild.pm. I'll do a trial commit tomorrow and see if that fixes it
unless someone has a better suggestion.
Regards,
Jeff Davis
push that to fix the MSVC buildfarm
> members.
>
> Sorry for the duplicate effort and/or stepping on your toes.
Thank you, no apology necessary.
Regards,
Jeff Davis
On Fri, 2023-11-03 at 17:11 +0700, John Naylor wrote:
> On Sat, Oct 28, 2023 at 4:15 AM Jeff Davis wrote:
> >
> > I plan to commit something like v3 early next week unless someone
> > else
> > has additional comments or I missed a concern.
>
> Hi Jeff, is the C
nceXLInsertBuffer(), 3)
> the following sanity check to see if the read page is valid in
> XLogReadFromBuffers(). If it sounds sensible, I'll work towards
> coding
> it up. Thoughts?
I like it. I think it will ultimately be a fairly simple loop. And by
moving to atomics, we won't need the delicate comment in
GetXLogBuffer().
Regards,
Jeff Davis
r to me exactly why that matters.
Intuitively, access through a local pointer seems much more likely to
be optimized and therefore more dangerous, but that doesn't imply that
access through global variables is not dangerous.
Regards,
Jeff Davis
because I'm suggesting that he
can avoid the WALBufMappingLock to reduce the risk of a regression. In
the process, we'll probably get rid of that unnecessary "volatile" in
AdvanceXLInsertBuffer().
Regards,
Jeff Davis
Assert(!XLogRecPtrIsInvalid(EndPtr));
Can that really happen? If the EndPtr is invalid, that means the page
is in the process of being cleared, so the contents of the page are
undefined at that time, right?
Regards,
Jeff Davis
her way search path can be changed, which adds to the complexity.
Also, by default it's "$user", public; and given that "public" was
world-writable until recently, that doesn't seem like a good idea for a
change intended to prevent search_path manipulation.
Regards,
Jeff Davis
Granted, there are reasons to want an index to have a particular
collation, in which case it makes sense to opt-in to #2. But in the
common case, the high performance costs and dependency versioning risks
aren't worth it.
Thoughts?
Regards,
Jeff Davis
> than the user's direct request (e.g. DISTINCT/GROUP BY, merge
> joins).
+1. Where "cheaper" comes from is an interesting question -- is it a
property of the provider or the specific collation? Or do we just call
"C" special?
Regards,
Jeff Davis
On Mon, 2023-11-13 at 13:43 +0100, Peter Eisentraut wrote:
> On 11.11.23 01:03, Jeff Davis wrote:
> > But the database collation is always deterministic,
>
> So far!
Yeah, if we did that, clearly the index collation would need to match
that of the database to be useful. Wh
;
> I'd think the specific collation. Even if we initially perhaps just
> get the
> default cost from the provider such, it structurally seems the sanest
> place to
> locate the cost.
Makes sense, though I'm thinking we'd still want to special case the
fastest collation as "C".
Regards,
Jeff Davis
course, if we feel entitled to create the primary key index with a
> collation of our choosing, that'd make this unpredictable.
I wouldn't describe it as "unpredictable". We'd have some defined way
of defaulting the collation of an index which might be affected by a
database option or something. In any case, it would be visible with \d.
Regards,
Jeff Davis
>
On Tue, 2023-11-14 at 17:15 +0100, Peter Eisentraut wrote:
> On 14.11.23 02:58, Jeff Davis wrote:
> > If the user just wants PK/FK constraints, and equality lookups,
> > then an
> > index with the "C" collation makes a lot of sense to serve those
> > purposes.
e in this
thread, that's less useful than it may seem at first (text indexes are
often uncorrelated). It seems valid to offer this as a trade-off that
users can make.
Regards,
Jeff Davis
be easy to block where necessary.
Regards,
Jeff Davis
On Thu, 2023-10-19 at 19:01 -0700, Jeff Davis wrote:
> 0003: Cache for recomputeNamespacePath.
Committed with some further simplification around the OOM handling.
Instead of using MCXT_ALLOC_NO_OOM, it just temporarily sets the cache
invalid while copying the string, and sets it valid ag
ant under all possible values of
> search_path. If you care about your function behaving the same way
> all
> the time, you have to set the search_path.
After adding the search path cache (recent commit f26c2368dc) hopefully
that helps to make the above suggestion more reasonable performance-
wise. I think we can call that progress.
Regards,
Jeff Davis
k this answers my earlier question. Now that I think about
> this, the one confusing thing with this syntax is that it seems to
> assign the collation to the constraint, but in reality we want the
> constraint to be enforced with the column's collation and the
> alternative collation is for the index.
Yeah, let's be careful about that. It's still technically correct:
uniqueness in either collation makes sense. But it could be confusing
anyway.
> >
Regards,
Jeff Davis
On Tue, 2023-11-14 at 20:13 -0800, Jeff Davis wrote:
> On Thu, 2023-10-19 at 19:01 -0700, Jeff Davis wrote:
> > 0003: Cache for recomputeNamespacePath.
>
> Committed with some further simplification around the OOM handling.
While I considered OOM during hash key initialization
Right now, if allocation fails while growing a hashtable, it's left in
an inconsistent state and can't be used again.
Patch attached.
--
Jeff Davis
PostgreSQL Contributor Team - AWS
From 82068d744f668039de7249854bc42eead4e77ebc Mon Sep 17 00:00:00 2001
From: Jeff Davis
Date: F
I had briefly experimented changing the hash table in guc.c to use
simplehash. It didn't offer any measurable speedup, but the API is
slightly nicer.
I thought I'd post the patch in case others thought this was a good
direction or nice cleanup.
--
Jeff Davis
PostgreSQL Contributor
ter to his
> argument that
> we could just use "C" for such indexes.
I am saying we shouldn't prematurely optimize for the case of ORDER BY
on a text PK case by making a an index with a non-"C" collation, given
the costs and risks of non-"C" indexes. Particularly because, even if
there is an ORDER BY, there are several common reasons such an index
would not help anyway.
> > > >
Regards,
Jeff Davis
518f791168bc6fb653d1f95f4d.ca...@j-davis.com
Regards,
Jeff Davis
On Fri, 2023-11-17 at 12:13 -0800, Andres Freund wrote:
> On 2023-11-17 10:42:54 -0800, Jeff Davis wrote:
> > Right now, if allocation fails while growing a hashtable, it's left
> > in
> > an inconsistent state and can't be used again.
>
> I'm not ag
n other potential improvements/mitigations and see if I can
make progress somewhere else.
Regards,
Jeff Davis
ng both hsearch.h and simplehash.h for
overlapping use cases indefinitely, then I'll drop this.
Regards,
Jeff Davis
e might not rewrite hsearch. But simplehash was never meant
> to be a universal solution.
OK, I will withdraw the patch until/unless it provides a concrete
benefit.
Regards,
Jeff Davis
l see if I can solve the
case-folding slowness first, and then maybe it will be measurable.
Regards,
Jeff Davis
fails, case-fold and try again. I'll hack up
a patch -- I believe that would be measurable for the proconfigs.
Regards,
Jeff Davis
002 and 0001):
master: 7899ms
0001: 7850
0002: 7958
0003: 7942
0004: 7549
0005: 7411
I'm inclined toward all of these patches. I'll also look at adding
SH_STORE_HASH for the search_path cache.
Looks like we're on track to bring the overhead of SET search_p
ting a few
PANIC sites at a time? Is it fine to leave plain PANICs in place for
the foreseeable future, or do you want all of them to eventually move?
Regards,
Jeff Davis
t as a best practice in multi-user environments".
Regards,
Jeff Davis
ot;could not locate a valid checkpoint record"),
errabort(false),errrestart(false)));
Regards,
Jeff Davis
On Thu, 2023-11-16 at 16:46 -0800, Jeff Davis wrote:
> While I considered OOM during hash key initialization, I missed some
> other potential out-of-memory hazards. Attached a fixup patch 0003,
> which re-introduces one list copy but it simplifies things
> substantially in addition to
,
Jeff Davis
From b878af835da794f3384f870db57b34e236b1efba Mon Sep 17 00:00:00 2001
From: Jeff Davis
Date: Mon, 20 Nov 2023 17:42:07 -0800
Subject: [PATCH] Add SH_OPTIMIZE_REPEAT option to simplehash.h.
Callers which expect to look up the same value repeatedly can specify
SH_OPTIMIZE_REPEAT
sh, and see where it tends to win.
The caller can also save the hash and pass it down, but that's not
always convenient to do.
Regards,
Jeff Davis
might benefit some other callers?
Regards,
Jeff Davis
re likely to benefit we can
reconsider.
Though it makes it easy to test a few other callers, just to see what
numbers appear.
Regards,
Jeff Davis
tor.
> If we want to have a GUC that
> allows warning behavior, I think that's OK but I think it should be
> superuser-only and documented as a "developer" setting similar to
> zero_damaged_pages.
A GUC seems sensible to express the availability-vs-safety trade-off. I
su
o it will be a long time before it's used widely
enough to consider the problem solved.
And even after all of that, ICU is not perfect, and our support for it
still has various rough edges.
Regards,
Jeff Davis
false positives and false negatives.
We'd need to document the setting so that users understand the
consequences and limitations.
I won't push strongly for such a setting to exist because I know that
it's far from a complete solution. But I believe it would be sensible
considering that this problem is going to take a while to resolve.
Regards,
Jeff Davis
nd packaging infrastructure, that is not very practical.
Regards,
Jeff Davis
se ICU is not allowed for that
encoding), but I'd like it if we could make this infrastructure
independent of ICU, because I have some follow-up proposals to simplify
character classification here and in ts_locale.c.
Thoughts?
Regards,
Jeff Davis
s.com
which optimizes exact hits (most GUC names are already folded) before
trying case folding?
Regards,
Jeff Davis
ial/clever with the hash functions. We would still want the
faster hash for C-strings, but that's general and helps all callers.
But you're right that it's more code, and that's not great.
Regards,
Jeff Davis
om
around ~7300ms to ~6800ms.
This doesn't seem very controversial or complex, so I'll probably
commit this soon unless someone else has a comment.
--
Jeff Davis
PostgreSQL Contributor Team - AWS
From 906cb1cdf42f92090d4a9acf296098ec3bfa53e0 Mon Sep 17 00:00:00 2001
From: Jeff Davis
Da
the cast.
Regards,
Jeff Davis
From 72b00b1b094945845e4ea4d427e426eafd5650c2 Mon Sep 17 00:00:00 2001
From: Jeff Davis
Date: Mon, 4 Dec 2023 16:20:05 -0800
Subject: [PATCH v2] Cache opaque handle for GUC option to avoid repeasted
lookups.
When setting GUCs from proconfig, perfo
On Mon, 2023-11-20 at 17:13 -0800, Jeff Davis wrote:
> Will commit 0005 soon.
Committed.
> I also attached a trivial 0006 patch that uses SH_STORE_HASH. I
> wasn't
> able to show much benefit, though, even when there's a bucket
> collision. Perhaps there just aren
find it hard
> to
> follow.
OK. I am fine with (a).
Regards,
Jeff Davis
I'm not inclined to commit this in its current form but if someone
thinks that it's a worthwhile direction, I can clean it up a bit and
reconsider.
Regards,
Jeff Davis
From e48a54d9880ab65a1e5ad6d136b849bda2e4554e Mon Sep 17 00:00:00 2001
From: Jeff Davis
Date:
On Tue, 2023-12-05 at 11:58 -0800, Jeff Davis wrote:
> Also, I forward-declared config_generic in guc.h to eliminate the
> cast.
Looking more closely, I fixed an issue related to placeholder configs.
We can't return a handle to a placeholder, because it's not stable, so
in that ca
e there's a way to
use a static buffer to even avoid the palloc() in get_str_from_var()?
Not sure these are worth the effort; just brainstorming.
In any case, +1 to your simple change.
Regards,
Jeff Davis
to be ahead of
the
+ * page we're looking for. Don't PANIC on that, until we've verified
the
+ * value while holding the lock.
Is that still true even without a torn read?
The code for 0001 itself looks good. These are minor concerns and I am
inclined to commit something like it fa
earch path cache, and
there's a significant speedup for cases not benefiting from a86c61c9ee.
It's enough that we almost don't need a86c61c9ee. So a definite +1 to
the new APIs.
Regards,
Jeff Davis
t if you want to commit that piece now, but I hesitate
> to
> call it a performance improvement on its own.
>
> - The runtime measurements I saw reported were well within the noise
> level.
> - The memory usage starts out better, but with more entries is worse.
I suppose I'll wa
oing in the attached path is using part of the key as
the seed. Is that a good idea or should the seed be zero or come from
somewhere else?
Regards,
Jeff Davis
From a30e5f0ea580fb5038eb90e862f697b557627f32 Mon Sep 17 00:00:00 2001
From: Jeff Davis
Date: Fri, 8 Dec 2023 12:14:27 -0800
Subj
nd ICU
select 'Σ' ~* 'ς'; -- true in both libc and ICU
Similarly for titlecase variants:
select 'Dž' ~* 'dž'; -- false in libc and ICU
select 'dž' ~* 'Dž'; -- true in libc and ICU
If we do the case mapping ourselves, we can make those work. We'd just
have to modify the APIs a bit so that allcases() can actually get all
of the case variants, rather than relying on just towupper/towlower.
Regards,
Jeff Davis
at
> we're not going to every support newly added Unicode characters like
> Latin Glottals.
If, by "version it", you mean "update the data tables in new Postgres
versions", then I agree. If you mean that one PG version would need to
support many versions of Unicode, I don't agree.
Regards,
Jeff Davis
[5]
https://postgr.es/m/c5e9dac884332824e0797937518da0b8766c1238.ca...@j-davis.com
[6] https://www.unicode.org/policies/stability_policy.html#Case_Folding
On Wed, 2023-12-13 at 16:34 +0100, Daniel Verite wrote:
> But there are CLDR mappings on top of that.
I see, thank you.
Would it still be called "full" case mapping to only use the mappings
in SpecialCasing.txt? And would that be useful?
Regards,
Jeff Davis
ithout this additional
tailoring.
You are correct that ICU will still have some features that won't be
supported by the builtin provider. Better word boundary semantics in
INITCAP() are another advantage.
Regards,
Jeff Davis
should support case
folding.)
> And I have no idea if or when
> glibc might have picked up the new unicode characters.
That's a strong argument in favor of a builtin provider.
Regards,
Jeff Davis
n't consistent with
each other. ICU, libc, and the builtin provider will all be based on
different versions of Unicode. That's by design.
The built-in provider will be a bit better in the sense that it's
consistent with the normalization functions, and the other providers
aren't.
Regards,
Jeff Davis
rried about.
Regards,
Jeff Davis
boundary,
which I think is OK (though I think I'd need to fix the patch for when
maxalign < 8).
Regards,
Jeff Davis
From 055d5cc24404584fd98109fabdcf83348e5c49b4 Mon Sep 17 00:00:00 2001
From: Jeff Davis
Date: Mon, 18 Dec 2023 16:44:27 -0800
Subject: [PATCH v10jd] Optimize hash functi
t of place and possibly slow, and there's a
bitwise trick we can use instead.
My original test case is a bit too "macro" of a benchmark at this
point, so I'm not sure it's a good guide for these individual micro-
optimizations.
Regards,
Jeff Davis
ifferent locale at initdb time, they would be doing so
intentionally, rather than implicitly accepting index corruption risks
based on an environment variable.
Regards,
Jeff Davis
ibity,
> truly immutable and faster indexes for fields that
> don't require linguistic ordering, alignment between Unicode
> updates and Postgres updates.
Thank you, that summarizes exactly the compromise that I'm trying to
reach.
Regards,
Jeff Davis
e,
fast, stable, better semantics than "C" for many locales, and we can
document it. In any case, we don't need to decide that now. If the
builtin provider is useful, we should do it.
Regards,
Jeff Davis
e.linux.utf8.sql seems to be skipped on my
machine because of the "version() !~ 'linux-gnu'" check, even though
I'm running Ubuntu. Is that test getting run often enough?
And relatedly, is it worth thinking about extending pg_regress to
report skipped tests so it's easier to f
we'd have to consider whether it's
worth it or not. Ideally, new callers would either use the new APIs or
use the pg_ascii_* APIs.
Regards,
Jeff Davis
connect to a database with non-unicode
> encoding?
> 💥😜 ...at least it seems to be able to walk the index without
> decoding
> strings to find other users - but the way these global catalogs work
> scares me a little bit)
I didn't see that specific demo, but in general we seem to change
between pg_wchar and unicode code points too freely, so I'm not
surprised that something went wrong.
Regards,
Jeff Davis
On Wed, 2023-12-20 at 17:48 -0800, Jeff Davis wrote:
> Attached.
It appears to increase the coverage. I committed it and I'll see how
the buildfarm reacts.
Regards,
Jeff Davis
rministic = true);';
END
$$;
The above may need some adjustment, but perhaps you can try it out?
Another option might be to use \gset to assign it to a variable, which
might be more readable, but I think it's better to just follow what the
rest of the file is doing.
Regards,
Jeff Davis
d we look around for other unrelated
protocol changes to make at the same time? Do we want a more generic
form of negotiation?
Regards,
Jeff Davis
s://www.postgresql.org/message-id/804eb67b37f41d3afeb2b6469cbe8bfa79c562cc.ca...@j-davis.com
and the most recent patch is posted there. Having a built-in provider
is more useful if it also offers a "C.UTF-8" locale that is superior to
the libc locale of the same name.
Regards,
Jeff Davis
one idea that came up.
Regards,
Jeff Davis
ate that nondeterministic
> collations not supported.
Thank you, pushed this version. There are other similar commands in the
file, so I think it's fine. It exercises a specific locale that might
be different from datcollate.
Regards,
Jeff Davis
On Tue, 2023-09-05 at 12:08 -0700, Jeff Davis wrote:
> OK, so we could have a built-in FDW called pg_connection that would
> do
> the right kinds of validation; and then also allow other FDWs but the
> subscription would have to do its own validation.
Attached a rough rebased version
On Fri, 2023-12-29 at 15:22 -0800, Jeff Davis wrote:
> On Tue, 2023-09-05 at 12:08 -0700, Jeff Davis wrote:
> > OK, so we could have a built-in FDW called pg_connection that would
> > do
> > the right kinds of validation; and then also allow other FDWs but
> > the
> &
in
> check_search_path().
Looks good to me.
Regards,
Jeff Davis
27;s
not clear to me the check of the wal page headers is the right one
anyway.
It seems like all of this would be simpler if you checked first how far
you can safely read data, and then just loop and read that far. I'm not
sure that it's worth it to try to mix the validity checks
501 - 600 of 1501 matches
Mail list logo