date:20240815

RE: Conflict detection and logging in logical replication

2024-08-15 Thread Zhijie Hou (Fujitsu)

On Wednesday, August 14, 2024 7:02 PM Amit Kapila  
wrote:
> 
> On Wed, Aug 14, 2024 at 8:05 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > Here is the V14 patch.
> >
> 
> Review comments:
> 1.
> ReportApplyConflict()
> {
> ...
> + ereport(elevel,
> + errcode(ERRCODE_INTEGRITY_CONSTRAINT_VIOLATION),
> + errmsg("conflict detected on relation \"%s.%s\": conflict=%s",
> +get_namespace_name(RelationGetNamespace(localrel)),
> ...
> 
> Is it a good idea to use ERRCODE_INTEGRITY_CONSTRAINT_VIOLATION for
> all conflicts? I think it is okay to use for insert_exists and update_exists. 
> The
> other error codes to consider for conflicts other than insert_exists and
> update_exists are ERRCODE_T_R_SERIALIZATION_FAILURE,
> ERRCODE_CARDINALITY_VIOLATION, ERRCODE_NO_DATA,
> ERRCODE_NO_DATA_FOUND,
> ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION,
> ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE.
> 
> BTW, even for insert/update_exists, won't it better to use
> ERRCODE_UNIQUE_VIOLATION ?

Agreed. I changed the patch to use ERRCODE_UNIQUE_VIOLATION for
Insert,update_exists, and ERRCODE_T_R_SERIALIZATION_FAILURE for
other conflicts.

> 
> Apart from the above, the attached contains some cosmetic changes.

Thanks. I have checked and merged the changes. Here is the V15 patch
which addressed above comments.

Best Regards,
Hou zj



v15-0001-Detect-and-log-conflicts-in-logical-replication.patch
Description:  v15-0001-Detect-and-log-conflicts-in-logical-replication.patch

RE: Conflict detection and logging in logical replication

2024-08-15 Thread Zhijie Hou (Fujitsu)

On Wednesday, August 14, 2024 10:15 PM Michail Nikolaev 
  wrote:
> > This is as expected, and we have documented this in the code comments. We 
> > don't
> > need to report a conflict if the conflicting tuple has been removed or 
> > updated
> > due to concurrent transaction. The same is true if the transaction that
> > inserted the conflicting tuple is rolled back before 
> > CheckAndReportConflict().
> > We don't consider such cases as a conflict.
> 
> That seems a little bit strange to me.
> 
> From the perspective of a user, I expect that if a change from publisher is 
> not
> applied - I need to know about it from the logs. 

I think this is exactly the current behavior in the patch. In the race
condition we discussed, the insert will be applied if the conflicting tuple is
removed concurrently before CheckAndReportConflict().

> But in that case, I will not see any information about conflict in the logs
> in SOME cases. But in OTHER cases I will see it. However, in both cases the
> change from publisher was not applied. And these cases are just random and
> depend on the timing of race conditions. It is not something I am expecting
> from the database.

I think you might misunderstand the behavior of CheckAndReportConflict(), even
if it found a conflict, it still inserts the tuple into the index which means
the change is anyway applied.

Best Regards,
Hou zj

Re: ECPG cleanup and fix for clang compile-time problem

2024-08-15 Thread Peter Eisentraut


On 15.08.24 02:43, Tom Lane wrote:

I wrote:

Here's a rebased but otherwise identical patchset.  I also added
an 0007 that removes check_rules.pl as threatened.


I've done some more work on this and hope to post an updated patchset
tomorrow.  Before that though, is there any objection to going ahead
with pushing the 0001 patch (pgindent'ing ecpg's lexer and parser
files)?  It's pretty bulky yet of no intellectual interest, so I'd
like to stop carrying it forward.


The indentation patch looks good to me and it would be good to get it 
out of the way.

Re: Improve error message for ICU libraries if pkg-config is absent

2024-08-15 Thread Michael Banck

Hi,

On Wed, Aug 14, 2024 at 06:05:19PM -0700, Jeff Davis wrote:
> That looks good to me. Does anyone have a different opinion? If not,
> I'll go ahead and commit (but not backport) this change.

What is the rationale not to backpatch this? The error message changes,
but we do not translate configure output, so is this a problem/project
policy to never change configure output in the back-branches?

If the change is too invasive, would something like the initial patch I
suggested (i.e., in the absense of pkg-config, add a more targetted
error message) be acceptable for the back-branches?

Michael

CREATE SUBSCRIPTION - add missing test case

2024-08-15 Thread Peter Smith

Hi Hackers,

While reviewing another logical replication thread [1], I found an
ERROR scenario that seems to be untested.

TEST CASE: Attempt CREATE SUBSCRIPTION where the subscriber table is
missing some expected column(s).

Attached is a patch to add the missing test for this error message.

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPt5FqV7J9GnnWFRNW_Z1KOMMAZXNTRcRNdtFrfMBz_GLA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Add-missing-test-case.patch
Description: Binary data

Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest

2024-08-15 Thread Jelte Fennema-Nio

On Thu, 15 Aug 2024 at 01:19, Tom Lane  wrote:
> +1 for the basic idea.

That's great to hear.

>  Not sure about your 3 & 4 points though, especially if that'd
> mean some delay in sending the mail.  For one thing, wouldn't those
> links be stale as soon as the initial patch got replaced?

I recently made those links predictable based on only the ID of the CF
entry. So they won't get stale, and I would want to send them straight
away (even if that means that they might return a 404 until cfbot
actually pushes the patches to github). The links would be:

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf/$CFENTRYID

https://github.com/postgresql-cfbot/postgresql/compare/cf/$CFENTRYID~1...cf/$CFENTRYID

> Those links seem like they ought to live in the commitfest entry.

Agreed. I'll start with that, since that should be very easy.

Re: Pgoutput not capturing the generated columns

2024-08-15 Thread Peter Smith

On Tue, Jul 23, 2024 at 9:23 AM Peter Smith  wrote:
>
> On Fri, Jul 19, 2024 at 4:01 PM Shlok Kyal  wrote:
> >
> > On Thu, 18 Jul 2024 at 13:55, Peter Smith  wrote:
> > >
> > > Hi, here are some review comments for v19-0002
> > > ==
> > > src/test/subscription/t/004_sync.pl
> > >
> > > 1.
> > > This new test is not related to generated columns. IIRC, this is just
> > > some test that we discovered missing during review of this thread. As
> > > such, I think this change can be posted/patched separately from this
> > > thread.
> > >
> > I have removed the test for this thread.
> >
> > I have also addressed the remaining comments for v19-0002 patch.
>
> Hi, I have no more review comments for patch v20-0002 at this time.
>
> I saw that the above test was removed from this thread as suggested,
> but I could not find that any new thread was started to propose this
> valuable missing test.
>

I still did not find any new thread for adding the missing test case,
so I started one myself [1].

==
[1] 
https://www.postgresql.org/message-id/CAHut+PtX8P0EGhsk9p=hqguhrzxecszanxsmkovyilx-ejd...@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Enable data checksums by default

2024-08-15 Thread Jakub Wartak

On Wed, Aug 7, 2024 at 4:18 PM Greg Sabino Mullane  wrote:
>
> On Wed, Aug 7, 2024 at 4:43 AM Michael Banck  wrote:
>>
>> I think the last time we dicussed this the consensus was that
>> computational overhead of computing the checksums is pretty small for
>> most systems (so the above change seems warranted regardless of whether
>> we switch the default), but turning on wal_compression also turns on
>> wal_log_hints, which can increase WAL by quite a lot. Maybe this is
[..]
>
>
> Yeah, that seems something beyond this patch? Certainly we should mention 
> wal_compression in the release notes if the default changes. I mean, I feel 
> wal_log_hints should probably default to on as well, but I've honestly never 
> really given it much thought because my fingers are trained to type "initdb 
> -k". I've been using data checksums for roughly a decade now. I think the 
> only time I've NOT used checksums was when I was doing checksum overhead 
> measurements, or hacking on the pg_checksums program.

Maybe I don't understand something, but just to be clear:
wal_compression (mentioned above) is not turning wal_log_hints on,
just the wal_log_hints needs to be on when using data checksums
(implicitly, by the XLogHintBitIsNeeded() macro). I suppose Michael
was thinking about the wal_log_hints earlier (?)

-J.

Re: Enable data checksums by default

2024-08-15 Thread Jakub Wartak

Hi Greg and others

On Tue, Aug 13, 2024 at 4:42 PM Greg Sabino Mullane  wrote:
>
> On Thu, Aug 8, 2024 at 6:11 AM Peter Eisentraut  wrote:
>
>>
>> My understanding was that the reason for some hesitation about adopting data 
>> checksums was the performance impact.  Not the checksumming itself, but the 
>> overhead from hint bit logging.  The last time I looked into that, you could 
>> get performance impacts on the order of 5% tps.  Maybe that's acceptable, 
>> and you of course can turn it off if you want the extra performance.  But I 
>> think this should be discussed in this thread.
>
>
> Fair enough. I think the performance impact is acceptable, as evidenced by 
> the large number of people that turn it on. And it is easy enough to turn it 
> off again, either via --no-data-checksums or pg_checksums --disable. I've 
> come across people who have regretted not throwing a -k into their initial 
> initdb, but have not yet come across someone who has the opposite regret. 
> When I did some measurements some time ago, I found numbers much less than 
> 5%, but of course it depends on a lot of factors.

Same here, and +1 to data_checksums=on by default for new installations.

The best public measurement of the impact was posted in [1] in 2019 by
Tomas to the best of my knowledge, where he explicitly mentioned the
problem with more WAL with hints/checksums: SATA disks (low IOPS). My
take: now we have 2024, and most people are using at least SSDs or
slow-SATA (but in cloud they could just change the class of I/O if
required to get IOPS to avoid too much throttling), therefore the
price of IOPS dropped significantly.

>> About the claim that it's already the de-facto standard.  Maybe that is 
>> approximately true for "serious" installations.  But AFAICT, the popular 
>> packagings don't enable checksums by default, so there is likely a 
>> significant middle tier between "just trying it out" and serious
>> production use that don't have it turned on.
>
>
> I would push back on that "significant" a good bit. The number of Postgres 
> installations in the cloud is very likely to dwarf the total package 
> installations. Maybe not 10 years ago, but now? Maybe someone from Amazon can 
> share some numbers. Not that we have any way to compare against package 
> installs :) But anecdotally the number of people who mention RDS etc. on the 
> various fora has exploded.

Same here. If it helps the case the: 43% of all PostgreSQL DBs
involved in any support case or incident in EDB within last year had
data_checksums=on (at least if they had collected the data using our )
. That's a surprisingly high number (for something that's off by
default), and it makes me think this is because plenty of customers
are either managed by DBAs who care, or assisted by consultants when
deploying, or simply using TPAexec [2] which has this on by default.

Another thing is plenty of people run with wal_log_hints=on (without
data_checksums=off) just to have pg_rewind working. As this is a
strictly standby related tool it means they don't have WAL/network
bandwidth problems, so the WAL rate is not that high in the wild to
cause problems. I found 1 or 2 cases within last year where we would
mention that high WAL generation was attributed to
wal_log_hints=on/XLOG_FPI  and they still didn't disable it apparently
(we have plenty of cases related to too much WAL, but it's mostly due
to other basic reasons)

-J.

[1] - https://www.postgresql.org/message-id/20190330192543.GH4719%40development
[2] - https://www.enterprisedb.com/docs/pgd/4/deployments/tpaexec/

Re: format_datum debugging function

2024-08-15 Thread Peter Eisentraut


On 14.08.24 17:46, Paul Jungwirth wrote:

On 8/14/24 02:16, Peter Eisentraut wrote:

On 12.08.24 23:15, Paul Jungwirth wrote:

On 8/12/24 04:32, Aleksander Alekseev wrote:
[...] This function takes a Datum and the appropriate out function, 
and returns a char *. So you

can do this:

(gdb) call format_datum(range_out, $1)
$2 = 0x59162692d938 "[1,4)"

I assume a patch like this doesn't need documentation. Does it need 
a test? Anything else?


I think you forgot to attach the patch. Or is it just a proposal?


Sorry, patch attached here.


I don't think it's safe to call output functions at arbitrary points 
from a debugger.  But if you're okay with that during development, 
say, then I think you could just call 
OidOutputFunctionCall(F_RANGE_OUT, $1)?


I assumed it wasn't safe everywhere (e.g. there is potentially a TOAST 
lookup), but for debugging a patch that's okay with me.


Are you doing something to get macro expansion? I've never gotten my gdb 
to see #defines, although in theory this configure line should do it, 
right?:


Oh I see, you don't have the F_* constants available then.  Maybe just 
put in the OID manually then?

Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest

2024-08-15 Thread Jelte Fennema-Nio

On Thu, 15 Aug 2024 at 05:17, Euler Taveira  wrote:
> +1. Regarding the CI link, I would be good if the CF entry automatically adds 
> a
> link to the CI run. It can be a separate field or even add it to "Links".

I'm on it. I think this email should be a subset of the info on the CF
entry webpage, so I'll first change the cf entry page to include all
this info.

> I'm not sure about 4, you can always check the latest patch in the CF entry 
> (it
> is usually the "latest attachment") and that's what the cfbot uses to run.

This is definitely a personal preference thing, but I like reading
patches on GitHub much better than looking at raw patch files. It has
syntax highlighting and has those little arrow buttons at the top of a
diff, to show more context about the file.

I realized a 5th thing that I would want in the email and cf entry page

5. A copy-pastable set of git command that checks out the patch by
downloading it from the cfbot repo like this:

git config branch.cf/5107.remote
https://github.com/postgresql-cfbot/postgresql.git
git config branch.cf/5107.merge refs/heads/cf/5107
git checkout -b cf/5107
git pull

> If I understand your proposal correctly, there will be another email to the
> thread if the previous CF was closed and someone opened a new CF entry.
> Sometimes some CF entries are about the same thread.

Yeah, that's correct. If a new CF entry is created for an existing
thread a new email would be sent. But to be clear, if CF entry is
pushed to the next commitfest, **no** new email would be sent.

Re: Enable data checksums by default

2024-08-15 Thread Michael Banck

On Thu, Aug 15, 2024 at 09:49:04AM +0200, Jakub Wartak wrote:
> On Wed, Aug 7, 2024 at 4:18 PM Greg Sabino Mullane  wrote:
> >
> > On Wed, Aug 7, 2024 at 4:43 AM Michael Banck  wrote:
> >>
> >> I think the last time we dicussed this the consensus was that
> >> computational overhead of computing the checksums is pretty small for
> >> most systems (so the above change seems warranted regardless of whether
> >> we switch the default), but turning on wal_compression also turns on
> >> wal_log_hints, which can increase WAL by quite a lot. Maybe this is
> [..]
> >
> >
> > Yeah, that seems something beyond this patch? Certainly we should
> > mention wal_compression in the release notes if the default changes.
> > I mean, I feel wal_log_hints should probably default to on as well,
> > but I've honestly never really given it much thought because my
> > fingers are trained to type "initdb -k". I've been using data
> > checksums for roughly a decade now. I think the only time I've NOT
> > used checksums was when I was doing checksum overhead measurements,
> > or hacking on the pg_checksums program.
> 
> Maybe I don't understand something, but just to be clear:
> wal_compression (mentioned above) is not turning wal_log_hints on,
> just the wal_log_hints needs to be on when using data checksums
> (implicitly, by the XLogHintBitIsNeeded() macro). I suppose Michael
> was thinking about the wal_log_hints earlier (?)

Uh, I am pretty sure I meant to say "turning on data_checksums als turns
on wal_log_hints", sorry about the confusion.

I guess the connection is that if you turn on wal_lot_hints (either
directly or via data_checksums) then the number FPIs goes up (possibly
signficantly), and enabling wal_compression could (partly) remedy that.
But I agree with Greg that such a discussion is probably out-of-scope
for this default change.


Michael

Re: Thread-safe nl_langinfo() and localeconv()

2024-08-15 Thread Thomas Munro

Here's a new patch to add to this pile, this time for check_locale().
I also improved the error handling and comments in the other patches.
From 4831ff4373b9d713c78e303d9758de347aadfc2f Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 13 Aug 2024 14:15:54 +1200
Subject: [PATCH v3 1/3] Provide thread-safe pg_localeconv_r().

This involves four different implementation strategies:

1.  For Windows, we now require _configthreadlocale() to be available
and work, and the documentation says that the object returned by
localeconv() is in thread-local memory.

2.  For glibc, we translate to nl_langinfo_l() calls, because it offers
the same information that way as an extension, and that API is
thread-safe.

3.  For macOS/*BSD, use localeconv_l(), which is thread-safe.

4.  For everything else, use uselocale() to set the locale for the
thread, and use a big ugly lock to defend against the returned object
being concurrently clobbered.  In practice this currently means only
Solaris.

The new call is used in pg_locale.c, replacing calls to setlocale() and
localeconv().

This patch adds a hard requirement on Windows' _configthreadlocale().
In the past there were said to be MinGW systems that didn't have it, or
had it but it didn't work.  As far as I can tell, CI (optional MinGW
task + mingw cross build warning task) and build farm (fairywren msys)
should be happy with this.  Fingers crossed.  (There are places that use
configure checks for that in ECPG; other proposed patches would remove
those later.)

Reviewed-by: Heikki Linnakangas 
Discussion: https://postgr.es/m/CA%2BhUKGJqVe0%2BPv9dvC9dSums_PXxGo9SWcxYAMBguWJUGbWz-A%40mail.gmail.com
---
 configure |   2 +-
 configure.ac  |   1 +
 meson.build   |   1 +
 src/backend/utils/adt/pg_locale.c | 128 ++-
 src/include/pg_config.h.in|   3 +
 src/include/port.h|   6 +
 src/port/Makefile |   1 +
 src/port/meson.build  |   1 +
 src/port/pg_localeconv_r.c| 367 ++
 9 files changed, 402 insertions(+), 108 deletions(-)
 create mode 100644 src/port/pg_localeconv_r.c

diff --git a/configure b/configure
index 2abbeb27944..60dcf1e436e 100755
--- a/configure
+++ b/configure
@@ -15232,7 +15232,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

-for ac_func in backtrace_symbols copyfile copy_file_range getifaddrs getpeerucred inet_pton kqueue mbstowcs_l memset_s posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast strchrnul strsignal syncfs sync_file_range uselocale wcstombs_l
+for ac_func in backtrace_symbols copyfile copy_file_range getifaddrs getpeerucred inet_pton localeconv_l kqueue mbstowcs_l memset_s posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast strchrnul strsignal syncfs sync_file_range uselocale wcstombs_l
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.ac b/configure.ac
index c46ed2c591a..59e51a74629 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1735,6 +1735,7 @@ AC_CHECK_FUNCS(m4_normalize([
 	getifaddrs
 	getpeerucred
 	inet_pton
+	localeconv_l
 	kqueue
 	mbstowcs_l
 	memset_s
diff --git a/meson.build b/meson.build
index cd711c6d018..028a14547aa 100644
--- a/meson.build
+++ b/meson.build
@@ -2675,6 +2675,7 @@ func_checks = [
   ['inet_aton'],
   ['inet_pton'],
   ['kqueue'],
+  ['localeconv_l'],
   ['mbstowcs_l'],
   ['memset_s'],
   ['mkdtemp'],
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index cd3661e7279..dd4ba9e0e89 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -543,12 +543,8 @@ PGLC_localeconv(void)
 	static struct lconv CurrentLocaleConv;
 	static bool CurrentLocaleConvAllocated = false;
 	struct lconv *extlconv;
-	struct lconv worklconv;
-	char	   *save_lc_monetary;
-	char	   *save_lc_numeric;
-#ifdef WIN32
-	char	   *save_lc_ctype;
-#endif
+	struct lconv tmp;
+	struct lconv worklconv = {0};

 	/* Did we do it already? */
 	if (CurrentLocaleConvValid)
@@ -562,77 +558,21 @@ PGLC_localeconv(void)
 	}

 	/*
-	 * This is tricky because we really don't want to risk throwing error
-	 * while the locale is set to other than our usual settings.  Therefore,
-	 * the process is: collect the usual settings, set locale to special
-	 * setting, copy relevant data into worklconv using strdup(), restore
-	 * normal settings, convert data to desired encoding, and finally stash
-	 * the collected data in CurrentLocaleConv.  This makes it safe if we
-	 * throw an error during encoding conversion or run out of memory anywhere
-	 * in the process.  All data pointed to by struct lconv members is
-	 * allocated with strdup, to avoid premature elog(ERROR) and to allow
-	 * using a single cleanup routine.
+	 * Use thread-safe method of obtainin

RE: [BUG?] check_exclusion_or_unique_constraint false negative

2024-08-15 Thread Zhijie Hou (Fujitsu)

On Monday, August 12, 2024 7:11 PM Michail Nikolaev 
  wrote:
> > In my test, if the tuple is updated and new tuple is in the same page,
> > heapam_index_fetch_tuple should find the new tuple using HOT chain. So, 
> > it's a
> > bit unclear to me how the updated tuple is missing. Maybe I missed some 
> > other
> > conditions for this issue.
> 
> Yeah, I think the pgbench-based reproducer may also cause page splits in 
> btree.
> But we may add an index to the table to disable HOT.
> 
> I have attached a reproducer for this case using a spec and injection points.
> 
> I hope it helps, check the attached files.

Thanks a lot for the steps!

I successfully reproduced the issue you mentioned in the context of logical
replication[1]. As you said, it could increase the possibility of tuple missing
when applying updates or deletes in the logical apply worker. I think this is a
long-standing issue and I will investigate the fix you proposed.

In addition, I think the bug is not a blocker for the conflict detection
feature. As the feature simply reports the current behavior of the logical
apply worker (either unique violation or tuple missing) without introducing any
new functionality. Furthermore, I think that the new ExecCheckIndexConstraints
call after ExecInsertIndexTuples() is not affected by the dirty snapshot bug.
This is because a tuple has already been inserted into the btree before the
dirty snapshot scan, which means that a concurrent non-HOT update would not be
possible (it would be blocked after finding the just inserted tuple and wait
for the apply worker to commit the current transaction).

It would be good if others could also share their opinion on this.

[1] The steps to reproduce the tuple missing in logical replication.

1. setup pub/sub env, and publish a table with 1 row.

pub:
CREATE TABLE t(a int primary key, b int);
INSERT INTO t VALUES(1,1);
CREATE PUBLICATION pub FOR TABLE t;

sub:
CREATE TABLE t (a int primary key, b int check (b < 5));
CREATE INDEX t_b_idx ON t(b);
CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres port=$port_publisher' 
PUBLICATION pub;

2. Execute an UPDATE(UPDATE t set b = b + 1) on the publisher and use gdb to
stop the apply worker at the point after index_getnext_tid() and before
index_fetch_heap().

3. execute a concurrent update(UPDATE t set b = b + 100) on the subscriber to
update a non-key column value and commit the update.

4. release the apply worker and it would report the update_missing conflict.

Best Regards,
Hou zj

Re: Remaining dependency on setlocale()

2024-08-15 Thread Thomas Munro

On Thu, Aug 15, 2024 at 11:00 AM Jeff Davis  wrote:
> On Thu, 2024-08-15 at 10:43 +1200, Thomas Munro wrote:
> > So I think the solution could perhaps be something like: in some
> > early
> > startup phase before there are any threads, we nail down all the
> > locale categories to "C" (or whatever we decide on for the permanent
> > global locale), and also query the "" categories and make a copy of
> > them in case anyone wants them later, and then never call setlocale()
> > again.
>
> +1.

We currently nail down these categories:

/* We keep these set to "C" always.  See pg_locale.c for explanation. */
init_locale("LC_MONETARY", LC_MONETARY, "C");
init_locale("LC_NUMERIC", LC_NUMERIC, "C");
init_locale("LC_TIME", LC_TIME, "C");

CF #5170 has patches to make it so that we stop changing them even
transiently, using locale_t interfaces to feed our caches of stuff
needed to work with those categories, so they really stay truly nailed
down.

It sounds like someone needs to investigate doing the same thing for
these two, from CheckMyDatabase():

if (pg_perm_setlocale(LC_COLLATE, collate) == NULL)
ereport(FATAL,
(errmsg("database locale is incompatible with
operating system"),
 errdetail("The database was initialized with
LC_COLLATE \"%s\", "
   " which is not recognized by setlocale().", collate),
 errhint("Recreate the database with another locale or
install the missing locale.")));

if (pg_perm_setlocale(LC_CTYPE, ctype) == NULL)
ereport(FATAL,
(errmsg("database locale is incompatible with
operating system"),
 errdetail("The database was initialized with LC_CTYPE \"%s\", "
   " which is not recognized by setlocale().", ctype),
 errhint("Recreate the database with another locale or
install the missing locale.")));

How should that work?  Maybe we could imagine something like
MyDatabaseLocale, a locale_t with LC_COLLATE and LC_CTYPE categories
set appropriately.  Or should it be a pg_locale_t instead (if your
database default provider is ICU, then you don't even need a locale_t,
right?).

Then I think there is one quite gnarly category, from
assign_locale_messages() (a GUC assignment function):

(void) pg_perm_setlocale(LC_MESSAGES, newval);

I have never really studied gettext(), but I know it was just
standardised in POSIX 2024, and the standardised interface has _l()
variants of all functions.  Current implementations don't have them
yet.  Clearly we absolutely couldn't call pg_perm_setlocale() after
early startup --  but if gettext() is relying on the current locale to
affect far away code, then maybe this is one place where we'd just
have to use uselocale().  Perhaps we could plan some transitional
strategy where NetBSD users lose the ability to change the GUC without
restarting the server and it has to be the same for all sessions, or
something like that, until they produce either gettext_l() or
uselocale(), but I haven't thought hard about this part at all yet...

Re: On non-Windows, hard depend on uselocale(3)

2024-08-15 Thread Thomas Munro

On Wed, Aug 14, 2024 at 11:17 AM Tristan Partin  wrote:
> Thanks for picking this up. I think your patch looks really good.

Thanks for looking!

> Are
> you familiar with gcc's function poisoning?
>
> #include 
> #pragma GCC poison puts
>
> int main(){
> #pragma GCC bless begin puts
> puts("a");
> #pragma GCC bless end puts
> }
>
> I wonder if we could use function poisoning to our advantage. For
> instance in ecpg, it looks like you got all of the strtod() invocations
> and replaced them with strtod_l(). Here is a patch with an example of
> what I'm talking about.

Thanks, this looks very useful.

Re: Vacuum statistics

2024-08-15 Thread Ilia Evdokimov



On 15.8.24 11:49, Alena Rybakina wrote:


I have some suggestions:

 1. pgstatfuncs.c in functions tuplestore_put_for_database() and
tuplestore_put_for_relation you can remove 'nulls' array if
you're sure that columns cannot be NULL.

We need to use this for tuplestore_putvalues function. With this 
function, we fill the table with the values of the statistics.


Ah, right! I'm sorry.



1.



 2. These functions are almost the same and I would think of writing
one function depending of type 'ExtVacReportType'

I'm not sure that I fully understand what you mean. Can you explain it 
more clearly, please?



Ah, I didn't notice that the size of all three tables is different. 
Therefore, it won't be possible to write one function instead of two to 
avoid code duplication. My mistake.

Re: Optimize mul_var() for var1ndigits >= 8

2024-08-15 Thread Dean Rasheed

On Wed, 14 Aug 2024 at 07:31, Joel Jacobson  wrote:
>
> I think this is acceptable, since it produces more correct results.
>

Thanks for checking. I did a bit more testing myself and didn't see
any problems, so I have committed both these patches.

> In addition, I've traced the rscale_adjustment -15 mul_var() calls to 
> originate
> from numeric_exp() and numeric_power(), so I thought it would be good to
> brute-force test those as well, to get an idea of the probability of different
> results from those functions.
>
> Brute-force testing of course doesn't prove it's impossible to happen,
> but millions of inputs didn't cause any observable differences in the
> returned results, so I think it's at least very improbable to
> happen in practice.
>

Indeed, there certainly will be cases where the result changes. I saw
some with ln(), for which HEAD rounded the final digit the wrong way,
and the result is now correct, but the opposite cannot be ruled out
either, since these functions are inherently inexact. The aim is to
have them generate the correctly rounded result in the vast majority
of cases, while accepting an occasional off-by-one error in the final
digit. Having them generate the correct result in all cases is
certainly possible, but would require a fair bit of additional code
that probably isn't worth the effort.

In my testing, exp() rounded the final digit incorrectly with a
probability of roughly 1 in 50-100 million when computing results with
a handful of digits (consistent with the "+8" digits added to
"sig_digits"), rising to roughly 1 in 5-10 million when computing
around 1000 digits (presumably because we don't take into account the
number of Taylor series terms when deciding on the local rscale). That
wasn't affected significantly by the patch, and it's not surprising
that you saw nothing with brute-force testing.

In any case, I'm pretty content with those results.

Regards,
Dean

Re: [Patch] remove duplicated smgrclose

2024-08-15 Thread Junwang Zhao

On Wed, Aug 14, 2024 at 2:35 PM Steven Niu  wrote:
>
> Junwang, Kirill,
>
> The split work has been done. I created a new patch for removing redundant 
> smgrclose() function as attached.
> Please help review it.

Patch looks good, actually you can make the refactoring code as v3-0002-xxx
by using:

git format-patch -2 -v 3

Another kind reminder: Do not top-post when you reply ;)

>
> Thanks,
> Steven
>
> Steven Niu  于2024年8月12日周一 18:11写道：
>>
>> Kirill,
>>
>> Good catch!
>> I will split the patch into two to cover both cases.
>>
>> Thanks,
>> Steven
>>
>>
>> Junwang Zhao  于2024年8月9日周五 18:19写道：
>>>
>>> On Fri, Aug 9, 2024 at 5:20 PM Kirill Reshke  wrote:
>>> >
>>> > On Thu, 1 Aug 2024 at 17:32, Junwang Zhao  wrote:
>>> > >
>>> > > Hi Steven,
>>> > >
>>> > > On Wed, Jul 31, 2024 at 11:16 AM Steven Niu  wrote:
>>> > > >
>>> > > > Hello, hackers,
>>> > > >
>>> > > > I think there may be some duplicated codes.
>>> > > > Function smgrDoPendingDeletes() calls both smgrdounlinkall() and 
>>> > > > smgrclose().
>>> > > > But both functions would close SMgrRelation object, it's dupliacted 
>>> > > > behavior?
>>> > > >
>>> > > > So I make this patch. Could someone take a look at it?
>>> > > >
>>> > > > Thanks for your help,
>>> > > > Steven
>>> > > >
>>> > > > From Highgo.com
>>> > > >
>>> > > >
>>> > > You change LGTM, but the patch seems not to be applied to HEAD,
>>> > > I generate the attached v2 using `git format` with some commit message.
>>> > >
>>> > > --
>>> > > Regards
>>> > > Junwang Zhao
>>> >
>>> > Hi all!
>>> > This change looks good to me. However, i have an objection to these
>>> > lines from v2:
>>> >
>>> > >  /* Close the forks at smgr level */
>>> > > - for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
>>> > > - smgrsw[which].smgr_close(rels[i], forknum);
>>> > > + smgrclose(rels[i]);
>>> >
>>> > Why do we do this? This seems to be an unrelated change given thread
>>> > $subj. This is just a pure refactoring job, which deserves a separate
>>> > patch. There is similar coding in
>>> > smgrdestroy function:
>>> >
>>> > ```
>>> > for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
>>> > smgrsw[reln->smgr_which].smgr_close(reln, forknum);
>>> > ```
>>> >
>>> > So, I feel like these two places should be either changed together or
>>> > not be altered at all. And is it definitely a separate change.
>>>
>>> Yeah, I tend to agree with you, maybe we should split the patch
>>> into two.
>>>
>>> Steven, could you do this?
>>>
>>> >
>>> > --
>>> > Best regards,
>>> > Kirill Reshke
>>>
>>>
>>>
>>> --
>>> Regards
>>> Junwang Zhao



-- 
Regards
Junwang Zhao

Re: [PATCH] Add get_bytes() and set_bytes() functions

2024-08-15 Thread Dean Rasheed

On Thu, 15 Aug 2024 at 05:20, Joel Jacobson  wrote:
>
> On Wed, Aug 14, 2024, at 19:25, Joel Jacobson wrote:
> > What do we want to happen if passing a numeric with decimal digits,
> > to decimal_to_bytes()? It must be an error, right?
> >
> > Example: SELECT decimal_to_bytes(1.23);
>
> Hmm, an error feels quite ugly on second thought.
> Would be nicer if all numerics could be represented,
>
> But then what about Inf,-Inf,NaN?
>

Perhaps we should also add casts between bytea and the integer/numeric
types. That might be easier to use than functions in some
circumstances.

When casting a numeric to an integer, the result is rounded to the
nearest integer, and NaN/Inf generate errors, so we should probably do
the same here.

Regards,
Dean

replace magic num in struct cachedesc with CATCACHE_MAXKEYS

2024-08-15 Thread Junwang Zhao

Hi hackers,

I noticed that there is a magic number which can be replaced by CATCACHE_MAXKEYS
in struct cachedesc, I checked some other struct like CatCache, CatCTup, they
all use CATCACHE_MAXKEYS.

I did some search on pg-hackers, and found an old thread[0] that
Robert proposed to change
the maximum number of keys for a syscache from 4 to 5.

It seems to me that the *five-key syscaches* feature is not necessary
since the idea was
14 years old and we still use 4 keys without any problems(I might be wrong).

However, in that patch, there is a change that seems reasonable.

--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -95,7 +95,7 @@ struct cachedesc
Oid reloid; /* OID of the
relation being cached */
Oid indoid; /* OID of
index relation for this cache */
int nkeys;  /* # of keys
needed for cache lookup */
-   int key[4]; /* attribute
numbers of key attrs */
+   int key[CATCACHE_MAXKEYS];  /* attribute
numbers of key attrs */
int nbuckets;   /* number of
hash buckets for this cache */
 };


[0]: 
https://www.postgresql.org/message-id/flat/603c8f071003281532t5e6c68eex458825485d4fcd98%40mail.gmail.com

-- 
Regards
Junwang Zhao


v1-0001-replace-magic-num-with-CATCACHE_MAXKEYS.patch
Description: Binary data

Re: [PATCH] Add get_bytes() and set_bytes() functions

2024-08-15 Thread Aleksander Alekseev

Hi,

> Perhaps we should also add casts between bytea and the integer/numeric
> types. That might be easier to use than functions in some
> circumstances.
>
> When casting a numeric to an integer, the result is rounded to the
> nearest integer, and NaN/Inf generate errors, so we should probably do
> the same here.

Yes, I was also thinking about adding NUMERIC versions of get_bytes()
/ set_bytes(). This would allow converting more than 8 bytes to/from
an integer. I dropped this idea because I thought there would be not
much practical use for it. On the flip side you never know who uses
Postgres and for what purpose.

I will add corresponding casts unless the idea will get a push-back
from the community. IMO the existence of these casts will at least not
make things worse.

-- 
Best regards,
Aleksander Alekseev

Re: Thread-safe nl_langinfo() and localeconv()

2024-08-15 Thread Heikki Linnakangas


On 15/08/2024 11:03, Thomas Munro wrote:

Here's a new patch to add to this pile, this time for check_locale().
I also improved the error handling and comments in the other patches.
There's a similar function in initdb, check_locale_name. I wonder if 
that could reuse the same code.



I wonder if it would be more clear to split this into three functions:

/*
 * Get the name of the locale in "native environment",
 * like setlocale(category, NULL) does
 */
char *get_native_locale(int category);

/*
 * Return true if 'locale' is valid locale name for 'category
 */
bool check_locale(int category, const char *locale);

/*
 * Return a canonical name of given locale
 */
char *canonicalize_locale(int category, const char *locale);


result = malloc(strlen(canonical) + 1);
if (!result)
goto exit;
strcpy(result, canonical);


Could use "result = strdup(canonical)" here. Or even better, could we 
use palloc()/pstrdup() here, and save the caller the trouble to copy it?


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Skip adding row-marks for non target tables when result relation is foreign table.

2024-08-15 Thread Etsuro Fujita

On Thu, Aug 15, 2024 at 9:56 AM Jeff Davis  wrote:
> Is there any sample code that implements late locking for a FDW? I'm
> not quite clear on how it's supposed to work.

See the patch in [1].  It would not apply to HEAD anymore, though.

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/16016.1431455...@sss.pgh.pa.us

Re: Enable data checksums by default

2024-08-15 Thread Jakub Wartak

Hi all,

On Tue, Aug 13, 2024 at 10:08 PM Robert Haas  wrote:

> And it's not like we have statistics anywhere that you can look at to
> see how much CPU time you spent computing checksums, so if a user DOES
> have a performance problem that would not have occurred if checksums
> had been disabled, they'll probably never know it.

In worst case, per second and per-pid CPU time consumption could be
quantified by having eBPF which is the standard on distros now
(requires kernel headers and bpfcc-tools installed), e.g. here 7918
was PID doing pgbench-related -c 4 workload with checksum=on (sorry
for formatting, but I don't want to use HTML here):

# funclatency-bpfcc --microseconds -i 1 -p 7918
/usr/lib/postgresql/16/bin/postgres:pg_checksum_page
Tracing 1 functions for
"/usr/lib/postgresql/16/bin/postgres:pg_checksum_page"... Hit Ctrl-C
to end.

 usecs   : count distribution
 0 -> 1  : 0||
 2 -> 3  : 238  |*   |
 4 -> 7  : 714  ||
 8 -> 15 : 2||
16 -> 31 : 5||
32 -> 63 : 0||
64 -> 127: 1||
   128 -> 255: 0||
   256 -> 511: 1||
   512 -> 1023   : 1||

avg = 6 usecs, total: 6617 usecs, count: 962

 usecs   : count distribution
 0 -> 1  : 0||
 2 -> 3  : 241  |*   |
 4 -> 7  : 706  ||
 8 -> 15 : 11   ||
16 -> 31 : 10   ||
32 -> 63 : 1||

avg = 5 usecs, total: 5639 usecs, count: 969

[..refreshes every 1s here..]

So the above can tell us e.g. that this pg_checksum_page() took 5639
us out of 1s full sample time (and with 100% CPU pegged core so that's
gives again ~5% CPU util per this routine; I'm ignoring the WAL/log
hint impact for sure). One could also write a small script using
bpftrace instead, too. Disassembly on Debian version and stock PGDG is
telling me it's ful SSE2 instruction-set, so that's nice and optimal
too.

> >> For those uses, this change would render pg_upgrade useless for upgrades 
> >> from an old instance with default settings to a new instance with default 
> >> settings.  And then users would either need to re-initdb with checksums 
> >> turned back off, or I suppose run pg_checksums on the old instance before 
> >> upgrading?  This is significant additional complication.
> > Meh, re-running initdb with --no-data-checksums seems a fairly low hurdle.
>
> I tend to agree with that, but I would also like to see the sort of
> improvements that Peter mentions.
[..]
> None of that is to say that I'm totally hostile to this change.
[.,.]
> Whether that's worth the overhead for everyone, I'm not quite sure.

Without data checksums there's a risk that someone receives silent-bit
corruption and no one will notice. Shouldn't integrity of data stand
above performance by default, in this case? (and performance could be
opt-in, if someone really wants it).

-J.

Re: Improve error message for ICU libraries if pkg-config is absent

2024-08-15 Thread Peter Eisentraut


On 15.08.24 09:20, Michael Banck wrote:

On Wed, Aug 14, 2024 at 06:05:19PM -0700, Jeff Davis wrote:

That looks good to me. Does anyone have a different opinion? If not,
I'll go ahead and commit (but not backport) this change.


What is the rationale not to backpatch this? The error message changes,
but we do not translate configure output, so is this a problem/project
policy to never change configure output in the back-branches?

If the change is too invasive, would something like the initial patch I
suggested (i.e., in the absense of pkg-config, add a more targetted
error message) be acceptable for the back-branches?


But it's not just changing an error message, it's changing the logic 
that leads to the error message.  Have we really thought through all the 
combinations here?  I don't know.  So committing for master and then 
seeing if there is any further feedback seems most prudent.


(I'm not endorsing either patch version here, just commenting on the 
process.)

Re: Remaining dependency on setlocale()

2024-08-15 Thread Peter Eisentraut


On 15.08.24 00:43, Thomas Munro wrote:

"" is a problem however... the special value for "native environment"
is returned as a real locale name, which we probably still need in
places.  We could change that to newlocale("") + query instead, but


Where do we need that in the server?

It should just be initdb doing that and then initializing the server 
with concrete values based on that.


I guess technically some of these GUC settings default to the 
environment?  But I think we could consider getting rid of that.

Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest

2024-08-15 Thread Peter Eisentraut


On 15.08.24 00:40, Jelte Fennema-Nio wrote:

I'd like to send an automatic mail to a thread whenever it gets added
to a commitfest. Since this would impact everyone that's subscribed to
the mailinglist I'd like some feedback on this. This mail would
include:

1. A very short blurb like: "This thread was added to the commitfest
with ID 1234"


How would you attach such an email to a thread?  Where in the thread 
would you attach it?  I'm not quite sure how well that would work.



The main reason I'm proposing this is that currently it's not trivial
to go from an email in my inbox to the commitfest entry.


Maybe there could be a feature in the commitfest app to enter a message 
ID and get the commitfest entries that contain that message.

Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest

2024-08-15 Thread Peter Eisentraut


On 15.08.24 09:59, Jelte Fennema-Nio wrote:

I realized a 5th thing that I would want in the email and cf entry page

5. A copy-pastable set of git command that checks out the patch by
downloading it from the cfbot repo like this:

git config branch.cf/5107.remote
https://github.com/postgresql-cfbot/postgresql.git
git config branch.cf/5107.merge refs/heads/cf/5107
git checkout -b cf/5107
git pull


Maybe this kind of thing should rather be on the linked-to web page, not 
in every email.


But a more serious concern here is that the patches created by the cfbot 
are not canonical.  There are various heuristics when they get applied. 
I would prefer that people work with the actual patches sent by email, 
at least unless they know exactly what they are doing.  We don't want to 
create parallel worlds of patches that are like 90% similar but not 
really identical.

Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest

2024-08-15 Thread Jelte Fennema-Nio

On Thu, 15 Aug 2024 at 15:28, Peter Eisentraut  wrote:
> How would you attach such an email to a thread?  Where in the thread
> would you attach it?  I'm not quite sure how well that would work.

My idea would be to have the commitfest app send it in reply to the
message id that was entered in the "New patch" page:
https://commitfest.postgresql.org/open/new/

> Maybe there could be a feature in the commitfest app to enter a message
> ID and get the commitfest entries that contain that message.

That's a good idea.

Re: Parallel CREATE INDEX for BRIN indexes

2024-08-15 Thread Peter Eisentraut


On 13.04.24 23:04, Tomas Vondra wrote:

While preparing a differential code coverage report between 16 and HEAD, one
thing that stands out is the parallel brin build code. Neither on
coverage.postgresql.org nor locally is that code reached during our tests.



Thanks for pointing this out, it's definitely something that I need to
improve (admittedly, should have been part of the patch). I'll also look
into eliminating the difference between BTREE and BRIN parallel builds,
mentioned in my last message in this thread.



Here's a couple patches adding a test for the parallel CREATE INDEX with
BRIN. The actual test is 0003/0004 - I added the test to pageinspect,
because that allows cross-checking the index to one built without
parallelism, which I think is better than just doing CREATE INDEX
without properly testing it produces correct results.


These pageinspect tests added a new use of the md5() function.  We got
rid of those in the tests for PG17.  You could write the test case with
something like

 SELECT (CASE WHEN (mod(i,231) = 0) OR (i BETWEEN 3500 AND 4000) THEN NULL ELSE 
i END),
-   (CASE WHEN (mod(i,233) = 0) OR (i BETWEEN 3750 AND 4250) THEN NULL ELSE 
md5(i::text) END),
+   (CASE WHEN (mod(i,233) = 0) OR (i BETWEEN 3750 AND 4250) THEN NULL ELSE 
encode(sha256(i::text::bytea), 'hex') END),
(CASE WHEN (mod(i,233) = 0) OR (i BETWEEN 3850 AND 4500) THEN NULL ELSE 
(i/100) + mod(i,8) END)

But this changes the test output slightly and I'm not sure if this gives
you the data distribution that you need for you test.  Could your check
this please?

Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest

2024-08-15 Thread Jelte Fennema-Nio

On Thu, 15 Aug 2024 at 15:33, Peter Eisentraut  wrote:
> Maybe this kind of thing should rather be on the linked-to web page, not
> in every email.

Yeah, I'll first put a code snippet on the page for the commitfest entry.

> But a more serious concern here is that the patches created by the cfbot
> are not canonical.  There are various heuristics when they get applied.
> I would prefer that people work with the actual patches sent by email,
> at least unless they know exactly what they are doing.  We don't want to
> create parallel worlds of patches that are like 90% similar but not
> really identical.

I'm not really sure what kind of heuristics and resulting differences
you're worried about here. The heuristics it uses are very simple and
are good enough for our CI. Basically they are:
1. Unzip/untar based on file extension
2. Apply patches using "patch" in alphabetic order

Also, when I apply patches myself, I use heuristics too. And my
heuristics are probably different from yours. So I'd expect that many
people using the exact same heuristic would only make the situation
better. Especially because if people don't know exactly what they are
doing, then their heuristics are probably not as good as the one of
our cfbot. I know I've struggled a lot the first few times when I was
manually applying patches.

Re: format_datum debugging function

2024-08-15 Thread Tom Lane

Peter Eisentraut  writes:
> On 14.08.24 17:46, Paul Jungwirth wrote:
>> Are you doing something to get macro expansion? I've never gotten my gdb 
>> to see #defines, although in theory this configure line should do it, 
>> right?:

> Oh I see, you don't have the F_* constants available then.  Maybe just 
> put in the OID manually then?

That's pretty illegible and error-prone.  I agree that writing the
output function's C name is noticeably better.  However, I would
call the result DirectOutputFunctionCall and put it near
OidOutputFunctionCall in fmgr.c.  It's not like we don't already
have a naming convention and many instances of this.

(Also, now that I look at the code, I wonder why it looks so
little like any of the existing DirectFunctionCall functions.)

regards, tom lane

Re: Alias of VALUES RTE in explain plan

2024-08-15 Thread Yasir

On Mon, Jul 1, 2024 at 3:17 PM Ashutosh Bapat 
wrote:

> Hi All,
> While reviewing Richard's patch for grouping sets, I stumbled upon
> following explain output
>
> explain (costs off)
> select distinct on (a, b) a, b
> from (values (1, 1), (2, 2)) as t (a, b) where a = b
> group by grouping sets((a, b), (a))
> order by a, b;
>QUERY PLAN
> 
>  Unique
>->  Sort
>  Sort Key: "*VALUES*".column1, "*VALUES*".column2
>  ->  HashAggregate
>Hash Key: "*VALUES*".column1, "*VALUES*".column2
>Hash Key: "*VALUES*".column1
>->  Values Scan on "*VALUES*"
>  Filter: (column1 = column2)
> (8 rows)
>
> There is no VALUES.column1 and VALUES.column2 in the query. The alias t.a
> and t.b do not appear anywhere in the explain output. I think explain
> output should look like
> explain (costs off)
> select distinct on (a, b) a, b
> from (values (1, 1), (2, 2)) as t (a, b) where a = b
> group by grouping sets((a, b), (a))
> order by a, b;
>QUERY PLAN
> 
>  Unique
>->  Sort
>  Sort Key: t.a, t.b
>  ->  HashAggregate
>Hash Key: t.a, t.b
>Hash Key: t.a
>->  Values Scan on "*VALUES*" t
>  Filter: (a = b)
> (8 rows)
>
> I didn't get time to figure out the reason behind this, nor the history.
> But I thought I would report it nonetheless.
>

I have looked into the issue and found that when subqueries are pulled up,
a modifiable copy of the subquery is created for modification in the
pull_up_simple_subquery() function. During this process,
flatten_join_alias_vars() is called to flatten any join alias variables in
the subquery's target list. However at this point, we lose subquery's alias.
If you/hackers agree with my findings, I can provide a working patch soon.


> --
> Best Wishes,
> Ashutosh Bapat
>

Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest

2024-08-15 Thread Nathan Bossart

On Thu, Aug 15, 2024 at 12:40:29AM +0200, Jelte Fennema-Nio wrote:
> I'd like to send an automatic mail to a thread whenever it gets added
> to a commitfest. Since this would impact everyone that's subscribed to
> the mailinglist I'd like some feedback on this. This mail would
> include:

I haven't thought too much about the details, but in general, +1 for
sending links to cfest/cfbot entries to the thread.

-- 
nathan

Re: Adding clarification to description of IPC wait events XactGroupUpdate and ProcArrayGroupUpdate

2024-08-15 Thread Nathan Bossart

On Thu, Aug 15, 2024 at 11:25:25AM +0800, SAMEER KUMAR wrote:
> I think it is important to indicate that the group leader is responsible
> for clearing the transaction ID/transaction status of other backends
> (including this one).

Your proposal is

Waiting for the group leader process to clear the transaction ID of
this backend at the end of a transaction.

Waiting for the group leader process to update the transaction status
for this backend.

Mine is

Waiting for the group leader to clear the transaction ID at transaction
end.

Waiting for the group leader to update transaction status at
transaction end.

IMHO the latter doesn't convey substantially less information, and it fits
a little better with the terse style of the other wait events nearby.  But
I'll yield to majority opinion here.

-- 
nathan

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-08-15 Thread Robert Haas

On Wed, Aug 14, 2024 at 2:04 PM Jelte Fennema-Nio  wrote:
> Applied all 0002 feedback. Although I used Min(proto,
> PG_PROTOCOL_LATEST) because Max results in the wrong value.

Picky, picky. :-)

Committed.

> Makes sense. I'm not in too much of a hurry with this specific one. So
> I'll leave it like this for now and hopefully someone else responds.
> If this becomes close to being the final unmerged patch of this
> patchset, I'll probably cut my losses and create a patch that adds a
> function instead.

Maybe reorder the series to put that one later then.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Remove dependence on integer wrapping

2024-08-15 Thread Alexander Lakhin


Hello Joe,

05.08.2024 02:55, Joseph Koshakow wrote:


On Fri, Jun 14, 2024 at 8:00 AM Alexander Lakhin  wrote:
>
>    And the most interesting case to me:
>    SET temp_buffers TO 10;
>
>    CREATE TEMP TABLE t(i int PRIMARY KEY);
>    INSERT INTO t VALUES(1);
>
...
Alex, are you able to get a full stack trace for this panic? I'm unable
to reproduce this because I don't have enough memory in my system. I've
tried reducing `BLCKSZ` to 1024, which is the lowest value allowed per
my understanding, and I still don't have enough memory.


Yes, please take a look at it (sorry for the late reply):

(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140438687430464) 
at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140438687430464) at 
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140438687430464, signo=signo@entry=6) at 
./nptl/pthread_kill.c:89
#3  0x7fba70025476 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/posix/raise.c:26
#4  0x7fba7000b7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x563945aed511 in __addvsi3 ()
#6  0x563945a6c106 in init_htab (hashp=0x563947700980, nelem=10) at 
dynahash.c:720
#7  0x563945a6bd22 in hash_create (tabname=0x563945c591d9 "Local Buffer Lookup Table", nelem=10, 
info=0x7ffd4d394620, flags=40) at dynahash.c:567

#8  0x5639457f2760 in el () at localbuf.c:635
#9  0x5639457f19e3 in ExtendBufferedRelLocal (bmr=..., fork=MAIN_FORKNUM, flags=8, extend_by=1, 
extend_upto=4294967295, buffers=0x7ffd4d3948e0, extended_by=0x7ffd4d3947ac) at localbuf.c:326
#10 0x5639457e8851 in ExtendBufferedRelCommon (bmr=..., fork=MAIN_FORKNUM, strategy=0x0, flags=8, extend_by=1, 
extend_upto=4294967295, buffers=0x7ffd4d3948e0, extended_by=0x7ffd4d39488c) at bufmgr.c:2175
#11 0x5639457e6850 in ExtendBufferedRelBy (bmr=..., fork=MAIN_FORKNUM, strategy=0x0, flags=8, extend_by=1, 
buffers=0x7ffd4d3948e0, extended_by=0x7ffd4d39488c) at bufmgr.c:923
#12 0x5639452d8ae6 in RelationAddBlocks (relation=0x7fba650abd78, bistate=0x0, num_pages=1, use_fsm=true, 
did_unlock=0x7ffd4d394a3d) at hio.c:341
#13 0x5639452d944a in RelationGetBufferForTuple (relation=0x7fba650abd78, len=32, otherBuffer=0, options=0, 
bistate=0x0, vmbuffer=0x7ffd4d394ac4, vmbuffer_other=0x0, num_pages=1) at hio.c:767
#14 0x5639452be996 in heap_insert (relation=0x7fba650abd78, tup=0x5639476ecfc0, cid=0, options=0, bistate=0x0) at 
heapam.c:2019
#15 0x5639452cee84 in heapam_tuple_insert (relation=0x7fba650abd78, slot=0x5639476ecf30, cid=0, options=0, 
bistate=0x0) at heapam_handler.c:251
#16 0x5639455b3b07 in table_tuple_insert (rel=0x7fba650abd78, slot=0x5639476ecf30, cid=0, options=0, bistate=0x0) at 
../../../src/include/access/tableam.h:1405
#17 0x5639455b5c60 in ExecInsert (context=0x7ffd4d394d20, resultRelInfo=0x5639476ec390, slot=0x5639476ecf30, 
canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1139

#18 0x5639455ba942 in ExecModifyTable (pstate=0x5639476ec180) at 
nodeModifyTable.c:4077
#19 0x563945575425 in ExecProcNodeFirst (node=0x5639476ec180) at 
execProcnode.c:469
#20 0x563945568095 in ExecProcNode (node=0x5639476ec180) at 
../../../src/include/executor/executor.h:274
#21 0x56394556af65 in ExecutePlan (estate=0x5639476ebf00, planstate=0x5639476ec180, use_parallel_mode=false, 
operation=CMD_INSERT, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x5639476f5470,

    execute_once=true) at execMain.c:1646
#22 0x5639455687e3 in standard_ExecutorRun (queryDesc=0x5639476f3e70, direction=ForwardScanDirection, count=0, 
execute_once=true) at execMain.c:363
#23 0x5639455685b9 in ExecutorRun (queryDesc=0x5639476f3e70, direction=ForwardScanDirection, count=0, 
execute_once=true) at execMain.c:304
#24 0x56394584986e in ProcessQuery (plan=0x5639476f5310, sourceText=0x56394760d610 "INSERT INTO t VALUES(1);", 
params=0x0, queryEnv=0x0, dest=0x5639476f5470, qc=0x7ffd4d395180) at pquery.c:160
#25 0x56394584b445 in PortalRunMulti (portal=0x56394768ab20, isTopLevel=true, setHoldSnapshot=false, 
dest=0x5639476f5470, altdest=0x5639476f5470, qc=0x7ffd4d395180) at pquery.c:1278
#26 0x56394584a93c in PortalRun (portal=0x56394768ab20, count=9223372036854775807, isTopLevel=true, run_once=true, 
dest=0x5639476f5470, altdest=0x5639476f5470, qc=0x7ffd4d395180) at pquery.c:791

#27 0x563945842fd9 in exec_simple_query (query_string=0x56394760d610 "INSERT 
INTO t VALUES(1);") at postgres.c:1284
#28 0x563945848536 in PostgresMain (dbname=0x563947644900 "regression", username=0x5639476448e8 "law") at 
postgres.c:4766

#29 0x56394583eb67 in BackendMain (startup_data=0x7ffd4d395404 "", 
startup_data_len=4) at backend_startup.c:107
#30 0x56394574e00e in postmaster_child_launch (child_type=B_BACKEND, startup_data=0x7ffd4d395404 "", 
startup_data_len=4, client_sock=0x7ffd4d395450) at launch_backend.c:274

#31 0x563945753f7

Re: generic plans and "initial" pruning

2024-08-15 Thread Robert Haas

On Thu, Aug 15, 2024 at 8:57 AM Amit Langote  wrote:
> TBH, it's more of a hunch that people who are not involved in this
> development might find the new reality, whereby the execution is not
> racefree until ExecutorRun(), hard to reason about.

I'm confused by what you mean here by "racefree". A race means
multiple sessions are doing stuff at the same time and the result
depends on who does what first, but the executor stuff is all
backend-private. Heavyweight locks are not backend-private, but those
would be taken in ExectorStart(), not ExecutorRun(), IIUC.

> With the patch, CreateQueryDesc() and ExecutorStart() are moved to
> PortalStart() so that QueryDescs including the PlanState trees for all
> queries are built before any is run.  Why?  So that if ExecutorStart()
> fails for any query in the list, we can simply throw out the QueryDesc
> and the PlanState trees of the previous queries (NOT run them) and ask
> plancache for a new CachedPlan for the list of queries.  We don't have
> a way to ask plancache.c to replan only a given query in the list.

I agree that moving this from PortalRun() to PortalStart() seems like
a bad idea, especially in view of what you write below.

> * There's no longer CCI() between queries in PortalRunMulti() because
> the snapshots in each query's QueryDesc must have been adjusted to
> reflect the correct command counter.  I've checked but can't really be
> sure if the value in the snapshot is all anyone ever uses if they want
> to know the current value of the command counter.

I don't think anything stops somebody wanting to look at the current
value of the command counter. I also don't think you can remove the
CommandCounterIncrement() calls between successive queries, because
then they won't see the effects of earlier calls. So this sounds
broken to me.

Also keep in mind that one of the queries could call a function which
does something that bumps the command counter again. I'm not sure if
that creates its own hazzard separate from the lack of CCIs, or
whether it's just another part of that same issue. But you can't
assume that each query's snapshot should have a command counter value
one more than the previous query.

While this all seems bad for the partially-initialized-execution-tree
approach, I wonder if you don't have problems here with the other
design, too. Let's say you've the multi-query case and there are 2
queries. The first one (Q1) is SELECT mysterious_function() and the
second one (Q2) is SELECT * FROM range_partitioned_table WHERE
key_column = 42. What if mysterious_function() performs DDL on
range_partitioned_table? I haven't tested this so maybe there are
things going on here that prevent trouble, but it seems like executing
Q1 can easily invalidate the plan for Q2. And then it seems like
you're basically back to the same problem.

> > > 3. The need to add *back* the fields to store the RT indexes of
> > > relations that are not looked at by ExecInitNode() traversal such as
> > > root partitioned tables and non-leaf partitions.
> >
> > I don't remember exactly why we removed those or what the benefit was,
> > so I'm not sure how big of a problem it is if we have to put them
> > back.
>
> We removed those in commit 52ed730d511b after commit f2343653f5b2
> removed redundant execution-time locking of non-leaf relations.  So we
> removed them because we realized that execution time locking is
> unnecessary given that AcquireExecutorLocks() exists and now we want
> to add them back because we'd like to get rid of
> AcquireExecutorLocks(). :-)

My bias is to believe that getting rid of AcquireExecutorLocks() is
probably the right thing to do, but that's not a strongly-held
position and I could be totally wrong about it. The thing is, though,
that AcquireExecutorLocks() is fundamentally stupid, and it's hard to
see how it can ever be any smarter. If we want to make smarter
decisions about what to lock, it seems reasonable to me to think that
the locking code needs to be closer to code that can evaluate
expressions and prune partitions and stuff like that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: long-standing data loss bug in initial sync of logical replication

2024-08-15 Thread vignesh C

On Thu, 8 Aug 2024 at 16:24, Shlok Kyal  wrote:
>
> On Wed, 31 Jul 2024 at 11:17, Shlok Kyal  wrote:
> >
> > On Wed, 31 Jul 2024 at 09:36, Amit Kapila  wrote:
> > >
> > > On Wed, Jul 31, 2024 at 3:27 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Wed, Jul 24, 2024 at 9:53 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Wed, Jul 17, 2024 at 5:25 PM vignesh C  wrote:
> > > > > >
> > > > > > On Wed, 17 Jul 2024 at 11:54, Amit Kapila  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Tue, Jul 16, 2024 at 6:54 PM vignesh C  
> > > > > > > wrote:
> > > > > > >
> > > > > > > BTW, I noticed that we don't take any table-level locks for Create
> > > > > > > Publication .. For ALL TABLES (and Drop Publication). Can that 
> > > > > > > create
> > > > > > > a similar problem? I haven't tested so not sure but even if there 
> > > > > > > is a
> > > > > > > problem for the Create case, it should lead to some ERROR like 
> > > > > > > missing
> > > > > > > publication.
> > > > > >
> > > > > > I tested these scenarios, and as you expected, it throws an error 
> > > > > > for
> > > > > > the create publication case:
> > > > > > 2024-07-17 14:50:01.145 IST [481526] 481526  ERROR:  could not 
> > > > > > receive
> > > > > > data from WAL stream: ERROR:  publication "pub1" does not exist
> > > > > > CONTEXT:  slot "sub1", output plugin "pgoutput", in the 
> > > > > > change
> > > > > > callback, associated LSN 0/1510CD8
> > > > > > 2024-07-17 14:50:01.147 IST [481450] 481450  LOG:  background worker
> > > > > > "logical replication apply worker" (PID 481526) exited with exit 
> > > > > > code
> > > > > > 1
> > > > > >
> > > > > > The steps for this process are as follows:
> > > > > > 1) Create tables in both the publisher and subscriber.
> > > > > > 2) On the publisher: Create a replication slot.
> > > > > > 3) On the subscriber: Create a subscription using the slot created 
> > > > > > by
> > > > > > the publisher.
> > > > > > 4) On the publisher:
> > > > > > 4.a) Session 1: BEGIN; INSERT INTO T1;
> > > > > > 4.b) Session 2: CREATE PUBLICATION FOR ALL TABLES
> > > > > > 4.c) Session 1: COMMIT;
> > > > > >
> > > > > > Since we are throwing out a "publication does not exist" error, 
> > > > > > there
> > > > > > is no inconsistency issue here.
> > > > > >
> > > > > > However, an issue persists with DROP ALL TABLES publication, where
> > > > > > data continues to replicate even after the publication is dropped.
> > > > > > This happens because the open transaction consumes the invalidation,
> > > > > > causing the publications to be revalidated using old snapshot. As a
> > > > > > result, both the open transactions and the subsequent transactions 
> > > > > > are
> > > > > > getting replicated.
> > > > > >
> > > > > > We can reproduce this issue by following these steps in a logical
> > > > > > replication setup with an "ALL TABLES" publication:
> > > > > > On the publisher:
> > > > > > Session 1: BEGIN; INSERT INTO T1 VALUES (val1);
> > > > > > In another session on the publisher:
> > > > > > Session 2: DROP PUBLICATION
> > > > > > Back in Session 1 on the publisher:
> > > > > > COMMIT;
> > > > > > Finally, in Session 1 on the publisher:
> > > > > > INSERT INTO T1 VALUES (val2);
> > > > > >
> > > > > > Even after dropping the publication, both val1 and val2 are still
> > > > > > being replicated to the subscriber. This means that both the
> > > > > > in-progress concurrent transaction and the subsequent transactions 
> > > > > > are
> > > > > > being replicated.
> > > > > >
> > > > > > I don't think locking all tables is a viable solution in this case, 
> > > > > > as
> > > > > > it would require asking the user to refrain from performing any
> > > > > > operations on any of the tables in the database while creating a
> > > > > > publication.
> > > > > >
> > > > >
> > > > > Indeed, locking all tables in the database to prevent concurrent DMLs
> > > > > for this scenario also looks odd to me. The other alternative
> > > > > previously suggested by Andres is to distribute catalog modifying
> > > > > transactions to all concurrent in-progress transactions [1] but as
> > > > > mentioned this could add an overhead. One possibility to reduce
> > > > > overhead is that we selectively distribute invalidations for
> > > > > catalogs-related publications but I haven't analyzed the feasibility.
> > > > >
> > > > > We need more opinions to decide here, so let me summarize the problem
> > > > > and solutions discussed. As explained with an example in an email [1],
> > > > > the problem related to logical decoding is that it doesn't process
> > > > > invalidations corresponding to DDLs for the already in-progress
> > > > > transactions. We discussed preventing DMLs in the first place when
> > > > > concurrent DDLs like ALTER PUBLICATION ... ADD TABLE ... are in
> > > > > progress. The solution discussed was to acquire
> > > > > ShareUpdateExclusiveLock for all the tables being added via such
> > > > > commands. Further

Re: optimizing pg_upgrade's once-in-each-database steps

2024-08-15 Thread Nathan Bossart

On Sat, Aug 10, 2024 at 10:35:46AM -0500, Nathan Bossart wrote:
> Another option might be to combine all the queries for a task into a single
> string and then send that in one PQsendQuery() call.  That may be a simpler
> way to eliminate the time between queries.

I tried this out and didn't see any difference in my tests.  However, the
idea seems sound, and I could remove ~40 lines of code by doing this and by
making the search_path query an implicit first step (instead of its own
state).  So, here's a v9 of the patch set with those changes.

-- 
nathan
>From 8d8577b3dd1bbdbe584816dfb9045a5988862412 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 28 Jun 2024 11:02:44 -0500
Subject: [PATCH v9 01/11] introduce framework for parallelizing pg_upgrade
 tasks

---
 src/bin/pg_upgrade/Makefile  |   1 +
 src/bin/pg_upgrade/async.c   | 425 +++
 src/bin/pg_upgrade/meson.build   |   1 +
 src/bin/pg_upgrade/pg_upgrade.h  |  14 +
 src/tools/pgindent/typedefs.list |   4 +
 5 files changed, 445 insertions(+)
 create mode 100644 src/bin/pg_upgrade/async.c

diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index bde91e2beb..3bc4f5d740 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -12,6 +12,7 @@ include $(top_builddir)/src/Makefile.global
 
 OBJS = \
$(WIN32RES) \
+   async.o \
check.o \
controldata.o \
dump.o \
diff --git a/src/bin/pg_upgrade/async.c b/src/bin/pg_upgrade/async.c
new file mode 100644
index 00..18a0b01f79
--- /dev/null
+++ b/src/bin/pg_upgrade/async.c
@@ -0,0 +1,425 @@
+/*
+ * async.c
+ * parallelization via libpq's async APIs
+ *
+ * This framework provides an efficient way of running the various
+ * once-in-each-database tasks required by pg_upgrade.  Specifically, it
+ * parallelizes these tasks by managing a simple state machine of
+ * user_opts.jobs slots and using libpq's asynchronous APIs to establish the
+ * connections and run the queries.  Callers simply need to create a callback
+ * function and build/execute an AsyncTask.  A simple example follows:
+ *
+ * static void
+ * my_process_cb(DbInfo *dbinfo, PGresult *res, void *arg)
+ * {
+ * for (int i = 0; i < PQntuples(res); i++)
+ * {
+ * ... process results ...
+ * }
+ * }
+ *
+ * void
+ * my_task(ClusterInfo *cluster)
+ * {
+ * AsyncTask  *task = async_task_create();
+ *
+ * async_task_add_step(task,
+ * "... query text 
...",
+ * my_process_cb,
+ * true,   // let 
the task free the PGresult
+ * NULL);  // 
"arg" pointer for the callbacks
+ * async_task_run(task, cluster);
+ * async_task_free(task);
+ * }
+ *
+ * Note that multiple steps can be added to a given task.  When there are
+ * multiple steps, the task will run all of the steps consecutively in the same
+ * database connection before freeing the connection and moving on.  In other
+ * words, it only ever initiates one connection to each database in the
+ * cluster for a given run.
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ * src/bin/pg_upgrade/async.c
+ */
+
+#include "postgres_fe.h"
+
+#include "common/connect.h"
+#include "fe_utils/string_utils.h"
+#include "pg_upgrade.h"
+
+/*
+ * dbs_complete stores the number of databases that we have completed
+ * processing.  When this value equals the number of databases in the cluster,
+ * the task is finished.
+ */
+static int dbs_complete;
+
+/*
+ * dbs_processing stores the index of the next database in the cluster's array
+ * of databases that will be picked up for processing.  It will always be
+ * greater than or equal to dbs_complete.
+ */
+static int dbs_processing;
+
+/*
+ * This struct stores all the information for a single step of a task.  All
+ * steps in a task are run in a single connection before moving on to the next
+ * database (which requires a new connection).
+ */
+typedef struct AsyncTaskCallbacks
+{
+   AsyncTaskProcessCB process_cb;  /* processes the results of the query */
+   const char *query;  /* query text */
+   boolfree_result;/* should we free the result? */
+   void   *arg;/* pointer passed to each 
callback */
+} AsyncTaskCallbacks;
+
+/*
+ * This struct is a thin wrapper around an array of steps, i.e.,
+ * AsyncTaskCallbacks.
+ */
+typedef struct AsyncTask
+{
+   AsyncTaskCallbacks *cbs;
+   int num_cb_sets;
+} AsyncTask;
+
+

Re: Make query cancellation keys longer

2024-08-15 Thread Heikki Linnakangas

I'm back to working on the main patch here, to make cancellation keys 
longer. New rebased version attached, with all the FIXMEs and TODOs from 
the earlier version fixed. There was a lot of bitrot, too.


The first patch now introduces timingsafe_bcmp(), a function borrowed 
from OpenBSD to perform a constant-time comparison. There's a configure 
check to use the function from the OS if it's available, and includes a 
copy of OpenBSD's implementation otherwise. Similar functions exist with 
different names in OpenSSL (CRYPTO_memcmp) and NetBSD 
(consttime_memequal), but it's a pretty simple function so I don't think 
we need to work too hard to pick up those other native implementations.


I used it for checking if the cancellation key matches, now that it's 
not a single word anymore. It feels paranoid to worry about timing 
attacks here, a few instructions is unlikely to give enough signal to an 
attacker and query cancellation is not a very interesting target anyway. 
But better safe than sorry. You can still get information about whether 
a backend with the given PID exists at all, the constant-time comparison 
only applies to comparing the key. We probably should be using this in 
some other places in the backend, but I haven't gone around looking for 
them.



Hmm, looking at the current code, I do agree that not introducing a
new message would simplify both client and server implementation. Now
clients need to do different things depending on if the server
supports 3.1 or 3.0 (sending CancelRequestExtended instead of
CancelRequest and having to parse BackendKeyData differently). And I
also agree that the extra length field doesn't add much in regards to
sanity checking (for the CancelRequest and BackendKeyData message
types at least). So, sorry for the back and forth on this, but I now
agree with you that we should not add the length field. I think one
reason I didn't see the benefit before was because the initial patch
0002 was still introducing a CancelRequestExtended message type. If we
can get rid of this message type by not adding a length, then I think
that's worth losing the sanity checking.


Ok, I went back to the original scheme that just redefines the secret 
key in the CancelRequest message to be variable length, with the length 
deduced from the message length.



We could teach
libpq to disconnect and reconnect with minor protocol version 3.0, if
connecting with 3.1 fails, but IMHO it's better to not complicate this
and accept the break in backwards-compatibility.


Yeah, implementing automatic reconnection seems a bit overkill to me
too. But it might be nice to add a connection option that causes libpq
to use protocol 3.0. Having to install an old libpq to connect to an
old server seems quite annoying.


Added a "protocol_version" libpq option for that. It defaults to "auto", 
but you can set it to "3.1" or "3.0" to force the version. It makes it 
easier to test that the backwards-compatibility works, too.



Especially since I think that many other types of servers that
implement the postgres protocol have not implemented the minor
version negotiation.

I at least know PgBouncer[1] and pgcat[2] have not, but probably
other server implementations like CockroachDB and Google Spanner have
this problem too.

Good point.

--
Heikki Linnakangas
Neon (https://neon.tech)
From 5fb2a442efca75dbc384bafee486b090f91806ba Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Thu, 15 Aug 2024 15:49:46 +0300
Subject: [PATCH v5 1/2] Add timingsafe_bcmp(), for constant-time memory
 comparison

timingsafe_bcmp() should be used instead of memcmp() or a naive
for-loop, when comparing passwords or secret tokens, to avoid leaking
information about the secret token by timing. This commit just
introduces the function but does not change any existing code to use
it yet.
---
 configure  | 23 +++
 configure.ac   |  3 ++-
 meson.build|  2 ++
 src/include/port.h |  4 
 src/port/meson.build   |  1 +
 src/port/timingsafe_bcmp.c | 35 +++
 6 files changed, 67 insertions(+), 1 deletion(-)
 create mode 100644 src/port/timingsafe_bcmp.c

diff --git a/configure b/configure
index 2abbeb2794..1fd832bd5b 100755
--- a/configure
+++ b/configure
@@ -15758,6 +15758,16 @@ fi
 cat >>confdefs.h <<_ACEOF
 #define HAVE_DECL_STRSEP $ac_have_decl
 _ACEOF
+ac_fn_c_check_decl "$LINENO" "timingsafe_bcmp" "ac_cv_have_decl_timingsafe_bcmp" "$ac_includes_default"
+if test "x$ac_cv_have_decl_timingsafe_bcmp" = xyes; then :
+  ac_have_decl=1
+else
+  ac_have_decl=0
+fi
+
+cat >>confdefs.h <<_ACEOF
+#define HAVE_DECL_TIMINGSAFE_BCMP $ac_have_decl
+_ACEOF
 
 
 # We can't use AC_CHECK_FUNCS to detect these functions, because it
@@ -15918,6 +15928,19 @@ esac
 
 fi
 
+ac_fn_c_check_func "$LINENO" "timingsafe_bcmp" "ac_cv_func_timingsafe_bcmp"
+if test "x$ac_cv_func_timingsafe_bcmp" = xyes; then :
+  $as_echo "#define HAVE_TIMINGSAFE_

Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest

2024-08-15 Thread Matthias van de Meent

(sorry for the formatting, my mobile phone doesn't have the capabilities I
usually get when using my laptop)

On Thu, 15 Aug 2024, 16:02 Jelte Fennema-Nio,  wrote:

> On Thu, 15 Aug 2024 at 15:33, Peter Eisentraut 
> wrote:
> > Maybe this kind of thing should rather be on the linked-to web page, not
> > in every email.
>
> Yeah, I'll first put a code snippet on the page for the commitfest entry.
>
> > But a more serious concern here is that the patches created by the cfbot
> > are not canonical.  There are various heuristics when they get applied.
> > I would prefer that people work with the actual patches sent by email,
> > at least unless they know exactly what they are doing.  We don't want to
> > create parallel worlds of patches that are like 90% similar but not
> > really identical.
>
> I'm not really sure what kind of heuristics and resulting differences
> you're worried about here. The heuristics it uses are very simple and
> are good enough for our CI. Basically they are:
> 1. Unzip/untar based on file extension
> 2. Apply patches using "patch" in alphabetic order
>
> Also, when I apply patches myself, I use heuristics too. And my
> heuristics are probably different from yours. So I'd expect that many
> people using the exact same heuristic would only make the situation
> better. Especially because if people don't know exactly what they are
> doing, then their heuristics are probably not as good as the one of
> our cfbot. I know I've struggled a lot the first few times when I was
> manually applying patches.

One serious issue with this is that in cases of apply failures, CFBot
delays, or other issues, the CFBot repo won't contain the latest version of
the series' patchsets. E.g. a hacker can  accidentally send an incremental
patch, or an unrelated patch to fix an issue mentioned in the thread
without splitting into a different thread, etc. This can easily cause users
(and CFBot) to test and review the wrong patch, esp. when the mail thread
proper is not looked by the reviewer, which would be somewhat promoted by a
CFA+github -centric workflow.

Apart from the above issue, I'm -0.5 on what to me equates with automated
spam to -hackers: the volume of mails would put this around the 16th most
common sender on -hackers, with about 400 mails/year (based on 80 new
patches for next CF, and 5 CFs/year, combined with Robert's 2023 statistics
at [0]).

I also don't quite like the suggested contents of such mail: (1) and (2)
are essentially duplicative information, and because CF's entries' IDs are
not shown in the app the "with ID " part of (1) is practically useless
(better use the CFE's title), (3) would best be stored and/or integrated in
the CFA, as would (4). Additionally, (4) isn't canonical/guaranteed to be
up-to-date, see above. As for the "copy-pastable git commands" suggestion,
I'm not sure that's applicable, for the same reasons that (4) won't work
reliably. CFBot's repo to me seems more like an internal implementation
detail of CFBot than an authorative source of patchset diffs.

Maybe we could instead automate CF mail thread registration by allowing
registration of threadless CF entries (as 'preliminary'), and detecting
(and subsequently linking) new threads containing references to those CF
entries, with e.g. an  "CF: https://commitfest.postgresql.org/49/4980/";
directive in the new thread's initial mail's text. This would give the
benefits of requiring no second mail for CF referencing purposes, be it
automated or manual.
Alternatively, we could allow threads for new entries to be started through
the CF app (which would automatically insert the right form data into the
mail), providing an alternative avenue to registering patches that doesn't
have the chicken-and-egg problem you're trying to address here.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0] https://rhaas.blogspot.com/2024/01/who-contributed-to-postgresql.html

Re: Reducing the log spam

2024-08-15 Thread Rafia Sabih

On Thu, 25 Jul 2024 at 18:03, Laurenz Albe  wrote:

> Thanks for the review!
>
> On Wed, 2024-07-24 at 15:27 +0200, Rafia Sabih wrote:
> > I liked the idea for this patch. I will also go for the default being
> > an empty string.
> > I went through this patch and have some comments on the code,
> >
> > 1. In general, I don't like the idea of goto, maybe we can have a
> > free_something function to call here.
>
> The PostgreSQL code base has over 3000 goto's...
>
> Sure, that can be factored out to a function (except the final "return"),
> but I feel that a function for three "free" calls is code bloat.
>
> On a detailed look over this, you are right Laurenz about this.

> Do you think that avoiding the goto and using a function would make the
> code simpler and clearer?
>
> > 2.
> > if (!SplitIdentifierString(new_copy, ',', &states))
> > {
> > GUC_check_errdetail("List syntax is invalid.");
> > goto failed;
> > }
> > Here, we don't need all that free-ing, we can just return false here.
>
> I am OK with changing that; I had thought it was more clearer and more
> foolproof to use the same pattern everywhere.
>
> > 3.
> > /*
> > * Check the the values are alphanumeric and convert them to upper case
> > * (SplitIdentifierString converted them to lower case).
> > */
> > for (p = state; *p != '\0'; p++)
> > if (*p >= 'a' && *p <= 'z')
> > *p += 'A' - 'a';
> > else if (*p < '0' || *p > '9')
> > {
> > GUC_check_errdetail("error codes can only contain digits and ASCII
> letters.");
> > goto failed;
> > }
> > I was thinking, maybe we can use tolower() function here.
>
> That is a good idea, but these C library respect the current locale.
> I would have to explicitly specify the C locale or switch the locale
> temporarily.
>
Hmm. actually I don't have any good answers to this locale issue.

>
> Switching the locale seems clumsy, and I have no idea what I would have
> to feed as second argument to toupper_l() or isalnum_l().
> Do you have an idea?
>
> > 4.
> > list_free(states);
> > pfree(new_copy);
> >
> > *extra = statelist;
> > return true;
> >
> > failed:
> > list_free(states);
> > pfree(new_copy);
> > guc_free(statelist);
> > return false;
> >
> > This looks like duplication that can be easily avoided.
> > You may have free_somthing function to do all free-ing stuff only and
> > its callee can then have a return statement.
> > e.g for here,
> > free_states(states, new_copy, statelist);
> > return true;
>
> That free_states() function would just contain two function calls.
> I think that defining a special function for that is somewhat out of
> proportion.
>
> > 5. Also, for alphanumeric check, maybe we can have something like,
> > if (isalnum(*state) == 0)
> > {
> > GUC_check_errdetail("error codes can only contain digits and ASCII
> letters.");
> > goto failed;
> > }
> > and we can do this in the beginning after the len check.
>
> isalnum() operates on a single character and depends on the current locale.
> See my comments to 3. above.
>
>
> Please let me know what you think, particularly about the locale problem.
>
> Yours,
> Laurenz Albe
>


-- 
Regards,
Rafia Sabih

Re: Remove dependence on integer wrapping

2024-08-15 Thread Nathan Bossart

On Thu, Aug 15, 2024 at 02:56:00PM +0800, jian he wrote:
> i am confused with
> "
> +#elif defined(HAVE_INT128)
> + uint128 res = -((int128) a);
> "
> I thought "unsigned" means non-negative, therefore uint128 means non-negative.
> therefore "int128  res = -((int128) a);" makes sense to me.

Ah, that's a typo, thanks for pointing it out.

> also in HAVE_INT128 branch
> do we need cast int128 to int64, like
> 
> *result = (int64) res;

I don't think we need an explicit cast here since *result is known to be an
int64.  But it certainly wouldn't hurt anything...

-- 
nathan

Re: Partial aggregates pushdown

2024-08-15 Thread Bruce Momjian

On Sun, Jul  7, 2024 at 09:52:27PM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Hi Bruce.
> 
> > From: Bruce Momjian 
> > Is there a reason the documentation is no longer a part of this patch?
> > Can I help you keep it current?
> 
> Here are the reasons:
>   Reason1. The approach differs significantly from the previous patch that 
> included documentation, the below.
> 
> https://www.postgresql.org/message-id/attachment/152086/0001-Partial-aggregates-push-down-v34.patch
>   Reason2. I have two options for transmitting the state value and I'm 
> evaluating which one is optimal.
>One is what I presented you in PGConf.dev2024. The other is Jelte's one.
>He listened to my talk at the conference and gave me some useful comments 
> on hackers. I'm very grateful that.
>   Reason3. The implementation and test have been not finished yet.
> Regarding Reason 2, I provided my conclusion in the previous message.
> 
> My plan for advancing the patch involves the following steps:
>   Step1. Decide the approach on transmitting state value.
>   Step2. Implement code (including comments) and tests to support a subset of 
> aggregate functions.
> Specifically, I plan to support avg, sum, and other aggregate functions 
> like min and max which don't need export/import functions.
>   Step3. Add documentations.
> 
> To clarify my approach, should I proceed with Step 3 before Step2?
> I would appreciate your feedback on this.

Thanks, I now understand why the docs were remove, and I agree.  I will
post about the options now in a new email.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: POC, WIP: OR-clause support for indexes

2024-08-15 Thread Alena Rybakina


Hi!
On 07.08.2024 04:11, Alexander Korotkov wrote:

On Mon, Aug 5, 2024 at 11:24 PM Alena Rybakina
 wrote:

Ok, thank you for your work)

I think we can leave only the two added libraries in the first patch,
others are superfluous.

Thank you.
I also have fixed some grammar issues.


While reviewing the patch, I can't understand one part of the code where 
we check the comparability of restrictinfos.


/* RestrictInfo parameters dmust match parent */
    if (subRinfo->is_pushed_down != rinfo->is_pushed_down ||
        subRinfo->is_clone != rinfo->is_clone ||
        subRinfo->security_level != rinfo->security_level ||
        !bms_equal(subRinfo->required_relids, 
rinfo->required_relids) ||
        !bms_equal(subRinfo->incompatible_relids, 
rinfo->incompatible_relids) ||

        !bms_equal(subRinfo->outer_relids, rinfo->outer_relids))
        return NULL;

I didn't find a place in the optimizer where required_relids, 
incompatible_relids and outer_relids become different. Each 
make_restrictinfo function takes arguments from

parent data.

I disabled this check and the regression tests passed. This code is 
needed for security verification， may I clarify?


In the last patch I corrected the libraries - one of them was not in 
alphabetical order.


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From 0d4e4d89f32496c692c43ec6b2d5a33b906f6697 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov 
Date: Mon, 5 Aug 2024 21:27:02 +0300
Subject: [PATCH 1/2] Transform OR-clauses to SAOP's during index matching

Replace "(indexkey op C1) OR (indexkey op C2) ... (indexkey op CN)" with
"indexkey op ANY(ARRAY[C1, C2, ...])" (ScalarArrayOpExpr node) during matching
a clause to index.

Here Ci is an i-th constant or parameters expression, 'expr' is non-constant
expression, 'op' is an operator which returns boolean result and has a commuter
(for the case of reverse order of constant and non-constant parts of the
expression, like 'Cn op expr').

This transformation allows handling long OR-clauses with single IndexScan
avoiding slower bitmap scans.

Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru
Author: Alena Rybakina 
Author: Andrey Lepikhov 
Reviewed-by: Peter Geoghegan 
Reviewed-by: Ranier Vilela 
Reviewed-by: Alexander Korotkov 
Reviewed-by: Robert Haas 
Reviewed-by: Jian He 
Reviewed-by: Tom Lane 
Reviewed-by: Nikolay Shaplov 
---
 src/backend/optimizer/path/indxpath.c  | 253 +
 src/test/regress/expected/create_index.out | 183 +--
 src/test/regress/expected/join.out |  57 -
 src/test/regress/sql/create_index.sql  |  42 
 src/test/regress/sql/join.sql  |   9 +
 5 files changed, 522 insertions(+), 22 deletions(-)

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c0fcc7d78df..cbfb0fdb3c8 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -32,8 +32,10 @@
 #include "optimizer/paths.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
+#include "utils/array.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
+#include "utils/syscache.h"
 
 
 /* XXX see PartCollMatchesExprColl */
@@ -177,6 +179,10 @@ static IndexClause *match_rowcompare_to_indexcol(PlannerInfo *root,
  RestrictInfo *rinfo,
  int indexcol,
  IndexOptInfo *index);
+static IndexClause *match_orclause_to_indexcol(PlannerInfo *root,
+			   RestrictInfo *rinfo,
+			   int indexcol,
+			   IndexOptInfo *index);
 static IndexClause *expand_indexqual_rowcompare(PlannerInfo *root,
 RestrictInfo *rinfo,
 int indexcol,
@@ -2248,6 +2254,10 @@ match_clause_to_indexcol(PlannerInfo *root,
 	{
 		return match_rowcompare_to_indexcol(root, rinfo, indexcol, index);
 	}
+	else if (restriction_is_or_clause(rinfo))
+	{
+		return match_orclause_to_indexcol(root, rinfo, indexcol, index);
+	}
 	else if (index->amsearchnulls && IsA(clause, NullTest))
 	{
 		NullTest   *nt = (NullTest *) clause;
@@ -2771,6 +2781,249 @@ match_rowcompare_to_indexcol(PlannerInfo *root,
 	return NULL;
 }
 
+/*
+ * match_orclause_to_indexcol()
+ *	  Handles the OR-expr case for match_clause_to_indexcol() in the case
+ *	  when it could be transformed to ScalarArrayOpExpr.
+ */
+static IndexClause *
+match_orclause_to_indexcol(PlannerInfo *root,
+		   RestrictInfo *rinfo,
+		   int indexcol,
+		   IndexOptInfo *index)
+{
+	ListCell   *lc;
+	BoolExpr   *orclause = (BoolExpr *) rinfo->orclause;
+	Node	   *indexExpr = NULL;
+	List	   *consts = NIL;
+	Node	   *arrayNode = NULL;
+	ScalarArrayOpExpr *saopexpr = NULL;
+	HeapTuple	opertup;
+	Form_pg_operator operform;
+	Oid			matchOpno = InvalidOid;
+	IndexClause *iclause;
+	Oid			consttype = InvalidOid;
+	Oid			arraytype = InvalidOid;
+	Oid			inputcollid = InvalidOid;
+	bool		firstTime = t

Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2024-08-15 Thread Peter Geoghegan

Attached patch has EXPLAIN ANALYZE display the total number of
primitive index scans for all 3 kinds of index scan node. This is
useful for index scans that happen to use SAOP arrays. It also seems
almost essential to offer this kind of instrumentation for the skip
scan patch [1]. Skip scan works by reusing all of the Postgres 17 work
(see commit 5bf748b8) to skip over irrelevant sections of a composite
index with a low cardinality leading column, so it has all the same
issues.

One reason to have this patch is to differentiate between similar
cases involving simple SAOP arrays. The user will have some reasonable
way of determining how a query such as this:

pg@regression:5432 [2070325]=# explain (analyze, buffers, costs off,
summary off)
select
  abalance
from
  pgbench_accounts
where
  aid in (1, 2, 3, 4, 5);
┌──┐
│  QUERY PLAN
│
├──┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual
time=0.007..0.008 rows=5 loops=1) │
│   Index Cond: (aid = ANY ('{1,2,3,4,5}'::integer[]))
│
│   Primitive Index Scans: 1
│
│   Buffers: shared hit=4
│
└──┘
(4 rows)

...differs from a similar query, such as this:

pg@regression:5432 [2070325]=# explain (analyze, buffers, costs off,
summary off)
select
  abalance
from
  pgbench_accounts
where
  aid in (1000, 2000, 3000, 4000, 5000);
┌──┐
│  QUERY PLAN
│
├──┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual
time=0.006..0.012 rows=5 loops=1) │
│   Index Cond: (aid = ANY ('{1000,2000,3000,4000,5000}'::integer[]))
│
│   Primitive Index Scans: 5
│
│   Buffers: shared hit=20
│
└──┘
(4 rows)

Another reason to have this patch is consistency. We're only showing
the user the number of times we've incremented
pg_stat_user_tables.idx_scan in each case. The fact that
pg_stat_user_tables.idx_scan counts primitive index scans like this is
nothing new. That issue was only documented quite recently, as part of
the Postgres 17 work, and it seems quite misleading. It's consistent,
but not necessarily nonsurprising. Making it readily apparent that
there is more than one primitive index scan involved here makes the
issue less surprising.

Skip scan
-

Here's an example with this EXPLAIN ANALYZE patch applied on top of my
skip scan patch [1], using the tenk1 table left behind when the
standard regression tests are run:

pg@regression:5432 [2070865]=# create index on tenk1 (four, stringu1);
CREATE INDEX
pg@regression:5432 [2070865]=# explain (analyze, buffers, costs off,
summary off)
select
  stringu1
from
  tenk1
where
  -- Omitted: the leading column on "four"
  stringu1 = 'YG';
┌───┐
│QUERY PLAN
 │
├───┤
│ Index Only Scan using tenk1_four_stringu1_idx on tenk1 (actual
time=0.011..0.017 rows=15 loops=1) │
│   Index Cond: (stringu1 = 'YG'::name)
 │
│   Heap Fetches: 0
 │
│   Primitive Index Scans: 5
 │
│   Buffers: shared hit=11
 │
└───┘
(5 rows)

Notice that there are 5 primitive index scans here. That's what I'd
expect, given that there are exactly 4 distinct "logical subindexes"
implied by our use of a leading column on "four" as the scan's skip
column. Under the hood, an initial primitive index scan locates the
lowest "four" value. There are then 4 additional primitive index scans
to locate the next "four" value (needed when the current "four" value
gets past the value's "stringu1 = 'YG'" tuples).

Obviously, the cardinality of the leading column predicts the number
of primitive index scans at runtime. But it can be much more
complicated of a relationship than what I've shown here may suggest.
Skewness matters, too. Small clusters of index tuple

Re: Make query cancellation keys longer

2024-08-15 Thread Robert Haas

On Thu, Aug 15, 2024 at 1:13 PM Heikki Linnakangas  wrote:
> Added a "protocol_version" libpq option for that. It defaults to "auto",
> but you can set it to "3.1" or "3.0" to force the version. It makes it
> easier to test that the backwards-compatibility works, too.

Over on the "Add new protocol message to change GUCs for usage with
future protocol-only GUCs" there is a lot of relevant discussion about
how bumping the protocol version should work. This thread shouldn't
ignore all that discussion. Just to take one example, Jelte wants to
bump the protocol version to 3.2, not 3.1, for some reasons that are
in the commit message for the relevant patch over there.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: ECPG cleanup and fix for clang compile-time problem

2024-08-15 Thread Tom Lane

I wrote:
> Thanks, done.  Here's a revised patchset.

The cfbot points out that I should probably have marked progname
as "static" in 0008.  I'm not going to repost the patchset just for
that, though.

regards, tom lane

Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2024-08-15 Thread Matthias van de Meent

On Thu, 15 Aug 2024 at 21:23, Peter Geoghegan  wrote:
>
> Attached patch has EXPLAIN ANALYZE display the total number of
> primitive index scans for all 3 kinds of index scan node. This is
> useful for index scans that happen to use SAOP arrays. It also seems
> almost essential to offer this kind of instrumentation for the skip
> scan patch [1]. Skip scan works by reusing all of the Postgres 17 work
> (see commit 5bf748b8) to skip over irrelevant sections of a composite
> index with a low cardinality leading column, so it has all the same
> issues.

Did you notice the patch over at [0], where additional diagnostic
EXPLAIN output for btrees is being discussed, too? I'm asking, because
I'm not very convinced that 'primitive scans' are a useful metric
across all (or even: most) index AMs (e.g. BRIN probably never will
have a 'primitive scans' metric that differs from the loop count), so
maybe this would better be implemented in that framework?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0] 
https://www.postgresql.org/message-id/flat/TYWPR01MB10982D24AFA7CDC273445BFF0B1DC2%40TYWPR01MB10982.jpnprd01.prod.outlook.com#9c64cf75179da8d657a5eab7c75be480

Re: [Patch] add new parameter to pg_replication_origin_session_setup

2024-08-15 Thread Doruk Yilmaz

Hello again,

On Tue, Aug 13, 2024 at 12:48 AM Euler Taveira  wrote:
> I'm curious about your use case. Is it just because the internal function has 
> a
> different signature or your tool is capable of apply logical replication 
> changes
> in parallel using the SQL API?

The latter is correct, it applies logical replication changes in parallel.
Since multiple connections may commit, we need all of them to be able
to advance the replication origin.

> * no documentation changes. Since the function you are changing has a new
> signature, this change should be reflected in the documentation.
> * no need for a new internal function. The second parameter (PID) can be
> optional and defaults to 0 in this case. See how we changed the
> pg_create_logical_replication_slot along the years add some IN parameters like
> twophase and failover in the recent versions.

I updated/rewrote the patch to reflect these suggestions.
It now has the same DEFAULT 0 style used in pg_create_logical_replication_slot.
I also updated the documentation.

> * add a CF entry [1] for this patch so we don't forget it. Another advantage 
> is
> that this patch is covered by CI [2][3].
Sadly I still can't log in to the Commitfest due to the cool-off
period. I will create an entry as soon as this period ends.

Thanks for all the feedback,
Doruk Yılmaz
From b9c54f3d217f67c24ce74ffa7c1f2812d784333e Mon Sep 17 00:00:00 2001
From: Doruk 
Date: Thu, 15 Aug 2024 23:34:26 +0300
Subject: [PATCH] add new parameter to pg_replication_origin_session_setup

---
 doc/src/sgml/func.sgml   | 2 +-
 src/backend/catalog/system_functions.sql | 9 -
 src/backend/replication/logical/origin.c | 8 +---
 src/include/catalog/pg_proc.dat  | 2 +-
 4 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5dd95d73a1..7db5a8ed52 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29486,7 +29486,7 @@ DETAIL:  Make sure pg_wal_replay_wait() isn't called within a transaction with a
 
  pg_replication_origin_session_setup
 
-pg_replication_origin_session_setup ( node_name text )
+pg_replication_origin_session_setup ( node_name text, acquired_by integer DEFAULT 0 )
 void


diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 623b9539b1..4aae06e06d 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -639,6 +639,13 @@ LANGUAGE INTERNAL
 CALLED ON NULL INPUT VOLATILE PARALLEL SAFE
 AS 'pg_stat_reset_slru';
 
+CREATE OR REPLACE FUNCTION
+  pg_replication_origin_session_setup(node_name text, acquired_by integer DEFAULT 0)
+RETURNS void
+LANGUAGE INTERNAL
+STRICT VOLATILE
+AS 'pg_replication_origin_session_setup';
+
 --
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
@@ -736,7 +743,7 @@ REVOKE EXECUTE ON FUNCTION pg_replication_origin_session_progress(boolean) FROM
 
 REVOKE EXECUTE ON FUNCTION pg_replication_origin_session_reset() FROM public;
 
-REVOKE EXECUTE ON FUNCTION pg_replication_origin_session_setup(text) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_replication_origin_session_setup(text,integer) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_replication_origin_xact_reset() FROM public;
 
diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c
index 419e4814f0..e50bcc8466 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -1351,13 +1351,15 @@ pg_replication_origin_session_setup(PG_FUNCTION_ARGS)
 {
 	char	   *name;
 	RepOriginId origin;
-
+	int pid;
+	
 	replorigin_check_prerequisites(true, false);
 
 	name = text_to_cstring((text *) DatumGetPointer(PG_GETARG_DATUM(0)));
 	origin = replorigin_by_name(name, false);
-	replorigin_session_setup(origin, 0);
-
+	pid = PG_GETARG_INT32(1);
+	replorigin_session_setup(origin, pid);
+	
 	replorigin_session_origin = origin;
 
 	pfree(name);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4abc6d9526..a490d4fc6e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11948,7 +11948,7 @@
 { oid => '6006',
   descr => 'configure session to maintain replication progress tracking for the passed in origin',
   proname => 'pg_replication_origin_session_setup', provolatile => 'v',
-  proparallel => 'u', prorettype => 'void', proargtypes => 'text',
+  proparallel => 'u', prorettype => 'void', proargtypes => 'text int4',
   prosrc => 'pg_replication_origin_session_setup' },
 
 { oid => '6007', descr => 'teardown configured replication progress tracking',
-- 
2.39.2

Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2024-08-15 Thread Alena Rybakina


Hi! Thank you for your work on this subject!

On 15.08.2024 22:22, Peter Geoghegan wrote:

Attached patch has EXPLAIN ANALYZE display the total number of
primitive index scans for all 3 kinds of index scan node. This is
useful for index scans that happen to use SAOP arrays. It also seems
almost essential to offer this kind of instrumentation for the skip
scan patch [1]. Skip scan works by reusing all of the Postgres 17 work
(see commit 5bf748b8) to skip over irrelevant sections of a composite
index with a low cardinality leading column, so it has all the same
issues.
I think that it is enough to pass the IndexScanDesc parameter to the 
function - this saves us from having to define the planstate type twice.


For this reason, I suggest some changes that I think may improve your patch.


One reason to have this patch is to differentiate between similar
cases involving simple SAOP arrays. The user will have some reasonable
way of determining how a query such as this:

pg@regression:5432 [2070325]=# explain (analyze, buffers, costs off,
summary off)
select
   abalance
from
   pgbench_accounts
where
   aid in (1, 2, 3, 4, 5);
┌──┐
│  QUERY PLAN
 │
├──┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual
time=0.007..0.008 rows=5 loops=1) │
│   Index Cond: (aid = ANY ('{1,2,3,4,5}'::integer[]))
 │
│   Primitive Index Scans: 1
 │
│   Buffers: shared hit=4
 │
└──┘
(4 rows)

...differs from a similar query, such as this:

pg@regression:5432 [2070325]=# explain (analyze, buffers, costs off,
summary off)
select
   abalance
from
   pgbench_accounts
where
   aid in (1000, 2000, 3000, 4000, 5000);
┌──┐
│  QUERY PLAN
 │
├──┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual
time=0.006..0.012 rows=5 loops=1) │
│   Index Cond: (aid = ANY ('{1000,2000,3000,4000,5000}'::integer[]))
 │
│   Primitive Index Scans: 5
 │
│   Buffers: shared hit=20
 │
└──┘
(4 rows)

Another reason to have this patch is consistency. We're only showing
the user the number of times we've incremented
pg_stat_user_tables.idx_scan in each case. The fact that
pg_stat_user_tables.idx_scan counts primitive index scans like this is
nothing new. That issue was only documented quite recently, as part of
the Postgres 17 work, and it seems quite misleading. It's consistent,
but not necessarily nonsurprising. Making it readily apparent that
there is more than one primitive index scan involved here makes the
issue less surprising.

Skip scan
-

Here's an example with this EXPLAIN ANALYZE patch applied on top of my
skip scan patch [1], using the tenk1 table left behind when the
standard regression tests are run:

pg@regression:5432 [2070865]=# create index on tenk1 (four, stringu1);
CREATE INDEX
pg@regression:5432 [2070865]=# explain (analyze, buffers, costs off,
summary off)
select
   stringu1
from
   tenk1
where
   -- Omitted: the leading column on "four"
   stringu1 = 'YG';
┌───┐
│QUERY PLAN
  │
├───┤
│ Index Only Scan using tenk1_four_stringu1_idx on tenk1 (actual
time=0.011..0.017 rows=15 loops=1) │
│   Index Cond: (stringu1 = 'YG'::name)
  │
│   Heap Fetches: 0
  │
│   Primitive Index Scans: 5
  │
│   Buffers: shared hit=11
  │
└───┘
(5 rows)

Notice that there are 5 primitive index scans here. That's what I'd
expect, given that there are exactly 4 distinct "logical subindexes"
implied by our use of a leading column on "four" as the scan's skip
column. Under the hood, an initial primitive index scan locates the
lowest "four" value. There are then 4 additional primitive index scans
to locate the next "four" value (need

Re: pgsql: Introduce hash_search_with_hash_value() function

2024-08-15 Thread Nathan Bossart

On Wed, Aug 07, 2024 at 02:01:30PM -0400, Robert Haas wrote:
> Also, for the notes to be useful, we'd probably need some conventions
> about how we, as a project, want to use them. If everyone does
> something different, the result isn't likely to be all that great.

What did you have in mind?  Would it be sufficient to propose a template
that, once ratified, would be available in the committing checklist [0]?

[0] https://wiki.postgresql.org/wiki/Committing_checklist

-- 
nathan

Re: [PoC] XMLCast (SQL/XML X025)

2024-08-15 Thread Jim Jones



On 05.07.24 16:18, Jim Jones wrote:
> On 02.07.24 18:02, Jim Jones wrote:
>> It basically does the following:
>>
>> * When casting an XML value to a SQL data type, XML values containing
>> XSD literals will be converted to their equivalent SQL data type.
>> * When casting from a SQL data type to XML, the cast operand will be
>> translated to its corresponding XSD data type.
>>
> v2 attached adds missing return for NO_XML_SUPPORT control path in
> unescape_xml
>
v3 adds the missing XML passing mechanism BY VALUE and BY REF, as
described in the  XMLCast specification:

XMLCAST ( AS  [  ])

Tests and documentation were updated accordingly.

-- 
Jim
From 0679e82e9653183190a6af6c97de1887f567ef72 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 15 Aug 2024 20:27:36 +0200
Subject: [PATCH v3] Add XMLCast function (SQL/XML X025)

This function implements the SQL/XML function xmlcast, which
enables conversions between SQL data types and the XML data type.

XMLCAST ( expression AS type [ BY REF | BY VALUE ] )

When casting an XML value to a SQL data type, XML values containing
XSD literals will be converted to their equivalent SQL data type.
When casting from a SQL data type to XML, the cast operand will be
translated to its corresponding XSD data type.

This patch also includes documentation and regression tests.
---
 doc/src/sgml/datatype.sgml|  78 -
 src/backend/catalog/sql_features.txt  |   2 +-
 src/backend/executor/execExprInterp.c |  83 -
 src/backend/nodes/nodeFuncs.c |  13 +
 src/backend/parser/gram.y |  22 +-
 src/backend/parser/parse_expr.c   |  81 +
 src/backend/parser/parse_target.c |   7 +
 src/backend/utils/adt/ruleutils.c |   4 +
 src/backend/utils/adt/xml.c   |  29 ++
 src/include/nodes/parsenodes.h|   8 +
 src/include/nodes/primnodes.h |   3 +
 src/include/parser/kwlist.h   |   1 +
 src/include/utils/xml.h   |   1 +
 src/test/regress/expected/xml.out | 429 ++
 src/test/regress/expected/xml_1.out   | 346 +
 src/test/regress/expected/xml_2.out   | 429 ++
 src/test/regress/sql/xml.sql  | 210 +
 src/tools/pgindent/typedefs.list  |   1 +
 18 files changed, 1739 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index e0d33f12e1..28c93460a5 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4472,14 +4472,84 @@ XMLPARSE ( { DOCUMENT | CONTENT } value)
 XMLPARSE (DOCUMENT 'Manual...')
 XMLPARSE (CONTENT 'abcbarfoo')
 ]]>
-While this is the only way to convert character strings into XML
-values according to the SQL standard, the PostgreSQL-specific
-syntaxes:
+
+Another option to convert character strings into xml is the function xmlcast,
+which is designed to cast SQL data types into xml, and vice versa.
+
+XMLCAST ( expression AS type [ BY REF | BY VALUE ] )
+
+Similar to the SQL function CAST, this function converts an expression
+into the specified type. This can be useful for creating XML
+documents using SQL or when parsing the contents of XML documents. The function xmlcast works with the
+following criteria:
+
+ 
+  
+
+  Either expression or type must be of type xml.
+
+  
+  
+
+  It supports casting between xml and character, numeric, date/time, and boolean data types.
+
+  
+  
+
+  Similar to the function xmltext, expressions containing XML predifined entities
+  will be escaped (see examples below).
+
+  
+  
+
+  The expressions of type date, time [with time zone], timestamp [with time zone],
+  and interval will be converted to their XSD equivalents, xs:date, xs:time,
+  xs:dateTime, and xs:duration, respectively.
+
+  
+   
+
+  The BY REF and BY VALUE clauses
+  are accepted but ignored, as discussed in
+  .
+
+  
+
+
+ Examples:
+
+
+Alternatively, it is also possible to convert character strings into XML using PostgreSQL-specific cast syntaxes:
 
-can also be used.
+

 

diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index c002f37202..a4d8f7a2ac 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -624,7 +624,7 @@ X014	Attributes of XML type			YES
 X015	Fields of XML type			NO	
 X016	Persistent XML values			YES	
 X020	XMLConcat			YES	
-X025	XMLCast			NO	
+X025	XMLCast			YES	
 X030	XMLDocument			NO	
 X031	XMLElement			YES	
 X032	XMLForest			YES	
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index ea47c4d6f9..1eb2231aa1 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -69,6 +69,7 @@

Re: Remaining dependency on setlocale()

2024-08-15 Thread Thomas Munro

On Fri, Aug 16, 2024 at 1:25 AM Peter Eisentraut  wrote:
> On 15.08.24 00:43, Thomas Munro wrote:
> > "" is a problem however... the special value for "native environment"
> > is returned as a real locale name, which we probably still need in
> > places.  We could change that to newlocale("") + query instead, but
>
> Where do we need that in the server?

Hmm.  Yeah, right, the only way I've found so far to even reach that
code and that captures that result is:

create database db2 locale = '';

Thats puts 'en_NZ.UTF-8' or whatever in pg_database.  In contrast,
create collation will accept '' but just store it verbatim, and the
GUCs for changing time, monetary, numeric accept it too and keep it
verbatim.  We could simply ban '' in all user commands.  I doubt
they're documented as acceptable values, once you get past initdb and
have a running system.  Looking into that...

> It should just be initdb doing that and then initializing the server
> with concrete values based on that.

Right.

> I guess technically some of these GUC settings default to the
> environment?  But I think we could consider getting rid of that.

Yeah.

Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2024-08-15 Thread Peter Geoghegan

On Thu, Aug 15, 2024 at 4:34 PM Matthias van de Meent
 wrote:
> > Attached patch has EXPLAIN ANALYZE display the total number of
> > primitive index scans for all 3 kinds of index scan node. This is
> > useful for index scans that happen to use SAOP arrays. It also seems
> > almost essential to offer this kind of instrumentation for the skip
> > scan patch [1]. Skip scan works by reusing all of the Postgres 17 work
> > (see commit 5bf748b8) to skip over irrelevant sections of a composite
> > index with a low cardinality leading column, so it has all the same
> > issues.
>
> Did you notice the patch over at [0], where additional diagnostic
> EXPLAIN output for btrees is being discussed, too?

To be clear, for those that haven't been paying attention to that
other thread: that other EXPLAIN patch (the one authored by Masahiro
Ikeda) surfaces information about a distinction that the skip scan
patch renders obsolete. That is, the skip scan patch makes all "Non
Key Filter" quals into quals that can relocate the scan to a later
leaf page by starting a new primitive index scan. Technically, skip
scan removes the concept that that patch calls "Non Key Filter"
altogether.

Note that this isn't the same thing as making that other patch
obsolete. Skip scan renders the whole concept of "Non Key Filter"
obsolete *in name only*. You might prefer to think of it as making
that whole concept squishy. Just because we can theoretically use the
leading column to skip doesn't mean we actually will. It isn't an
either/or thing. We might skip during some parts of a scan, but not
during other parts.

It's just not clear how to handle those sorts of fuzzy distinctions
right now. It does seem worth pursuing, but I see no conflict.

> I'm asking, because
> I'm not very convinced that 'primitive scans' are a useful metric
> across all (or even: most) index AMs (e.g. BRIN probably never will
> have a 'primitive scans' metric that differs from the loop count), so
> maybe this would better be implemented in that framework?

What do you mean by "within that framework"? They seem orthogonal?

It's true that BRIN index scans will probably never show more than a
single primitive index scan. I don't think that the same is true of
any other index AM, though. Don't they all support SAOPs, albeit
non-natively?

The important question is: what do you want to do about cases like the
BRIN case? Our choices are all fairly obvious choices. We can be
selective, and *not* show this information when a set of heuristics
indicate that it's not relevant. This is fairly straightforward to
implement. Which do you prefer: overall consistency, or less
verbosity?

Personally I think that the consistency argument works in favor of
displaying this information for every kind of index scan. That's a
hopelessly subjective position, though.

-- 
Peter Geoghegan

Re: Partial aggregates pushdown

2024-08-15 Thread Bruce Momjian

On Thu, Aug  8, 2024 at 01:48:49PM +0200, Jelte Fennema-Nio wrote:
> SUMMARY OF THREAD
> 
> The design of patch 0001 is agreed upon by everyone on the thread (so
> far). This adds the PARTIAL_AGGREGATE label for aggregates, which will
> cause the finalfunc not to run. It also starts using PARTIAL_AGGREGATE
> for pushdown of aggregates in postgres_fdw. In 0001 PARTIAL_AGGREGATE
> is only supported for aggregates with a non-internal/pseudo type as
> the stype.
> 
> The design for patch 0002 is still under debate. This would expand on
> the functionality added by adding support for PARTIAL_AGGREGATE for
> aggregates with an internal stype. This is done by returning a byte
> array containing the bytes that the serialfunc of the aggregate
> returns.
> 
> A competing proposal for 0002 is to instead change aggregates to not
> use an internal stype anymore, and create dedicated types. The main
> downside here is that infunc and outfunc would need to be added for
> text serialization, in addition to the binary serialization. An open
> question is: Can we change the requirements for CREATE TYPE, so that
> types can be created without infunc and outfunc.
> 
> WHAT IS NEEDED?
> 
> The things needed for this patch are that docs need to be added, and
> detailed codereview needs to be done.
> 
> Feedback from more people on the two competing proposals for 0002
> would be very helpful in making a decision.

First, I am sorry to be replying so late --- I have been traveling for
the past four weeks.  Second, I consider this feature a big part of
sharding, and I think sharding is Postgres's biggest missing feature. I
talk about this patch often when asked about what Postgres is working
on.

Third, I would like to show a more specific example to clarify what is
being considered above.  If we look at MAX(), we can have FDWs return
the max for each FDW, and the coordinator can chose the highest value. 
This is the patch 1 listed above.  These can return the
pg_aggregate.aggtranstype data type using the pg_type.typoutput text
output.

The second case is for something like AVG(), which must return the SUM()
and COUNT(), and we currently have no way to return multiple text values
on the wire.  For patch 0002, we have the option of creating functions
that can do this and record them in new pg_attribute columns, or we can
create a data type with these functions, and assign the data type to
pg_aggregate.aggtranstype.

Is that accurate?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Restart pg_usleep when interrupted

2024-08-15 Thread Nathan Bossart

On Wed, Aug 14, 2024 at 06:00:06AM +, Bertrand Drouvot wrote:
> I gave it more thoughts and I don't think we have to choose between the two.
> The 1 Hz approach reduces the number of interrupts and Sami's patch provides a
> way to get "accurate" delay in case of interrupts. I think both have their own
> benefit. 

Is it really that important to delay with that level of accuracy?  In most
cases, the chances of actually interrupting a given vacuum delay point are
pretty small.  Even in the extreme scenario you tested with ~350K
interrupts in a 19 minute vacuum, you only saw a 10-15% difference in total
time.  I wouldn't say I'm diametrically opposed to this patch, but I do
think we need to carefully consider whether it's worth the extra code.

Separately, I've been wondering whether it's worth allowing the sleep to be
interrupted in certain cases, such as SIGINT and SIGTERM.  That should
address one of Heikki's points.

-- 
nathan

Re: Remove dependence on integer wrapping

2024-08-15 Thread Nathan Bossart

I've committed 0001.  Now to 0002...

-   if (-element > nelements)
+   if (element == PG_INT32_MIN || -element > nelements)

This seems like a good opportunity to use our new pg_abs_s32() function,
and godbolt.org [0] seems to indicate that it might produce better code,
too (at least to my eye).  I've attached an updated version of the patch
with this change.  Barring additional feedback, I plan to commit this one
shortly.

[0] https://godbolt.org/z/57P4vvGYf

-- 
nathan
>From 62677b351cf18dee37dc0d9253bd694fe6fbf26a Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sat, 6 Jul 2024 15:41:09 -0400
Subject: [PATCH v23 1/1] Remove dependence on integer wrapping for jsonb

This commit updates various jsonb operators and functions to no longer
rely on integer wrapping for correctness. Not all compilers support
-fwrapv, so it's best not to rely on it.
---
 src/backend/utils/adt/jsonfuncs.c   |  7 ---
 src/test/regress/expected/jsonb.out | 12 
 src/test/regress/sql/jsonb.sql  |  2 ++
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/jsonfuncs.c 
b/src/backend/utils/adt/jsonfuncs.c
index 5ecb9fffae..1f8ea51e6a 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -18,6 +18,7 @@
 
 #include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "common/int.h"
 #include "common/jsonapi.h"
 #include "common/string.h"
 #include "fmgr.h"
@@ -946,7 +947,7 @@ jsonb_array_element(PG_FUNCTION_ARGS)
{
uint32  nelements = JB_ROOT_COUNT(jb);
 
-   if (-element > nelements)
+   if (pg_abs_s32(element) > nelements)
PG_RETURN_NULL();
else
element += nelements;
@@ -5426,7 +5427,7 @@ setPathArray(JsonbIterator **it, Datum *path_elems, bool 
*path_nulls,
 
if (idx < 0)
{
-   if (-idx > nelems)
+   if (pg_abs_s32(idx) > nelems)
{
/*
 * If asked to keep elements position consistent, it's 
not allowed
@@ -5438,7 +5439,7 @@ setPathArray(JsonbIterator **it, Datum *path_elems, bool 
*path_nulls,
 errmsg("path element at 
position %d is out of range: %d",
level + 1, 
idx)));
else
-   idx = INT_MIN;
+   idx = PG_INT32_MIN;
}
else
idx = nelems + idx;
diff --git a/src/test/regress/expected/jsonb.out 
b/src/test/regress/expected/jsonb.out
index e66d760189..a9d93052fc 100644
--- a/src/test/regress/expected/jsonb.out
+++ b/src/test/regress/expected/jsonb.out
@@ -680,6 +680,18 @@ select '"foo"'::jsonb -> 'z';
  
 (1 row)
 
+select '[]'::jsonb -> -2147483648;
+ ?column? 
+--
+ 
+(1 row)
+
+select jsonb_delete_path('{"a":[]}', '{"a",-2147483648}');
+ jsonb_delete_path 
+---
+ {"a": []}
+(1 row)
+
 select '{"a": [{"b": "c"}, {"b": "cc"}]}'::jsonb ->> null::text;
  ?column? 
 --
diff --git a/src/test/regress/sql/jsonb.sql b/src/test/regress/sql/jsonb.sql
index 97bc2242a1..6a18577ead 100644
--- a/src/test/regress/sql/jsonb.sql
+++ b/src/test/regress/sql/jsonb.sql
@@ -204,6 +204,8 @@ select '[{"b": "c"}, {"b": "cc"}]'::jsonb -> 'z';
 select '{"a": "c", "b": null}'::jsonb -> 'b';
 select '"foo"'::jsonb -> 1;
 select '"foo"'::jsonb -> 'z';
+select '[]'::jsonb -> -2147483648;
+select jsonb_delete_path('{"a":[]}', '{"a",-2147483648}');
 
 select '{"a": [{"b": "c"}, {"b": "cc"}]}'::jsonb ->> null::text;
 select '{"a": [{"b": "c"}, {"b": "cc"}]}'::jsonb ->> null::int;
-- 
2.39.3 (Apple Git-146)

Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2024-08-15 Thread Peter Geoghegan

On Thu, Aug 15, 2024 at 4:58 PM Alena Rybakina
 wrote:
> I think that it is enough to pass the IndexScanDesc parameter to the function 
> - this saves us from having to define the planstate type twice.
>
> For this reason, I suggest some changes that I think may improve your patch.

Perhaps it's a little better that way. I'll consider it.

> To be honest, I don't quite understand how information in explain analyze 
> about the number of used primitive indexes
> will help me improve my database system as a user. Perhaps I'm missing 
> something.

There is probably no typical case. The patch shows implementation
details, which need to be interpreted in the context of a particular
problem.

Maybe the problem is that some of the heuristics added by one of my
nbtree patches interact relatively badly with some real world query.
It would be presumptuous of me to say that that will never happen.

> Maybe it can tell me which columns are best to create an index on or 
> something like that?

That's definitely going to be important in the case of skip scan.
Simply showing the user that the index scan skips at all will make
them aware that there are missing index columns. That could be a sign
that they'd be better off not using skip scan at all, by creating a
new index that suits the particular query (by not having the extra
skipped column).

It's almost always possible to beat skip scan by creating a new index
-- whether or not it's worth the trouble/expense of maintaining a
whole new index is the important question. Is this particular query
the most important query *to the business*, for whatever reason? Or is
having merely adequate performance acceptable?

Your OR-to-SAOP-rewrite patch effectively makes two or more bitmap
index scans into one single continuous index scan. Or...does it? The
true number of (primitive) index scans might be "the same" as it was
before (without your patch), or there might really only be one
(primitive) index scan with your patch. Or it might be anywhere in
between those two extremes. Users will benefit from knowing where on
this continuum a particular index scan falls. It's just useful to know
where time is spent.

Knowing this information might even allow the user to create a new
multicolumn index, with columns in an order better suited to an
affected query. It's not so much the cost of descending the index
multiple times that we need to worry about here, even though that's
what we're talking about counting here. Varying index column order
could make an index scan faster by increasing locality. Locality is
usually very important. Few index scans is a good proxy for greater
locality.

It's easiest to understand what I mean about locality with an example.
An index on (a, b) is good for queries with quals such as "where a =
42 and b in (1,2,3,4,5,6,7,8,9)" if it allows such a query to only
access one or two leaf pages, where all of the "b" values of interest
live side by side. Obviously that won't be true if it's the other way
around -- if the typical qual looks more like "where b = 7 and a in
(1,2,3,4,5,6,7,8,9)".  This is the difference between 1 primitive
index scan and 9 primitive index scans -- quite a big difference. Note
that the main cost we need to worry about here *isn't* the cost of
descending the index. It's mostly the cost of reading the leaf pages.

--
Peter Geoghegan

Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2024-08-15 Thread Matthias van de Meent

On Thu, 15 Aug 2024 at 23:10, Peter Geoghegan  wrote:
>
> On Thu, Aug 15, 2024 at 4:34 PM Matthias van de Meent
>  wrote:
> > > Attached patch has EXPLAIN ANALYZE display the total number of
> > > primitive index scans for all 3 kinds of index scan node. This is
> > > useful for index scans that happen to use SAOP arrays. It also seems
> > > almost essential to offer this kind of instrumentation for the skip
> > > scan patch [1]. Skip scan works by reusing all of the Postgres 17 work
> > > (see commit 5bf748b8) to skip over irrelevant sections of a composite
> > > index with a low cardinality leading column, so it has all the same
> > > issues.
> >
> > Did you notice the patch over at [0], where additional diagnostic
> > EXPLAIN output for btrees is being discussed, too?
>
> To be clear, for those that haven't been paying attention to that
> other thread: that other EXPLAIN patch (the one authored by Masahiro
> Ikeda) surfaces information about a distinction that the skip scan
> patch renders obsolete. That is, the skip scan patch makes all "Non
> Key Filter" quals into quals that can relocate the scan to a later
> leaf page by starting a new primitive index scan. Technically, skip
> scan removes the concept that that patch calls "Non Key Filter"
> altogether.
>
> Note that this isn't the same thing as making that other patch
> obsolete. Skip scan renders the whole concept of "Non Key Filter"
> obsolete *in name only*. You might prefer to think of it as making
> that whole concept squishy. Just because we can theoretically use the
> leading column to skip doesn't mean we actually will. It isn't an
> either/or thing. We might skip during some parts of a scan, but not
> during other parts.

Yes.

> It's just not clear how to handle those sorts of fuzzy distinctions
> right now. It does seem worth pursuing, but I see no conflict.
>
> > I'm asking, because
> > I'm not very convinced that 'primitive scans' are a useful metric
> > across all (or even: most) index AMs (e.g. BRIN probably never will
> > have a 'primitive scans' metric that differs from the loop count), so
> > maybe this would better be implemented in that framework?
>
> What do you mean by "within that framework"? They seem orthogonal?

What I meant was putting this 'primitive scans' info into the
AM-specific explain callback as seen in the latest patch version.

> It's true that BRIN index scans will probably never show more than a
> single primitive index scan. I don't think that the same is true of
> any other index AM, though. Don't they all support SAOPs, albeit
> non-natively?

Not always. For Bitmap Index Scan the node's functions can allow
non-native SAOP support (it ORs the bitmaps), but normal indexes
without SAOP support won't get SAOP-functionality from the IS/IOS
node's infrastructure, it'll need to be added as Filter.

> The important question is: what do you want to do about cases like the
> BRIN case? Our choices are all fairly obvious choices. We can be
> selective, and *not* show this information when a set of heuristics
> indicate that it's not relevant. This is fairly straightforward to
> implement. Which do you prefer: overall consistency, or less
> verbosity?

Consistency, I suppose. But adding explain attributes left and right
in Index Scan's explain output when and where every index type needs
them doesn't scale, so I'd put index-specific output into it's own
system (see the linked thread for more rationale). And, in this case,
the use case seems quite index-specific, at least for IS/IOS nodes.

> Personally I think that the consistency argument works in favor of
> displaying this information for every kind of index scan.

Agreed, assuming "this information" is indeed shared (and useful)
across all AMs.

This made me notice that you add a new metric that should generally be
exactly the same as pg_stat_all_indexes.idx_scan (you mention the
same). Can't you pull that data, instead of inventing a new place
every AMs needs to touch for it's metrics?


Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-08-15 Thread Heikki Linnakangas

On 14/08/2024 21:04, Jelte Fennema-Nio wrote:

On Wed, 7 Aug 2024 at 22:10, Robert Haas  wrote:

I respect that, but I don't want to get flamed for doing something
that might be controversial without anybody else endorsing it. I'll
commit this if it gets some support, but not otherwise. I'm willing to
commit a patch adding a new function even if nobody else votes, but
not this.

Makes sense. I'm not in too much of a hurry with this specific one. So
I'll leave it like this for now and hopefully someone else responds.
If this becomes close to being the final unmerged patch of this
patchset, I'll probably cut my losses and create a patch that adds a
function instead.

I think Jelte's proposal on PQprotocolVersion() is OK. As he pointed 
out, the function is pretty useless as it is, so I doubt anyone is doing 
anything interesting with it. Perhaps we should even change it to return 
30 for protocol version 3.0, and just leave a note in the docs like 
"in older versions of libpq, this returned 3 for protocol version 3.0".

On Wed, 7 Aug 2024 at 22:10, Robert Haas  wrote:

> > Patch 7: Bump the protocol version to 3.2 (see commit message for why
> > not bumping to 3.1)
>
> Good commit message. The point is arguable, so putting forth your best
> argument is important.

Just to clarify: do you agree with the point now, after that argument?

Well, here again, I would like to know what other people think. It
doesn't intrinsically matter to me that much what we do here, but it
matters to me a lot that extensive recriminations don't ensue
afterwards.

Makes sense to me. It's sad that pgbouncer had such a bug, but it makes 
sense to accommodate it. We're not going to run out of integers. This 
deserves some more commentary in the docs I think. If I understand the 
plan correctly, if the client requests version 3.1, the server accepts 
it, but behaves exactly the same as 3.0. Or should 3.1 be forbidden 
altogether?

On the default for "max_protocol_version": I'm pretty disappointed if we 
cannot change the default to "latest". I realize that that won't work 
with poolers that don't implement NegotiateProtocolVersion. But I'm 
afraid if we make the new protocol version opt-in, no one will use it, 
and the poolers etc still won't bother to implement 
NegotiateProtocolVersion for years to come, if ever. We can revisit this 
decision later in the release cycle though. But I'd suggest changing the 
default to "latest" for now, so that more hackers working with 
connection poolers will notice, and we get more testing of the new 
protocol and the negotiation.

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Make query cancellation keys longer

2024-08-15 Thread Heikki Linnakangas


On 15/08/2024 23:20, Robert Haas wrote:

On Thu, Aug 15, 2024 at 1:13 PM Heikki Linnakangas  wrote:

Added a "protocol_version" libpq option for that. It defaults to "auto",
but you can set it to "3.1" or "3.0" to force the version. It makes it
easier to test that the backwards-compatibility works, too.


Over on the "Add new protocol message to change GUCs for usage with
future protocol-only GUCs" there is a lot of relevant discussion about
how bumping the protocol version should work. This thread shouldn't
ignore all that discussion. Just to take one example, Jelte wants to
bump the protocol version to 3.2, not 3.1, for some reasons that are
in the commit message for the relevant patch over there.


Ok, I've read through that thread now, and opined there too. One 
difference is with libpq option name: My patch adds "protocol_version", 
while Jelte proposes "max_protocol_version". I don't have strong 
opinions on that. I hope the ecosystem catches up to support 
NegotiateProtocolVersion quickly, so that only few people will need to 
set this option. In particular, I hope that there will never be need to 
use "max_protocol_version=3.2", because by the time we introduce version 
3.3, all the connection poolers that support 3.2 will also implement 
NegotiateProtocolVersion.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2024-08-15 Thread Peter Geoghegan

On Thu, Aug 15, 2024 at 5:47 PM Matthias van de Meent
 wrote:
> > > I'm asking, because
> > > I'm not very convinced that 'primitive scans' are a useful metric
> > > across all (or even: most) index AMs (e.g. BRIN probably never will
> > > have a 'primitive scans' metric that differs from the loop count), so
> > > maybe this would better be implemented in that framework?
> >
> > What do you mean by "within that framework"? They seem orthogonal?
>
> What I meant was putting this 'primitive scans' info into the
> AM-specific explain callback as seen in the latest patch version.

I don't see how that could work. This is fundamentally information
that is only known when the query has fully finished execution.

Again, this is already something that we track at the whole-table
level, within pg_stat_user_tables.idx_scan. It's already considered
index AM agnostic information, in that sense.

> > It's true that BRIN index scans will probably never show more than a
> > single primitive index scan. I don't think that the same is true of
> > any other index AM, though. Don't they all support SAOPs, albeit
> > non-natively?
>
> Not always. For Bitmap Index Scan the node's functions can allow
> non-native SAOP support (it ORs the bitmaps), but normal indexes
> without SAOP support won't get SAOP-functionality from the IS/IOS
> node's infrastructure, it'll need to be added as Filter.

Again, what do you want me to do about it? Almost anything is possible
in principle, and can be implemented without great difficulty. But you
have to clearly say what you want, and why you want it.

Yeah, non-native SAOP index scans are always bitmap scans. In the case
of GIN, there are only lossy/bitmap index scans, anyway -- can't see
that ever changing. In the case of GiST, we could in the future add
native SAOP support, so do we really want to be inconsistent in what
we show now? (Tom said something about that recently, in fact.)

I don't hate the idea of selectively not showing this information (for
BRIN, say). Just as I don't hate the idea of totally omitting
"loops=1" in the common case where we couldn't possibly be more than
one loop in practice. It's just that I don't think that it's worth it,
on balance. Not all redundancy is bad.

> > The important question is: what do you want to do about cases like the
> > BRIN case? Our choices are all fairly obvious choices. We can be
> > selective, and *not* show this information when a set of heuristics
> > indicate that it's not relevant. This is fairly straightforward to
> > implement. Which do you prefer: overall consistency, or less
> > verbosity?
>
> Consistency, I suppose. But adding explain attributes left and right
> in Index Scan's explain output when and where every index type needs
> them doesn't scale, so I'd put index-specific output into it's own
> system (see the linked thread for more rationale).

I can't argue with that. I just don't think it's directly relevant.

> And, in this case,
> the use case seems quite index-specific, at least for IS/IOS nodes.

I disagree. It's an existing concept, exposed in system views, and now
in EXPLAIN ANALYZE. It's precisely that -- nothing more, nothing less.

The fact that it tends to be much more useful in the case of nbtree
(at least for now) makes this no less true.

> This made me notice that you add a new metric that should generally be
> exactly the same as pg_stat_all_indexes.idx_scan (you mention the
> same).

I didn't imagine that that part was subtle.

> Can't you pull that data, instead of inventing a new place
> every AMs needs to touch for it's metrics?

No. At least not in a way that's scoped to a particular index scan.

-- 
Peter Geoghegan

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-08-15 Thread Jacob Champion

On Thu, Aug 15, 2024 at 3:04 PM Heikki Linnakangas  wrote:
> Perhaps we should even change it to return
> 30 for protocol version 3.0, and just leave a note in the docs like
> "in older versions of libpq, this returned 3 for protocol version 3.0".

I think that would absolutely break current code. It's not uncommon
(IME) for hand-built clients wrapping libpq to make sure they're not
talking v2 before turning on some feature, and they're allowed to do
that with a PQprotocolVersion() == 3 check. A GitHub code search
brings up examples.

As for 30001: I don't see the value in modifying an exported API in
this way. Especially since we can add a new entry point that will be
guaranteed not to break anyone, as Robert suggested. I think it's a
POLA violation at minimum; my understanding was that up until this
point, the value was incremented during major (incompatible) version
bumps. And I think other users will have had the same understanding.

--Jacob

Re: libpq: Fix lots of discrepancies in PQtrace

2024-08-15 Thread Alvaro Herrera

Hello,

On 2024-Aug-14, Jelte Fennema-Nio wrote:

> The following removed comments seems useful to keep (I realize I
> already removed them in a previous version of the patch, but I don't
> think I did that on purpose)
> [...]

Ah, yeah, I agree.  I put them back, and pushed 0005, 6 and 7 as a
single commit.  It didn't seem worth pushing each separately, really.  I
added two lines for the CopyData message as well, since otherwise the
output shows the "mismatched length" error when getting COPY data.

I'm leaving 0008 to whoever is doing the NegotiateProtocolVersion stuff;
maybe post that one in that thread you mentioned.  I'll mark this CF
entry committed.

Many thanks!

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: PATCH: Add hooks for pg_total_relation_size and pg_indexes_size

2024-08-15 Thread Abdoulaye Ba

On Fri, 9 Aug 2024 at 18:10, Andreas Karlsson  wrote:

> On 8/8/24 2:18 PM, Abdoulaye Ba wrote:
> > I am submitting a patch to add hooks for the functions
> > pg_total_relation_size and pg_indexes_size. These hooks allow for custom
> > behaviour to be injected into these functions, which can be useful for
> > extensions and other custom PostgreSQL modifications.
>
> What kind of extensions do you imagine would use this hook? If it is for
> custom index AMs I think adding this to the index AM interface would
> make more sense than just adding a generic callback hook.
>
> Andreas
>


> The primary use case for this hook is to allow extensions to account for
> additional storage mechanisms that are not covered by the default
> PostgreSQL relation size calculations. For instance, in our project, we are
> working with an external indexing system (Tantivy) that maintains
> additional data structures outside the standard PostgreSQL storage. This
> hook allows us to include the size of these additional structures in the
> total relation size calculations.
>
> While I understand your suggestion about custom index AMs, the intent
> behind this hook is broader. It is not limited to custom index types but
> can also be used for other forms of external storage that need to be
> accounted for in relation size calculations. This is why a generic callback
> hook was chosen over extending the index AM interface.
>
> However, if there is a consensus that such a hook would be better suited
> within the index AM interface for cases involving custom index storage, I'm
> open to discussing this further and exploring how it could be integrated
> more tightly with the existing PostgreSQL AM framework.
>

Re: Statistics Import and Export

2024-08-15 Thread Jeff Davis

On Thu, 2024-08-15 at 01:57 -0700, Jeff Davis wrote:
> On Sun, 2024-08-04 at 01:09 -0400, Corey Huinker wrote:
> > 
> > > I believe 0001 and 0002 are in good shape API-wise, and I can
> > > start
> > > getting those committed. I will try to clean up the code in the
> > > process.
> 
> Attached v26j.

I did a lot of refactoring, and it's starting to take the shape I had
in mind. Some of it is surely just style preference, but I think it
reads more nicely and I caught a couple bugs along the way. The
function attribute_statsitics_update() is significantly shorter. (Thank
you for a good set of tests, by the way, which sped up the refactoring
process.)

Attached v27j.

Questions:

  * Remind me why the new stats completely replace the new row, rather
than updating only the statistic kinds that are specified?
  * I'm not sure what the type_is_scalar() function was doing before,
but I just removed it. If it can't find the element type, then it skips
over the kinds that require it.
  * I introduced some hard errors. These happen when it can't find the
table, or the attribute, or doesn't have permissions. I don't see any
reason to demote those to a WARNING. Even for the restore case,
analagous errors happen for COPY, etc.
  * I'm still sorting through some of the type info derivations. I
think we need better explanations about why it's doing exactly the
things it's doing, e.g. for tsvector and multiranges.

Regards,
Jeff Davis

From 93bb8e2dfeab17d5291273e6343f61e40e501633 Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Wed, 24 Jul 2024 23:45:26 -0400
Subject: [PATCH v27j 1/2] Create function pg_set_relation_stats.

This function is used to tweak statistics on any relation that the user
owns.

The first parameter, relation, is used to identify the the relation to be
modified.

The remaining parameters correspond to the statistics attributes in
pg_class: relpages, reltuples, and relallisvible.

This function allows the user to tweak pg_class statistics values,
allowing the user to inflate rowcounts, table size, and visibility to see
what effect those changes will have on the the query planner.

The function has no return value.

Author: Corey Huinker
Discussion: https://postgr.es/m/CADkLM=e0jM7m0GFV44nNtvL22CtnFUu+pekppCVE=dobe58...@mail.gmail.com
---
 doc/src/sgml/func.sgml |  65 +++
 src/backend/statistics/Makefile|   1 +
 src/backend/statistics/import_stats.c  | 207 +
 src/backend/statistics/meson.build |   1 +
 src/include/catalog/pg_proc.dat|   9 +
 src/test/regress/expected/stats_import.out |  99 ++
 src/test/regress/parallel_schedule |   2 +-
 src/test/regress/sql/stats_import.sql  |  79 
 8 files changed, 462 insertions(+), 1 deletion(-)
 create mode 100644 src/backend/statistics/import_stats.c
 create mode 100644 src/test/regress/expected/stats_import.out
 create mode 100644 src/test/regress/sql/stats_import.sql

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5dd95d73a1a..02d93f6c925 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29938,6 +29938,71 @@ DETAIL:  Make sure pg_wal_replay_wait() isn't called within a transaction with a
 in identifying the specific disk files associated with database objects.

+   
+Database Object Statistics Import Functions
+
+ 
+  
+   
+Function
+   
+   
+Description
+   
+  
+ 
+
+ 
+  
+   
+
+ 
+  pg_set_relation_stats
+ 
+ pg_set_relation_stats ( relation regclass, relpages integer, reltuples real, relallvisible integer )
+ void
+
+
+ Updates table-level statistics for the given relation to the
+ specified values. The parameters correspond to columns in pg_class.
+
+
+ Ordinarily, these statistics are collected automatically or updated
+ as a part of  or , so it's not necessary to call this
+ function. However, it may be useful when testing the effects of
+ statistics on the planner to understand or anticipate plan changes.
+
+
+ The caller must have the MAINTAIN privilege on
+ the table or be the owner of the database.
+
+
+ The value of relpages must be greater than
+ or equal to 0,
+ reltuples must be greater than or equal to
+ -1.0, and relallvisible
+ must be greater than or equal to 0.
+
+
+ 
+  Changes made by this function are likely to be overwritten by autovacuum (or manual
+  VACUUM or ANALYZE) and should
+  be considered temporary.
+ 
+ 
+  The signature of this function may change in new major releases if
+  there are changes to the statistics being tracked.
+ 
+
+   
+  
+ 
+
+   
+

 Databas

Re: define PG_REPLSLOT_DIR

2024-08-15 Thread Alvaro Herrera

On 2024-Aug-14, Bertrand Drouvot wrote:

> Out of curiosity, checking the sizes of affected files (O2, no debug):
> 
> with patch:
> 
>textdata bss dec hex filename
>   30304   0   8   303127668 
> src/backend/replication/logical/reorderbuffer.o

> without patch:
>   30286   0   8   302947656 
> src/backend/replication/logical/reorderbuffer.o

Hmm, I suppose this delta can be explained because because the literal
string is used inside larger snprintf() format strings or similar, so
things like

snprintf(path, sizeof(path),
-"pg_replslot/%s/%s", slotname,
+"%s/%s/%s", PG_REPLSLOT_DIR, slotname,
 spill_de->d_name);

and

ereport(ERROR,
(errcode_for_file_access(),
-errmsg("could not remove file \"%s\" during removal of 
pg_replslot/%s/xid*: %m",
-   path, slotname)));
+errmsg("could not remove file \"%s\" during removal of 
%s/%s/xid*: %m",
+   PG_REPLSLOT_DIR, path, slotname)));

I don't disagree with making this change, but changing some of the other
directory names you suggest, such as

> pg_notify
> pg_serial
> pg_subtrans
> pg_multixact
> pg_wal

would probably make no difference, since there are no literal strings
that contain that as a substring(*).  It might some sense to handle
pg_tblspc similarly.  As for pg_logical, I think you'll want separate
defines for pg_logical/mappings, pg_logical/snapshots and so on.


(*) I hope you're not going to suggest this kind of change (word-diff):

if (GetOldestSnapshot())
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("[-pg_wal-]{+%s+}_replay_wait() must be only called 
without an active or registered snapshot"{+, PG_WAL_DIR+}),
 errdetail("Make sure [-pg_wal-]{+%s+}_replay_wait() isn't 
called within a transaction with an isolation level higher than READ COMMITTED, 
another procedure, or a function."{+, PG_WAL_DIR+})));

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Los trabajadores menos efectivos son sistematicamente llevados al lugar
donde pueden hacer el menor daño posible: gerencia."  (El principio Dilbert)

Re: Thread-safe nl_langinfo() and localeconv()

2024-08-15 Thread Thomas Munro

On Thu, Aug 15, 2024 at 11:11 PM Heikki Linnakangas  wrote:
> There's a similar function in initdb, check_locale_name. I wonder if
> that could reuse the same code.

Thanks, between this comment and some observations from Peter E and
Tom, I think I have a better plan now.  I think they should *not*
match, and a comment saying so should be deleted.  In the backend, we
should do neither ""-expansion (ie getenv("LC_...") whether direct or
indirect) nor canonicalisation (of Windows' deprecated pre-BCP 47
locale names), making this v4 patch extremely simple.

1.  CREATE DATABASE doesn't really need to accept LOCALE = ''.  What
is the point?  It's not documented or desirable behavior AFAIK.  If
you like defaults you can just not provide a locale at all and get the
template database's (which often comes from initdb, which often uses
the server environment).  That behavior was already inconsistent with
CREATE COLLATION.  So I think we should just reject "" in the backend
check_locale().

2.  A similar argument applies to Windows canonicalisation.  CREATE
COLLATION isn't doing it.  CREATE DATABASE is, but again, what is the
point?  See previous.

(I also think that initdb should use a different mechanism for finding
the native locale on Windows, but I already have a CF #3772 for that,
ie migration plan for BCP 47 and native UTF-8 on Windows, but I don't
want *this* thread to get blocked by our absence of Windows
reviewers/testers, so let's not tangle that up with this thread-safety
expedition.)

To show a concrete example of commands no longer accepted with this
version, because they call check_locale():

postgres=# set lc_monetary = '';
ERROR:  invalid value for parameter "lc_monetary": ""

postgres=# create database db2 locale = '';
ERROR:  invalid LC_COLLATE locale name: ""
HINT:  If the locale name is specific to ICU, use ICU_LOCALE.

Does anyone see a problem with that?

I do see a complication for CREATE COLLATION, though.  It doesn't call
check_locale(), is not changed in this patch, and still accepts ''.
Reasoning: There may be systems with '' in their pg_collation catalog
in the wild, since we never canonicalised with setlocale(), so it
might create some kind of unforeseen dump/restore/upgrade hazard if we
just ban '' outright, I just don't know what yet.

There is no immediate problem, ie there is no setlocale() to excise,
for *this* project.  Longer term, we can't actually continue to allow
'' in COLLATION objects, though: that tells newlocale() to call
getenv(), which may be technically OK in a multi-threaded program
(that never calls setenv()), but is hardly desirable.  But it will
also give the wrong result, if we pursue the plan that Jeff and I
discussed: we'll stop doing setenv("LC_COLLATE", datacollate) and
setenv("LC_CTYPE", datctype) in postinit.c (see pg_perm_setlocale()
calls).  So I think any pg_collation catalog entries holding '' need
to be translated to datcollate/datctype... somewhere.  I just don't
know where yet and don't want to tackle that in the same patch.
From 4831ff4373b9d713c78e303d9758de347aadfc2f Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 13 Aug 2024 14:15:54 +1200
Subject: [PATCH v4 1/3] Provide thread-safe pg_localeconv_r().

This involves four different implementation strategies:

1.  For Windows, we now require _configthreadlocale() to be available
and work, and the documentation says that the object returned by
localeconv() is in thread-local memory.

2.  For glibc, we translate to nl_langinfo_l() calls, because it offers
the same information that way as an extension, and that API is
thread-safe.

3.  For macOS/*BSD, use localeconv_l(), which is thread-safe.

4.  For everything else, use uselocale() to set the locale for the
thread, and use a big ugly lock to defend against the returned object
being concurrently clobbered.  In practice this currently means only
Solaris.

The new call is used in pg_locale.c, replacing calls to setlocale() and
localeconv().

This patch adds a hard requirement on Windows' _configthreadlocale().
In the past there were said to be MinGW systems that didn't have it, or
had it but it didn't work.  As far as I can tell, CI (optional MinGW
task + mingw cross build warning task) and build farm (fairywren msys)
should be happy with this.  Fingers crossed.  (There are places that use
configure checks for that in ECPG; other proposed patches would remove
those later.)

Reviewed-by: Heikki Linnakangas 
Discussion: https://postgr.es/m/CA%2BhUKGJqVe0%2BPv9dvC9dSums_PXxGo9SWcxYAMBguWJUGbWz-A%40mail.gmail.com
---
 configure |   2 +-
 configure.ac  |   1 +
 meson.build   |   1 +
 src/backend/utils/adt/pg_locale.c | 128 ++-
 src/include/pg_config.h.in|   3 +
 src/include/port.h|   6 +
 src/port/Makefile |   1 +
 src/port/meson.build  |   1 +
 src/port/pg_localeconv_r.c| 367 ++
 9 files chang

Re: Statistics Import and Export

2024-08-15 Thread Corey Huinker

>
>
> function attribute_statsitics_update() is significantly shorter. (Thank
> you for a good set of tests, by the way, which sped up the refactoring
> process.)
>

yw


>   * Remind me why the new stats completely replace the new row, rather
> than updating only the statistic kinds that are specified?
>

because:
- complexity
- we would then need a mechanism to then tell it to *delete* a stakind
- we'd have to figure out how to reorder the remaining stakinds, or spend
effort finding a matching stakind in the existing row to know to replace it
- "do what analyze does" was an initial goal and as a result many test
cases directly compared pg_statistic rows from an original table to an
empty clone table to see if the "copy" had fidelity.


>   * I'm not sure what the type_is_scalar() function was doing before,
> but I just removed it. If it can't find the element type, then it skips
> over the kinds that require it.
>

that may be sufficient,


>   * I introduced some hard errors. These happen when it can't find the
> table, or the attribute, or doesn't have permissions. I don't see any
> reason to demote those to a WARNING. Even for the restore case,
> analagous errors happen for COPY, etc.
>

I can accept that reasoning.


>   * I'm still sorting through some of the type info derivations. I
> think we need better explanations about why it's doing exactly the
> things it's doing, e.g. for tsvector and multiranges.


I don't have the specifics of each, but any such cases were derived from
similar behaviors in the custom typanalyze functions, and the lack of a
custom typanalyze function for a given type was taken as evidence that the
type was adequately handled by the default rules. I can see that this is an
argument for having a second stats-specific custom typanalyze function for
datatypes that need them, but I wasn't ready to go that far myself.

Re: Remaining dependency on setlocale()

2024-08-15 Thread Thomas Munro

On Fri, Aug 16, 2024 at 9:09 AM Thomas Munro  wrote:
> On Fri, Aug 16, 2024 at 1:25 AM Peter Eisentraut  wrote:
> > It should just be initdb doing that and then initializing the server
> > with concrete values based on that.
>
> Right.
>
> > I guess technically some of these GUC settings default to the
> > environment?  But I think we could consider getting rid of that.
>
> Yeah.

Seems to make a lot of sense.  I tried that out over in CF #5170.

(In case it's not clear why I'm splitting discussion between threads:
I was thinking of this thread as the one where we're discussing what
needs to be done, with other threads being spun off to become CF entry
with concrete patches.  I realised re-reading some discussion that
that might not be obvious...)

Re: SQL:2011 application time

2024-08-15 Thread jian he

On Thu, Aug 8, 2024 at 4:54 AM Paul Jungwirth
 wrote:
>
> Rebased to e56ccc8e42.

I only applied to 0001-0003.
in create_table.sgml, I saw the WITHOUT OVERLAPS change is mainly in
table_constraint.
but we didn't touch alter_table.sgml.
Do we also need to change alter_table.sgml correspondingly?


+ if (constraint->without_overlaps)
+ {
+ /*
+ * This enforces that there is at least one equality column
+ * besides the WITHOUT OVERLAPS columns.  This is per SQL
+ * standard.  XXX Do we need this?
+ */
+ if (list_length(constraint->keys) < 2)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("constraint using WITHOUT OVERLAPS needs at least two columns"));
+
+ /* WITHOUT OVERLAPS requires a GiST index */
+ index->accessMethod = "gist";
+ }
if Constraint->conname is not NULL, we can
+ errmsg("constraint \"%s\" using WITHOUT OVERLAPS needs at least two
columns"));

"XXX Do we need this?"
I think currently we need this, otherwise the following create_table
synopsis will not be correct.
UNIQUE [ NULLS [ NOT ] DISTINCT ] ( column_name [, ... ] [,
column_name WITHOUT OVERLAPS ] )
PRIMARY KEY ( column_name [, ... ] [, column_name WITHOUT OVERLAPS ] )


we add a column in catalog-pg-constraint.
do we need change column conexclop,
"If an exclusion constraint, list of the per-column exclusion operators "
but currently, primary key, unique constraint both have valid conexclop.


+static void
+ExecWithoutOverlapsNotEmpty(Relation rel, NameData attname, Datum
attval, char typtype, Oid atttypid)
+{
+ bool isempty;
+ RangeType *r;
+ MultirangeType *mr;
+
+ switch (typtype)
+ {
+ case TYPTYPE_RANGE:
+ r = DatumGetRangeTypeP(attval);
+ isempty = RangeIsEmpty(r);
+ break;
+ case TYPTYPE_MULTIRANGE:
+ mr = DatumGetMultirangeTypeP(attval);
+ isempty = MultirangeIsEmpty(mr);
+ break;
+ default:
+ elog(ERROR, "WITHOUT OVERLAPS column \"%s\" is not a range or multirange",
+ NameStr(attname));
+ }
+
+ /* Report a CHECK_VIOLATION */
+ if (isempty)
+ ereport(ERROR,
+ (errcode(ERRCODE_CHECK_VIOLATION),
+ errmsg("empty WITHOUT OVERLAPS value found in column \"%s\" in
relation \"%s\"",
+ NameStr(attname), RelationGetRelationName(rel;
+}
I think in the default branch, you need at least set the isempty
value, otherwise maybe there will be a compiler warning
because later your use isempty, but via default branch is value undefined?


+ /*
+ * If this is a WITHOUT OVERLAPS constraint,
+ * we must also forbid empty ranges/multiranges.
+ * This must happen before we look for NULLs below,
+ * or a UNIQUE constraint could insert an empty
+ * range along with a NULL scalar part.
+ */
+ if (indexInfo->ii_WithoutOverlaps)
+ {
+ ExecWithoutOverlapsNotEmpty(heap, att->attname,
+ }
previously we found out that if this happens later, then it won't work.
but this comment didn't explain why this must have happened earlier.
I didn't dig deep enough to find out why.
but explaining it would be very helpful.


I think some tests are duplicated, so I did the refactoring.


v39-0001-refactor-tests.no-cfbot
Description: Binary data

Re: Remove dependence on integer wrapping

2024-08-15 Thread Joseph Koshakow

On Thu, Aug 15, 2024 at 5:34 PM Nathan Bossart 
wrote:

> Now to 0002...
>
> -   if (-element > nelements)
> +   if (element == PG_INT32_MIN || -element > nelements)
>
> This seems like a good opportunity to use our new pg_abs_s32() function,
> and godbolt.org [0] seems to indicate that it might produce better code,
> too (at least to my eye).

This updated version LGTM, I agree it's a good use of pg_abs_s32().

Thanks,
Joseph Koshakow

Re: CREATE SUBSCRIPTION - add missing test case

2024-08-15 Thread vignesh C

On Thu, 15 Aug 2024 at 12:55, Peter Smith  wrote:
>
> Hi Hackers,
>
> While reviewing another logical replication thread [1], I found an
> ERROR scenario that seems to be untested.
>
> TEST CASE: Attempt CREATE SUBSCRIPTION where the subscriber table is
> missing some expected column(s).
>
> Attached is a patch to add the missing test for this error message.

I agree currently there is no test to hit this code. I'm not sure if
this is the correct location for the test, should it be included in
the 008_diff_schema.pl file? Additionally, the commenting style for
this test appears quite different from the others. Could we use a
commenting style consistent with the earlier tests?

Regards,
Vignesh

Re: Conflict detection and logging in logical replication

2024-08-15 Thread Amit Kapila

On Wed, Aug 14, 2024 at 7:45 PM Michail Nikolaev
 wrote:
>
> > This is as expected, and we have documented this in the code comments. We 
> > don't
> > need to report a conflict if the conflicting tuple has been removed or 
> > updated
> > due to concurrent transaction. The same is true if the transaction that
> > inserted the conflicting tuple is rolled back before 
> > CheckAndReportConflict().
> > We don't consider such cases as a conflict.
>
> That seems a little bit strange to me.
>
> From the perspective of a user, I expect that if a change from publisher is 
> not applied - I need to know about it from the logs.
>

In the above conditions where a concurrent tuple insertion is removed
or rolled back before CheckAndReportConflict, the tuple inserted by
apply will remain. There is no need to report anything in such cases
as apply was successful.

> But in that case, I will not see any information about conflict in the logs 
> in SOME cases. But in OTHER cases I will see it.
> However, in both cases the change from publisher was not applied.
> And these cases are just random and depend on the timing of race conditions. 
> It is not something I am expecting from the database.
>
> Maybe it is better to report about the fact that event from publisher was not 
> applied because of conflict and then try to
> provide additional information about the conflict itself?
>
> Or possibly in case we were unable to apply the event and not able to find 
> the conflict, we should retry the event processing?
>

Per my understanding, we will apply or the conflict will be logged and
retried where required (unique key violation).

-- 
With Regards,
Amit Kapila.

Re: define PG_REPLSLOT_DIR

2024-08-15 Thread Yugo Nagata

On Wed, 14 Aug 2024 11:32:14 +
Bertrand Drouvot  wrote:

> Hi hackers,
> 
> while working on a replication slot tool (idea is to put it in contrib, not
> shared yet), I realized that "pg_replslot" is being used > 25 times in
> .c files.
> 
> I think it would make sense to replace those occurrences with a $SUBJECT, 
> attached
> a patch doing so.
> 
> There is 2 places where it is not done:
> 
> src/bin/initdb/initdb.c
> src/bin/pg_rewind/filemap.c
> 
> for consistency with the existing PG_STAT_TMP_DIR define.
> 
> Out of curiosity, checking the sizes of affected files (O2, no debug):
> 
> with patch:
> 
>textdata bss dec hex filename
>   20315 224  17   20556504c src/backend/backup/basebackup.o
>   30304   0   8   303127668 
> src/backend/replication/logical/reorderbuffer.o
>   23812  36  40   238885d50 src/backend/replication/slot.o
>6367   0   0636718df src/backend/utils/adt/genfile.o
>   4099725742528   46099b413 src/bin/initdb/initdb.o
>6963 224   871951c1b src/bin/pg_rewind/filemap.o
> 
> without patch:
> 
>textdata bss dec hex filename
>   20315 224  17   20556504c src/backend/backup/basebackup.o
>   30286   0   8   302947656 
> src/backend/replication/logical/reorderbuffer.o
>   23766  36  40   238425d22 src/backend/replication/slot.o
>6363   0   0636318db src/backend/utils/adt/genfile.o
>   4099725742528   46099b413 src/bin/initdb/initdb.o
>6963 224   871951c1b src/bin/pg_rewind/filemap.o
> 
> Also, I think we could do the same for:
> 
> pg_notify
> pg_serial
> pg_subtrans
> pg_wal
> pg_multixact
> pg_tblspc
> pg_logical
> 
> And I volunteer to do so if you think that makes sense.
> 
> Looking forward to your feedback,

/* restore all slots by iterating over all on-disk entries */
-   replication_dir = AllocateDir("pg_replslot");
-   while ((replication_de = ReadDir(replication_dir, "pg_replslot")) != NULL)
+   replication_dir = AllocateDir(PG_REPLSLOT_DIR);
+   while ((replication_de = ReadDir(replication_dir, PG_REPLSLOT_DIR)) != NULL)
{
charpath[MAXPGPATH + 12];
PGFileType  de_type;

I think the size of path can be rewritten to "MAXPGPATH + 
sizeof(PG_REPLSLOT_DIR)" 
and it seems easier to understand the reason why this size is used. 
(That said, I wonder the path would never longer than MAXPGPATH...)

Regards,
Yugo Nagata

> 
> Regards,
> 
> -- 
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com


-- 
Yugo Nagata

Re: libpq minor TOCTOU violation

2024-08-15 Thread Peter Eisentraut


On 14.08.24 03:12, Andreas Karlsson wrote:

On 8/10/24 9:10 AM, Peter Eisentraut wrote:

Thoughts?


I like it. Not because of the security issue but mainly because it is 
more correct to do it this way. Plus the old code running stat() on 
Windows also made little sense.


I think this simple fix can be committed.


committed

Re: Logical Replication of sequences

2024-08-15 Thread Peter Smith

Hi Vignesh. I looked at the latest v20240815* patch set.

I have only the following few comments for patch v20240815-0004, below.

==
Commit message.

Please see the attachment for some suggested updates.

==
src/backend/commands/subscriptioncmds.c

CreateSubscription:
nit - fix wording in one of the XXX comments

==
.../replication/logical/sequencesync.c

report_mismatched_sequences:
nit - param name /warning_sequences/mismatched_seqs/

append_mismatched_sequences:
nit - param name /warning_sequences/mismatched_seqs/

LogicalRepSyncSequences:
nit - var name /warning_sequences/mismatched_seqs/

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 534d385..e799a41 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -800,9 +800,9 @@ CreateSubscription(ParseState *pstate, 
CreateSubscriptionStmt *stmt,
 * export it.
 *
 * XXX: If the subscription is for a sequence-only 
publication,
-* creating this slot. It can be created later during 
the ALTER
-* SUBSCRIPTION ... REFRESH command, if the publication 
is updated
-* to include tables or tables in schema.
+* creating this slot is unnecessary. It can be created 
later
+* during the ALTER SUBSCRIPTION ... REFRESH command, 
if the
+* publication is updated to include tables or tables 
in schema.
 */
if (opts.create_slot)
{
diff --git a/src/backend/replication/logical/sequencesync.c 
b/src/backend/replication/logical/sequencesync.c
index 1aa5ab8..934646f 100644
--- a/src/backend/replication/logical/sequencesync.c
+++ b/src/backend/replication/logical/sequencesync.c
@@ -264,17 +264,17 @@ copy_sequence(WalReceiverConn *conn, Relation rel,
  * Report any sequence mismatches as a single warning log.
  */
 static void
-report_mismatched_sequences(StringInfo warning_sequences)
+report_mismatched_sequences(StringInfo mismatched_seqs)
 {
-   if (warning_sequences->len)
+   if (mismatched_seqs->len)
{
ereport(WARNING,

errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("parameters differ for the remote and 
local sequences (%s) for subscription \"%s\"",
-  warning_sequences->data, 
MySubscription->name),
+  mismatched_seqs->data, 
MySubscription->name),
errhint("Alter/Re-create local sequences to 
have the same parameters as the remote sequences."));
 
-   resetStringInfo(warning_sequences);
+   resetStringInfo(mismatched_seqs);
}
 }
 
@@ -282,15 +282,15 @@ report_mismatched_sequences(StringInfo warning_sequences)
  * append_mismatched_sequences
  *
  * Appends details of sequences that have discrepancies between the publisher
- * and subscriber to the warning_sequences string.
+ * and subscriber to the mismatched_seqs string.
  */
 static void
-append_mismatched_sequences(StringInfo warning_sequences, Relation seqrel)
+append_mismatched_sequences(StringInfo mismatched_seqs, Relation seqrel)
 {
-   if (warning_sequences->len)
-   appendStringInfoString(warning_sequences, ", ");
+   if (mismatched_seqs->len)
+   appendStringInfoString(mismatched_seqs, ", ");
 
-   appendStringInfo(warning_sequences, "\"%s.%s\"",
+   appendStringInfo(mismatched_seqs, "\"%s.%s\"",
 
get_namespace_name(RelationGetNamespace(seqrel)),
 RelationGetRelationName(seqrel));
 }
@@ -314,7 +314,7 @@ LogicalRepSyncSequences(void)
boolstart_txn = true;
Oid subid = MyLogicalRepWorker->subid;
MemoryContext oldctx;
-   StringInfo  warning_sequences = makeStringInfo();
+   StringInfo  mismatched_seqs = makeStringInfo();
 
 /*
  * Synchronizing each sequence individually incurs overhead from starting
@@ -425,15 +425,15 @@ LogicalRepSyncSequences(void)
PG_CATCH();
{
if (sequence_mismatch)
-   append_mismatched_sequences(warning_sequences, 
sequence_rel);
+   append_mismatched_sequences(mismatched_seqs, 
sequence_rel);
 
-   report_mismatched_sequences(warning_sequences);
+   report_mismatched_sequences(mismatched_seqs);
PG_RE_THROW();
}
PG_END_TRY();
 
if (sequence_mismatch)
-   append_mismatched_se

Re: [Patch] remove duplicated smgrclose

2024-08-15 Thread Steven Niu

Junwang Zhao  于2024年8月15日周四 18:03写道：

> On Wed, Aug 14, 2024 at 2:35 PM Steven Niu  wrote:
> >
> > Junwang, Kirill,
> >
> > The split work has been done. I created a new patch for removing
> redundant smgrclose() function as attached.
> > Please help review it.
>
> Patch looks good, actually you can make the refactoring code as v3-0002-xxx
> by using:
>
> git format-patch -2 -v 3
>
> Another kind reminder: Do not top-post when you reply ;)
>
> >
> > Thanks,
> > Steven
> >
> > Steven Niu  于2024年8月12日周一 18:11写道：
> >>
> >> Kirill,
> >>
> >> Good catch!
> >> I will split the patch into two to cover both cases.
> >>
> >> Thanks,
> >> Steven
> >>
> >>
> >> Junwang Zhao  于2024年8月9日周五 18:19写道：
> >>>
> >>> On Fri, Aug 9, 2024 at 5:20 PM Kirill Reshke 
> wrote:
> >>> >
> >>> > On Thu, 1 Aug 2024 at 17:32, Junwang Zhao  wrote:
> >>> > >
> >>> > > Hi Steven,
> >>> > >
> >>> > > On Wed, Jul 31, 2024 at 11:16 AM Steven Niu 
> wrote:
> >>> > > >
> >>> > > > Hello, hackers,
> >>> > > >
> >>> > > > I think there may be some duplicated codes.
> >>> > > > Function smgrDoPendingDeletes() calls both smgrdounlinkall() and
> smgrclose().
> >>> > > > But both functions would close SMgrRelation object, it's
> dupliacted behavior?
> >>> > > >
> >>> > > > So I make this patch. Could someone take a look at it?
> >>> > > >
> >>> > > > Thanks for your help,
> >>> > > > Steven
> >>> > > >
> >>> > > > From Highgo.com
> >>> > > >
> >>> > > >
> >>> > > You change LGTM, but the patch seems not to be applied to HEAD,
> >>> > > I generate the attached v2 using `git format` with some commit
> message.
> >>> > >
> >>> > > --
> >>> > > Regards
> >>> > > Junwang Zhao
> >>> >
> >>> > Hi all!
> >>> > This change looks good to me. However, i have an objection to these
> >>> > lines from v2:
> >>> >
> >>> > >  /* Close the forks at smgr level */
> >>> > > - for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
> >>> > > - smgrsw[which].smgr_close(rels[i], forknum);
> >>> > > + smgrclose(rels[i]);
> >>> >
> >>> > Why do we do this? This seems to be an unrelated change given thread
> >>> > $subj. This is just a pure refactoring job, which deserves a separate
> >>> > patch. There is similar coding in
> >>> > smgrdestroy function:
> >>> >
> >>> > ```
> >>> > for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
> >>> > smgrsw[reln->smgr_which].smgr_close(reln, forknum);
> >>> > ```
> >>> >
> >>> > So, I feel like these two places should be either changed together or
> >>> > not be altered at all. And is it definitely a separate change.
> >>>
> >>> Yeah, I tend to agree with you, maybe we should split the patch
> >>> into two.
> >>>
> >>> Steven, could you do this?
> >>>
> >>> >
> >>> > --
> >>> > Best regards,
> >>> > Kirill Reshke
> >>>
> >>>
> >>>
> >>> --
> >>> Regards
> >>> Junwang Zhao
>
>
>
> --
> Regards
> Junwang Zhao
>


OK， thanks for your kind help.

Steven

Re: Conflict detection and logging in logical replication

2024-08-15 Thread shveta malik

On Thu, Aug 15, 2024 at 12:47 PM Zhijie Hou (Fujitsu)
 wrote:
>
> Thanks. I have checked and merged the changes. Here is the V15 patch
> which addressed above comments.

Thanks for the patch. Please find few comments and queries:

1)
For various conflicts , we have these in Logs:
Replica identity (val1)=(30).(for RI on 1 column)
Replica identity (pk, val1)=(200, 20).  (for RI on  2 columns)
Replica identity (40, 40, 11).(for RI full)

Shall we have have column list in last case as well, or can simply
have *full* keyword i.e. Replica identity full (40, 40, 11)

2)
For toast column, we dump null in remote-tuple. I know that the toast
column is not sent in  new-tuple from the publisher and thus the
behaviour, but it could be misleading for users. Perhaps document
this?

See 'null' in all these examples in  remote tuple:

update_differ With PK:
LOG:  conflict detected on relation "public.t1": conflict=update_differ
DETAIL:  Updating the row that was modified locally in transaction 831
at 2024-08-16 09:59:26.566012+05:30.
Existing local tuple (30, 30, 30,
...);
remote tuple (30, 30, 300, null); replica identity (pk, val1)=(30,
30).

update_missing With PK:
LOG:  conflict detected on relation "public.t1": conflict=update_missing
DETAIL:  Could not find the row to be updated.
Remote tuple (10, 10, 100, null); replica identity (pk, val1)=(10, 10).

update_missing with RI full:
LOG:  conflict detected on relation "public.t1": conflict=update_missing
DETAIL:  Could not find the row to be updated.
Remote tuple (20, 20, 2000, null); replica identity (20, 20, 10,
...).

3)
For update_exists(), we dump:
Key (a, b)=(2, 1)

For delete_missing, update_missing, update_differ, we dump:
Replica identity (a, b)=(2, 1).

For update_exists as well, shouldn't we dump 'Replica identity'? Only
for insert case, it should be referred as 'Key'.

4)
Why delete_missing is not having remote_tuple. Is it because we dump
new tuple as 'remote tuple', which is not relevant for delete_missing?
2024-08-16 09:13:33.174 IST [419839] LOG:  conflict detected on
relation "public.t1": conflict=delete_missing
2024-08-16 09:13:33.174 IST [419839] DETAIL:  Could not find the row
to be deleted.
Replica identity (val1)=(30).

thanks
Shveta

Re: Conflict detection and logging in logical replication

2024-08-15 Thread shveta malik

On Fri, Aug 16, 2024 at 10:46 AM shveta malik  wrote:
>
> 3)
> For update_exists(), we dump:
> Key (a, b)=(2, 1)
>
> For delete_missing, update_missing, update_differ, we dump:
> Replica identity (a, b)=(2, 1).
>
> For update_exists as well, shouldn't we dump 'Replica identity'? Only
> for insert case, it should be referred as 'Key'.
>

On rethinking, is it because for update_exists case 'Key' dumped is
not the one used to search the row to be updated? Instead it is the
one used to search the conflicting row. Unlike update_differ, the row
to be updated and the row currently conflicting will be different for
update_exists case. I earlier thought that 'KEY' and 'Existing local
tuple' dumped always belong to the row currently being
updated/deleted/inserted. But for 'update_eixsts', that is not the
case. We are dumping 'Existing local tuple' and 'Key' for the row
which is conflicting and not the one being updated.  Example:

ERROR:  conflict detected on relation "public.tab_1": conflict=update_exists
Key (a, b)=(2, 1); existing local tuple (2, 1); remote tuple (2, 1).

Operations performed were:
Pub: insert into tab values (1,1);
Sub: insert into tab values (2,1);
Pub: update tab set a=2 where a=1;

Here Key and local tuple are both 2,1 instead of 1,1. While replica
identity value (used to search original row) will be 1,1 only.

It may be slightly confusing or say tricky to understand when compared
to other conflicts' LOGs. But not sure what better we can do here.

One more comment:

5)
For insert/update_exists, the sequence is:
Key .. ; existing local tuple .. ; remote tuple ...

For rest of the conflicts, sequence is:
 Existing local tuple .. ; remote tuple .. ; replica identity ..

Is it intentional? Shall the 'Key' or 'Replica Identity' be the first
one to come in all conflicts?

thanks
Shveta

Re: Conflict detection and logging in logical replication

2024-08-15 Thread Amit Kapila

On Fri, Aug 16, 2024 at 10:46 AM shveta malik  wrote:
>
> On Thu, Aug 15, 2024 at 12:47 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > Thanks. I have checked and merged the changes. Here is the V15 patch
> > which addressed above comments.
>
> Thanks for the patch. Please find few comments and queries:
>
> 1)
> For various conflicts , we have these in Logs:
> Replica identity (val1)=(30).(for RI on 1 column)
> Replica identity (pk, val1)=(200, 20).  (for RI on  2 columns)
> Replica identity (40, 40, 11).(for RI full)
>
> Shall we have have column list in last case as well, or can simply
> have *full* keyword i.e. Replica identity full (40, 40, 11)
>

I would prefer 'full' instead of the entire column list as the
complete column list could be long and may not much sense.

>
> 2)
> For toast column, we dump null in remote-tuple. I know that the toast
> column is not sent in  new-tuple from the publisher and thus the
> behaviour, but it could be misleading for users. Perhaps document
> this?
>

Agreed that we should document this. I suggest that we can have a doc
patch that explains the conflict logging format and in that, we can
mention this behavior as well.

> 3)
> For update_exists(), we dump:
> Key (a, b)=(2, 1)
>
> For delete_missing, update_missing, update_differ, we dump:
> Replica identity (a, b)=(2, 1).
>
> For update_exists as well, shouldn't we dump 'Replica identity'? Only
> for insert case, it should be referred as 'Key'.
>

I think update_exists is quite similar to insert_exists and both
happen due to unique key violation. So, it seems okay to display the
Key for update_exists.

>
> 4)
> Why delete_missing is not having remote_tuple. Is it because we dump
> new tuple as 'remote tuple', which is not relevant for delete_missing?
>

Right.

-- 
With Regards,
Amit Kapila.

Re: Conflict detection and logging in logical replication

2024-08-15 Thread Amit Kapila

On Fri, Aug 16, 2024 at 11:48 AM shveta malik  wrote:
>
> On Fri, Aug 16, 2024 at 10:46 AM shveta malik  wrote:
> >
> > 3)
> > For update_exists(), we dump:
> > Key (a, b)=(2, 1)
> >
> > For delete_missing, update_missing, update_differ, we dump:
> > Replica identity (a, b)=(2, 1).
> >
> > For update_exists as well, shouldn't we dump 'Replica identity'? Only
> > for insert case, it should be referred as 'Key'.
> >
>
> On rethinking, is it because for update_exists case 'Key' dumped is
> not the one used to search the row to be updated? Instead it is the
> one used to search the conflicting row. Unlike update_differ, the row
> to be updated and the row currently conflicting will be different for
> update_exists case. I earlier thought that 'KEY' and 'Existing local
> tuple' dumped always belong to the row currently being
> updated/deleted/inserted. But for 'update_eixsts', that is not the
> case. We are dumping 'Existing local tuple' and 'Key' for the row
> which is conflicting and not the one being updated.  Example:
>
> ERROR:  conflict detected on relation "public.tab_1": conflict=update_exists
> Key (a, b)=(2, 1); existing local tuple (2, 1); remote tuple (2, 1).
>
> Operations performed were:
> Pub: insert into tab values (1,1);
> Sub: insert into tab values (2,1);
> Pub: update tab set a=2 where a=1;
>
> Here Key and local tuple are both 2,1 instead of 1,1. While replica
> identity value (used to search original row) will be 1,1 only.
>
> It may be slightly confusing or say tricky to understand when compared
> to other conflicts' LOGs. But not sure what better we can do here.
>

The update_exists behaves more like insert_exists as we detect that
only while inserting into index. It is also not clear to me if we can
do better than to clarify this in docs.

> 
>
> One more comment:
>
> 5)
> For insert/update_exists, the sequence is:
> Key .. ; existing local tuple .. ; remote tuple ...
>
> For rest of the conflicts, sequence is:
>  Existing local tuple .. ; remote tuple .. ; replica identity ..
>
> Is it intentional? Shall the 'Key' or 'Replica Identity' be the first
> one to come in all conflicts?
>

This is worth considering but Replica Identity signifies the old tuple
values, that is why it is probably kept at the end. But let's see what
Hou-San or others think about this.

-- 
With Regards,
Amit Kapila.

85 matches

Mail list logo