Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-03-11 Thread Peter Smith
On Tue, Mar 11, 2025 at 11:32 AM David G. Johnston
 wrote:
>
> On Mon, Mar 10, 2025 at 5:00 PM Peter Smith  wrote:
>>
>> Hi Shubham.
>>
>> Some review comments for patch v16-0001.
>>
>> ==
>> doc/src/sgml/ref/pg_createsubscriber.sgml
>>
>> 1.
>> + -c
>> + --drop-all-publications
>>
>> Is 'c' the best switch choice letter for this option? It doesn't seem
>> intuitive, but unfortunately, I don't have any better ideas since d/D
>> and p/P are already being used.
>
>
> Agreed.  Better to just not assign a short name to this.
>
> The description for the sgml docs needs to frame up this option's purpose.
>
> How exactly does one go about backing up a publication?  You discuss the 
> topic in the commit message but that definitely seems user-facing.
>
> If we aren't expecting lots of publications maybe name them individually 
> instead of logging "all publications"?  Possibly one info line each but even 
> just comma separated would work.
>
> The name of this is shock-inducing.  Admittedly, after pondering things, it 
> is fairly obvious that only the target is going to be affected, but there is 
> a source database involved here as well.  It is also unclear on whether it 
> would happen before or after, which is less problematic since it would only 
> impact failure modes anyway - when all is said and done with this specified 
> upon restart following the pg_resetwal the server will have no publications, 
> right?
>
> Maybe: --drop-target-publications-first ?
>
> If we do want a letter either "X" or "Z" probably works for an 
> English-speaking audience probably.  X is how one denotes removing something; 
> and Z is a mnemonic for "Zap" which is a synonym for "Drop". "R" for "Remove".
>
> Can you briefly recap how this is different than the automatic behavior 
> described in the existing Step 6?
> "Drop publications on the target server that were replicated because they 
> were created before the replication start location. It has no use on the 
> subscriber."
>
> -R --remove-target-publications; I fine with answering the timing question to 
> the description.
>

Hi David.

This feature is all about removing some objects that were replicated
during the streaming replication before pg_createsubscriber was run,
but that are no longer needed/wanted afterwards.

Although the current thread/patch is just for removing existing
publications from the target server, in the future there might be some
enhancements to remove other kinds of unwanted objects that were
replicated to the target server by streaming replication -- e.g.
remove subscriptions, or remove replication slots, or whatever.

Unfortunately, we are spinning in circles a bit trying to come up with
a good way to represent the option needed for this, while at the same
time trying to be future-proof. I see 3 choices...

==

Choice 1.  Generic option

Implement a single boolean option to remove everything.

In this case, the option would need to have some generic name like
"--remove-target-existing-object". For now (current patch), it would
be implemented to remove only existing publications. In the future, it
could remove more things. But, there isn't much flexibility: IIUC this
option can only remove either all or none.

~

Choice 2. Different options for removing different things (the current
patch is like this)

"--remove-target-publications" is what this patch is doing.

In future, if we want to remove more things then we would need to add
more options like "--remove-target-subscriptions",
"--remove-target-replication-slots" etc.

~

Choice 3. Implement some option that has an argument saying what to delete

Implement an option that takes some argument saying what objects to remove.

Here, the current patch would be something like:
"--remove-target-objects=publications". In future, this option could
be enhanced to accept other values like
"--remove-target-objects=publications,subscriptions" etc.

~~~

Do you have any thoughts on what kind of option is best here?

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Send multiple statements to pg server at once

2025-03-11 Thread Andy Fan



Hi,

I want to simulate a case where server receive "SELECT 1;SELECT 2;" at
once, so when executing exec_simple_query, the query_string is
"SELECT 1;SELECT 2;"  However I found it is not easy to simulate, does
anyone has a suggestion?

I tried 'psql -f', checking 'psql/settings.h', at last I see below code
in psql/mainloop.c, I guess it is impossible in psql.


/*
 * Send command if semicolon found, or if end of line and we're in
 * single-line mode.
 */
if (scan_result == PSCAN_SEMICOLON ||
(scan_result == PSCAN_EOL && pset.singleline))
{


-- 
Best Regards
Andy Fan





Re: Allow CI to only run the compiler warnings task

2025-03-11 Thread Bertrand Drouvot
Hi,

On Thu, Sep 12, 2024 at 05:21:43AM +, Bertrand Drouvot wrote:
> Hi,
> 
> On Wed, Sep 11, 2024 at 04:39:57PM +0300, Nazir Bilal Yavuz wrote:
> > Hi,
> > 
> > On Wed, 11 Sept 2024 at 15:36, Bertrand Drouvot
> >  wrote:
> > >
> > > Hi hackers,
> > >
> > > While working on a new pg_logicalinspect module ([1]), I reached a point 
> > > where
> > > all the CI tests were green except the compiler warnings one. Then, to 
> > > save time
> > > addressing the issue, I modified the .cirrus.tasks.yml file to $SUBJECT.
> > >
> > > I think this could be useful for others too, so please find attached this 
> > > tiny
> > > patch.
> > 
> > +1 for this. I encountered the same issue before.
> 
> Thanks for the feedback!
> 
> > 
> > > Note that the patch does not add an extra "ci-task-only", but for 
> > > simplicity it
> > > it renames ci-os-only to ci-task-only.
> > 
> > I think this change makes sense. I gave a quick try to your patch with
> > ci-task-only: ["", "linux", "compilerwarnings"] and it worked as
> > expected.
> 
> And for the testing.

Mandatory rebase attached.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 504181905666fdada111c51ff2b682a86f88da5e Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Wed, 11 Sep 2024 11:01:41 +
Subject: [PATCH v2] Allow CI to only run the compiler warnings task

That could be useful to only run the CI compiler warnings task when addressing
compiler warnings issues.

Renaming ci-os-only to ci-task-only and adding filtering for the "compilerwarnings"
task.
---
 .cirrus.tasks.yml   | 20 ++--
 src/tools/ci/README |  7 ---
 2 files changed, 14 insertions(+), 13 deletions(-)
  19.6% src/tools/ci/

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 5849cbb839a..5aac99e2666 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -72,7 +72,7 @@ task:
   # push-wait-for-ci cycle time a bit when debugging operating system specific
   # failures. Uses skip instead of only_if, as cirrus otherwise warns about
   # only_if conditions not matching.
-  skip: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:.*'
+  skip: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-task-only:.*'
 
   env:
 CPUS: 4
@@ -167,7 +167,7 @@ task:
   <<: *freebsd_task_template
 
   depends_on: SanityCheck
-  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*freebsd.*'
+  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-task-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-task-only:[^\n]*freebsd.*'
 
   sysinfo_script: |
 id
@@ -257,7 +257,7 @@ task:
 
   matrix:
 - name: NetBSD - Meson
-  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*netbsd.*'
+  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-task-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-task-only:[^\n]*netbsd.*'
   env:
 OS_NAME: netbsd
 IMAGE_FAMILY: pg-ci-netbsd-postgres
@@ -274,7 +274,7 @@ task:
   <<: *netbsd_task_template
 
 - name: OpenBSD - Meson
-  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*openbsd.*'
+  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-task-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-task-only:[^\n]*openbsd.*'
   env:
 OS_NAME: openbsd
 IMAGE_FAMILY: pg-ci-openbsd-postgres
@@ -414,7 +414,7 @@ task:
   <<: *linux_task_template
 
   depends_on: SanityCheck
-  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*linux.*'
+  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-task-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-task-only:[^\n]*linux.*'
 
   ccache_cache:
 folder: ${CCACHE_DIR}
@@ -598,7 +598,7 @@ task:
   <<: *macos_task_template
 
   depends_on: SanityCheck
-  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(macos|darwin|osx).*'
+  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-task-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-task-only:[^\n]*(macos|darwin|osx).*'
 
   sysinfo_script: |
 id
@@ -704,7 +704,7 @@ task:
   <<: *windows_task_template
 
   depends_on: SanityCheck
-  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*windows.*'
+  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-task-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-task-only:[^\n]*windows.*'
 
   setup_additional_packages_script: |
 REM choco install -y --no-progress ...
@@ -742,8 +742,8 @@ task:
   # due to resource constraints we don't run this task by default for now
   trigger_type: manual
   # worth using only_if despite being manual, otherwise this task will show up
-  # when e.g. ci-os-only: linux is used.
-  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*mingw.*'
+  # whe

Re: bogus error message for ALTER TABLE ALTER CONSTRAINT

2025-03-11 Thread jian he
On Tue, Mar 11, 2025 at 1:58 AM Álvaro Herrera  wrote:
>
> Hello,
>
> I fleshed this out more fully and I think 0001 is good enough to commit.
>
> I then noticed that constraints on domains are giving bogus error
> messages as well, and the situation is easily improved -- see 0002.  I'm
> not so sure about this one, mainly because test coverage appears scant.
> I need to look into this one a bit more.
>

hi.

this look a little strange?
if (cas_bits & (CAS_NOT_DEFERRABLE) && seen)
seen->seen_deferrability = true;

it should be
if ((cas_bits & CAS_NOT_DEFERRABLE) && seen)
seen->seen_deferrability = true;
?


typedef struct CAS_flags need add to typedefs.list


seems didn't cover "initially immediate" case for domain.
for example:
create domain d_int as int4;
--- the following two cases should fail.
alter domain d_int add constraint nn1 not null initially immediate;
alter domain d_int add constraint cc check(value > 1) initially immediate;

we can add the following into processCASbits to make it error out
if ((cas_bits & CAS_INITIALLY_IMMEDIATE) && seen)
seen->seen_deferrability = true;



create domain d_int as int4;
alter domain d_int add not null no inherit not valid;
ERROR:  not-null constraints on domains cannot be marked NOT VALID
LINE 1: alter domain d_int add not null no inherit not valid;
^
If we can report an error like
"ERROR:  NOT NULL constraints on domains cannot be marked INHERIT / NOT INHERIT"
That would be great.
just report the first constraint property that is not ok, but seems not doable.




Re: Tidy recent code bloat in pg_creatersubscriber::cleanup_objects_atexit

2025-03-11 Thread Michael Paquier
On Tue, Mar 11, 2025 at 12:29:42PM +1100, Peter Smith wrote:
> During a recent review of pg_creatersubscriber I saw that commit
> e117cfb introduced a common 'dbinfos' struct to contain the array of
> individual 'dbinfo[i]' infos. But, this now means that getting to each
> dbinfo[i] requires another level of referencing.
> 
> In some places, e.g. function cleanup_objects_atexit, this caused
> unnecessary code bloat. IMO this function is crying out for a local
> variable to simplify the code again.

Right.  This improves the clarity of the code, so agreed about the use
of a local variable here.
--
Michael


signature.asc
Description: PGP signature


Re: Send multiple statements to pg server at once

2025-03-11 Thread Pavel Stehule
Hi

út 11. 3. 2025 v 8:23 odesílatel Andy Fan  napsal:

>
>
> Hi,
>
> I want to simulate a case where server receive "SELECT 1;SELECT 2;" at
> once, so when executing exec_simple_query, the query_string is
> "SELECT 1;SELECT 2;"  However I found it is not easy to simulate, does
> anyone has a suggestion?
>
> I tried 'psql -f', checking 'psql/settings.h', at last I see below code
> in psql/mainloop.c, I guess it is impossible in psql.
>
>
> /*
>  * Send command if semicolon found, or if end of line and we're in
>  * single-line mode.
>  */
> if (scan_result == PSCAN_SEMICOLON ||
> (scan_result == PSCAN_EOL && pset.singleline))
> {
>
>
If I remember well, you can use \; for this case

https://www.postgresql.org/docs/current/app-psql.html

Regards

Pavel


>
> --
> Best Regards
> Andy Fan
>
>
>
>


Re: Send multiple statements to pg server at once

2025-03-11 Thread Andy Fan
Pavel Stehule  writes:

>
> If I remember well, you can use \; for this case
>
> https://www.postgresql.org/docs/current/app-psql.html
>
> Regards

Thank you Pavel, it works! This is so handy!


-- 
Best Regards
Andy Fan





Re: Elimination of the repetitive code at the SLRU bootstrap functions.

2025-03-11 Thread Evgeny


Hello Hackers!

On 11 March 2025, the CI job failed 
(https://cirrus-ci.com/task/5453963400577024).


The issue occurred in the test ‘pg_amcheck/t/002_nonesuch.pl .

Log:

https://api.cirrus-ci.com/v1/artifact/task/5453963400577024/testrun/build/testrun/pg_amcheck/002_nonesuch/log/regress_log_002_nonesuch

Log output:
--
[11:40:20.527](0.000s) not ok 18 - checking with a non-existent user 
stderr /(?^:role "no_such_user" does not exist)/
[11:40:20.527](0.000s) #   Failed test 'checking with a non-existent 
user stderr /(?^:role "no_such_user" does not exist)/'

#   at C:/cirrus/src/bin/pg_amcheck/t/002_nonesuch.pl line 86.
[11:40:20.527](0.000s) #   'pg_amcheck: error: 
connection to server on socket 
"C:/Windows/TEMP/AtJUArHHFu/.s.PGSQL.17536" failed: server closed the 
connection unexpectedly

#   This probably means the server terminated abnormally
#   before or while processing the request.
# '
# doesn't match '(?^:role "no_such_user" does not exist)'
# Running: pg_amcheck template1
--

It appears to be a flickering issue that regularly occurs on the Windows 
platform and is not related to the patch.


For the purpose of restarting the CI job, I have attached the 5th 
version of the patch. It is identical to v4, except for the version 
number increment, which helps prevent duplicate attachment names in this 
thread.


Best regards,
Evgeny Voropaev.From 885c15ba379da4fd5f0dd71ee4dc9b5e31108d6f Mon Sep 17 00:00:00 2001
From: Evgeny Voropaev 
Date: Sun, 23 Feb 2025 23:19:19 +0800
Subject: [PATCH v5] Elimination of the repetitive code at the SLRU
 bootstrapping and nullifying functions.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The functions bootstrapping and nullifying SLRU pages used to have a lot of
repetitive code. New functions realized by this commit (BootStrapSlruPage,
XLogSimpleInsert) have allowed to remove these repetitions.

Author: Evgeny Voropaev  
Reviewed by: Álvaro Herrera 
Reviewed by: Andrey Borodin 
Reviewed by: Aleksander Alekseev 
Discussion: https://www.postgresql.org/message-id/flat/97820ce8-a1cd-407f-a02b-47368fadb14b%40tantorlabs.com
---
 src/backend/access/transam/clog.c   |  72 ++-
 src/backend/access/transam/commit_ts.c  |  70 ++-
 src/backend/access/transam/multixact.c  | 155 ++--
 src/backend/access/transam/slru.c   |  45 ++-
 src/backend/access/transam/subtrans.c   |  48 ++--
 src/backend/access/transam/xloginsert.c |  14 +++
 src/include/access/slru.h   |   2 +
 src/include/access/xloginsert.h |   1 +
 8 files changed, 138 insertions(+), 269 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 48f10bec91e..c86129282fa 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -110,7 +110,6 @@ static SlruCtlData XactCtlData;
 #define XactCtl (&XactCtlData)
 
 
-static int	ZeroCLOGPage(int64 pageno, bool writeXlog);
 static bool CLOGPagePrecedes(int64 page1, int64 page2);
 static void WriteZeroPageXlogRec(int64 pageno);
 static void WriteTruncateXlogRec(int64 pageno, TransactionId oldestXact,
@@ -832,41 +831,11 @@ check_transaction_buffers(int *newval, void **extra, GucSource source)
 void
 BootStrapCLOG(void)
 {
-	int			slotno;
-	LWLock	   *lock = SimpleLruGetBankLock(XactCtl, 0);
-
-	LWLockAcquire(lock, LW_EXCLUSIVE);
-
-	/* Create and zero the first page of the commit log */
-	slotno = ZeroCLOGPage(0, false);
-
-	/* Make sure it's written out */
-	SimpleLruWritePage(XactCtl, slotno);
-	Assert(!XactCtl->shared->page_dirty[slotno]);
-
-	LWLockRelease(lock);
-}
-
-/*
- * Initialize (or reinitialize) a page of CLOG to zeroes.
- * If writeXlog is true, also emit an XLOG record saying we did this.
- *
- * The page is not actually written, just set up in shared memory.
- * The slot number of the new page is returned.
- *
- * Control lock must be held at entry, and will be held at exit.
- */
-static int
-ZeroCLOGPage(int64 pageno, bool writeXlog)
-{
-	int			slotno;
-
-	slotno = SimpleLruZeroPage(XactCtl, pageno);
-
-	if (writeXlog)
-		WriteZeroPageXlogRec(pageno);
-
-	return slotno;
+	/*
+	 * Nullify the page (pageno = 0), don't insert an XLog record, 
+	 * save the page on a disk
+	 */
+	BootStrapSlruPage(XactCtl, 0, 0, true);
 }
 
 /*
@@ -959,7 +928,6 @@ void
 ExtendCLOG(TransactionId newestXact)
 {
 	int64		pageno;
-	LWLock	   *lock;
 
 	/*
 	 * No work except at first XID of a page.  But beware: just after
@@ -970,14 +938,12 @@ ExtendCLOG(TransactionId newestXact)
 		return;
 
 	pageno = TransactionIdToPage(newestXact);
-	lock = SimpleLruGetBankLock(XactCtl, pageno);
 
-	LWLockAcquire(lock, LW_EXCLUSIVE);
-
-	/* Zero the page and make an XLOG entry about it */
-	ZeroCLOGPage(pageno, true);

Re: Memory context can be its own parent and child in replication command

2025-03-11 Thread Anthonin Bonnefoy
On Tue, Mar 11, 2025 at 1:09 AM Michael Paquier  wrote:
> It seems to me that you mean 1afe31f03cd2, no?

Yes, that was a bad copy/paste from me.

On Tue, Mar 11, 2025 at 2:13 AM Tom Lane  wrote:
>
> I dunno about this patch: it seems to me it's doing things exactly
> backwards, namely trying to restore the CurrentMemoryContext after
> a transaction abort to what it was just beforehand.  With only
> a slightly different set of suppositions, this is introducing
> use of a deleted context rather than removing it.

Yeah, I'm not super happy about the fix either. This leaves the issue
that CurrentMemoryContext is set to a deleted context, even if
temporarily.

> I'm inclined to suspect that the real problem is somebody installed
> the wrong context as current just before the transaction was started.
> I don't have the energy to look closer right now, though.

The context installed is the replication command context created at
the beginning of exec_replication_command. This context is deleted at
the end of the function while the transaction is still running. Maybe
a better fix would be to not delete the Replication command context
when a transaction was started to handle the export, and switch back
to this context at the start of the next exec_replication_command?
This way, the memory context referenced by priorContext would still be
valid when the transaction is aborted.

This would also require the Replication command context to not be
attached under the MessageContext to avoid being deleted at the end of
the query cycle.

> Independently of where the actual bug is there, it seems not
> nice that it's so easy to bollix the memory context data
> structures.  I wonder if we can make that more detectable
> at reasonable cost.

One thing that made it hard to trace the issue was that there's no way
to know if a MemoryContext was deleted or not. Setting the
MemoryContext's node type to T_Invalid during reset could be a way to
tag the context as deleted. And this will be able to leverage the
existing MemoryContextIsValid check and additional
MemoryContextIsValid checks could be added before restoring the
priorContext.

--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -527,6 +527,7 @@ MemoryContextDeleteOnly(MemoryContext context)

context->methods->delete_context(context);

+   context->type = T_Invalid;
VALGRIND_DESTROY_MEMPOOL(context);
 }

However, when testing this on my mac, it seems to trigger a heap
corruption during initdb.

postgres(9057,0x200690840) malloc: Heap corruption detected, free list
is damaged at 0x6322bb10
*** Incorrect guard value: 276707563012096
postgres(9057,0x200690840) malloc: *** set a breakpoint in
malloc_error_break to debug

While it ran fine on linux. I didn't have time to check why yet.




Re: bogus error message for ALTER TABLE ALTER CONSTRAINT

2025-03-11 Thread Álvaro Herrera
On 2025-Mar-11, Amul Sul wrote:

> On Mon, Mar 10, 2025 at 11:29 PM Álvaro Herrera  
> wrote:
>
> > I fleshed this out more fully and I think 0001 is good enough to commit.
> 
> The approach looks good to me, but instead of adding a CAS_flags struct, could
> we use macros like SEEN_DEFERRABILITY(bits), SEEN_ENFORCEABILITY(bits),
> etc.? We can simply pass cas_bits to these macros, and to avoid the error
> from processCASbits(), we can pass NULL for constrType.

Ah yeah, I thought of this too at first, but didn't actually code it
because I thought it'd be messier.  Trying to do it naively doesn't
work, because it's not enough to test whether each bit is true or false
-- what you need to know is whether an option was specified for each
bit, in either direction.  So we'd need a separate bitmask, we can't
pass the existing 'bits' mask.  And at that point, it's not any better
to have a bitmask, and a stack-allocated struct of booleans is just
easier to write.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Los dioses no protegen a los insensatos.  Éstos reciben protección de
otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)




Re: NOT ENFORCED constraint feature

2025-03-11 Thread Álvaro Herrera
On 2025-Feb-28, Amul Sul wrote:

> Yeah, that was intentional. I wanted to avoid recursion again by
> hitting ATExecAlterChildConstr() at the end of
> ATExecAlterConstraintInternal(). Also, I realized the value doesn’t
> matter since recurse = false is explicitly set inside the
> cmdcon->alterEnforceability condition. I wasn’t fully satisfied with
> how we handled the recursion decision (code design), so I’ll give it
> more thought. If I don’t find a better approach, I’ll add clearer
> comments to explain the reasoning.

So, did you have a chance to rethink the recursion model here?  TBH I do
not like what you have one bit.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Para tener más hay que desear menos"




Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

2025-03-11 Thread Kirill Reshke
On Tue, 11 Mar 2025 at 14:37, Naga Appani  wrote:
>
>
>
> On Mon, Mar 10, 2025 at 10:43 AM Naga Appani  wrote:
>>
>> Hi,
>>

Hi

> =
> Proposal
> =
> The internal ReadMultiXactCounts() function, implemented in multixact.c, 
> directly calculates the number of MultiXact members by reading live state 
> from shared memory. This approach avoids the performance issues of the 
> current filesystem-based estimation methods.

This proposal looks sane. It is indeed helpful to keep an eye out for
multixact usage in systems that are heavily loaded.

> By exposing ReadMultiXactCounts() for external use, we can provide PostgreSQL 
> users with an efficient way to monitor MultiXact member usage. This could be 
> particularly useful for integrating with tools like Amazon RDS Performance 
> Insights and Amazon CloudWatch to provide enhanced database insights and 
> proactive managed monitoring for users.
>
> Please let me know if this approach is acceptable, so I’ll go ahead and 
> submit a patch.

Let's give it a try!



-- 
Best regards,
Kirill Reshke




Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-03-11 Thread Rushabh Lathia
On Mon, Mar 10, 2025 at 5:20 PM Alvaro Herrera 
wrote:

> On 2025-Mar-10, Rushabh Lathia wrote:
>
> > I adjusted the set_attnotnull() API and removed the added
> > queue_validation parameter.  Rather, the function start using wqueue
> > input parameter as a check.
> > If wqueue is NULL, skip the queue_validation.  Attaching patch here,
> > but not sure how clear it is, in term of understanding the API.  Your
> > thoughts/inputs?
>
> Yeah, I think this makes sense, if it supports all the cases correctly.
> (If in your code you have any calls of set_attnotnull that pass a NULL
> wqueue during ALTER TABLE in order to avoid queueing a check, you had
> better have comments to explain this.)
>

Done.


>
> Kindly do not send patches standalone, because the CFbot doesn't
> understand that it needs to apply it on top of the three previous
> patches.  You need to send all 4 patches every time.  You can see the
> result here:
> https://commitfest.postgresql.org/patch/5554/
> If you click on the "need rebase!" label you'll see that it tried to
> apply patch 0004 to the top of the master branch, and that obviously
> failed.  (FWIW if you click on the "summary" label you'll also see that
> the patch has been failing the CI tests since Feb 27.)
>

Sure, attaching the rebased patch here.


>
> > Looking further for AdjustNotNullInheritance() I don't see need of
> > rename this API as it's just adding new error check now for an
> > existing NOT NULL constraint.
>
> OK.
>
> > JFYI, I can reproduce the failure Ashutosh Bapat reported, while
> > running the pg_upgrade test through meson commands.  I am
> > investigating that further.
>
> Ah good, thanks.
>

Somehow, I am now not able to reproduce after the clean build.  Yesterday
I was able to reproduce, so I was happy, but again trying to analyze the
issue
when I start with the

Thanks,
Rushabh Lathia
www.EnterpriseDB.com


0002-Support-NOT-VALID-and-VALIDATE-CONSTRAINT-for-named-.patch
Description: Binary data


0001-Convert-pg_attribut.attnotnull-to-char-type.patch
Description: Binary data


0003-Documentation-changes.patch
Description: Binary data


Re: Parallel heap vacuum

2025-03-11 Thread Amit Kapila
On Mon, Mar 10, 2025 at 11:57 PM Masahiko Sawada  wrote:
>
> On Sun, Mar 9, 2025 at 11:12 PM Amit Kapila  wrote:
> >
> >
> > > However, in the heap vacuum phase, the leader process needed
> > > to process all blocks, resulting in soft page faults while creating
> > > Page Table Entries (PTEs). Without the patch, the backend process had
> > > already created PTEs during the heap scan, thus preventing these
> > > faults from occurring during the heap vacuum phase.
> > >
> >
> > This part is again not clear to me because I am assuming all the data
> > exists in shared buffers before the vacuum, so why the page faults
> > will occur in the first place.
>
> IIUC PTEs are process-local data. So even if physical pages are loaded
> to PostgreSQL's shared buffer (and paga caches), soft page faults (or
> minor page faults)[1] can occur if these pages are not yet mapped in
> its page table.
>

Okay, I got your point. BTW, I noticed that even for the case where
all the data is in shared_buffers, the performance improvement for
workers greater than two does decrease marginally. Am I reading the
data correctly? If so, what is the theory, and do we have
recommendations for a parallel degree?

-- 
With Regards,
Amit Kapila.




Re: Memory context can be its own parent and child in replication command

2025-03-11 Thread Anthonin Bonnefoy
On Tue, Mar 11, 2025 at 10:26 AM Anthonin Bonnefoy
 wrote:
> --- a/src/backend/utils/mmgr/mcxt.c
> +++ b/src/backend/utils/mmgr/mcxt.c
> @@ -527,6 +527,7 @@ MemoryContextDeleteOnly(MemoryContext context)
>
> context->methods->delete_context(context);
>
> +   context->type = T_Invalid;
> VALGRIND_DESTROY_MEMPOOL(context);
>  }
>
> However, when testing this on my mac, it seems to trigger a heap
> corruption during initdb.

Which is normal since this would only work with AllocSet as other
delelte_context methods would just free the context... A better spot
would be just before putting the context in the freelist in
AllocSetDelete.




Re: bogus error message for ALTER TABLE ALTER CONSTRAINT

2025-03-11 Thread jian he
On Tue, Mar 11, 2025 at 6:21 PM Álvaro Herrera  wrote:
>
>
> > seems didn't cover "initially immediate" case for domain.
> > for example:
> > create domain d_int as int4;
> > --- the following two cases should fail.
> > alter domain d_int add constraint nn1 not null initially immediate;
> > alter domain d_int add constraint cc check(value > 1) initially immediate;
>
> Yeah, I thought at first you were right, but on further thought, do we
> really want to do this?  I mean, INITIALLY IMMEDIATE is the default
> timing for a constraint, so why should we complain if a user wants to
> get a constraint that's declared that way?  I'm not sure that we should
> do it.  Same with NOT DEFERRABLE.
> [... looks at the standard doc ...]
> And that's indeed what the SQL standard says:
>
>  ::=
>   CREATE DOMAIN  [ AS ] 
>   [  ]
>   [ ... ]
>   [  ]
>
>  ::=
>   [  ]  [ 
>  ]
>
> 8) For every  specified:
>[...]
> b) If  is not specified, then INITIALLY IMMEDIATE
>NOT DEFERRABLE is implicit.
>
>
> So I think the fix here needs to distinguish those cases and avoid
> throwing errors for them.
>
> (Note also it does not say that INITIALLY DEFERRED or DEFERRABLE are
> disallowed, which means that we're failing to fully implement the
> standard-mandated behavior by prohibiting those.)
>

I don't have a huge opinion though.
but it's better to align CREATE DOMAIN with ALTER DOMAIN.
For example, the following two logic should behave the same.

create domain d_int as int4 constraint nn1 not null initially immediate;
alter domain d_int add constraint nn1 not null initially immediate;

Also if we do not error out, then in the create_domain.sgml, alter_domain.sgml
 section we should include these "useless" keywords.




Re: Changing the state of data checksums in a running cluster

2025-03-11 Thread Dagfinn Ilmari Mannsåker
As the resident perl style pedant, I'd just like to complain about the
below:

Tomas Vondra  writes:

> diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
> b/src/test/perl/PostgreSQL/Test/Cluster.pm
> index 666bd2a2d4c..1c66360c16c 100644
> --- a/src/test/perl/PostgreSQL/Test/Cluster.pm
> +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
> @@ -3761,7 +3761,8 @@ sub checksum_enable_offline
>   my ($self) = @_;
>  
>   print "### Enabling checksums in \"$self->data_dir\"\n";
> - PostgreSQL::Test::Utils::system_or_bail('pg_checksums', '-D', 
> $self->data_dir, '-e');
> + PostgreSQL::Test::Utils::system_or_bail('pg_checksums', '-D',
> + $self->data_dir, '-e');
>   return;
>  }


This breaking between the command line options and its arguments is why
we're switching to using fat commas. We're also using long options for
improved self-documentation, so this should be written as:

PostgreSQL::Test::Utils::system_or_bail('pg_checksums',
'--pgdata' => $self->data_dir,
'--enable');

And likewise below in the disable method.

- ilmari




Re: RFC: Additional Directory for Extensions

2025-03-11 Thread Peter Eisentraut

On 10.03.25 21:25, Matheus Alcantara wrote:

On Thu, Mar 6, 2025 at 10:46 AM Peter Eisentraut  wrote:

This looks very good to me.  I have one issue to point out:  The logic
in get_extension_control_directories() needs to be a little bit more
careful to align with the rules in find_in_path().  For example, it
should use first_path_var_separator() to get the platform-specific path
separator, and probably also substitute_path_macro() and
canonicalize_path() etc., to keep everything consistent.


I fixed this hardcoded path separator issue on the TAP test and forgot
to fix it also on code, sorry, fixed on this new version.



(Maybe it would be ok to move the function to dfmgr.c to avoid having
to export too many things from there.)


I've exported substitute_path_macro because adding a new function on
dfmgr would require #include nodes/pg_list.h and I'm not sure what
approach would be better, please let me know what you think.


Yes, that structure looks ok.  But you can remove one level of block in 
get_extension_control_directories().


I found a bug that was already present in my earlier patch versions:

@@ -423,7 +424,7 @@ find_extension_control_filename(ExtensionControlFile 
*control)

ecp = Extension_control_path;
if (strlen(ecp) == 0)
ecp = "$system";
-   result = find_in_path(basename, Extension_control_path, 
"extension_control_path", "$system", system_dir);
+   result = find_in_path(basename, ecp, "extension_control_path", 
"$system", system_dir);


Without this, it won't work if you set extension_control_path empty. 
(Maybe add a test for that?)


I think this all works now, but I think the way 
pg_available_extensions() works is a bit strange and inefficient.  After 
it finds a candidate control file, it calls 
read_extension_control_file() with the extension name, that calls 
parse_extension_control_file(), that calls 
find_extension_control_filename(), and that calls find_in_path(), which 
searches that path again!


There should be a simpler way into this.  Maybe 
pg_available_extensions() should fill out the ExtensionControlFile 
structure itself, set ->control_dir with where it found it, then call 
directly to parse_extension_control_file(), and that should skip the 
finding if the directory is already set.  Or something similar.






Re: [PATCH] Add reverse(bytea)

2025-03-11 Thread Nathan Bossart
Here is what I have staged for commit.  The only differences from v1 are
some very light edits.

-- 
nathan
>From aa0cc332c6fbadfa6f49ab072dd97c5c8d78fdf8 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 11 Mar 2025 11:01:32 -0500
Subject: [PATCH v2 1/1] Add reverse() for bytea.

This commit introduces a function for reversing the order of the
bytes in binary strings.

NEEDS CATVERSION BUMP

Author: Aleksander Alekseev 
Discussion: 
https://postgr.es/m/CAJ7c6TMe0QVRuNssUArbMi0bJJK32%2BzNA3at5m3osrBQ25MHuw%40mail.gmail.com
---
 doc/src/sgml/func.sgml| 17 +
 src/backend/utils/adt/varlena.c   | 21 +
 src/include/catalog/pg_proc.dat   |  3 +++
 src/test/regress/expected/strings.out | 18 ++
 src/test/regress/sql/strings.sql  |  4 
 5 files changed, 63 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 51dd8ad6571..1c3810e1a04 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4660,6 +4660,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 
'three');

   
 
+  
+   
+
+ reverse
+
+reverse ( bytea )
+bytea
+   
+   
+Reverses the order of the bytes in the binary string.
+   
+   
+reverse('\xabcd'::bytea)
+\xcdab
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index cdf185ea00b..95631eb2099 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -3398,6 +3398,27 @@ byteaSetBit(PG_FUNCTION_ARGS)
PG_RETURN_BYTEA_P(res);
 }
 
+/*
+ * Return reversed bytea
+ */
+Datum
+bytea_reverse(PG_FUNCTION_ARGS)
+{
+   bytea  *v = PG_GETARG_BYTEA_PP(0);
+   const char *p = VARDATA_ANY(v);
+   int len = VARSIZE_ANY_EXHDR(v);
+   const char *endp = p + len;
+   bytea  *result = palloc(len + VARHDRSZ);
+   char   *dst = (char *) VARDATA(result) + len;
+
+   SET_VARSIZE(result, len + VARHDRSZ);
+
+   while (p < endp)
+   *(--dst) = *p++;
+
+   PG_RETURN_BYTEA_P(result);
+}
+
 
 /* text_name()
  * Converts a text type to a Name type.
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 42e427f8fe8..890822eaf79 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1518,6 +1518,9 @@
 { oid => '6163', descr => 'number of set bits',
   proname => 'bit_count', prorettype => 'int8', proargtypes => 'bytea',
   prosrc => 'bytea_bit_count' },
+{ oid => '8694', descr => 'reverse bytea',
+  proname => 'reverse', prorettype => 'bytea', proargtypes => 'bytea',
+  prosrc => 'bytea_reverse' },
 
 { oid => '725',
   proname => 'dist_pl', prorettype => 'float8', proargtypes => 'point line',
diff --git a/src/test/regress/expected/strings.out 
b/src/test/regress/expected/strings.out
index f8cba9f5b24..fbe7d7be71f 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -236,6 +236,24 @@ SELECT E'De\\678dBeEf'::bytea;
 ERROR:  invalid input syntax for type bytea
 LINE 1: SELECT E'De\\678dBeEf'::bytea;
^
+SELECT reverse(''::bytea);
+ reverse 
+-
+ \x
+(1 row)
+
+SELECT reverse('\xaa'::bytea);
+ reverse 
+-
+ \xaa
+(1 row)
+
+SELECT reverse('\xabcd'::bytea);
+ reverse 
+-
+ \xcdab
+(1 row)
+
 SET bytea_output TO escape;
 SELECT E'\\xDeAdBeEf'::bytea;
   bytea   
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 4deb0683d57..ed054e6e99c 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -77,6 +77,10 @@ SELECT E'De\123dBeEf'::bytea;
 SELECT E'De\\123dBeEf'::bytea;
 SELECT E'De\\678dBeEf'::bytea;
 
+SELECT reverse(''::bytea);
+SELECT reverse('\xaa'::bytea);
+SELECT reverse('\xabcd'::bytea);
+
 SET bytea_output TO escape;
 SELECT E'\\xDeAdBeEf'::bytea;
 SELECT E'\\x De Ad Be Ef '::bytea;
-- 
2.39.5 (Apple Git-154)



Re: Commitfest app release on Feb 17 with many improvements

2025-03-11 Thread vignesh C
On Sun, 9 Mar 2025 at 19:05, Jelte Fennema-Nio  wrote:
>
> On Sun, 9 Mar 2025 at 03:21, vignesh C  wrote:
> > Couple of suggestions: a) No need to show CI status as "Needs rebase,"
> > "Not processed," etc., for committed patches.
>
> Do you mean specifically for committed ones? Or just for any patch
> with a "closed" status.

It is for any closed status patches.

> > b) Can we add a filter
> > for "Needs rebase!"? This would help the CommitFest manager easily
> > list patches that need updating.
>
> That should be pretty easy to implement. But is that really what we
> want? In the next release, sorting by "failing since" is implemented.
> It sounds like that could be enough instead. i.e. do we really only
> want to call out patches that need a rebase? Or also ones that have
> been failing in CI for a long time?
>
> I'm even wondering if this whole flow still makes sense. Do we really
> want to send an email to  the mailing list about this? And if so, why
> is someone doing that manually? If people subscribe to updates for
> patches that they authored, then they get these "needs rebase"
> automatically. Should we maybe simply default that option to true? And
> for instance send a notification automatically to all people with a
> "needs rebase" CI status whenever we start a new commitfest.

It will be good if you can send a notification automatically to the
patch requiring a rebase when the CFBot first identifies this(not only
during the start of commitfest) and probably send this notification
once in a day for 3 or so days and then change the status to "waiting
on author" if a new rebased version is not posted in this time.

Regards,
Vignesh




Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py

2025-03-11 Thread Mats Kindahl
On Sun, Mar 2, 2025 at 2:26 PM Mats Kindahl  wrote:

> On Sat, Jan 18, 2025 at 8:44 PM Mats Kindahl  wrote:
>
>> On Tue, Jan 14, 2025 at 4:19 PM Aleksander Alekseev <
>> aleksan...@timescale.com> wrote:
>>
>>> IMO the best solution would be re-submitting all the patches to this
>>> thread. Also please make sure the patchset is registered on the
>>> nearest open CF [1] This will ensure that the patchset is built on our
>>> CI (aka cfbot [2]) and will not be lost.
>>>
>>> [1]: https://commitfest.postgresql.org/
>>> [2]: http://cfbot.cputube.org/
>>>
>>>
>>
> Hi all,
>
> Here is a new set of patches rebased on the latest version of the
> Postgres. I decided to just include the semantic patches in each patch
> since it is trivial to generate the patch and build using:
>
>
> ninja coccicheck-patch | patch -d .. -p1 && ninja
>
>
> I repeat the description from the previous patch set and add comments
> where things have changed, but I have also added two semantic patches,
> which are described last.
>
> For those of you that are not aware of it: Coccinelle is a tool for
>> pattern matching and text transformation for C code and can be used for
>> detection of problematic programming patterns and to make complex,
>> tree-wide patches easy. It is aware of the structure of C code and is
>> better suited to make complicated changes than what is possible using
>> normal text substitution tools like Sed and Perl. I've noticed it's been
>> used at a few cases way back to fix issues.[1]
>>
>> Coccinelle have been successfully been used in the Linux project since
>> 2008 and is now an established tool for Linux development and a large
>> number of semantic patches have been added to the source tree to capture
>> everything from generic issues (like eliminating the redundant A in
>> expressions like "!A || (A && B)") to more Linux-specific problems like
>> adding a missing call to kfree().
>>
>> Although PostgreSQL is nowhere the size of the Linux kernel, it is
>> nevertheless of a significant size and would benefit from incorporating
>> Coccinelle into the development. I noticed it's been used in a few cases
>> way back (like 10 years back) to fix issues in the PostgreSQL code, but I
>> thought it might be useful to make it part of normal development practice
>> to, among other things:
>>
>> - Identify and correct bugs in the source code both during development
>> and review.
>> - Make large-scale changes to the source tree to improve the code based
>> on new insights.
>> - Encode and enforce APIs by ensuring that function calls are used
>> correctly.
>> - Use improved coding patterns for more efficient code.
>> - Allow extensions to automatically update code for later PostgreSQL
>> versions.
>>
>> To that end, I created a series of patches to show how it could be used
>> in the PostgreSQL tree. It is a lot easier to discuss concrete code and I
>> split it up into separate messages since that makes it easier to discuss
>> each individual patch. The series contains code to make it easy to work
>> with Coccinelle during development and reviews, as well as examples of
>> semantic patches that capture problems, demonstrate how to make large-scale
>> changes, how to enforce APIs, and also improve some coding patterns.
>>
>> The first three patches contain the coccicheck.py script and the
>> integration with the build system (both Meson and Autoconf).
>>
>> # Coccicheck Script
>>
>> It is a re-implementation of the coccicheck script that the Linux kernel
>> uses. We cannot immediately use the coccicheck script since it is quite
>> closely tied to the Linux source code tree and we need to have something
>> that both supports Autoconf and Meson. Since Python seems to be used more
>> and more in the tree, it seems to be the most natural choice. (I have no
>> strong opinion on what language to use, but think it would be good to have
>> something that is as platform-independent as possible.)
>>
>> The intention is that we should be able to use the Linux semantic patches
>> directly, so it supports the "Requires" and "Options" keywords, which can
>> be used to require a specific version of spatch(1) and add options to the
>> execution of that semantic patch, respectively.
>>
>
> I have added support for using multiple jobs similar to how "make -jN"
> works. This is also supported by the autoconf and ninja builds
>
>
>> # Autoconf support
>>
>> The changes to Autoconf modifies the configure.ac and related files (in
>> particular Makefile.global.in). At this point, I have deliberately not
>> added support for pgxs so extensions cannot use coccicheck through the
>> PostgreSQL installation. This is something that we can add later though.
>>
>> The semantic patches are expected to live in cocci/ directory under the
>> root and the patch uses the pattern cocci/**/*.cocci to find all semantic
>> patches. Right now there are no subdirectories for the semantic patches,
>> but this might be something we want to add to create different categori

Re: Non-text mode for pg_dumpall

2025-03-11 Thread Mahendra Singh Thalor
On Tue, 11 Mar 2025 at 20:12, Álvaro Herrera 
wrote:
>
> On 2025-Mar-11, Mahendra Singh Thalor wrote:
>
> > On Wed, 5 Mar 2025 at 20:42, Álvaro Herrera 
wrote:
>
> > > Okay, we should probably fix that, but I think the new map.dat file
your
> > > patch adds is going to make the problem worse, because it doesn't look
> > > like you handled that case in any particular way that would make it
not
> > > fail.
> >
> > As Jian also pointed out, we should not allow \n\r in dbnames. I am
> > keeping dbanames as single line names only.
>
> Ehm, did you get consensus on adding such a restriction?
>

Hi Alvaro,

In map.dat file, I tried to fix this issue by adding number of characters
in dbname but as per code comments, as of now, we are not supporting \n\r
in dbnames so i removed handling.
I will do some more study to fix this issue.

/*
>  * Append the given string to the shell command being built in the buffer,
>  * with shell-style quoting as needed to create exactly one argument.
>  *
>  * Forbid LF or CR characters, which have scant practical use beyond
> designing
>  * security breaches.  The Windows command shell is unusable as a conduit
> for
>  * arguments containing LF or CR characters.  A future major release should
>  * reject those characters in CREATE ROLE and CREATE DATABASE, because use
>  * there eventually leads to errors here.
>  *
>  * appendShellString() simply prints an error and dies if LF or CR appears.
>  * appendShellStringNoError() omits those characters from the result, and
>  * returns false if there were any.
>  */
> void
> appendShellString(PQExpBuffer buf, const char *str)


Sorry, in the v22 patches, I missed to use the "git add connectdb.c" file.
(Thanks Andrew for reporting this offline)

Here, I am attaching updated patches for review and testing.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


v23_0001_move-common-code-of-pg_dumpall-and-pg_restore-to-new_file.patch
Description: Binary data


v23_0002_pg_dumpall-with-non-text_format-11th_march.patch
Description: Binary data


Re: Use "protocol options" name instead of "protocol extensions" everywhere

2025-03-11 Thread Peter Eisentraut

On 31.01.25 10:32, Laurenz Albe wrote:

I'll set this patch to "ready for committer".
This is about the color of the bikeshed, and several people
have voiced their opinion.  I don't think much more review
is needed.  All that is needed is a committer who either
commits or rejects it.


I don't think there is consensus for change, so I'm rejecting it.





Re: Reducing the log spam

2025-03-11 Thread Laurenz Albe
On Fri, 2025-03-07 at 20:38 +0100, Jim Jones wrote:
> I've tested this patch and for the most part it works as intended.

Thanks for the thorough test!

> There are a few issues though ...
> 
> 1) Invalid codes aren't rejected. Is there any way to validate them?
> 
> postgres=# SET log_suppress_errcodes TO '0foo1'; SHOW log_suppress_errcodes;
> SET
>  log_suppress_errcodes
> ---
>  0foo1
> (1 row)

That is intentional.  I only test that the SQLSTATE is 5 characters long
and contains only ASCII letters and numbers.  I looked at the SQL standard,
and while it defines the meaning of certain SQLSTATEs, it allows for
custom defined states.

Nothing bad can happen from setting an unused SQLSTATE; it just won't
suppress anything.

> 2) No duplication check (not critical IMO)
> 
> postgres=# SET log_suppress_errcodes TO '3F000,3F000'; SHOW
> log_suppress_errcodes;
> SET
>  log_suppress_errcodes
> ---
>  3F000,3F000
> (1 row)
> 
> 
> 3) errcodes are not trimmed (also not critical..  just doesn't look nice)
> 
> postgres=# SET log_suppress_errcodes TO '   3F000, 42P01';
> SHOW log_suppress_errcodes;
> SET
>     log_suppress_errcodes    
> -
>     3F000, 42P01
> (1 row)

These two are mostly cosmetic issues.

I originally thought it best to leave the parameter the way the user
entered it, but in the attached version I revised that by reassembling
the parameter string from the parsed SQLSTATEs, so that spaces and
duplicates will vanish and everything is converted to upper case.

So these complaints should be addressed.

> 4) SHOW log_suppress_errcodes displays an invalid value if we set it
> twice to an empty string
> 
> $ /usr/local/postgres-dev/bin/psql postgres
> psql (18devel)
> Type "help" for help.
> 
> postgres=# SET log_suppress_errcodes TO ''; SHOW log_suppress_errcodes;
> SET log_suppress_errcodes TO ''; SHOW log_suppress_errcodes;
> SET
>  log_suppress_errcodes
> ---
>  
> (1 row)
> 
> SET
>  log_suppress_errcodes
> ---
>  wV
> (1 row)
> 
> 
> 5) The system crashes if we set log_suppress_errcodesto an empty string
> and then set it back to comma separated values

These two points were actually caused by a memory management bug that I
had inadvertently introduced in the assign hook.  They should be fixed now.

Attached is the fifth version of the patch.

Yours,
Laurenz Albe
From 5885419838597293ac18b131017f8f778261f3d9 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Tue, 11 Mar 2025 10:32:33 +0100
Subject: [PATCH v5] Add parameter log_suppress_errcodes

The parameter contains a list of SQLSTATEs for which
FATAL and ERROR messages are not logged.  This is to
suppress messages that are of little interest to the
database administrator, but tend to clutter the log.

Author: Laurenz Albe 
Reviewed-by: Jim Jones 
Reviewed-by: Jelte Fennema-Nio 
Reviewed-by: Justin Pryzby 
Reviewed-by: Rafia Sabih 
Discussion: https://postgr.es/m/408f399e7de1416c47bab7e260327ed5ad92838c.camel%40cybertec.at
---
 doc/src/sgml/config.sgml  |  35 
 src/backend/utils/error/elog.c| 149 ++
 src/backend/utils/misc/guc_tables.c   |  11 ++
 src/backend/utils/misc/postgresql.conf.sample |   3 +
 src/include/pg_config_manual.h|  10 ++
 src/include/utils/guc.h   |   1 +
 src/include/utils/guc_hooks.h |   2 +
 7 files changed, 211 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d2fa5f7d1a9..ba5f60977a9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6916,6 +6916,41 @@ local0.*/var/log/postgresql
   
  
 
+ 
+  log_suppress_errcodes (string)
+  
+   log_suppress_errcodes configuration parameter
+  
+  
+  
+   
+Causes ERROR and FATAL messages
+from client backend processes with certain error codes to be excluded
+from the log.
+The value is a comma-separated list of five-character error codes as
+listed in .  An error code that
+represents a class of errors (ends with three zeros) suppresses logging
+of all error codes within that class.  For example, the entry
+08000 (connection_exception)
+would suppress an error with code 08P01
+(protocol_violation).  The default setting is
+empty, that is, all error codes get logged.
+Only superusers and users with the appropriate SET
+privilege can change this setting.
+   
+
+   
+This setting allows you to skip error logging for messages that are
+frequent but irrelevant.  Supressing such messages makes it easier to
+spot relevant messages in the log and keeps log files from growing too
+big.  For example, if you have a monitoring system that regularly
+establishes a TCP connection to the server without s

Re: protocol support for labels

2025-03-11 Thread Jeremy Schneider


> On Mar 11, 2025, at 3:03 AM, Kirill Reshke  wrote:
> 
> On Tue, 11 Mar 2025 at 11:09, Jeremy Schneider  
> wrote:
> 
>> observability frameworks like OpenTelemetry support tracing through all
>> layers of a stack, and trace_ids can even be passed into sql with
>> extensions like sqlcommenter. however sqlcommenter puts the trace_id
>> into a comment which effectively breaks all the utility of
>> pg_stat_statements since each query has a unique trace_id.
>> 
> There are some other use-cases:
> 1) RO-RW routing. Users can pass target-session-attrs to the server
> within query labels to hint about its need. Useful when some kind of
> proxy (odyssey,pgbouncer,spqr,pgpool II, pgcat, pg_doorman) is used.
> 2) pg_comment_stats uses comments in the query to accumulate statistics. [0]


Thinking a bit more, my root issue is specifically around pg_stat_statements so 
maybe it’s also solvable with some changes to how query jumbling is done

But that topic seems like one where we’d never get consensus

Should query jumbling for calculating query_id be customizable somehow? How 
would we resolve multiple concurrent opinions about how queries should be 
jumbled (eg if comment_stats needs different tweaks than sqlcommenter)? Was 
there previous discussion about this already? I’ll need to go search mailing 
list history a bit

-Jeremy


Sent from my TI-83



Re: Non-text mode for pg_dumpall

2025-03-11 Thread Álvaro Herrera
On 2025-Mar-11, Mahendra Singh Thalor wrote:

> On Wed, 5 Mar 2025 at 20:42, Álvaro Herrera  wrote:

> > Okay, we should probably fix that, but I think the new map.dat file your
> > patch adds is going to make the problem worse, because it doesn't look
> > like you handled that case in any particular way that would make it not
> > fail.
> 
> As Jian also pointed out, we should not allow \n\r in dbnames. I am
> keeping dbanames as single line names only.

Ehm, did you get consensus on adding such a restriction?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-03-11 Thread David G. Johnston
On Tue, Mar 11, 2025 at 12:20 AM Peter Smith  wrote:

>
> Unfortunately, we are spinning in circles a bit trying to come up with
> a good way to represent the option needed for this, while at the same
> time trying to be future-proof. I see 3 choices...
>
> ==
>
> Choice 1.  Generic option
>
> Implement a single boolean option to remove everything.
>
>
> Do you have any thoughts on what kind of option is best here?
>
>
Option 1 by far.  Though personally I'd rather do something like what
pg_upgrade does and output a script with drop commands that can be executed
via psql instead of having pg_createsubscriber execute said commands.  I'd
call it "pruning the target".

Any automated bash script I'd write would just do:

pg_createsubscriber ... --prune-target-script=prune.sql
psql --file prune.sql

And if I'm working interactively I'd want to spend the extra minute or so
reviewing the commands in prune.sql to make sure I know what is going
away.  I can also edit said file, or use something like grep, if I want to
limit what is dropped.

In particular any database that isn't turned into a logical replica would
be added to this list; in addition to all the publication/subscription
stuff that is likely broken at this point.

David J.


RE: Selectively invalidate caches in pgoutput module

2025-03-11 Thread Zhijie Hou (Fujitsu)
On Monday, March 10, 2025 9:12 PM Kuroda, Hayato  
wrote:
> 
> > Currently, only the leaf partition is invalidated when the published
> > table is partitioned. However, I think pgoutput could cache both the
> > partitioned table and the leaf partition table as relsync entries.
> >
> > For INSERT/UPDATE/DELETE on a partitioned table, only the leaf
> > partition's relsync entry is used in pgoutput, but the TRUNCATE
> > references the parent table's relsync entry.
> 
> I think your analysis is correct. PSA new version. Below part contains my
> analysis.
> 
> In ExecuteTruncate(), if the specified relation has children, all of them are
> checked via find_all_inheritors() and listed as target. Also,
> ExecuteTruncateGuts() serializes both a parent and children in
> XLOG_HEAP_TRUNCATE WAL record.
> Decoding layer passes relations as-is. These facts mean that output plugins
> can store caches on the memory.

Thanks for updating the patch.

I have reviewed patch 0001 and did not find issues, aside from a few issues for
code comments that were mentioned in Amit's email. Here are some analyses and
insights gathered during the review of 0001:

The functions and variables in the 0001 patch uses 'Relsync' (e.g.,
RegisterRelsyncInvalidation) instead of the longer 'RelsyncCache'. After
internal discussions, we think it's OK, as using 'RelsyncCache' could
unnecessarily lengthen the names.

Furthermore, considering we're introducing a new invalidation message for the
RelationSyncCache within pgoutput, which is an output plugin, we discussed
whether a more general invalidation name should be adopted in case other output
plugins might use it. However, after reviewing all the plugins listed in
Wiki[1], we did not find any other plugins that reference the built-in
publication catalog. Therefore, this scenario appears to be uncommon.
Additionally, the current naming convention is sufficiently intuitive for
output plugin developers. Hence, we feel it is reasonable to retain the
existing names.


For patch 0002, I think the implementation could be improved. The
current patch introduces a new function, RenamePublication, to replace the
existing generic approach in ExecRenameStmt->AlterObjectRename_internal.
However, this creates inconsistency because the original code uses
AccessExclusiveLock for publication renaming, making concurrent renaming
impossible. The current patch seems to overlook this aspect.

Additionally, introducing a new function results in code duplication, which can
be avoided. After further consideration, handling the publication rename
separately seems unnecessary, given it requires only sending a few extra
invalidation messages. Therefore, I suggest reusing the generic handling and
simply extending AlterObjectRename_internal to include the additional
invalidation messages.

I've attached a diff with the proposed changes for 0002.

Best Regards,
Hou zj


0001-refactor.patch
Description: 0001-refactor.patch


Re: initdb.exe Fails on Windows When Password File Path Contains Non-ASCII Characters

2025-03-11 Thread Manika Singhal
On Tue, Mar 11, 2025 at 5:12 PM Manika Singhal <
manika.sing...@enterprisedb.com> wrote:

> Hi,
>
> The postgresql windows installer packaged by EDB, creates the postgresql
> instance on the fly.
> It creates a temporary file to store the password and then deletes it once
> the initdb is run successfully.
> But a user reported the initdb fails. On debugging it was found that, it
> occurs because the username in the
> %TEMP% contains the non-ascii characters as shown in the example below
>


> initdb.exe --pgdata="C:\Program Files\PostgreSQL\17\data"
> --username=postgres --encoding=UTF8
> --pwfile="C:\Users\玛妮卡\AppData\Local\Temp\passwdXXX" --auth=scram-sha-256
>


> initdb: error: could not open file
> "C:\Users\???\AppData\Local\Temp\psswrd" for reading: No such file or
> directory
>


> It seems that initdb.exe does not correctly handle paths containing
> Unicode characters, replacing them with ???
> in the error message. I want to ask if this is the expected behaviour of
> initdb.exe?
>


> When the user temp location is replaced with system temp (c:\windows\temp)
> the error goes away.
> Any suggestions if the installer should use this path to create the
> password file?
>
>
>
sending again with corrected spacing. Sorry for the inconvenience!



-
> Thanks & Regards
> Manika Singhal
> EDB India
>


Re: Add an option to skip loading missing publication to avoid logical replication failure

2025-03-11 Thread vignesh C
On Mon, 10 Mar 2025 at 09:33, Amit Kapila  wrote:
>
> On Tue, Mar 4, 2025 at 6:54 PM vignesh C  wrote:
> >
> > On further thinking, I felt the use of publications_updated variable
> > is not required we can use publications_valid itself which will be set
> > if the publication system table is invalidated. Here is a patch for
> > the same.
> >
>
> The patch relies on the fact that whenever a publication's data is
> invalidated, it will also invalidate all the RelSyncEntires as per
> publication_invalidation_cb. But note that we are discussing removing
> that inefficiency in the thread  [1]. So, we should try to rebuild the
> entry when we have skipped the required publication previously.
>
> Apart from this, please consider updating the docs, as mentioned in my
> response to Sawada-San's email.

The create subscription documentation already has "We allow
non-existent publications to be specified so that users can add those
later. This means pg_subscription can have non-existent publications."
and should be enough at [1]. Let me know if we need to add more
documentation.

Apart from this I have changed the log level that logs "skipped
loading publication" to WARNING as we log a warning "WARNING:
publications "pub2", "pub3" do not exist on the publisher" in case of
CREATE SUBSCRIPTION and looked similar to this. I can change it to a
different log level in case you feel this is not the right level.

Also I have added a test case for dilip's comment from [2].
The attached v7 version patch has the changes for the same.

[1] - https://www.postgresql.org/docs/devel/sql-createsubscription.html
[2] - 
https://www.postgresql.org/message-id/CAFiTN-tgUR6QLSs3UHK7gx4VP7cURGNkufA_xkrQLw9eCnbGQw%40mail.gmail.com

Regards,
Vignesh


v7-0001-Fix-logical-replication-breakage-after-ALTER-SUBS.patch
Description: Binary data


Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-03-11 Thread Nisha Moond
On Mon, Mar 10, 2025 at 4:31 PM Shubham Khanna
 wrote:
>
> > 2) This part of code has another bug:
> >  "dbinfos.dbinfo->made_publication = false;" incorrectly modifies
> > dbinfo[0], even if the failure occurs in other databases (dbinfo[1],
> > etc.). Since cleanup_objects_atexit() checks made_publication per
> > database, this could lead to incorrect behavior.
> > Solution: Pass the full 'dbinfo' structure to
> > drop_publication_by_name() , and use it accordingly.
> >
> > --
>
> Fixed.
>
> The attached patch contains the suggested changes.
>

Thanks for the patch. Here are few comments for the new change -
1)
+static void check_and_drop_existing_publications(PGconn *conn, struct
LogicalRepInfo *dbinfo);
 - move the second parameter to the next line to maintain 80 char length.

2)
  pg_log_error("could not drop publication \"%s\" in database \"%s\": %s",
- dbinfo->pubname, dbinfo->dbname, PQresultErrorMessage(res));
+ pubname, dbinfo->dbname,
+ PQresultErrorMessage(res));

You can keep "PQresultErrorMessage(res));" in the previous line, it
will still be < 80 chars.

3) The new IF check -
+   if (strcmp(pubname, dbinfos.dbinfo->pubname) == 0)
+   dbinfo->made_publication = false;   /* don't try again. */

always compares dbinfo[0]->pubname, but it should compare 'pubname'
with the respective database's publication. Please correct it as -
if (strcmp(pubname, dbinfo->pubname) == 0)


--
Thanks,
Nisha




Re: Optimizing FastPathTransferRelationLocks()

2025-03-11 Thread Ashutosh Bapat
Hi Fujii-san,

It seems that this was forgotten somehow.

The patch still applies.

Examining c4d5cb71d229095a39fda1121a75ee40e6069a2a, it seems that this patch
could have been part of that commit as well. But may be it wasn't so apparent
that time. I think it's a good improvement.

On Tue, Nov 19, 2024 at 10:14 PM Fujii Masao
 wrote:
>
> The latest version of the patch is attached.

@@ -2773,6 +2773,10 @@ FastPathTransferRelationLocks(LockMethod
lockMethodTable, const LOCKTAG *locktag
LWLock *partitionLock = LockHashPartitionLock(hashcode);
Oid relid = locktag->locktag_field2;
uint32 i;
+ uint32 group;
+
+ /* fast-path group the lock belongs to */
+ group = FAST_PATH_REL_GROUP(relid);

We could just combine variable declaration and initialization; similar
to partitionLock.

@@ -2802,16 +2805,16 @@ FastPathTransferRelationLocks(LockMethod
lockMethodTable, const LOCKTAG *locktag
* less clear that our backend is certain to have performed a memory
* fencing operation since the other backend set proc->databaseId. So
* for now, we test it after acquiring the LWLock just to be safe.
+ *
+ * Also skip groups without any registered fast-path locks.
*/
- if (proc->databaseId != locktag->locktag_field1)
+ if (proc->databaseId != locktag->locktag_field1 ||
+ proc->fpLockBits[group] == 0)

I like this. Moving the group computation out of the loop also allows
the computed group to be used for checking existence of lock. That's
double benefit. This test is similar to the test in
GetLockStatusData(), so we already have a precedence.

>
>
> > And perhaps we should start clearing databaseid on backend exit.
>
> Maybe, but I'm not sure if we really need this..

There's a comment
* proc->databaseId is set at backend startup time and never changes
* thereafter, so it might be safe to perform this test before
* acquiring &proc->fpInfoLock.

that seems to assume that databaseid is never cleared, so maybe it's
not safe to do this.

The performance gain from this patch might be tiny and may not be
visible. Still, have you tried to measure the performance improvement?

-- 
Best Wishes,
Ashutosh Bapat




Re: Statistics import and export: difference in statistics of materialized view dumped

2025-03-11 Thread Tom Lane
Ashutosh Bapat  writes:
> After fixing the statistics difference in dumps of tables with
> indexes, I now see difference in statistics of materialized view dump
> in the test I am developing at [1] (see the latest patches there).

Are you doing the restore in parallel by any chance?  I had a todo
item to revisit the dependencies that pg_dump is creating for stats
items, because they looked wrong to me, ie inadequate to guarantee
correct restore order.

regards, tom lane




Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2025-03-11 Thread Michael Paquier
On Tue, Mar 11, 2025 at 08:48:33PM +1300, David Rowley wrote:
> On Tue, 11 Mar 2025 at 19:50, Michael Paquier  wrote:
>> This issue exists since the query jumbling exists in pgss, so it goes
>> really down the road.  I've spent a bit more than a few minutes on
>> that.
> 
> I didn't mean to cause offence here. I just wanted to make it clear
> that I don't think fixing these types of issues by swapping the order
> of the fields is going to be a sustainable practice, and it would be
> good to come up with a solution that eliminates the chances of this
> class of bug from ever appearing again.  Even if we were to trawl the
> entire Query struct and everything that could ever be attached to it
> and we deem that today it's fine and no more such bugs exist, the
> person adding some new SQL feature in the future that adds new data to
> store in the Query probably isn't going to give query jumbling much
> thought, especially now that the code generation is automatic.

FWIW, I've mentioned the use of the offset in a Node upthread, in the
next to last last paragraph of this email:
https://www.postgresql.org/message-id/z84mhjm5rtwtl...@paquier.xyz

One thing I did not notice yesterday was that it is possible to rely
on _jumbleNode() to pass an offset from the parent.  So it looks like
we had roughly the same idea independently, but I was not able to look
more closely at that yesterday.

>> But after sleeping on it I think that these two points are kind of
>> moot: having only the offset passed down to a single _jumbleNode()
>> with the offset compiled at its beginning should be sufficient.  Using
>> "seed" as a term is perhaps a bit confusing, because this is an offset
>> in the upper node, but perhaps that's OK as-is.  It's just more
>> entropy addition.  If somebody has a better idea for this term, feel
>> free.
> 
> Can you share your thoughts on Ivan's NULL jumble idea?

This is an incomplete solution, I am afraid.  The origin of the
problem comes from the fact that a Node (here, Query) stores two times
successively equivalent Nodes that are used for separate data, with
NULL being the difference between both.  The problem is not limited to
NULL, we could as well, for example, finish with a Node that has a
custom jumbling function and the same issue depending on how it is
used in a parent Node.  Adding the offset from the parent in the
jumbling is a much stronger insurance policy that the proposed
solution specific to NULL, because each offset is unique in a single
Node.

>> _jumbleList() is an interesting one.  If we want an equivalent of the
>> offset, this could use the item number in the list which would be a
>> rather close equivalent to the offset used elsewhere.  For the int,
>> oid and xid lists, we should do an extra JUMBLE_FIELD_SINGLE() for
>> each member, apply the offset at the beginning of _jumbleList(), once,
>> I guess.
> 
> Is this affected by the same class of bug that we're aiming to fix
> here?  (This class being not always jumbling all fields because of
> NULLs or some other value, resulting in the next jumble field applying
> the same bytes to the buffer as the previous field would have, had it
> not been ignored)

Yep, that could be possible.  I am not sure if it can be reached
currently with the set of parse nodes, but it in theory possible could
be possible with a list of Nodes, at least.
--
Michael


signature.asc
Description: PGP signature


Re: Printing window function OVER clauses in EXPLAIN

2025-03-11 Thread Tom Lane
David Rowley  writes:
> The only minor points I noted down while reviewing were 1)
> name_active_windows()'s newname variable could be halved in size and,
> 2) explain.sql's new test could name the window "w1" instead of "w" to
> exercise the name selection code a bit better. Both are minor points,
> but I thought I'd mention them anyway.

Thanks, pushed with those points addressed.

regards, tom lane




Re: Statistics import and export: difference in statistics of materialized view dumped

2025-03-11 Thread Jeff Davis
On Tue, 2025-03-11 at 10:17 -0400, Tom Lane wrote:
> Ashutosh Bapat  writes:
> > After fixing the statistics difference in dumps of tables with
> > indexes, I now see difference in statistics of materialized view
> > dump
> > in the test I am developing at [1] (see the latest patches there).
> 
> Are you doing the restore in parallel by any chance?  I had a todo
> item to revisit the dependencies that pg_dump is creating for stats
> items, because they looked wrong to me, ie inadequate to guarantee
> correct restore order.

It's creating a dependency on the relation and a boundary dependency on
the postDataBound (unless it's an index, or an MV that got pushed to
SECTION_POST_DATA).

I suspect what we need here is a dependency on the MV *data*, because
that's doing a heap swap, which resets the stats. Looking into it.

Regards,
Jeff Davis





Re: Optimizing FastPathTransferRelationLocks()

2025-03-11 Thread Fujii Masao



On 2025/03/11 20:55, Ashutosh Bapat wrote:

Hi Fujii-san,

It seems that this was forgotten somehow.

The patch still applies.

Examining c4d5cb71d229095a39fda1121a75ee40e6069a2a, it seems that this patch
could have been part of that commit as well. But may be it wasn't so apparent
that time. I think it's a good improvement.


Thanks for reviewing the patch!



On Tue, Nov 19, 2024 at 10:14 PM Fujii Masao
 wrote:


The latest version of the patch is attached.


@@ -2773,6 +2773,10 @@ FastPathTransferRelationLocks(LockMethod
lockMethodTable, const LOCKTAG *locktag
LWLock *partitionLock = LockHashPartitionLock(hashcode);
Oid relid = locktag->locktag_field2;
uint32 i;
+ uint32 group;
+
+ /* fast-path group the lock belongs to */
+ group = FAST_PATH_REL_GROUP(relid);

We could just combine variable declaration and initialization; similar
to partitionLock.


I’ve updated the patch as suggested. Updated patch is attached.



The performance gain from this patch might be tiny and may not be
visible. Still, have you tried to measure the performance improvement?


I haven’t measured the actual performance gain since the patch optimizes
the logic in a clear and logical way. While we might see some improvement
in artificial scenarios — like with a large max_connections and
all backends slots having their databaseIds set, I’m not sure
how meaningful that would be.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From e2d0cca4f9032aa5f5fbab024d23562633e2707f Mon Sep 17 00:00:00 2001
From: Fujii Masao 
Date: Tue, 11 Mar 2025 22:31:12 +0900
Subject: [PATCH v3] Optimize iteration over PGPROC for fast-path lock
 searches.

This commit improves efficiency in FastPathTransferRelationLocks()
and GetLockConflicts(), which iterate over PGPROCs to search for
fast-path locks.

Previously, these functions recalculated the fast-path group during
every loop iteration, even though it remained constant. This update
optimizes the process by calculating the group once and reusing it
throughout the loop.

The functions also now skip empty fast-path groups, avoiding
unnecessary scans of their slots. Additionally, groups belonging to
inactive backends (with pid=0) are always empty, so checking
the group is sufficient to bypass these backends, further enhancing
performance.

Author: Fujii Masao 
Reviewed-by: Heikki Linnakangas 
Reviewed-by: Ashutosh Bapat 
Discussion: 
https://postgr.es/m/07d5fd6a-71f1-4ce8-8602-4cc6883f4...@oss.nttdata.com
---
 src/backend/storage/lmgr/lock.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index ccfe6b69bf5..cb95fd74fcf 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -2774,6 +2774,9 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, 
const LOCKTAG *locktag
Oid relid = locktag->locktag_field2;
uint32  i;
 
+   /* fast-path group the lock belongs to */
+   uint32  group = FAST_PATH_REL_GROUP(relid);
+
/*
 * Every PGPROC that can potentially hold a fast-path lock is present in
 * ProcGlobal->allProcs.  Prepared transactions are not, but any
@@ -2783,8 +2786,7 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, 
const LOCKTAG *locktag
for (i = 0; i < ProcGlobal->allProcCount; i++)
{
PGPROC *proc = &ProcGlobal->allProcs[i];
-   uint32  j,
-   group;
+   uint32  j;
 
LWLockAcquire(&proc->fpInfoLock, LW_EXCLUSIVE);
 
@@ -2802,16 +2804,16 @@ FastPathTransferRelationLocks(LockMethod 
lockMethodTable, const LOCKTAG *locktag
 * less clear that our backend is certain to have performed a 
memory
 * fencing operation since the other backend set 
proc->databaseId.  So
 * for now, we test it after acquiring the LWLock just to be 
safe.
+*
+* Also skip groups without any registered fast-path locks.
 */
-   if (proc->databaseId != locktag->locktag_field1)
+   if (proc->databaseId != locktag->locktag_field1 ||
+   proc->fpLockBits[group] == 0)
{
LWLockRelease(&proc->fpInfoLock);
continue;
}
 
-   /* fast-path group the lock belongs to */
-   group = FAST_PATH_REL_GROUP(relid);
-
for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
{
uint32  lockmode;
@@ -3027,6 +3029,9 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE 
lockmode, int *countp)
Oid relid = locktag->locktag_field2;
VirtualTransactionId vxid;
 
+   

Re: change on_exit_nicely_list array to the dynamic array to increase slots at run time for pg_restore

2025-03-11 Thread Dilip Kumar
On Mon, Mar 10, 2025 at 2:24 PM Mahendra Singh Thalor
 wrote:
>
> Hi,
> (refer file src/bin/pg_dump/pg_backup_utils.c)
>
> While doing some code changes with pg_dumpall and pg_rsetore[1], we noticed 
> that on_exit_nicely_list array has only fixed slots (MAX_ON_EXIT_NICELY=20) 
> but in some cases, we need more slots for pg_restore.
> Ex: restore more than 20 databases by single pg_restore command.
>
> We are working on a patch[1] which dumps all the databases in non-text mode 
> by pg_dumpall and then we are restoring those dumps by pg_restore. So we need 
> more slots due to multiple databases.
>
> Apart from the attached patch solution, we thought of some more solutions.
> Solution 1: reset array index with each database restore, but this might 
> break some other things.
> Solution 2: for each database, we can keep the index of the 
> on_exit_nicely_list array and after restoring a particular database we can 
> reset the index to old value but this looks like a hack.
>
> Here, I am proposing a patch which will dynamically enlarge the 
> on_exit_nicely_list array by doubling the previous size.
>
> Thoughts?

This reply might belong in another thread, but since you raised the
issue here, instead of registering multiple functions with
exit_nicely, why not modify the argument of the exit function
(archive_close_connection()) to handle multiple arguments? In short,
you could change ShutdownInformation[1] so that, instead of holding a
single "Archive," it holds a list, allowing you to append different
Archive handles to the list.

[1]
typedef struct ShutdownInformation
{
   ParallelState *pstate;
   Archive *AHX;  -> change this to List of Archive*
} ShutdownInformation;


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2025-03-11 Thread David Rowley
On Tue, 11 Mar 2025 at 19:50, Michael Paquier  wrote:
>
> On Tue, Mar 11, 2025 at 12:45:35AM +1300, David Rowley wrote:
> > It seems to me that if this fixes the issue, that the next similar one
> > is already lurking in the shadows waiting to jump out on us.
>
> For how many years will be have to wait for this threat hiddent in the
> shadows?  :)
>
> This issue exists since the query jumbling exists in pgss, so it goes
> really down the road.  I've spent a bit more than a few minutes on
> that.

I didn't mean to cause offence here. I just wanted to make it clear
that I don't think fixing these types of issues by swapping the order
of the fields is going to be a sustainable practice, and it would be
good to come up with a solution that eliminates the chances of this
class of bug from ever appearing again.  Even if we were to trawl the
entire Query struct and everything that could ever be attached to it
and we deem that today it's fine and no more such bugs exist, the
person adding some new SQL feature in the future that adds new data to
store in the Query probably isn't going to give query jumbling much
thought, especially now that the code generation is automatic.

> > Couldn't we adjust all this code so that we pass a seed to
> > AppendJumble() that's the offsetof(Struct, field) that we're jumbling
> > and incorporate that seed into the hash?  I don't think we could
> > possibly change the offset in a minor version, so I don't think
> > there's a risk that could cause jumble value instability. Possibly
> > different CPU architectures would come up with different offsets
> > through having different struct alignment requirements, but we don't
> > make any guarantees about the jumble values matching across different
> > CPU architectures anyway.
>
> Yeah, something like that has potential to get rid of such problems
> forever, particularly thanks to the fact that we automate the
> generation of this code now so it is mostly cost-free.  This has a
> cost for the custom jumbling functions where one needs some
> imagination, but with models being around while we keep the number of
> custom functions low, that should be neither complicated nor costly,
> at least in my experience.

I hadn't really processed this thread fully when I responded yesterday
and my response was mostly aimed at the latest patch on the thread at
the time, which I had looked at quickly and wanted to put a stop to
changing field orders as a fix for this.  Since then, I see that Ivan
has already submitted a patch that accounts for NULL nodes and adds a
byte to the jumble buffer to account for NULLs. This seems quite clean
and simple. However, Sami seems to have concerns about the overhead of
doing this. Is that warranted at all? Potentially, there could be a
decent number of NULL fields. It'll probably be much cheaper than the
offsetof idea I came up with.

> When I was thinking about the addition of the offset to the jumbling
> yesterday, I was wondering about two things:
> - if there are some node structures where it could not work.
> - if we'd need to pass down some information of the upper node when
> doing the jumbling of a node attached to it, which would make the code
> generation much more annoying.
>
> But after sleeping on it I think that these two points are kind of
> moot: having only the offset passed down to a single _jumbleNode()
> with the offset compiled at its beginning should be sufficient.  Using
> "seed" as a term is perhaps a bit confusing, because this is an offset
> in the upper node, but perhaps that's OK as-is.  It's just more
> entropy addition.  If somebody has a better idea for this term, feel
> free.

Can you share your thoughts on Ivan's NULL jumble idea?

> _jumbleList() is an interesting one.  If we want an equivalent of the
> offset, this could use the item number in the list which would be a
> rather close equivalent to the offset used elsewhere.  For the int,
> oid and xid lists, we should do an extra JUMBLE_FIELD_SINGLE() for
> each member, apply the offset at the beginning of _jumbleList(), once,
> I guess.

Is this affected by the same class of bug that we're aiming to fix
here?  (This class being not always jumbling all fields because of
NULLs or some other value, resulting in the next jumble field applying
the same bytes to the buffer as the previous field would have, had it
not been ignored)

> _jumbleVariableSetStmt() should be OK with the offset of "arg" in
> VariableSetStmt.  The addition of hash_combine64() to count for the
> offset should be OK.
>
> With that in mind, the cases with DISTINCT/ORDER BY and OFFSET/LIMIT
> seem to work fine, without causing noise for the other cases tracked
> in the regression tests of PGSS.
>
> What do you think?

I mostly now think the class of bug can be fixed by ensuring we never
ignore any jumble field for any reason and always put at least 1 byte
onto the buffer with some sort of entropy. I'm keen to understand if
I'm missing some case that you've thought ab

Re: Accessing an invalid pointer in BufferManagerRelation structure

2025-03-11 Thread Daniil Davydov
Hi,
I am posting the new v2 patch, which is now rebased on the `master` branch.
Waiting for your feedback :)

--
Best regards,
Daniil Davydov
From 2e43a1411ebcb37b2c0c1d6bac758e48799d2c4e Mon Sep 17 00:00:00 2001
From: Daniil Davidov 
Date: Tue, 11 Mar 2025 17:11:16 +0700
Subject: [PATCH v2] Add macros for safety access to smgr

---
 src/backend/storage/buffer/bufmgr.c   | 55 ---
 src/backend/storage/buffer/localbuf.c | 10 +++--
 src/include/storage/bufmgr.h  | 14 +++
 3 files changed, 45 insertions(+), 34 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 7915ed624c1..c55228f298a 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -887,11 +887,8 @@ ExtendBufferedRelBy(BufferManagerRelation bmr,
 	Assert(bmr.smgr == NULL || bmr.relpersistence != 0);
 	Assert(extend_by > 0);
 
-	if (bmr.smgr == NULL)
-	{
-		bmr.smgr = RelationGetSmgr(bmr.rel);
+	if (bmr.relpersistence == 0)
 		bmr.relpersistence = bmr.rel->rd_rel->relpersistence;
-	}
 
 	return ExtendBufferedRelCommon(bmr, fork, strategy, flags,
    extend_by, InvalidBlockNumber,
@@ -923,11 +920,8 @@ ExtendBufferedRelTo(BufferManagerRelation bmr,
 	Assert(bmr.smgr == NULL || bmr.relpersistence != 0);
 	Assert(extend_to != InvalidBlockNumber && extend_to > 0);
 
-	if (bmr.smgr == NULL)
-	{
-		bmr.smgr = RelationGetSmgr(bmr.rel);
+	if (bmr.relpersistence == 0)
 		bmr.relpersistence = bmr.rel->rd_rel->relpersistence;
-	}
 
 	/*
 	 * If desired, create the file if it doesn't exist.  If
@@ -935,15 +929,15 @@ ExtendBufferedRelTo(BufferManagerRelation bmr,
 	 * an smgrexists call.
 	 */
 	if ((flags & EB_CREATE_FORK_IF_NEEDED) &&
-		(bmr.smgr->smgr_cached_nblocks[fork] == 0 ||
-		 bmr.smgr->smgr_cached_nblocks[fork] == InvalidBlockNumber) &&
-		!smgrexists(bmr.smgr, fork))
+		(BMR_GET_SMGR(bmr)->smgr_cached_nblocks[fork] == 0 ||
+		 BMR_GET_SMGR(bmr)->smgr_cached_nblocks[fork] == InvalidBlockNumber) &&
+		!smgrexists(BMR_GET_SMGR(bmr), fork))
 	{
 		LockRelationForExtension(bmr.rel, ExclusiveLock);
 
 		/* recheck, fork might have been created concurrently */
-		if (!smgrexists(bmr.smgr, fork))
-			smgrcreate(bmr.smgr, fork, flags & EB_PERFORMING_RECOVERY);
+		if (!smgrexists(BMR_GET_SMGR(bmr), fork))
+			smgrcreate(BMR_GET_SMGR(bmr), fork, flags & EB_PERFORMING_RECOVERY);
 
 		UnlockRelationForExtension(bmr.rel, ExclusiveLock);
 	}
@@ -953,13 +947,13 @@ ExtendBufferedRelTo(BufferManagerRelation bmr,
 	 * kernel.
 	 */
 	if (flags & EB_CLEAR_SIZE_CACHE)
-		bmr.smgr->smgr_cached_nblocks[fork] = InvalidBlockNumber;
+		BMR_GET_SMGR(bmr)->smgr_cached_nblocks[fork] = InvalidBlockNumber;
 
 	/*
 	 * Estimate how many pages we'll need to extend by. This avoids acquiring
 	 * unnecessarily many victim buffers.
 	 */
-	current_size = smgrnblocks(bmr.smgr, fork);
+	current_size = smgrnblocks(BMR_GET_SMGR(bmr), fork);
 
 	/*
 	 * Since no-one else can be looking at the page contents yet, there is no
@@ -1003,7 +997,7 @@ ExtendBufferedRelTo(BufferManagerRelation bmr,
 	if (buffer == InvalidBuffer)
 	{
 		Assert(extended_by == 0);
-		buffer = ReadBuffer_common(bmr.rel, bmr.smgr, bmr.relpersistence,
+		buffer = ReadBuffer_common(bmr.rel, BMR_GET_SMGR(bmr), bmr.relpersistence,
    fork, extend_to - 1, mode, strategy);
 	}
 
@@ -2153,10 +2147,10 @@ ExtendBufferedRelCommon(BufferManagerRelation bmr,
 	BlockNumber first_block;
 
 	TRACE_POSTGRESQL_BUFFER_EXTEND_START(fork,
-		 bmr.smgr->smgr_rlocator.locator.spcOid,
-		 bmr.smgr->smgr_rlocator.locator.dbOid,
-		 bmr.smgr->smgr_rlocator.locator.relNumber,
-		 bmr.smgr->smgr_rlocator.backend,
+		 BMR_GET_SMGR(bmr)->smgr_rlocator.locator.spcOid,
+		 BMR_GET_SMGR(bmr)->smgr_rlocator.locator.dbOid,
+		 BMR_GET_SMGR(bmr)->smgr_rlocator.locator.relNumber,
+		 BMR_GET_SMGR(bmr)->smgr_rlocator.backend,
 		 extend_by);
 
 	if (bmr.relpersistence == RELPERSISTENCE_TEMP)
@@ -2170,10 +2164,10 @@ ExtendBufferedRelCommon(BufferManagerRelation bmr,
 	*extended_by = extend_by;
 
 	TRACE_POSTGRESQL_BUFFER_EXTEND_DONE(fork,
-		bmr.smgr->smgr_rlocator.locator.spcOid,
-		bmr.smgr->smgr_rlocator.locator.dbOid,
-		bmr.smgr->smgr_rlocator.locator.relNumber,
-		bmr.smgr->smgr_rlocator.backend,
+		BMR_GET_SMGR(bmr)->smgr_rlocator.locator.spcOid,
+		BMR_GET_SMGR(bmr)->smgr_rlocator.locator.dbOid,
+		BMR_GET_SMGR(bmr)->smgr_rlocator.locator.relNumber,
+		BMR_GET_SMGR(bmr)->smgr_rlocator.backend,
 		*extended_by,
 		first_block);
 
@@ -2239,9 +2233,9 @@ ExtendBufferedRelShared(BufferManagerRelation bmr,
 	 * kernel.
 	 */
 	if (flags & EB_CLEAR_SIZE_CACHE)
-		bmr.smgr->smgr_cached_nblocks[fork] = InvalidBlockNumber;
+		BMR_GET_SMGR(bmr)->smgr_cached_nblocks[fork] = InvalidBlockNumber;
 
-	first_block = smgrnblocks(bmr.smgr, fork);
+	first_block = smgrnblocks(BMR_G

Re: pg_upgrade: Support for upgrading to checksums enabled

2025-03-11 Thread Peter Eisentraut

On 21.02.25 00:41, Robert Treat wrote:

On Tue, Aug 27, 2024 at 5:57 PM Nathan Bossart  wrote:


On Mon, Aug 26, 2024 at 08:23:44AM +0200, Peter Eisentraut wrote:

The purpose of this patch is to allow using pg_upgrade between clusters that
have different checksum settings.  When upgrading between instances with
different checksum settings, the --copy (default) mode automatically sets
(or unsets) the checksum on the fly.

This would be particularly useful if we switched to enabling checksums by
default, as [0] proposes, but it's also useful without that.


Given enabling checksums can be rather expensive, I think it makes sense to
add a way to do it during pg_upgrade versus asking folks to run
pg_checksums separately.  I'd anticipate arguments against enabling
checksums automatically, but as you noted, we can move it to a separate
option (e.g., --copy --enable-checksums).  Disabling checksums with
pg_checksums is fast because it just updates pg_control, so I don't see any
need for --disable-checkums in pg_upgrade.



The repercussions of either enabling or disabling checksums on
accident is quite high (not for pg_upgrade, but for $futureDBA), so
ISTM an explicit flag for BOTH is the right way to go. In that
scenario, pg_upgrade would check to make sure the clusters match and
then make the appropriate suggestion. In the case someone did
something like --enable-checksums and --link, again, we'd toss an
error that --copy mode is required to --enable-checksums.


Here is an updated patch that works more along those lines.  It adds a 
pg_upgrade option --update-checksums, which activates the code to 
rewrite the checksums.  You must specify this option if the source and 
target clusters have different checksum settings.


Note that this also works to hypothetically upgrade between future 
different checksum versions (hence "--update-*", not "--enable-*"). 
Also, as the patch is currently written, it is also required to specify 
this option to downgrade from checksums to no-checksums.  (It will then 
write a zero into the checksum place, as it would look if you had never 
used checksums.)  Also, you can optionally specify this option even if 
the checksum settings are the same, then it will recalculate the 
checksums.  Probably not all of this is useful, but perhaps a subset of 
it.  Thoughts?


Also, I still don't know what to do about the Windows code path in 
copyFile().  We could just not support this feature on Windows?  Or 
maybe the notionally correct thing to do would be to move that code into 
copyFileByRange().  But then we'd need a different default on Windows 
and it would require more documentation.  I don't know what to do here 
and I don't have enough context to make a suggestion.  But if we don't 
answer this, I don't think we can move ahead with this feature.
From f73db6e9bd69d0fd4fbd034a691429a22b33b472 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 10 Mar 2025 18:44:30 +0100
Subject: [PATCH v2] pg_upgrade: Support for upgrading to checksums enabled

When upgrading between instances with different checksum settings, the
--copy (default) mode automatically sets (or unsets) the checksum on
the fly.

TODO: What to do about the Windows code path?
---
 src/bin/pg_upgrade/controldata.c   | 18 ++---
 src/bin/pg_upgrade/file.c  | 31 +-
 src/bin/pg_upgrade/option.c|  9 +
 src/bin/pg_upgrade/pg_upgrade.h|  4 +++-
 src/bin/pg_upgrade/relfilenumber.c |  2 +-
 5 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index bd49ea867bf..7ac85ffb4ee 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -735,18 +735,12 @@ check_control_data(ControlData *oldctrl,
 * check_for_isn_and_int8_passing_mismatch().
 */
 
-   /*
-* We might eventually allow upgrades from checksum to no-checksum
-* clusters.
-*/
-   if (oldctrl->data_checksum_version == 0 &&
-   newctrl->data_checksum_version != 0)
-   pg_fatal("old cluster does not use data checksums but the new 
one does");
-   else if (oldctrl->data_checksum_version != 0 &&
-newctrl->data_checksum_version == 0)
-   pg_fatal("old cluster uses data checksums but the new one does 
not");
-   else if (oldctrl->data_checksum_version != 
newctrl->data_checksum_version)
-   pg_fatal("old and new cluster pg_controldata checksum versions 
do not match");
+   /* FIXME: what about upgrade from checksum to non-checksum? */
+   if (oldctrl->data_checksum_version != newctrl->data_checksum_version)
+   {
+   if (!user_opts.update_checksums)
+   pg_fatal("when upgrading between clusters with 
different data checksum settings, option %s must be used", 
"--update-checksums");
+   }
 }
 
 
diff --git a/src/bin/pg_upgrade/file

Re: Test to dump and restore objects left behind by regression

2025-03-11 Thread Ashutosh Bapat
On Tue, Feb 25, 2025 at 11:59 AM Ashutosh Bapat
 wrote:
>
> On Tue, Feb 11, 2025 at 5:53 PM Ashutosh Bapat
>  wrote:
> >
> > Hi Michael,
> >
> >
> > On Sun, Feb 9, 2025 at 1:25 PM Michael Paquier  wrote:
> > >
> > > On Fri, Feb 07, 2025 at 07:11:25AM +0900, Michael Paquier wrote:
> > > > Okay, thanks for the feedback.  We have been relying on diff -u for
> > > > the parts of the tests touched by 0001 for some time now, so if there
> > > > are no objections I would like to apply 0001 in a couple of days.
> > >
> > > This part has been applied as 169208092f5c.
> >
> > Thanks. PFA rebased patches.
>
> PFA rebased patches.
>
> After rebasing I found another bug and reported it at [1].

This bug has been fixed. But now that it's fixed, it's easy to see
another bug related to materialized view statistics. I have reported
it at [2]. That's the fourth bug identified by this test.

>
> For the time being I have added --no-statistics to the pg_dump command
> when taking a dump for comparison.
>

I have not taken out this option because of materialized view bug.

> [1] 
> https://www.postgresql.org/message-id/CAExHW5vf9D+8-a5_BEX3y=2y_xY9hiCxV1=c+fnxdvfprwv...@mail.gmail.com

[2] 
https://www.postgresql.org/message-id/caexhw5s47kmubpbbrjzsm-zfe0tj2o3gbagb7yaye8rq-v2...@mail.gmail.com


--
Best Wishes,
Ashutosh Bapat
From a140d50245249894a49c39a908163e9bac2fe4bb Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Tue, 11 Feb 2025 16:31:10 +0530
Subject: [PATCH 2/3] Filter COPY statements with differing column order

---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl  | 10 +---
 src/test/perl/PostgreSQL/Test/AdjustDump.pm | 59 +++--
 2 files changed, 45 insertions(+), 24 deletions(-)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index c6b99125d9e..e6d8ac9a757 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -581,9 +581,6 @@ sub test_regression_dump_restore
 		my $dump_file = "$tempdir/regression_dump.$format";
 		my $restored_db = 'regression_' . $format;
 
-		# Even though we compare only schema from the original and the restored
-		# database (See get_dump_for_comparison() for details.), we dump and
-		# restore data as well to catch any errors while doing so.
 		$src_node->command_ok(
 			[
 'pg_dump', "-F$format", '--no-sync',
@@ -645,13 +642,10 @@ sub get_dump_for_comparison
 	my $dump_adjusted = "${dumpfile}_adjusted";
 
 
-	# The order of columns in COPY statements dumped from the original database
-	# and that from the restored database differs. These differences are hard to
-	# adjust. Hence we compare only schema dumps for now.
 	$node->command_ok(
 		[
-			'pg_dump', '-s', '--no-sync', '-d',
-			$node->connstr($db), '-f', $dumpfile
+			'pg_dump', '--no-sync', '-d', $node->connstr($db), '-f',
+			$dumpfile
 		],
 		'dump for comparison succeeded');
 
diff --git a/src/test/perl/PostgreSQL/Test/AdjustDump.pm b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
index e3e152b88fa..e00a00d1b2c 100644
--- a/src/test/perl/PostgreSQL/Test/AdjustDump.pm
+++ b/src/test/perl/PostgreSQL/Test/AdjustDump.pm
@@ -44,22 +44,36 @@ our @EXPORT = qw(
 
 If we take dump of the regression database left behind after running regression
 tests, restore the dump, and take dump of the restored regression database, the
-outputs of both the dumps differ. Some regression tests purposefully create
-some child tables in such a way that their column orders differ from column
-orders of their respective parents. In the restored database, however, their
-column orders are same as that of their respective parents. Thus the column
+outputs of both the dumps differ in the following cases. This routine adjusts
+the given dump so that dump outputs from the original and restored database,
+respectively, match.
+
+Case 1: Some regression tests purposefully create child tables in such a way
+that the order of their inherited columns differ from column orders of their
+respective parents. In the restored database, however, the order of their
+inherited columns are same as that of their respective parents. Thus the column
 orders of these child tables in the original database and those in the restored
 database differ, causing difference in the dump outputs. See MergeAttributes()
-and dumpTableSchema() for details.
-
-This routine rearranges the column declarations in the relevant
-C statements in the dump file from original database
-to match those from the restored database. We could instead adjust the
-statements in the dump from the restored database to match those from original
-database or adjust both to a canonical order. But we have chosen to adjust the
-statements in the dump from original database for no particular reason.
-
-Additionally it adjusts blank and new lines to avoid noise.
+and dumpTableSchema() for details.  This routine rearranges the column
+declarations in the relevant C statements in the dump
+file from original databa

initdb.exe Fails on Windows When Password File Path Contains Non-ASCII Characters

2025-03-11 Thread Manika Singhal
Hi,

The postgresql windows installer packaged by EDB, creates the postgresql
instance on the fly.
It creates a temporary file to store the password and then deletes it once
the initdb is run successfully.
But a user reported the initdb fails. On debugging it was found that, it
occurs because the username in the
%TEMP% contains the non-ascii characters as shown in the example below
*initdb.exe --pgdata="C:\Program Files\PostgreSQL\17\data"
--username=postgres --encoding=UTF8
--pwfile="C:\Users\玛妮卡\AppData\Local\Temp\passwdXXX"
--auth=scram-sha-256* *initdb:
error: could not open file "C:\Users\???\AppData\Local\Temp\psswrd" for
reading: No such file or directory* ...  It seems that initdb.exe does
not correctly handle paths containing Unicode characters, replacing them
with ???
in the error message. I want to ask if this is the expected behaviour of
initdb.exe? When the user temp location is replaced with system temp
(c:\windows\temp) the error goes away.
Any suggestions if the installer should use this path to create the
password file?


-
Thanks & Regards
Manika Singhal
EDB India


Re: [PATCH] Add reverse(bytea)

2025-03-11 Thread Aleksander Alekseev
Nathan, Daniel,

> > We already have array_reverse() and text_reverse(), so I see no strong
> > reason against also having a bytea_reverse().
>
> +1

I also considered adding reverse(bit) however to my knowledge there is
no practical usage for it.

-- 
Best regards,
Aleksander Alekseev




a pool for parallel worker

2025-03-11 Thread Andy Fan



Hi,

Currently when a query needs some parallel workers, postmaster spawns
some backend for this query and when the work is done, the backend
exit.  there are some wastage here, e.g. syscache, relcache, smgr cache,
vfd cache and fork/exit syscall itself.

I am thinking if we should preallocate (or create lazily) some backends
as a pool for parallel worker. The benefits includes:

(1) Make the startup cost of a parallel worker lower in fact.
(2) Make the core most suitable for the cases where executor need to a
new worker to run a piece of plan more. I think this is needed in some
data redistribution related executor in a distributed database.

I guess the both cases can share some well designed code, like costing or
transfer the data between worker and leader.

The boring thing for the pool is it is [dbid + userId] based, which
I mean if the dbid or userId is different with the connection in pool,
they can't be reused.  To reduce the effect of UserId, I think if we can
start the pool with a superuser and then switch the user information
with 'SET ROLE xxx'. and the pool can be created lazily.

Any comments on this idea?

-- 
Best Regards
Andy Fan





Re: Test mail for pgsql-hackers

2025-03-11 Thread BharatDB
>
> Hi ,



> I’ve been exploring logical replication and noticed that if the column
> datatypes don’t match between the publisher and subscriber, PostgreSQL
> doesn’t give a warning. This can cause unexpected behavior, and I thought
> it might be helpful to alert users when this happens.



> ### **What This Patch Does:**



> - Adds a warning when a column's datatype in the subscriber doesn’t match
> the publisher.

- Helps users catch issues early instead of running into silent errors
> later.



> Why I Think It’s Useful:- Avoids confusion when replication doesn’t work
> as expected. - Makes debugging easier by pointing out potential problems.
> I’d love to get feedback on whether this is a good idea and if I’ve
> approached it correctly. Since I’m still learning, any suggestions for
> improvement would be really helpful. I’ve attached the patch—please let me
> know what you think!



> Thanks, Blessy
From 418a9e40f2b98dfef99eb4c79e87490e192969c2 Mon Sep 17 00:00:00 2001
From: Blessy456b 
Date: Tue, 11 Mar 2025 17:20:43 +0530
Subject: [PATCH] Added logical replication warning

---
 src/backend/replication/logical/tablesync.c   | 28 +++
 .../aborted-keyrevoke.spec|  0
 .../alter-table-1.spec|  0
 .../alter-table-2.spec|  0
 .../alter-table-3.spec|  0
 .../alter-table-4.spec|  0
 .../{specs => specs_backup}/async-notify.spec |  0
 .../classroom-scheduling.spec |  0
 .../cluster-conflict-partition.spec   |  0
 .../cluster-conflict.spec |  0
 .../create-trigger.spec   |  0
 .../deadlock-hard.spec|  0
 .../deadlock-parallel.spec|  0
 .../deadlock-simple.spec  |  0
 .../deadlock-soft-2.spec  |  0
 .../deadlock-soft.spec|  0
 .../delete-abort-savept-2.spec|  0
 .../delete-abort-savept.spec  |  0
 .../detach-partition-concurrently-1.spec  |  0
 .../detach-partition-concurrently-2.spec  |  0
 .../detach-partition-concurrently-3.spec  |  0
 .../detach-partition-concurrently-4.spec  |  0
 .../drop-index-concurrently-1.spec|  0
 .../eval-plan-qual-trigger.spec   |  0
 .../eval-plan-qual.spec   |  0
 .../fk-contention.spec|  0
 .../{specs => specs_backup}/fk-deadlock.spec  |  0
 .../{specs => specs_backup}/fk-deadlock2.spec |  0
 .../fk-partitioned-1.spec |  0
 .../fk-partitioned-2.spec |  0
 .../{specs => specs_backup}/fk-snapshot.spec  |  0
 .../freeze-the-dead.spec  |  0
 .../{specs => specs_backup}/horizons.spec |  0
 .../index-only-scan.spec  |  0
 .../{specs => specs_backup}/inherit-temp.spec |  0
 .../inplace-inval.spec|  0
 .../insert-conflict-do-nothing-2.spec |  0
 .../insert-conflict-do-nothing.spec   |  0
 .../insert-conflict-do-update-2.spec  |  0
 .../insert-conflict-do-update-3.spec  |  0
 .../insert-conflict-do-update.spec|  0
 .../insert-conflict-specconflict.spec |  0
 .../intra-grant-inplace-db.spec   |  0
 .../intra-grant-inplace.spec  |  0
 .../lock-committed-keyupdate.spec |  0
 .../lock-committed-update.spec|  0
 .../{specs => specs_backup}/lock-nowait.spec  |  0
 .../lock-update-delete.spec   |  0
 .../lock-update-traversal.spec|  0
 .../matview-write-skew.spec   |  0
 .../{specs => specs_backup}/merge-delete.spec |  0
 .../merge-insert-update.spec  |  0
 .../{specs => specs_backup}/merge-join.spec   |  0
 .../merge-match-recheck.spec  |  0
 .../{specs => specs_backup}/merge-update.spec |  0
 .../{specs => specs_backup}/multiple-cic.spec |  0
 .../multiple-row-versions.spec|  0
 .../multixact-no-deadlock.spec|  0
 .../multixact-no-forget.spec  |  0
 .../{specs => specs_backup}/nowait-2.spec |  0
 .../{specs => specs_backup}/nowait-3.spec |  0
 .../{specs => specs_backup}/nowait-4.spec |  0
 .../{specs => specs_backup}/nowait-5.spec |  0
 .../{specs => specs_backup}/nowait.spec   |  0
 .../partial-index.spec|  0
 .../partition-concurrent-attach.spec  |  0
 .../partition-drop-index-locking.spec |  0
 .../partition-key-update-1.spec   |  0
 .../partition-key-update-2.spec   |  0
 .../partition-key-update-3.spec   |  0
 .../partition-key-update-4.spec   |  0
 .../plpgsql-toast.spec|  0
 .../predicate-gin.spec|  0
 .../predicate-gist.spec   |  0
 .../predicate-hash.spec 

Re: Parallel heap vacuum

2025-03-11 Thread Amit Kapila
On Tue, Mar 11, 2025 at 5:00 AM Masahiko Sawada  wrote:
>
> On Sun, Mar 9, 2025 at 11:28 PM Amit Kapila  wrote:
> >
> >
> > Does phase 3 also use parallelism? If so, can we try to divide the
> > ring buffers among workers or at least try vacuum with an increased
> > number of ring buffers. This would be good to do for both the phases,
> > if they both use parallelism.
>
> No, only phase 1 was parallelized in this test. In parallel vacuum,
> since it uses (ring_buffer_size * parallel_degree) memory, more pages
> are loaded during phase 1, increasing cache hits during phase 3.
>

Shouldn't we ideally try with a vacuum without parallelism with
ring_buffer_size * parallel_degree to make the comparison better?
Also, what could be the reason for the variation in data of phase-I?
Do you restart the system after each run to ensure there is nothing in
the memory? If not, then shouldn't we try at least a few runs by
restarting the system before each run to ensure there is nothing
leftover in memory?

-- 
With Regards,
Amit Kapila.




Re: speedup COPY TO for partitioned table.

2025-03-11 Thread jian he
On Fri, Mar 7, 2025 at 6:41 PM jian he  wrote:
>
> hi.
>
> rebased and polished patch attached, test case added.
hi.

I realized I need to change the doc/src/sgml/ref/copy.sgml
Notes section.

current doc note section:
COPY TO can be used only with plain tables, not views, and does not
copy rows from child tables or child partitions.
For example, COPY table TO copies the same rows as SELECT * FROM ONLY table.
The syntax COPY (SELECT * FROM table) TO ... can be used to dump all
of the rows in an inheritance hierarchy, partitioned table, or view.

after my change:

COPY TO can be used only with plain tables, not views, and does not
copy rows from child tables,
however COPY TO can be used with partitioned tables.
For example, in a table inheritance hierarchy, COPY table TO copies
the same rows as SELECT * FROM ONLY table.
The syntax COPY (SELECT * FROM table) TO ... can be used to dump all
of the rows in an inheritance hierarchy, or view.

From f7376da47f51e385c5496b0cf7eb52e5340a39b9 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Tue, 11 Mar 2025 20:51:30 +0800
Subject: [PATCH v3 1/1] support "COPY partitioned_table TO"

drop table if exists pp;
CREATE TABLE pp (id  INT, val int ) PARTITION BY RANGE (id);
create table pp_1 (val int, id int);
create table pp_2 (val int, id int);
ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5);
ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10);
insert into pp select g, 10 + g from generate_series(1,9) g;
copy pp to stdout(header);

the above case is much slower (around 25% some case) than
``COPY (select * from pp) to stdout(header);``
but this is still a new feature, since master does not
support ``COPY (partitioned_table)``.

discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=i...@mail.gmail.com
---
 doc/src/sgml/ref/copy.sgml  |  8 +--
 src/backend/commands/copyto.c   | 80 ++---
 src/test/regress/expected/copy2.out | 16 ++
 src/test/regress/sql/copy2.sql  | 11 
 4 files changed, 105 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..f86e0b7ec35 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -521,15 +521,15 @@ COPY count
 

 COPY TO can be used only with plain
-tables, not views, and does not copy rows from child tables
-or child partitions.  For example, COPY COPY TO can be used with partitioned tables.
+For example, in a table inheritance hierarchy, COPY table TO copies
 the same rows as SELECT * FROM ONLY table.
 The syntax COPY (SELECT * FROM table) TO ... can be used to
-dump all of the rows in an inheritance hierarchy, partitioned table,
-or view.
+dump all of the rows in an inheritance hierarchy, or view.

 

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 84a3f3879a8..966b6741530 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,6 +19,8 @@
 #include 
 
 #include "access/tableam.h"
+#include "access/table.h"
+#include "catalog/pg_inherits.h"
 #include "commands/copyapi.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
@@ -82,6 +84,7 @@ typedef struct CopyToStateData
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
+	List	   *partitions;		/* oid list of partition oid for copy to */
 	copy_data_dest_cb data_dest_cb; /* function for writing data */
 
 	CopyFormatOptions opts;
@@ -643,6 +646,8 @@ BeginCopyTo(ParseState *pstate,
 		PROGRESS_COPY_COMMAND_TO,
 		0
 	};
+	List	   *children = NIL;
+	List	   *scan_oids = NIL;
 
 	if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION)
 	{
@@ -670,11 +675,19 @@ BeginCopyTo(ParseState *pstate,
 	 errmsg("cannot copy from sequence \"%s\"",
 			RelationGetRelationName(rel;
 		else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-			ereport(ERROR,
-	(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-	 errmsg("cannot copy from partitioned table \"%s\"",
-			RelationGetRelationName(rel)),
-	 errhint("Try the COPY (SELECT ...) TO variant.")));
+		{
+			children = find_all_inheritors(RelationGetRelid(rel),
+		   AccessShareLock,
+		   NULL);
+			foreach_oid(childreloid, children)
+			{
+char		relkind = get_rel_relkind(childreloid);
+if (RELKIND_HAS_PARTITIONS(relkind))
+	continue;
+
+scan_oids = lappend_oid(scan_oids, childreloid);
+			}
+		}
 		else
 			ereport(ERROR,
 	(errcode(ERRCODE_WRONG_OBJECT_TYPE),
@@ -710,6 +723,7 @@ BeginCopyTo(ParseState *pstate,
 		cstate->rel = rel;
 
 		tupDesc = RelationGetDescr(cstate->rel);
+		cstate->partitions = list_copy(scan_oids);
 	}
 	else
 	{
@@ -1066,7 +1080,61 @@ DoCopyTo(CopyToState cstate)
 
 	cstate->routine->CopyToStart(cstate, tupDesc);
 
-	

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

2025-03-11 Thread Yuki Seino


Instead, wouldn't it be simpler to update LockAcquireExtended() to
take a new argument, like logLockFailure, to control whether
a lock failure should be logged directly? I’ve adjusted the patch
accordingly and attached it. Please let me know what you think!

Regards,

Thank you!

It's very simple and nice.
It seems like it can also handle other lock failure cases by extending 
logLockFailure.


I agree with this propose.


Regards,





Re: per backend WAL statistics

2025-03-11 Thread Bertrand Drouvot
Hi,

On Tue, Mar 11, 2025 at 09:06:27AM +0900, Michael Paquier wrote:
> On Mon, Mar 10, 2025 at 11:52:26AM +, Bertrand Drouvot wrote:
> > Hi,
> > 
> > On Mon, Mar 10, 2025 at 04:46:53PM +0900, Michael Paquier wrote:
> > > On Sat, Mar 08, 2025 at 07:53:04AM +, Bertrand Drouvot wrote:
> > > > That would not be an issue should we only access the struct
> > > > fields in the code, but that's not the case as we're making use of
> > > > pg_memory_is_all_zeros() on it.
> > > 
> > > It does not hurt to keep it as it is, honestly.
> > 
> > I believe that's worse than before actually. Before padding bytes would 
> > "probably"
> > be set to zeros while now it's certainly not always the case. I think that
> > we already removed this (see comments === 4 in [1]).
> 
> We still apply the memset(), and the initialization is actually the
> same.

Yeah currently there is no issues: there is no padding in the struct and 
memset()
is done.

That said, memset() is done only if pgstat_tracks_backend_bktype() returns
true (i.e if pgstat_create_backend() is called).

That means that if, in the future, the struct is modified in such a way that
padding is added, then we could end up with non zeros padding bytes for the
backends for which pgstat_tracks_backend_bktype() returns false.

I think that could lead to racy conditions (even if, for the moment, I think 
that
all is fine as the other pgstat_tracks_backend_bktype() calls should protect 
us).

> And I guess that we're OK here,

Yup.

> so applied.

Thanks!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: SQL Property Graph Queries (SQL/PGQ)

2025-03-11 Thread Amit Langote
Hi Ashutosh, Junwang,

On Tue, Mar 11, 2025 at 4:22 PM Ashutosh Bapat
 wrote:
> On Wed, Feb 26, 2025 at 8:04 PM Junwang Zhao  wrote:
> > I added a trivial fix(v12-0014) that called table_open/table_close in
> > rewriteGraphTable, it now passed the regression test and cirrus ci test,
> > but I'm not sure it's the correct fix.
> >
> > I hope Ashutosh can chime in and take a look at this problem.
>
> 2. Following Assertion is failing, the assertion was added recently.
> TRAP: failed Assert("IsParallelWorker() ||
> CheckRelationOidLockedByMe(rte->relid, AccessShareLock, true)"), File:
> "../../coderoot/pg/src/backend/executor/execMain.c", Line: 695, PID:
> 303994
> postgres: ashutosh regression [local]
> SELECT(ExceptionalCondition+0xbe)[0x5c838c5d7114]
> postgres: ashutosh regression [local]
> SELECT(ExecCheckPermissions+0xf8)[0x5c838c11fb9c]
> postgres: ashutosh regression [local] SELECT(+0x38223f)[0x5c838c12023f]
> postgres: ashutosh regression [local]
> SELECT(standard_ExecutorStart+0x2f8)[0x5c838c11f223]
> postgres: ashutosh regression [local] 
> SELECT(ExecutorStart+0x69)[0x5c838c11ef22]
> postgres: ashutosh regression [local] 
> SELECT(PortalStart+0x368)[0x5c838c3d991a]
> postgres: ashutosh regression [local] SELECT(+0x63458e)[0x5c838c3d258e]
> postgres: ashutosh regression [local] 
> SELECT(PostgresMain+0x9eb)[0x5c838c3d7cf0]
> postgres: ashutosh regression [local] SELECT(+0x630178)[0x5c838c3ce178]
> postgres: ashutosh regression [local]
> SELECT(postmaster_child_launch+0x137)[0x5c838c2da677]
> postgres: ashutosh regression [local] SELECT(+0x5431b4)[0x5c838c2e11b4]
> postgres: ashutosh regression [local] SELECT(+0x54076a)[0x5c838c2de76a]
> postgres: ashutosh regression [local]
> SELECT(PostmasterMain+0x15f8)[0x5c838c2de04d]
> postgres: ashutosh regression [local] SELECT(main+0x3a1)[0x5c838c1b12be]
> /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7eda9ea29d90]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7eda9ea29e40]
> postgres: ashutosh regression [local] SELECT(_start+0x25)[0x5c838be7c025]
> 2025-03-11 11:40:01.696 IST postmaster[303081] LOG: client backend
> (PID 303994) was terminated by signal 6: Aborted
> 2025-03-11 11:40:01.696 IST postmaster[303081] DETAIL: Failed process
> was running: select * from atpgv1;
> I tried to investigate the Assertion, it's failing for property graph
> RTE which is turned into subquery RTE. It has the right lock mode set,
> but I haven't been able to figure out where the lock is supposed to be
> taken or where it's released. If we just prepare the failing query and
> execute the prepared statement, it does not trip the assertion. Will
> investigate it more.

I reproduced the crash using the example Junwang gave.

The problem seems to be that RTEs of rtekind RTE_GRAPH_TABLE are not
handled in AcquireRewriteLocks().  You'll need to add a case for
RTE_GRAPH_TABLE similar to RTE_RELATION in the following switch of
that function:

/*
 * First, process RTEs of the current query level.
 */
rt_index = 0;
foreach(l, parsetree->rtable)
{
RangeTblEntry *rte = (RangeTblEntry *) lfirst(l);
Relationrel;
LOCKMODElockmode;
List   *newaliasvars;
Index   curinputvarno;
RangeTblEntry *curinputrte;
ListCell   *ll;

++rt_index;
switch (rte->rtekind)
{
case RTE_RELATION:

which could be as simple as the following (fixes the crash!) or
something that's specific to RTE_GRAPH_TABLE:

diff --git a/src/backend/rewrite/rewriteHandler.c
b/src/backend/rewrite/rewriteHandler.c
index c51dd3d2ee4..8fa6edb90eb 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -174,6 +174,7 @@ AcquireRewriteLocks(Query *parsetree,
 switch (rte->rtekind)
 {
 case RTE_RELATION:
+case RTE_GRAPH_TABLE:

-- 
Thanks, Amit Langote




Re: Selectively invalidate caches in pgoutput module

2025-03-11 Thread Amit Kapila
On Mon, Mar 10, 2025 at 6:42 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> > Currently, only the leaf partition is invalidated when the published table 
> > is
> > partitioned. However, I think pgoutput could cache both the partitioned 
> > table
> > and the leaf partition table as relsync entries.
> >
> > For INSERT/UPDATE/DELETE on a partitioned table, only the leaf partition's
> > relsync entry is used in pgoutput, but the TRUNCATE references the parent
> > table's relsync entry.
>
> I think your analysis is correct. PSA new version. Below part contains my 
> analysis.
>

I have made several cosmetic changes atop 0001 patch in the attached.
Additionally, fixed an issue in AddRelsyncInvalidationMessage() to
consider invalidating all the RelSyncCache entries. Kindly include
these in the next version if you find the changes are okay.

-- 
With Regards,
Amit Kapila.
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 35df9be5e54..a0c49519977 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -497,7 +497,9 @@ AddRelcacheInvalidationMessage(InvalidationMsgsGroup *group,
 /*
  * Add a relsync inval entry
  *
- * We put these into the relcache subgroup for simplicity.
+ * We put these into the relcache subgroup for simplicity. This message is the
+ * same as AddRelcacheInvalidationMessage() except that it is for
+ * RelationSyncCache maintained by decoding plugin pgoutput.
  */
 static void
 AddRelsyncInvalidationMessage(InvalidationMsgsGroup *group,
@@ -505,11 +507,11 @@ AddRelsyncInvalidationMessage(InvalidationMsgsGroup 
*group,
 {
SharedInvalidationMessage msg;
 
-   /* Don't add a duplicate item */
-   /* We assume dbId need not be checked because it will never change */
+   /* Don't add a duplicate item. */
ProcessMessageSubGroup(group, RelCacheMsgs,
   if (msg->rc.id == 
SHAREDINVALRELSYNC_ID &&
-  msg->rc.relId == 
relId)
+  (msg->rc.relId == 
relId ||
+   msg->rc.relId 
== InvalidOid))
   return);
 
/* OK, add the item */
@@ -650,9 +652,9 @@ RegisterRelcacheInvalidation(InvalidationInfo *info, Oid 
dbId, Oid relId)
 }
 
 /*
- * RegisterRelcacheInvalidation
+ * RegisterRelsyncInvalidation
  *
- * As above, but register a relsync invalidation event.
+ * As above, but register a relsynccache invalidation event.
  */
 static void
 RegisterRelsyncInvalidation(InvalidationInfo *info, Oid dbId, Oid relId)
@@ -660,7 +662,6 @@ RegisterRelsyncInvalidation(InvalidationInfo *info, Oid 
dbId, Oid relId)
AddRelsyncInvalidationMessage(&info->CurrentCmdInvalidMsgs, dbId, 
relId);
 }
 
-
 /*
  * RegisterSnapshotInvalidation
  *
@@ -1684,9 +1685,8 @@ CacheInvalidateRelcacheByRelid(Oid relid)
ReleaseSysCache(tup);
 }
 
-
 /*
- * RelationCacheInvalidate
+ * CacheInvalidateRelSync
  * Register invalidation of the cache in logical decoding output 
plugin
  * for a database.
  *
@@ -1701,7 +1701,6 @@ CacheInvalidateRelSync(Oid relid)
MyDatabaseId, 
relid);
 }
 
-
 /*
  * CacheInvalidateRelSyncAll
  * Register invalidation of the whole cache in logical decoding 
output
@@ -1713,7 +1712,6 @@ CacheInvalidateRelSyncAll(void)
CacheInvalidateRelSync(InvalidOid);
 }
 
-
 /*
  * CacheInvalidateSmgr
  * Register invalidation of smgr references to a physical relation.
@@ -1858,7 +1856,7 @@ CacheRegisterRelcacheCallback(RelcacheCallbackFunction 
func,
 /*
  * CacheRegisterRelSyncCallback
  * Register the specified function to be called for all future
- * decoding-cache invalidation events.
+ * decoding plugin's relation cache invalidation events.
  *
  * This function is intended to be call from the logical decoding output
  * plugins.
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index 90a5af4ed8f..f168b5fbf8c 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -27,7 +27,7 @@
  * * invalidate an smgr cache entry for a specific physical relation
  * * invalidate the mapped-relation mapping for a given database
  * * invalidate any saved snapshot that might be used to scan a given 
relation
- * * invalidate a specific entry for specific output plugin
+ * * invalidate a RelationSyncCache entry for a specific relation
  * More types could be added if needed.  The message type is identified by
  * the first "int8" field of the message struct.  Zero or positive means a
  * specific-catcache inval message (and also serves as the catcache ID field).
@@ -47,12 +47,12 @@
  * catcache inval messages must be generated for each of its caches, sinc

Re: Improve CRC32C performance on SSE4.2

2025-03-11 Thread John Naylor
On Tue, Mar 11, 2025 at 4:47 AM Nathan Bossart  wrote:
>
> On Mon, Mar 10, 2025 at 03:48:31PM +0700, John Naylor wrote:
> > On Tue, Mar 4, 2025 at 2:11 AM Nathan Bossart  
> > wrote:
> >> Overall, I wish we could avoid splitting things into separate files and
> >> adding more header file gymnastics, but maybe there isn't much better we
> >> can do without overhauling the CPU feature detection code.
> >
> > I wanted to make an attempt to make this aspect nicer. v13-0002
> > incorporates deliberately compact and simple loops for inlined
> > constant input into the dispatch function, and leaves the existing
> > code alone. This avoids code churn and saves vertical space in the
> > copied code. It needs a bit more commentary, but I hope this is a more
> > digestible prerequisite to the CLMUL algorithm -- as a reminder, it'll
> > be simpler if we can always assume non-constant input can go through a
> > function pointer.
>
> That is certainly more readable.  FWIW I think it would be entirely
> reasonable to replace the pg_crc32c_sse42.c implementation with a call to
> this new pg_comp_crc32c_dispatch() function.  Of course, you'd have to
> split things up like:
> [snip]

That could work as well. I'm thinking if we do PMULL on Arm, it might
be advantageous to keep the inline path and function paths with
distinct coding -- because of the pickier alignment on that platform,
it might not be worth pre-aligning the pointer to 8 bytes for a
20-byte constant input.

I've gone ahead and added the generated AVX-512 algorithm in v14-0005,
and added the build support and some of the runtime support from Paul
and Raghuveer's earlier patches in 0006-7. It passes CI, but I'll have
to arrange access to other hardware to verify the runtime behavior. I
think the Meson support is most of the way there, but it looks like
configure.ac got whacked around cosmetically quite a bit. If we feel
it's time to refactor things there, we'll want to split that out. In
any case, for autoconf I've pretty much kept the earlier work for now.

--
John Naylor
Amazon Web Services
From adfe02a6d169be865937b567bc1b2b2ffde60631 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Tue, 11 Mar 2025 11:20:20 +0700
Subject: [PATCH v14 4/8] Always do runtime check for x86 to simplify PCLMUL

---
 configure |  2 +-
 configure.ac  |  2 +-
 src/include/port/pg_crc32c.h  | 20 ++--
 src/port/meson.build  |  1 +
 src/port/pg_crc32c_sse42.c|  2 +-
 src/port/pg_crc32c_sse42_choose.c |  2 ++
 6 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/configure b/configure
index 93fddd69981..91c0ffc8272 100755
--- a/configure
+++ b/configure
@@ -17684,7 +17684,7 @@ if test x"$USE_SSE42_CRC32C" = x"1"; then
 
 $as_echo "#define USE_SSE42_CRC32C 1" >>confdefs.h
 
-  PG_CRC32C_OBJS="pg_crc32c_sse42.o"
+  PG_CRC32C_OBJS="pg_crc32c_sse42.o pg_crc32c_sse42_choose.o"
   { $as_echo "$as_me:${as_lineno-$LINENO}: result: SSE 4.2" >&5
 $as_echo "SSE 4.2" >&6; }
 else
diff --git a/configure.ac b/configure.ac
index b6d02f5ecc7..a85bdbd4ff6 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2151,7 +2151,7 @@ fi
 AC_MSG_CHECKING([which CRC-32C implementation to use])
 if test x"$USE_SSE42_CRC32C" = x"1"; then
   AC_DEFINE(USE_SSE42_CRC32C, 1, [Define to 1 use Intel SSE 4.2 CRC instructions.])
-  PG_CRC32C_OBJS="pg_crc32c_sse42.o"
+  PG_CRC32C_OBJS="pg_crc32c_sse42.o pg_crc32c_sse42_choose.o"
   AC_MSG_RESULT(SSE 4.2)
 else
   if test x"$USE_SSE42_CRC32C_WITH_RUNTIME_CHECK" = x"1"; then
diff --git a/src/include/port/pg_crc32c.h b/src/include/port/pg_crc32c.h
index 229f4f6a65a..28253b48018 100644
--- a/src/include/port/pg_crc32c.h
+++ b/src/include/port/pg_crc32c.h
@@ -47,7 +47,10 @@ typedef uint32 pg_crc32c;
 #define EQ_CRC32C(c1, c2) ((c1) == (c2))
 
 #if defined(USE_SSE42_CRC32C)
-/* Use Intel SSE4.2 instructions. */
+/*
+ * Use either Intel SSE 4.2 or PCLMUL instructions. We don't need a runtime check
+ * for SSE 4.2, so we can inline those in some cases.
+ */
 
 #include 
 
@@ -55,7 +58,11 @@ typedef uint32 pg_crc32c;
 	((crc) = pg_comp_crc32c_dispatch((crc), (data), (len)))
 #define FIN_CRC32C(crc) ((crc) ^= 0x)
 
+extern pg_crc32c (*pg_comp_crc32c) (pg_crc32c crc, const void *data, size_t len);
 extern pg_crc32c pg_comp_crc32c_sse42(pg_crc32c crc, const void *data, size_t len);
+#ifdef USE_PCLMUL_WITH_RUNTIME_CHECK
+extern pg_crc32c pg_comp_crc32c_pclmul(pg_crc32c crc, const void *data, size_t len);
+#endif
 
 pg_attribute_no_sanitize_alignment()
 static inline
@@ -67,9 +74,9 @@ pg_comp_crc32c_dispatch(pg_crc32c crc, const void *data, size_t len)
 		const unsigned char *p = data;
 
 		/*
-		 * For constant inputs, inline the computation to avoid the
-		 * indirect function call. This also allows the compiler to unroll
-		 * loops for small inputs.
+		 * For constant inputs, inline the computation to avoid the indirect
+		 * function call. This also allows the compiler to unroll loops fo

Re: Conflict detection for multiple_unique_conflicts in logical replication

2025-03-11 Thread Nisha Moond
On Tue, Mar 11, 2025 at 11:10 AM Dilip Kumar  wrote:
>
> On Thu, Feb 20, 2025 at 5:01 PM Nisha Moond  wrote:
> >
> > Hi Hackers,
> > (CCing people involved in related discussions)
> >
> > I am starting this thread to propose a new conflict detection type
> > "multiple_unique_conflicts" that identifies when an incoming row
> > during logical replication violates more than one UNIQUE constraint.
> > If multiple constraints (such as the primary key/replica identity
> > along with additional unique indexes) are violated for the same
> > incoming tuple, the apply_worker will trigger a dedicated
> > multiple_unique_conflicts conflict.
> >
> > 
> > ISSUE:
> > 
> > Currently, when the apply_worker encounters an 'insert_exists'
> > conflict (where the incoming row conflicts with an existing row based
> > on the replica identity), it logs the conflict and errors out
> > immediately. If the user tries to resolve this manually by deleting
> > the conflicting row, the apply worker may fail again due to another
> > unique constraint violation on a different column. This forces users
> > to fix conflicts one by one, making resolution tedious and
> > inefficient.
> >
> > Example:
> > Schema:
> > CREATE TABLE tab1 (col1 integer PRIMARY KEY, col2 integer UNIQUE, col3
> > integer UNIQUE);
> >   - col1 is Replica Identity.
> >
> > Data:
> >  - on pub: (1, 11, 111)
> >  - on sub: 3 additional local Inserts: (2, 22, 222); (3, 33, 333); (4, 44, 
> > 444)
> >  - Concurrently on pub, new insert: (2, 33, 444)
> >
> > When the new incoming tuple (2, 33, 444) is applied on the subscriber:
> >  - The apply worker first detects an 'insert_exists' conflict on col1
> > (primary key) and errors out.
> >  - If the user deletes the conflicting row (key col1=2) : (2, 22,
> > 222), the apply worker fails again because col2=33 violates another
> > unique constraint for tuple (3, 33, 333);
> >  - If the user deletes col2=33, the apply worker fails yet again due
> > to tuple (4, 44, 444) because col3=444 is also unique.
> > Conflicts on both col2 and col3 (which are independent of each other)
> > are an example of a 'Multiple Unique Constraints' violation.
> >
> > In such cases, users are forced to resolve conflicts one by one,
> > making the process slow and error-prone.
> >
> > ---
> > SOLUTION:
> > ---
> > During an INSERT or UPDATE conflict check, instead of stopping at the
> > first encountered conflict, the apply_worker will now check all unique
> > indexes before reporting a conflict. If multiple unique key violations
> > are found, it will report a 'multiple_unique_conflicts' conflict,
> > listing all conflicting tuples in the logs. If only a single key
> > conflict is detected, the existing 'insert_exists' conflict will be
> > raised as it is now.
>
> I think it makes sense to report all the unique key conflicts for a
> single row at once, rather than stopping after the first one. However,
> I don't understand the need to create a new conflict type. Can't we
> report multiple conflicts for the row using the existing conflict
> type, if different columns in the incoming row conflict with different
> existing rows?
>

The goal of introducing a new conflict type is to handle multiple-key
conflicts separately. It will not only allow users to apply different
resolution methods for single-key vs multi-key conflicts but will also
help them identify such cases directly from stats instead of filtering
through the LOG file.
For example, with future resolution options, 'last_update_wins' will
work well for single-key conflicts (insert_exists/update_exists) since
only one local tuple will be replaced if the remote wins. However, for
multi-key conflicts, this resolution may not be feasible. Even if
supported, resolving the conflict could result in deleting multiple
tuples, which users may want to control separately, like opting to
skip or error out in case of multi-key conflicts.

For reference, databases like EDB-PGD(BDR)[1] support a separate
conflict type for multi-key cases. OTOH, Oracle[2] and DB2[3] do not
have a separate conflict_type and always ERROR out during conflict
resolution if multiple unique constraint violations happen. However,
none of these databases use the single-key conflict_type and
resolution for the multi-key conflict cases.

We’d like to know your preference—should we always ERROR out like
Oracle/DB2, or keep it as a separate conflict_type for better control
during resolution?

[1] 
https://www.enterprisedb.com/docs/pgd/4/bdr/conflicts/#insert-operations-that-violate-multiple-unique-constraints
[2] 
https://docs.oracle.com/goldengate/c1230/gg-winux/GWUAD/configuring-conflict-detection-and-resolution.htm#GWUAD316
[3] 
https://www.ibm.com/docs/en/idr/11.4.0?topic=console-setting-conflict-detection-resolution

--
Thanks,
Nisha Moond




Re: Emitting JSON to file using COPY TO

2025-03-11 Thread jian he
On Sun, Mar 2, 2025 at 1:28 PM Junwang Zhao  wrote:
>
>
> I've refactored the patch to adapt the newly introduced CopyToRoutine struct,
> see 2e4127b6d2.
>
> v15-0001 is the merged one of v14-0001 and v14-0002
>
> There are some other ongoing *copy to/from* refactors[1] which we can benefit
> to make the code cleaner, especially the checks done in ProcessCopyOptions.
>
> [1]: 
> https://www.postgresql.org/message-id/20250301.115009.424844407736647598.kou%40clear-code.com
>
hi.

git apply --check $PATCHES/v15-0001-Introduce-json-format-for-COPY-TO.patch
error: patch failed: src/backend/commands/copyfrom.c:155
error: src/backend/commands/copyfrom.c: patch does not apply
error: patch failed: src/backend/commands/copyto.c:176
error: src/backend/commands/copyto.c: patch does not apply

seems to need rebase.
the attachment is the rebase, minor comments tweaks, and commit message tweaks.

another issue is this patch entry in commitfest [1] status is: Not processed,
which means no cfbots CI tests, seems not great.
not sure how to resolve this issue

[1] https://commitfest.postgresql.org/patch/4716/
From 24e1858722dbb25c4842d0ec2dee5b1047edcc23 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Tue, 11 Mar 2025 16:19:55 +0800
Subject: [PATCH v16 2/2] Add option force_array for COPY JSON FORMAT

force_array option can only be used in COPY TO with JSON format.
it make the output json output behave like json array type.

refactored by Junwang Zhao to adapt the newly introduced CopyToRoutine struct(2e4127b6d2).

Author: Joe Conway 
discussion: https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
discussion: https://postgr.es/m/6a04628d-0d53-41d9-9e35-5a8dc302c...@joeconway.com
---
 doc/src/sgml/ref/copy.sgml | 14 
 src/backend/commands/copy.c| 13 
 src/backend/commands/copyto.c  | 34 +-
 src/bin/psql/tab-complete.in.c |  2 +-
 src/include/commands/copy.h|  1 +
 src/test/regress/expected/copy.out | 23 
 src/test/regress/sql/copy.sql  |  8 +++
 7 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 9c519d8a9e2..7b3c913d4ee 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { table_name [ ( column_name [, ...] ) | * }
 FORCE_NOT_NULL { ( column_name [, ...] ) | * }
 FORCE_NULL { ( column_name [, ...] ) | * }
+FORCE_ARRAY [ boolean ]
 ON_ERROR error_action
 REJECT_LIMIT maxerror
 ENCODING 'encoding_name'
@@ -392,6 +393,19 @@ COPY { table_name [ ( 

 
+   
+FORCE_ARRAY
+
+ 
+  Force output of square brackets as array decorations at the beginning
+  and end of output, and commas between the rows. It is allowed only in
+  COPY TO, and only when using
+  JSON format. The default is
+  false.
+ 
+
+   
+

 ON_ERROR
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b6f74c798d0..7b4c64ea97e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -504,6 +504,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		on_error_specified = false;
 	bool		log_verbosity_specified = false;
 	bool		reject_limit_specified = false;
+	bool		force_array_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -658,6 +659,13 @@ ProcessCopyOptions(ParseState *pstate,
 defel->defname),
 		 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "force_array") == 0)
+		{
+			if (force_array_specified)
+errorConflictingDefElem(defel, pstate);
+			force_array_specified = true;
+			opts_out->force_array = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "on_error") == 0)
 		{
 			if (on_error_specified)
@@ -906,6 +914,11 @@ ProcessCopyOptions(ParseState *pstate,
 errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg("COPY json mode cannot be used with %s", "COPY FROM"));
 
+	if (opts_out->format != COPY_FORMAT_JSON && opts_out->force_array)
+		ereport(ERROR,
+errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("COPY %s can only used with JSON mode", "FORCE_ARRAY"));
+
 	if (opts_out->default_print)
 	{
 		if (!is_from)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c1d4cbeedea..393c0440ad7 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -84,6 +84,7 @@ typedef struct CopyToStateData
 	List	   *attnumlist;		/* integer list of attnums to copy */
 	char	   *filename;		/* filename, or NULL for STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
+	bool		json_row_delim_needed;	/* need delimiter to start next json array element */
 	copy_data_dest_cb data_dest_cb; /* function for writing data */
 
 	CopyFormatOptions opts;
@@ -128,6 +129,7 @@ static void CopyToTextLikeOneRow(C

Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

2025-03-11 Thread Naga Appani
On Mon, Mar 10, 2025 at 10:43 AM Naga Appani  wrote:

> Hi,
>
> I would like to propose exposing an internal PostgreSQL function called
> ReadMultiXactCounts() to allow for efficient monitoring of MultiXact
> member usage. This function provides an accurate, real-time view of
> MultiXact activity by directly retrieving the actual member count, rather
> than relying on storage-based calculations.
>
> *Current Challenges: *The existing approach we are currently using to
> estimate MultiXact member usage has several drawbacks:
>
>- *Filesystem scanning overhead: *These functions recursively scan the
>pg_multixact directory, iterating over potentially thousands or
>millions of files, and retrieving file sizes using stat() calls, which
>introduces significant I/O overhead.
>- *Potential performance bottleneck:* On systems with high transaction
>throughput generating large numbers of MultiXact members, the
>filesystem-based approach scales poorly due to the latency of stat() calls,
>especially on network-based filesystems like RDS/Aurora.
>- *Not a real-time or memory-efficient solution:* The current approach
>does not provide a direct, in-memory view of MultiXact activity.
>
> *Proposed Solution*The internal ReadMultiXactCounts() function,
> implemented in multixact.c, directly calculates the number of MultiXact
> members by reading live state from shared memory. This approach avoids the
> performance issues of the current filesystem-based estimation methods.
> 
> ...
>


My apologies for re-posting. This is my first time writing to the hackers
list, and I accidentally used HTML formatting. Below is the original
request in plain text:

**
I would like to propose exposing an internal PostgreSQL function called
ReadMultiXactCounts()[1] to allow for efficient monitoring of MultiXact
member usage. This function provides an accurate, real-time view of
MultiXact activity by directly retrieving the actual member count, rather
than relying on storage-based calculations.


Current Challenges

The existing approach we are currently using to estimate MultiXact member
usage has several drawbacks:
- Filesystem scanning overhead: These functions recursively scan the
pg_multixact directory, iterating over potentially thousands or millions of
files, and retrieving file sizes using stat() calls, which introduces
significant I/O overhead.
- Potential performance bottleneck: On systems with high transaction
throughput generating large numbers of MultiXact members, the
filesystem-based approach scales poorly due to the latency of stat() calls,
especially on network-based filesystems like RDS/Aurora.
- Not a real-time or memory-efficient solution: The current approach does
not provide a direct, in-memory view of MultiXact activity.

=
Proposal
=
The internal ReadMultiXactCounts() function, implemented in multixact.c,
directly calculates the number of MultiXact members by reading live state
from shared memory. This approach avoids the performance issues of the
current filesystem-based estimation methods.

By exposing ReadMultiXactCounts() for external use, we can provide
PostgreSQL users with an efficient way to monitor MultiXact member usage.
This could be particularly useful for integrating with tools like Amazon
RDS Performance Insights and Amazon CloudWatch to provide enhanced database
insights and proactive managed monitoring for users.

=
Performance comparison
=
The performance comparison between the current and proposed approaches
shows a significant improvement, with the proposed solution taking only a
fraction of a millisecond to retrieve the MultiXact member count, compared
to tens or hundreds of milliseconds for the current filesystem-based
approach.  And as more members are generated, the gap widens.

Following is the comparison of performance between calculating storage of
MultiXact members directory and retrieving the count of members.

Implementation   |  Used size  |
MultiXact members
+-+--
EC2 community (RDS version of pg_ls_multixactdir)   |   8642 MB  | 1.8
billion
Linux du command|   8642 MB  | 1.8
billion
Proposal (ReadMultiXactCounts)  |   8642 MB  | 1.8
billion


Sample runs

Using "du -h"

postgres=# \! time du -h /rdsdbdata/db/17.4/data/pg_multixact/members
13G /rdsdbdata/db/17.4/data/pg_multixact/members

Re: Statistics Import and Export: difference in statistics dumped

2025-03-11 Thread Ashutosh Bapat
On Tue, Mar 11, 2025 at 5:23 AM Jeff Davis  wrote:
>
> On Mon, 2025-03-10 at 17:53 -0400, Tom Lane wrote:
> > I wrote:
> > > I think what is happening is that the patch shut off CREATE
> > > INDEX's update of not only the table's stats but also the
> > > index's stats.  This seems unhelpful: the index's empty
> > > stats can never be what's wanted.
> >
> > I looked at this more closely and realized that it's a simple matter
> > of having made the tests in the wrong order.  The whole stanza
> > should only apply when dealing with the table, not the index.
> >
> > I verified that this change fixes the cross-version-upgrade
> > failure in local testing, and pushed it.
>
> Ah, thank you.
>
> Regards,
> Jeff Davis
>

Thanks. I verified that it has been fixed now. But there's something
wrong with materialized view statistics. I am starting a new thread
for the same.

-- 
Best Wishes,
Ashutosh Bapat




Re: bogus error message for ALTER TABLE ALTER CONSTRAINT

2025-03-11 Thread Álvaro Herrera
On 2025-Mar-11, jian he wrote:

> this look a little strange?
> if (cas_bits & (CAS_NOT_DEFERRABLE) && seen)
> seen->seen_deferrability = true;
> 
> it should be
> if ((cas_bits & CAS_NOT_DEFERRABLE) && seen)
> seen->seen_deferrability = true;

True.  And since you mentioned CAS_INITIALLY_IMMEDIATE, it should really
be

/* These are the default values; just report that we saw them */
if ((cas_bits & (CAS_NOT_DEFERRABLE | CAS_INITIALLY_IMMEDIATE)) && seen)
seen->seen_deferrability = true;


> typedef struct CAS_flags need add to typedefs.list

True.

> seems didn't cover "initially immediate" case for domain.
> for example:
> create domain d_int as int4;
> --- the following two cases should fail.
> alter domain d_int add constraint nn1 not null initially immediate;
> alter domain d_int add constraint cc check(value > 1) initially immediate;

Yeah, I thought at first you were right, but on further thought, do we
really want to do this?  I mean, INITIALLY IMMEDIATE is the default
timing for a constraint, so why should we complain if a user wants to
get a constraint that's declared that way?  I'm not sure that we should
do it.  Same with NOT DEFERRABLE.
[... looks at the standard doc ...]
And that's indeed what the SQL standard says:

 ::=
  CREATE DOMAIN  [ AS ] 
  [  ]
  [ ... ]
  [  ]

 ::=
  [  ]  [  ]

8) For every  specified:
   [...]
b) If  is not specified, then INITIALLY IMMEDIATE
   NOT DEFERRABLE is implicit.


So I think the fix here needs to distinguish those cases and avoid
throwing errors for them.

(Note also it does not say that INITIALLY DEFERRED or DEFERRABLE are
disallowed, which means that we're failing to fully implement the
standard-mandated behavior by prohibiting those.)


> create domain d_int as int4;
> alter domain d_int add not null no inherit not valid;
> ERROR:  not-null constraints on domains cannot be marked NOT VALID
> LINE 1: alter domain d_int add not null no inherit not valid;
> ^
> If we can report an error like
> "ERROR:  NOT NULL constraints on domains cannot be marked INHERIT / NOT 
> INHERIT"
> That would be great.
> just report the first constraint property that is not ok, but seems not 
> doable.

Yeah, I don't think this can be made to work.  Maybe we'd have to change
the way ConstraintAttributeSpec is parsed completely.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Once again, thank you and all of the developers for your hard work on
PostgreSQL.  This is by far the most pleasant management experience of
any database I've worked on." (Dan Harris)
http://archives.postgresql.org/pgsql-performance/2006-04/msg00247.php




Re: explain analyze rows=%.0f

2025-03-11 Thread Ilia Evdokimov

Hi,

In the stats_ext regression test, there is a function 
check_estimated_rows that returns actual rows as an integer. Currently, 
none of the test cases produce a non-zero fractional part in actual rows.


The question is: should this function be modified to return a fractional 
number instead of an integer?


Personally, I don’t see much value in doing so, because the purpose of 
the test is to compare the estimated row count with the actual number of 
rows. It is also unlikely that there will be test cases where loops > 1, 
and the presence of a fractional part does not change the essence of the 
test.


What do you think?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.





Re: bogus error message for ALTER TABLE ALTER CONSTRAINT

2025-03-11 Thread Amul Sul
On Tue, Mar 11, 2025 at 1:56 PM Álvaro Herrera  wrote:
>
> On 2025-Mar-11, Amul Sul wrote:
>
> > On Mon, Mar 10, 2025 at 11:29 PM Álvaro Herrera  
> > wrote:
> >
> > > I fleshed this out more fully and I think 0001 is good enough to commit.
> >
> > The approach looks good to me, but instead of adding a CAS_flags struct, 
> > could
> > we use macros like SEEN_DEFERRABILITY(bits), SEEN_ENFORCEABILITY(bits),
> > etc.? We can simply pass cas_bits to these macros, and to avoid the error
> > from processCASbits(), we can pass NULL for constrType.
>
> Ah yeah, I thought of this too at first, but didn't actually code it
> because I thought it'd be messier.  Trying to do it naively doesn't
> work, because it's not enough to test whether each bit is true or false
> -- what you need to know is whether an option was specified for each
> bit, in either direction.  So we'd need a separate bitmask, we can't
> pass the existing 'bits' mask.  And at that point, it's not any better
> to have a bitmask, and a stack-allocated struct of booleans is just
> easier to write.
>

I was thinking of something like the attached, which includes your
test cases from 0001. Perhaps the macro name could be improved.

Regards,
Amul


v2-0001-trial.patch
Description: Binary data


Re: Reducing the log spam

2025-03-11 Thread Jim Jones


On 11.03.25 10:46, Laurenz Albe wrote:
> Thanks for the thorough test!
>
>> There are a few issues though ...
>>
>> 1) Invalid codes aren't rejected. Is there any way to validate them?
>>
>> postgres=# SET log_suppress_errcodes TO '0foo1'; SHOW log_suppress_errcodes;
>> SET
>>  log_suppress_errcodes
>> ---
>>  0foo1
>> (1 row)
> That is intentional.  I only test that the SQLSTATE is 5 characters long
> and contains only ASCII letters and numbers.  I looked at the SQL standard,
> and while it defines the meaning of certain SQLSTATEs, it allows for
> custom defined states.
>
> Nothing bad can happen from setting an unused SQLSTATE; it just won't
> suppress anything.


I suspected that. Just for the records, v5 now shows also "invalid"
SQLSTATEs in uppercase:

postgres=# SET log_suppress_errcodes TO 'x1y2z'; SHOW log_suppress_errcodes;
SET
 log_suppress_errcodes
---
 X1Y2Z
(1 row)


>> 2) No duplication check (not critical IMO)
>>
>> postgres=# SET log_suppress_errcodes TO '3F000,3F000'; SHOW
>> log_suppress_errcodes;
>> SET
>>  log_suppress_errcodes
>> ---
>>  3F000,3F000
>> (1 row)
>>
>>
>> 3) errcodes are not trimmed (also not critical..  just doesn't look nice)
>>
>> postgres=# SET log_suppress_errcodes TO '   3F000, 42P01';
>> SHOW log_suppress_errcodes;
>> SET
>>     log_suppress_errcodes    
>> -
>>     3F000, 42P01
>> (1 row)
> These two are mostly cosmetic issues.
>
> I originally thought it best to leave the parameter the way the user
> entered it, but in the attached version I revised that by reassembling
> the parameter string from the parsed SQLSTATEs, so that spaces and
> duplicates will vanish and everything is converted to upper case.
>
> So these complaints should be addressed.


Duplicated entries and whitespaces are now removed

postgres=# SET log_suppress_errcodes TO '   3F000, 3F000, 42P01';
SHOW log_suppress_errcodes;
SET
 log_suppress_errcodes
---
 3F000,42P01
(1 row)


>
>> 4) SHOW log_suppress_errcodes displays an invalid value if we set it
>> twice to an empty string
>>
>> $ /usr/local/postgres-dev/bin/psql postgres
>> psql (18devel)
>> Type "help" for help.
>>
>> postgres=# SET log_suppress_errcodes TO ''; SHOW log_suppress_errcodes;
>> SET log_suppress_errcodes TO ''; SHOW log_suppress_errcodes;
>> SET
>>  log_suppress_errcodes
>> ---
>>  
>> (1 row)
>>
>> SET
>>  log_suppress_errcodes
>> ---
>>  wV
>> (1 row)
>>
>>
>> 5) The system crashes if we set log_suppress_errcodesto an empty string
>> and then set it back to comma separated values
> These two points were actually caused by a memory management bug that I
> had inadvertently introduced in the assign hook.  They should be fixed now.
>

empty log_suppress_errcode now behaves as expected

$ /usr/local/postgres-dev/bin/psql postgres
psql (18devel)
Type "help" for help.

postgres=# SET log_suppress_errcodes TO ''; SHOW log_suppress_errcodes;
SET log_suppress_errcodes TO ''; SHOW log_suppress_errcodes;
SET
 log_suppress_errcodes
---
 
(1 row)

SET
 log_suppress_errcodes
---
 
(1 row)


The system no longer crashes when setting the parameter to an empty
string and set it back to comma separated values

$ /usr/local/postgres-dev/bin/psql postgres
psql (18devel)
Type "help" for help.

postgres=# SET log_suppress_errcodes TO '3F000,42883'; SHOW
log_suppress_errcodes;
SET log_suppress_errcodes TO ''; SHOW log_suppress_errcodes;
SET log_suppress_errcodes TO '42P01,23505'; SHOW log_suppress_errcodes;
SET log_suppress_errcodes TO '3D000,42704'; SHOW log_suppress_errcodes;
SET
 log_suppress_errcodes
---
 3F000,42883
(1 row)

SET
 log_suppress_errcodes
---
 
(1 row)

SET
 log_suppress_errcodes
---
 42P01,23505
(1 row)

SET
 log_suppress_errcodes
---
 3D000,42704
(1 row)


The previous tests work as expected and all issues were addressed.

If the other reviewers have no objections, I'll mark this patch as
"Ready for Committer".

Best regards, Jim







RE: Selectively invalidate caches in pgoutput module

2025-03-11 Thread Hayato Kuroda (Fujitsu)
Dear Amit, Hou,

Thanks for giving comments!

> For patch 0002, I think the implementation could be improved. The
> current patch introduces a new function, RenamePublication, to replace the
> existing generic approach in ExecRenameStmt->AlterObjectRename_internal.
> However, this creates inconsistency because the original code uses
> AccessExclusiveLock for publication renaming, making concurrent renaming
> impossible. The current patch seems to overlook this aspect.

Oh, I missed the point, thanks.

> Additionally, introducing a new function results in code duplication, which 
> can
> be avoided. After further consideration, handling the publication rename
> separately seems unnecessary, given it requires only sending a few extra
> invalidation messages. Therefore, I suggest reusing the generic handling and
> simply extending AlterObjectRename_internal to include the additional
> invalidation messages.
> 
> I've attached a diff with the proposed changes for 0002.

Hmm, possible. previously I introduced new function to preserve existing codes 
as
much as possible. But this introduced some duplicy. Your approach which extends
the common function looks nice, apart from two points;

1. Relsync cache invalidations should be done after the catalog update, but
   proposed one did before that. I preferred to add new if-statementfor post 
catalog-update.
2. Also, common check for the name conflict can be reused.


Attached patch address comments by you and Amit [1]. What's new;

* Skip adding new relsync messages when message for InvalidOid has already 
exist.
  Per comment from Amit [1].
* Update some code comments. Some of them are pointed out by [1].
* Stop using RenamePublication() and extend the common function.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



v10-0001-Introduce-a-new-invalidation-message-to-invalida.patch
Description:  v10-0001-Introduce-a-new-invalidation-message-to-invalida.patch


v10-0002-Invalidate-Relcaches-while-ALTER-PUBLICATION-REN.patch
Description:  v10-0002-Invalidate-Relcaches-while-ALTER-PUBLICATION-REN.patch


Re: Add “FOR UPDATE NOWAIT” lock details to the log.

2025-03-11 Thread Fujii Masao



On 2025/03/11 16:50, Yuki Seino wrote:


Instead, wouldn't it be simpler to update LockAcquireExtended() to
take a new argument, like logLockFailure, to control whether
a lock failure should be logged directly? I’ve adjusted the patch
accordingly and attached it. Please let me know what you think!

Regards,

Thank you!

It's very simple and nice.
It seems like it can also handle other lock failure cases by extending 
logLockFailure.
> I agree with this propose.


Thanks for reviewing the patch!

I've made some minor cosmetic adjustments. The updated patch is attached.

Unless there are any objections, I'll proceed with committing it.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From b6ad757e99456d379c1a1ec90cac73cd034d8926 Mon Sep 17 00:00:00 2001
From: Fujii Masao 
Date: Tue, 11 Mar 2025 19:59:09 +0900
Subject: [PATCH v10] Add GUC option to log lock acquisition failures.

This commit introduces a new GUC, log_lock_failure, which controls whether
a detailed log message is produced when a lock acquisition fails. Currently,
it only supports logging lock failures caused by SELECT ... NOWAIT.

The log message includes information about all processes holding or
waiting for the lock that couldn't be acquired, helping users analyze and
diagnose the causes of lock failures.

This mechanism can be extended in the future to support for logging
lock failures from other commands, such as LOCK TABLE ... NOWAIT.

Author: Yuki Seino 
Co-authored-by: Fujii Masao 
Reviewed-by: Jelte Fennema-Nio 
Discussion: https://postgr.es/m/411280a186cc26ef7034e0f2dfe54...@oss.nttdata.com
---
 doc/src/sgml/config.sgml  |  20 +++
 src/backend/access/heap/heapam.c  |  33 +++--
 src/backend/access/heap/heapam_handler.c  |   4 +-
 src/backend/storage/lmgr/lmgr.c   |  33 +++--
 src/backend/storage/lmgr/lock.c   |  55 +++-
 src/backend/storage/lmgr/proc.c   | 129 +++---
 src/backend/utils/misc/guc_tables.c   |   9 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/storage/lmgr.h|   5 +-
 src/include/storage/lock.h|   4 +-
 src/include/storage/proc.h|   4 +
 11 files changed, 211 insertions(+), 86 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d2fa5f7d1a9..3e85bf76dd5 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7678,6 +7678,26 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
   
  
 
+ 
+  log_lock_failure (boolean)
+  
+   log_lock_failure configuration 
parameter
+  
+  
+  
+   
+Controls whether a detailed log message is produced
+when a lock acquisition fails.  This is useful for analyzing
+the causes of lock failures.  Currently, only lock failures
+due to SELECT NOWAIT is supported.
+The default is off.  Only superusers and
+users with the appropriate SET privilege
+can change this setting.
+   
+  
+ 
+
+
  
   log_recovery_conflict_waits 
(boolean)
   
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fa7935a0ed3..21575a8ffef 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -98,7 +98,8 @@ static void MultiXactIdWait(MultiXactId multi, 
MultiXactStatus status, uint16 in
Relation rel, 
ItemPointer ctid, XLTW_Oper oper,
int *remaining);
 static bool ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus 
status,
-  
uint16 infomask, Relation rel, int *remaining);
+  
uint16 infomask, Relation rel, int *remaining,
+  bool 
logLockFailure);
 static void index_delete_sort(TM_IndexDeleteOp *delstate);
 static int bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
 static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
@@ -162,8 +163,8 @@ static const struct
LockTuple((rel), (tup), tupleLockExtraInfo[mode].hwlock)
 #define UnlockTupleTuplock(rel, tup, mode) \
UnlockTuple((rel), (tup), tupleLockExtraInfo[mode].hwlock)
-#define ConditionalLockTupleTuplock(rel, tup, mode) \
-   ConditionalLockTuple((rel), (tup), tupleLockExtraInfo[mode].hwlock)
+#define ConditionalLockTupleTuplock(rel, tup, mode, log) \
+   ConditionalLockTuple((rel), (tup), tupleLockExtraInfo[mode].hwlock, 
(log))
 
 #ifdef USE_PREFETCH
 /*
@@ -4894,7 +4895,7 @@ l3:
case LockWaitSkip:
if 
(!ConditionalMultiXactIdWa

Re: Improve CRC32C performance on SSE4.2

2025-03-11 Thread John Naylor
On Tue, Mar 4, 2025 at 2:11 AM Nathan Bossart  wrote:
>
> Overall, I wish we could avoid splitting things into separate files and
> adding more header file gymnastics, but maybe there isn't much better we
> can do without overhauling the CPU feature detection code.

I wanted to make an attempt to make this aspect nicer. v13-0002
incorporates deliberately compact and simple loops for inlined
constant input into the dispatch function, and leaves the existing
code alone. This avoids code churn and saves vertical space in the
copied code. It needs a bit more commentary, but I hope this is a more
digestible prerequisite to the CLMUL algorithm -- as a reminder, it'll
be simpler if we can always assume non-constant input can go through a
function pointer.

I've re-attached the modified perf test from v12 just in case anyone
wants to play with it (v13-0003), but named so that the CF bot can't
find it, since it breaks the tests in the original perf test (It's not
for commit anyway).

Adding back AVX-512 should be fairly mechanical, since Raghuveer and
Nathan have already done the work needed for that.

-- 
John Naylor
Amazon Web Services
From 298cbb26558807fe9ce87f3709d6bfb785668d5d Mon Sep 17 00:00:00 2001
From: Paul Amonson 
Date: Mon, 6 May 2024 08:34:17 -0700
Subject: [PATCH v13 1/3] Add a Postgres SQL function for crc32c benchmarking

Add a drive_crc32c() function to use for benchmarking crc32c
computation. The function takes 2 arguments:

(1) count: num of times CRC32C is computed in a loop.
(2) num: #bytes in the buffer to calculate crc over.

XXX not for commit

Extracted from a patch by  Raghuveer Devulapalli
---
 contrib/meson.build  |  1 +
 contrib/test_crc32c/Makefile | 20 +++
 contrib/test_crc32c/expected/test_crc32c.out | 57 
 contrib/test_crc32c/meson.build  | 34 
 contrib/test_crc32c/sql/test_crc32c.sql  |  3 ++
 contrib/test_crc32c/test_crc32c--1.0.sql |  1 +
 contrib/test_crc32c/test_crc32c.c| 47 
 contrib/test_crc32c/test_crc32c.control  |  4 ++
 8 files changed, 167 insertions(+)
 create mode 100644 contrib/test_crc32c/Makefile
 create mode 100644 contrib/test_crc32c/expected/test_crc32c.out
 create mode 100644 contrib/test_crc32c/meson.build
 create mode 100644 contrib/test_crc32c/sql/test_crc32c.sql
 create mode 100644 contrib/test_crc32c/test_crc32c--1.0.sql
 create mode 100644 contrib/test_crc32c/test_crc32c.c
 create mode 100644 contrib/test_crc32c/test_crc32c.control

diff --git a/contrib/meson.build b/contrib/meson.build
index 1ba73ebd67a..06673db0625 100644
--- a/contrib/meson.build
+++ b/contrib/meson.build
@@ -12,6 +12,7 @@ contrib_doc_args = {
   'install_dir': contrib_doc_dir,
 }
 
+subdir('test_crc32c')
 subdir('amcheck')
 subdir('auth_delay')
 subdir('auto_explain')
diff --git a/contrib/test_crc32c/Makefile b/contrib/test_crc32c/Makefile
new file mode 100644
index 000..5b747c6184a
--- /dev/null
+++ b/contrib/test_crc32c/Makefile
@@ -0,0 +1,20 @@
+MODULE_big = test_crc32c
+OBJS = test_crc32c.o
+PGFILEDESC = "test"
+EXTENSION = test_crc32c
+DATA = test_crc32c--1.0.sql
+
+first: all
+
+# test_crc32c.o:	CFLAGS+=-g
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_crc32c
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/test_crc32c/expected/test_crc32c.out b/contrib/test_crc32c/expected/test_crc32c.out
new file mode 100644
index 000..dff6bb3133b
--- /dev/null
+++ b/contrib/test_crc32c/expected/test_crc32c.out
@@ -0,0 +1,57 @@
+CREATE EXTENSION test_crc32c;
+select drive_crc32c(1, i) from generate_series(100, 300, 4) i;
+ drive_crc32c 
+--
+532139994
+   2103623867
+785984197
+   2686825890
+   3213049059
+   3819630168
+   1389234603
+534072900
+   2930108140
+   2496889855
+   1475239611
+136366931
+   3067402116
+   2012717871
+   3682416023
+   2054270645
+   1817339875
+   4100939569
+   1192727539
+   3636976218
+369764421
+   3161609879
+   1067984880
+   1235066769
+   3138425899
+648132037
+   4203750233
+   1330187888
+   2683521348
+   1951644495
+   2574090107
+   3904902018
+   3772697795
+   1644686344
+   2868962106
+   3369218491
+   3902689890
+   3456411865
+141004025
+   1504497996
+   3782655204
+   3544797610
+   3429174879
+   2524728016
+   3935861181
+ 25498897
+692684159
+345705535
+   2761600287
+   2654632420
+   3945991399
+(51 rows)
+
diff --git a/contrib/test_crc32c/meson.build b/contrib/test_crc32c/meson.build
new file mode 100644
index 000..d7bec4ba1cb
--- /dev/null
+++ b/contrib/test_crc32c/meson.build
@@ -0,0 +1,34 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+test_crc32c_sources = files(
+  'test_crc32c.c',
+)
+
+if host_system == 'windows'
+  test_

Re: dblink: Add SCRAM pass-through authentication

2025-03-11 Thread Jacob Champion
On Fri, Mar 7, 2025 at 8:22 AM Peter Eisentraut  wrote:
> Right.  How about the attached?  It checks as an alternative to a
> password whether the SCRAM keys were provided.  That should get us back
> to the same level of checking?

Yes, I think so. Attached is a set of tests to illustrate, mirroring
the dblink tests added upthread; they fail without this patch.

I like that this solution addresses some of the concerns from my dblink review.

--

Not part of this patchset, but I think the errmsg in
pgfdw_security_check() is confusing:

ERROR: password or GSSAPI delegated credentials required
DETAIL: Non-superuser cannot connect if the server does not
request a password or...
HINT: Target server's authentication method must be changed or...

For the user to have gotten past check_conn_params, they *have*
provided a password/credentials. But the server didn't ask for it (or
at least, not the right one). The detail and hint messages are correct
here, but I'd argue the error message itself is not.

Thanks!
--Jacob
diff --git a/contrib/postgres_fdw/t/001_auth_scram.pl 
b/contrib/postgres_fdw/t/001_auth_scram.pl
index 047840cc914..60d46ebc665 100644
--- a/contrib/postgres_fdw/t/001_auth_scram.pl
+++ b/contrib/postgres_fdw/t/001_auth_scram.pl
@@ -68,6 +68,45 @@ test_fdw_auth($node1, $db0, "t2", $fdw_server2,
 test_auth($node2, $db2, "t2",
"SCRAM auth directly on foreign server should still succeed");
 
+# Ensure that trust connections fail without superuser opt-in.
+unlink($node1->data_dir . '/pg_hba.conf');
+unlink($node2->data_dir . '/pg_hba.conf');
+
+$node1->append_conf(
+   'pg_hba.conf', qq{
+local   db0 all scram-sha-256
+local   db1 all trust
+});
+$node2->append_conf(
+   'pg_hba.conf', qq{
+local   db2 all trust
+});
+
+$node1->restart;
+$node2->restart;
+
+my ($ret, $stdout, $stderr) = $node1->psql(
+   $db0,
+   "SELECT count(1) FROM t",
+   connstr => $node1->connstr($db0) . " user=$user");
+
+is($ret, 3, 'loopback trust fails on the same cluster');
+like(
+   $stderr,
+   qr/password or GSSAPI delegated credentials required/,
+   'expected error from loopback trust (same cluster)');
+
+($ret, $stdout, $stderr) = $node1->psql(
+   $db0,
+   "SELECT count(1) FROM t2",
+   connstr => $node1->connstr($db0) . " user=$user");
+
+is($ret, 3, 'loopback trust fails on a different cluster');
+like(
+   $stderr,
+   qr/password or GSSAPI delegated credentials required/,
+   'expected error from loopback trust (different cluster)');
+
 # Helper functions
 
 sub test_auth


Re: Improve CRC32C performance on SSE4.2

2025-03-11 Thread John Naylor
On Wed, Mar 5, 2025 at 10:52 PM Nathan Bossart  wrote:
>
> On Wed, Mar 05, 2025 at 08:51:21AM +0700, John Naylor wrote:
> > That was my hunch too, but I wanted to be more sure, so I modified the
> > benchmark so it doesn't know the address of the next calculation until
> > it finishes the last calculation so we can hopefully see the latency
> > caused by indirection. It also does an additional calculation on
> > constant 20 bytes, like the WAL header. I also tweaked the length each
> > iteration so the branch predictor maybe has a harder time predicting
> > the constant 20 input. And to make it more challenging, I removed the
> > part that inlined all small inputs, so it inlines only constant
> > inputs:
>
> Would you mind sharing this test?

The test script is the same as here, except I only ran small lengths:

https://www.postgresql.org/message-id/CANWCAZahvhE-%2BhtZiUyzPiS5e45ukx5877mD-dHr-KSX6LcdjQ%40mail.gmail.com

...but I must have forgotten to attach the slightly tweaked patch set,
which I've done now. 0002 modifies the 0001 test module and 0006
reverts inlining non-constant input from 0005, just to see if I could
find a regression from indirection, which I didn't. If we don't need
it, it'd better to avoid inlining loops to keep from bloating the
binary.

> It sounds like you are running a
> workload with a mix of constant/inlined calls and function pointer calls to
> simulate typical usage for WAL, but I'm not 100% sure I'm understanding you
> correctly.

Exactly.

--
John Naylor
Amazon Web Services
From c5cd6e44028eaf11efc2cf4fc49c87101b49c97f Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Wed, 5 Mar 2025 08:21:54 +0700
Subject: [PATCH v12 6/6] Only inline for constant input (partial revert)

---
 src/include/port/pg_crc32c.h | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/include/port/pg_crc32c.h b/src/include/port/pg_crc32c.h
index 26b676dddc9..01192831ca3 100644
--- a/src/include/port/pg_crc32c.h
+++ b/src/include/port/pg_crc32c.h
@@ -66,12 +66,11 @@ static inline
 pg_crc32c
 pg_comp_crc32c_dispatch(pg_crc32c crc, const void *data, size_t len)
 {
-	if (len < 64)
+	if (__builtin_constant_p(len) && len < 64)
 	{
 		/*
-		 * For small inputs, inline the computation to avoid the runtime
-		 * check. This also allows the compiler to unroll loops for constant
-		 * input.
+		 * For small constant inputs, inline the computation. This allows the
+		 * compiler to unroll loops.
 		 */
 		return pg_comp_crc32c_sse42_inline(crc, data, len);
 	}
-- 
2.48.1

From eafea75fc761fd51fa67311af794cf0f7dec40aa Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Wed, 12 Feb 2025 15:27:16 +0700
Subject: [PATCH v12 4/6] Improve CRC32C performance on x86_64

The current SSE4.2 implementation of CRC32C relies on the native
CRC32 instruction, which operates on 8 bytes at a time. We can get a
substantial speedup on longer inputs by using carryless multiplication
on SIMD registers, processing 64 bytes per loop iteration.

The PCLMULQDQ instruction has been widely available since 2011 (almost
as old as SSE 4.2), so this commit now requires that, as well as SSE
4.2, to build pg_crc32c_sse42.c.

The MIT-licensed implementation was generated with the "generate"
program from

https://github.com/corsix/fast-crc32/

Based on: "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ
Instruction" V. Gopal, E. Ozturk, et al., 2009

Author: Raghuveer Devulapalli 
Author: John Naylor 
Discussion: https://postgr.es/m/ph8pr11mb82869ff741dfa4e9a029ff13fb...@ph8pr11mb8286.namprd11.prod.outlook.com
---
 src/include/port/pg_crc32c.h  | 30 ---
 src/port/pg_crc32c_sse42.c| 88 +++
 src/port/pg_crc32c_sse42_choose.c | 26 -
 3 files changed, 124 insertions(+), 20 deletions(-)

diff --git a/src/include/port/pg_crc32c.h b/src/include/port/pg_crc32c.h
index 5ccc79295c0..fe0e1b6b275 100644
--- a/src/include/port/pg_crc32c.h
+++ b/src/include/port/pg_crc32c.h
@@ -37,6 +37,11 @@
 
 typedef uint32 pg_crc32c;
 
+/* WIP: configure checks */
+#ifdef __x86_64__
+#define USE_PCLMUL_WITH_RUNTIME_CHECK
+#endif
+
 /* The INIT and EQ macros are the same for all implementations. */
 #define INIT_CRC32C(crc) ((crc) = 0x)
 #define EQ_CRC32C(c1, c2) ((c1) == (c2))
@@ -68,6 +73,23 @@ pg_comp_crc32c_dispatch(pg_crc32c crc, const void *data, size_t len)
 		return pg_comp_crc32c_sse42(crc, data, len);
 }
 
+#elif defined(USE_SSE42_CRC32C_WITH_RUNTIME_CHECK)
+
+/*
+ * Use Intel SSE 4.2 or PCLMUL instructions, but perform a runtime check first
+ * to check that they are available.
+ */
+#define COMP_CRC32C(crc, data, len) \
+	((crc) = pg_comp_crc32c((crc), (data), (len)))
+#define FIN_CRC32C(crc) ((crc) ^= 0x)
+
+extern pg_crc32c pg_comp_crc32c_sb8(pg_crc32c crc, const void *data, size_t len);
+extern pg_crc32c (*pg_comp_crc32c) (pg_crc32c crc, const void *data, size_t len);
+extern pg_crc32c pg_comp_crc32c_sse42(pg_crc32c crc, const void *data, size_t len);
+#i

RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2025-03-11 Thread Bykov Ivan
Hello!

>> Variant B is not acceptable IMO as it adds a whole bunch of 
>> null-terminators unnecessarily. For example, in a simple "select 1", 
>> (expr == NULL) is true 19 times, so that is an extra 19 bytes.

> Variant B is not acceptable here.

Could we improve Variant B?

I was thinking about adding a count of NULL pointers encountered by the query
jumble walker since the last scalar or non-null field visited.
This way, we can traverse the query as follows:

QueryA->subNodeOne = &(Value X);
/* JumbleState->Count = 0;
   QueryA_ID = hash_any_extended(&(Value X), sizeof(Value X), QueryA_ID) */
QueryA->subNodeTwo = NULL;
/* JumbleState->Count = 1; */
QueryA->subNodeThee = NULL;
/* JumbleState->Count = 2; */
QueryA->subNodeFour = NULL;
/* JumbleState->Count = 3; */
QueryA->scalar = Value Z;
/* QueryA_ID = hash_any_extended(&(JumbleState->Count),
sizeof(JumbleState->Count), QueryA_ID)
   JumbleState->Count = 0;
   QueryA_ID = hash_any_extended(&(Value Y), sizeof(Value Y), QueryA_ID) */


Variant A improvement
-

I have attached the updated Variant A patch with the following changes:
- Implemented a pg_stat_statements regression test (see similar test results
  on REL_17_3 in Query-ID-collision-at_pg_stat_statements-1ea6e890b22.txt).
- Added extra description about the Query ID collision
  in src/include/nodes/parsenodes.h.



-- End of contrib/pg_stat_statements/expected/select.out for REL_17_3

CREATE TABLE pgss_c AS
  SELECT id FROM generate_series(1,2) AS id;
SELECT pg_stat_statements_reset() IS NOT NULL AS t;
 t 
---
 t
(1 row)

SELECT DISTINCT id FROM pgss_c;
 id 

  2
  1
(2 rows)

SELECT id FROM pgss_c ORDER BY id;
 id 

  1
  2
(2 rows)

SELECT id FROM pgss_c LIMIT 1;
 id 

  1
(1 row)

SELECT id FROM pgss_c OFFSET 1;
 id 

  2
(1 row)

SELECT query FROM pg_stat_statements
  WHERE query LIKE '%pgss_c%'
  ORDER BY query COLLATE "C";
 query  

 SELECT DISTINCT id FROM pgss_c
 SELECT id FROM pgss_c LIMIT $1
(2 rows)

SELECT pg_stat_statements_reset() IS NOT NULL AS t;
 t 
---
 t
(1 row)

SELECT id FROM pgss_c ORDER BY id;
 id 

  1
  2
(2 rows)

SELECT DISTINCT id FROM pgss_c;
 id 

  2
  1
(2 rows)

SELECT id FROM pgss_c OFFSET 1;
 id 

  2
(1 row)

SELECT id FROM pgss_c LIMIT 1;
 id 

  1
(1 row)

SELECT query FROM pg_stat_statements
  WHERE query LIKE '%pgss_c%'
  ORDER BY query COLLATE "C";
   query   
---
 SELECT id FROM pgss_c OFFSET $1
 SELECT id FROM pgss_c ORDER BY id
(2 rows)

SELECT pg_stat_statements_reset() IS NOT NULL AS t;
 t 
---
 t
(1 row)

DROP TABLE pgss_c;


v2-0001-Query-ID-Calculation-Fix-Variant-A.patch
Description: v2-0001-Query-ID-Calculation-Fix-Variant-A.patch


support ALTER COLUMN SET EXPRESSION over virtual generated column with check constraint

2025-03-11 Thread jian he
hi.

in RememberAllDependentForRebuilding
while (HeapTupleIsValid(depTup = systable_getnext(scan)))
{
if(subtype == AT_SetExpression)
elog(INFO, "foundObject.classId:%d", foundObject.classId);
}
Then do the regress test on generated_stored.sql
I found out only constraints and indexes will be rebuilt
while we are doing ALTER TABLE ALTER COLUMN SET EXPRESSION.

we can also see RememberAllDependentForRebuilding handling of:
case RelationRelationId:
case AttrDefaultRelationId:
case ConstraintRelationId:

RememberAllDependentForRebuilding record
AlteredTableInfo->changedConstraintOids, AlteredTableInfo->changedIndexOids.
ATPostAlterTypeCleanup will construct steps to rebuild these
constraints over the generated column.
and if these constraints are successfully installed,
AlteredTableInfo->constraints will be populated.
then in phase3 ATRewriteTable will do the scan or rewrite.


in summary: ATExecSetExpression, RememberAllDependentForRebuilding
will do all the work to change the generation expression,
whether it's virtual or stored.


we didn't support virtual generated columns over domain with check constraint.
we also didn't support index over virtual generated columns.
to support change generation expressions for virtual generated columns
over check constraints,
the code seems not hard.
From fa81ac3a82bfdb0dea52a985eb534a3866b4af7a Mon Sep 17 00:00:00 2001
From: jian he 
Date: Tue, 11 Mar 2025 12:17:00 +0800
Subject: [PATCH v1 1/1] virtual generated column set expression with check
 constraint

currently, if virtual generated column have check constraints over it
(we currently not supported index on virtual generated column)
then we can not change the generation expression.
for example"

CREATE TABLE gtest20 (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL CHECK (b < 50));
INSERT INTO gtest20 (a) VALUES (10);
ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3); --error

this patch is to support it.
main gotcha is in ATExecSetExpression,
RememberAllDependentForRebuilding will do all the work.

also add a test for ALTER TABLE SET EXPRESSION for virtual generated column will not
do table rewrite.

discussion: https://postgr.es/m/
---
 src/backend/commands/tablecmds.c  | 28 +++--
 .../regress/expected/generated_virtual.out| 39 ---
 src/test/regress/sql/generated_virtual.sql| 19 +++--
 3 files changed, 58 insertions(+), 28 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1f870982559..079080700dc 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8464,6 +8464,7 @@ ATExecSetExpression(AlteredTableInfo *tab, Relation rel, const char *colName,
 	Expr	   *defval;
 	NewColumnValue *newval;
 	RawColumnDefault *rawEnt;
+	bool			rescan = false;
 
 	tuple = SearchSysCacheAttName(RelationGetRelid(rel), colName);
 	if (!HeapTupleIsValid(tuple))
@@ -8488,17 +8489,7 @@ ATExecSetExpression(AlteredTableInfo *tab, Relation rel, const char *colName,
  errmsg("column \"%s\" of relation \"%s\" is not a generated column",
 		colName, RelationGetRelationName(rel;
 
-	/*
-	 * TODO: This could be done, just need to recheck any constraints
-	 * afterwards.
-	 */
-	if (attgenerated == ATTRIBUTE_GENERATED_VIRTUAL &&
-		rel->rd_att->constr && rel->rd_att->constr->num_check > 0)
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("ALTER TABLE / SET EXPRESSION is not supported for virtual generated columns on tables with check constraints"),
- errdetail("Column \"%s\" of relation \"%s\" is a virtual generated column.",
-		   colName, RelationGetRelationName(rel;
+	rescan = (attgenerated == ATTRIBUTE_GENERATED_VIRTUAL);
 
 	/*
 	 * We need to prevent this because a change of expression could affect a
@@ -8538,6 +8529,21 @@ ATExecSetExpression(AlteredTableInfo *tab, Relation rel, const char *colName,
 		RememberAllDependentForRebuilding(tab, AT_SetExpression, rel, attnum, colName);
 	}
 
+	if (rescan)
+	{
+		Assert(!rewrite);
+
+		/* make sure we don't conflict with later attribute modifications */
+		CommandCounterIncrement();
+
+		/*
+		 * Find everything that depends on the column (constraints, indexes,
+		 * etc), and record enough information to let us recreate the objects
+		 * after recan.
+		 */
+		RememberAllDependentForRebuilding(tab, AT_SetExpression, rel, attnum, colName);
+	}
+
 	/*
 	 * Drop the dependency records of the GENERATED expression, in particular
 	 * its INTERNAL dependency on the column, which would otherwise cause
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index 7ef05f45be7..8d609c7c66e 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -636,12 +636,22 @@ INSERT INTO gtest20 (a) VALUES (10);  -- ok
 INSERT INTO gtest20 (a) VALUES (30);  -- violates constraint
 ERROR:  new 

Re: Printing window function OVER clauses in EXPLAIN

2025-03-11 Thread Tom Lane
=?utf-8?Q?=C3=81lvaro?= Herrera  writes:
> On 2025-Mar-09, Tom Lane wrote:
>> David Rowley  writes:
>>> What are your thoughts on being a bit more brief with the naming and
>>> just prefix with "w" instead of "window"?

>> OK by me, any objections elsewhere?

> WFM.

Here's a hopefully-final v3 that makes the two changes discussed.
Now with a draft commit message, too.

I looked for documentation examples that needed updates, but there
don't seem to be any.  Our documentation of EXPLAIN output is
mighty thin anyway.  I don't want to try to improve that situation
as part of this patch.

regards, tom lane

From 4f30b4302c8398616d4d3d44f710ed437c59f6b8 Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Mon, 10 Mar 2025 12:09:43 -0400
Subject: [PATCH v3] Improve EXPLAIN's display of window functions.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Up to now we just punted on showing the window definitions used
in a plan, with window function calls represented as "OVER (?)".
To improve that, show the window definition implemented by each
WindowAgg plan node, and reference their window names in OVER.
For nameless window clauses generated by "OVER (...)", assign
unique names w1, w2, etc.

In passing, re-order the properties shown for a WindowAgg node
so that the Run Condition (if any) appears after the Window
property and before the Filter (if any).  This seems more
sensible since the Run Condition is associated with the Window
and acts before the Filter.

Thanks to David G. Johnston and Álvaro Herrera for design
suggestions.

Author: Tom Lane 
Reviewed-by: David Rowley 
Discussion: https://postgr.es/m/144530.1741469...@sss.pgh.pa.us
---
 .../postgres_fdw/expected/postgres_fdw.out|  25 +-
 src/backend/commands/explain.c| 117 ++-
 src/backend/optimizer/plan/createplan.c   |  39 +--
 src/backend/optimizer/plan/planner.c  |  51 +++
 src/backend/utils/adt/ruleutils.c | 150 ++---
 src/include/nodes/plannodes.h |   3 +
 src/include/utils/ruleutils.h |   5 +
 src/test/regress/expected/box.out |  14 +-
 .../regress/expected/create_index_spgist.out  |  42 ++-
 src/test/regress/expected/explain.out |  22 +-
 .../regress/expected/generated_virtual.out|   3 +-
 src/test/regress/expected/groupingsets.out|  12 +-
 src/test/regress/expected/partition_prune.out |   4 +-
 src/test/regress/expected/polygon.out |   3 +-
 src/test/regress/expected/select_parallel.out |   7 +-
 src/test/regress/expected/sqljson.out |  18 +-
 src/test/regress/expected/window.out  | 308 ++
 src/test/regress/sql/explain.sql  |   4 +
 18 files changed, 574 insertions(+), 253 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c68119030ab..bb4ed3059c4 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3968,10 +3968,11 @@ select c2, sum(c2), count(c2) over (partition by c2%2) from ft2 where c2 < 10 gr
  QUERY PLAN 
 
  Sort
-   Output: c2, (sum(c2)), (count(c2) OVER (?)), ((c2 % 2))
+   Output: c2, (sum(c2)), (count(c2) OVER w1), ((c2 % 2))
Sort Key: ft2.c2
->  WindowAgg
- Output: c2, (sum(c2)), count(c2) OVER (?), ((c2 % 2))
+ Output: c2, (sum(c2)), count(c2) OVER w1, ((c2 % 2))
+ Window: w1 AS (PARTITION BY ((ft2.c2 % 2)))
  ->  Sort
Output: c2, ((c2 % 2)), (sum(c2))
Sort Key: ((ft2.c2 % 2))
@@ -3979,7 +3980,7 @@ select c2, sum(c2), count(c2) over (partition by c2%2) from ft2 where c2 < 10 gr
  Output: c2, ((c2 % 2)), (sum(c2))
  Relations: Aggregate on (public.ft2)
  Remote SQL: SELECT c2, (c2 % 2), sum(c2) FROM "S 1"."T 1" WHERE ((c2 < 10)) GROUP BY 1
-(12 rows)
+(13 rows)
 
 select c2, sum(c2), count(c2) over (partition by c2%2) from ft2 where c2 < 10 group by c2 order by 1;
  c2 | sum | count 
@@ -4001,10 +4002,11 @@ select c2, array_agg(c2) over (partition by c2%2 order by c2 desc) from ft1 wher
 QUERY PLAN 
 ---
  Sort
-   Output: c2, (array_agg(c2) OVER (?)), ((c2 % 2))
+   Output: c2, (array_agg(c2) OVER w1), ((c2 % 2))
Sort Key: ft1.c2
->  WindowAgg
- Output: c2, array_agg(c2) OVER (?), ((c2 % 2))
+ Output: c2, array_agg(c2) OVER w1, ((c2 % 2))
+ Window: w1 AS (PARTITION BY ((ft1.c2 % 2)) ORDER BY ft1.c2)
  ->  Sort
O

Re: table_tuple_lock's snapshot argument

2025-03-11 Thread Ranier Vilela
Em seg., 10 de mar. de 2025 às 12:22, Heikki Linnakangas 
escreveu:

> On 21/01/2025 12:05, Alexander Korotkov wrote:
> > On Tue, Jan 21, 2025 at 12:03 PM Alexander Korotkov
> >  wrote:
> >> On Sun, Dec 29, 2024 at 3:59 PM Heikki Linnakangas 
> wrote:
> >>> However, I think GetLatestSnapshot() is wrong here anyway. None of this
> >>> matters for heapam which just ignores the 'snapshot' argument, but
> let's
> >>> assume a different AM that would use the snapshot to identify the tuple
> >>> version. The TID was fetched from an index using an index scan with
> >>> SnapshotDirty. There's no guarantee that the LatestSnapshot would match
> >>> the same tuple version that the index scan found. If an MVCC snapshot
> is
> >>> needed, surely it should be acquired before the index scan, and used
> for
> >>> the index scan as well.
> >>>
> >>> I see three options:
> >>>
> >>> 1. Remove the 'snapshot' argument, since it's not used by heapam. If we
> >>> get a table AM where a single TID represents multiple row versions,
> this
> >>> will need to be revisited.
> >>>
> >>> 2. Rewrite the recheck code in execReplication.c so that it uses the
> >>> snapshot in a more consistent fashion. Call GetLatestSnapshot() first,
> >>> and use the same snapshot in the index scan and table_tuple_lock().
> >>> Acquiring a snapshot isn't free though, so it would be nice to avoid
> >>> doing that when the heapam is just going to ignore it anyway. If we go
> >>> with this option, I think we could reuse the snapshot that is already
> >>> active in most cases, and only take a new snapshot if the tuple was
> >>> concurrently updated.
> >>>
> >>> 3. Modify the tableam interface so that the index scan can return a
> more
> >>> unique identifier of the tuple version. In heapam, it could be the TID
> >>> like today, but a different AM could return some other token. Continue
> >>> to use SnapshotDirty in the index scan, but in the call to
> >>> table_tuple_lock(), instead of passing GetLatestSnapshot() and TID,
> pass
> >>> the token you got index_getnext_slot().
> >>
> >> Thank you for your finding and analysis.  I would vote for #3.
> >> Currently snapshot argument is not properly used, and the heapam code
> >> doesn't check it.  I think allowing flexible tuple identifier is
> >> essential for future of table AM API.  Therefore, any
> >> snapshot/visibility information could be put inside that tuple
> >> identifier if needed.
> >
> > To be more specific, I vote for #1 + #3. Remove the snapshot argument
> > for now, then work on flexible tuple identifier.
>
> I committed and backpatched the trivial fix for not, just replacing
> GetLatestSnapshot() with GetActiveSnapshot(). I got cold feet on just
> removing the argument without providing a replacement, because it's not
> 100% clear to me if the current situation is broken or not.
>
> To summarize the issue, RelationFindReplTupleByIndex() performs an index
> scan with DirtySnapshot, and then calls table_tuple_lock() with a fresh
> MVCC snapshot to lock the row. There's no guarantee that the index scan
> found the same row version that table_tuple_lock() will lock, if the TID
> alone doesn't uniquely identify it. But that's still OK as long as the
> key column hasn't changed, like with heapam's HOT updates. I couldn't
> convince myself that it's broken nor that it's guaranteed to be correct
> with other AMs.
>
Just for curiosity.
*RelationFindReplTupleSeq* on the same source, does not suffer
from the same issue?

PushActiveSnapshot(GetLatestSnapshot());

res = table_tuple_lock(rel, &(outslot->tts_tid), GetLatestSnapshot(),

best regards,
Ranier Vilela


Re: making EXPLAIN extensible

2025-03-11 Thread Tom Lane
Robert Haas  writes:
> If there is, maybe we should discard the integer IDs and just use
> strings directly. I could make GetExplainExtensionState() and
> SetExplainExtensionState() take const char *extension_name arguments
> instead of int extension_id, and just get rid of GetExplainExtensionId
> entirely. I chose the current design because I thought that the IDs
> did not need to be shared and I thought it would be a bit extravagant
> to have an EXPLAIN extension need to do a hash table lookup for every
> plan node, but maybe it wouldn't really matter in practice.

I find a good deal of attraction in getting rid of the IDs and
just using names.  Nor do I believe we need a hash table.
(1) Surely there will not be so many extensions using this within a
single EXPLAIN that a simple loop with strcmp()'s isn't good enough.
(2) The IDs aren't free either; where will an extension keep the
ID it assigned?  We're trying to move away from global variables.

But, if you're convinced otherwise, the current design is OK.

Other than that point, I think 0001 and 0002 are ready.

> Regarding commit timeframe, you expressed a preference for 0001 and
> 0002 to be committed "soon," but I'm going to be leaving town for a
> week on Saturday and would prefer not to put myself in a situation
> where somebody is expecting me to fix the buildfarm while I am gone.
> So I'm happy to commit them today or tomorrow if you don't mind being
> responsible for anything that blows up while I'm gone, or I'm happy to
> have you commit them whenever you want, but barring one of those
> outcomes, I'm going to deal with it on or after March 17th.

Sure, I will backstop you if you want to push and run.  There's
at least one nearby thread where some interest has been expressed in
using this (the postgres_fdw include-the-remote-EXPLAIN business).
So I think there is value in having it in the tree sooner.

regards, tom lane




Re: [Patch] remove duplicated smgrclose

2025-03-11 Thread Steven Niu
Masahiko Sawada  于2025年3月8日周六 12:04写道:

> Hi,
>
> On Sun, Oct 27, 2024 at 12:05 PM Kirill Reshke 
> wrote:
> >
> > On Wed, 14 Aug 2024 at 11:35, Steven Niu  wrote:
> > >
> > > Junwang, Kirill,
> > >
> > > The split work has been done. I created a new patch for removing
> redundant smgrclose() function as attached.
> > > Please help review it.
> > >
> > > Thanks,
> > > Steven
> > >
> > > Steven Niu  于2024年8月12日周一 18:11写道:
> > >>
> > >> Kirill,
> > >>
> > >> Good catch!
> > >> I will split the patch into two to cover both cases.
> > >>
> > >> Thanks,
> > >> Steven
> > >>
> > >>
> > >> Junwang Zhao  于2024年8月9日周五 18:19写道:
> > >>>
> > >>> On Fri, Aug 9, 2024 at 5:20 PM Kirill Reshke 
> wrote:
> > >>> >
> > >>> > On Thu, 1 Aug 2024 at 17:32, Junwang Zhao 
> wrote:
> > >>> > >
> > >>> > > Hi Steven,
> > >>> > >
> > >>> > > On Wed, Jul 31, 2024 at 11:16 AM Steven Niu 
> wrote:
> > >>> > > >
> > >>> > > > Hello, hackers,
> > >>> > > >
> > >>> > > > I think there may be some duplicated codes.
> > >>> > > > Function smgrDoPendingDeletes() calls both smgrdounlinkall()
> and smgrclose().
> > >>> > > > But both functions would close SMgrRelation object, it's
> dupliacted behavior?
> > >>> > > >
> > >>> > > > So I make this patch. Could someone take a look at it?
> > >>> > > >
> > >>> > > > Thanks for your help,
> > >>> > > > Steven
> > >>> > > >
> > >>> > > > From Highgo.com
> > >>> > > >
> > >>> > > >
> > >>> > > You change LGTM, but the patch seems not to be applied to HEAD,
> > >>> > > I generate the attached v2 using `git format` with some commit
> message.
> > >>> > >
> > >>> > > --
> > >>> > > Regards
> > >>> > > Junwang Zhao
> > >>> >
> > >>> > Hi all!
> > >>> > This change looks good to me. However, i have an objection to these
> > >>> > lines from v2:
> > >>> >
> > >>> > >  /* Close the forks at smgr level */
> > >>> > > - for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
> > >>> > > - smgrsw[which].smgr_close(rels[i], forknum);
> > >>> > > + smgrclose(rels[i]);
> > >>> >
> > >>> > Why do we do this? This seems to be an unrelated change given
> thread
> > >>> > $subj. This is just a pure refactoring job, which deserves a
> separate
> > >>> > patch. There is similar coding in
> > >>> > smgrdestroy function:
> > >>> >
> > >>> > ```
> > >>> > for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
> > >>> > smgrsw[reln->smgr_which].smgr_close(reln, forknum);
> > >>> > ```
> > >>> >
> > >>> > So, I feel like these two places should be either changed together
> or
> > >>> > not be altered at all. And is it definitely a separate change.
> > >>>
> > >>> Yeah, I tend to agree with you, maybe we should split the patch
> > >>> into two.
> > >>>
> > >>> Steven, could you do this?
> > >>>
> > >>> >
> > >>> > --
> > >>> > Best regards,
> > >>> > Kirill Reshke
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Regards
> > >>> Junwang Zhao
> >
> > Hi!
> > Looks like discussion on the subject is completed, and no open items
> > left, so I will try to mark commitfest [1] entry as Ready For
> > Committer.
> >
>
> I've looked at the patch and have some comments:
>
> The patch removes smgrclose() calls following smgrdounlinkall(), for
> example:
>
> --- a/src/backend/catalog/storage.c
> +++ b/src/backend/catalog/storage.c
> @@ -686,9 +686,6 @@ smgrDoPendingDeletes(bool isCommit)
>   {
>   smgrdounlinkall(srels, nrels, false);
>
> - for (int i = 0; i < nrels; i++)
> - smgrclose(srels[i]);
> -
>   pfree(srels);
>   }
>  }
>
> While smgrdounlinkall() close the relation at smgr level as follow:
>
> /* Close the forks at smgr level */
> for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
> smgrsw[which].smgr_close(rels[i], forknum);
>
> smgrrelease(), called by smgrclose(), also does the same thing but
> does more things as follow:
>
> void
> smgrrelease(SMgrRelation reln)
> {
> for (ForkNumber forknum = 0; forknum <= MAX_FORKNUM; forknum++)
> {
> smgrsw[reln->smgr_which].smgr_close(reln, forknum);
> reln->smgr_cached_nblocks[forknum] = InvalidBlockNumber;
> }
> reln->smgr_targblock = InvalidBlockNumber;
> }
>
> Therefore, if we move such smgrclose() calls, we would end up missing
> some operations that are done in smgrrelease() but not in
> smgrdounlinkall(), no?
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com



 Hi, Masahiko

Thanks for your comments! I understand your concern as you stated.
However, my initial patch was split into two parts as Kirill suggested.
This thread is about the first part. Another part is here:
https://commitfest.postgresql.org/patch/5149/
Or you can take a look at the v2-0001-remove-duplicated-smgrclose.patch

in
this thread for the complete change.

I think either we review the v2-patch, or review the both 5149 and 5196
CFs, for my complete change.
There should be no missing operations.

Please let me know if you have mo

Re: Back-patch of: avoid multiple hard links to same WAL file after a crash

2025-03-11 Thread Noah Misch
On Thu, Dec 19, 2024 at 02:44:53PM +0900, Michael Paquier wrote:
> On Wed, Dec 18, 2024 at 08:51:20PM -0500, Andres Freund wrote:
> > I don't think the issue is actually quite as unlikely to be hit as reasoned 
> > in
> > the commit message.  The crash has indeed to happen between the link() and
> > unlink() - but at the end of a checkpoint we do that operations hundreds of
> > times in a row on a busy server.  And that's just after potentially doing 
> > lots
> > of write IO during a checkpoint, filling up drive write caches / eating up
> > IOPS/bandwidth disk quots.
> 
> Looks so, yep.  Your timing and the report's timing are interesting.
> 
> I've been double-checking the code to refresh myself with the problem,
> and I don't see a reason to not apply something like the attached set
> down to v13 for all these remaining branches (minus an edit of the
> commit message).

This became v14 commit 1f95181b44c843729caaa688f74babe9403b5850.  I think it
is causing timing-dependent failures in archive recovery, at restartpoints.
v14 and earlier were relying on the "excl" part of durable_rename_excl() to
keep WAL preallocation and recycling from stomping on archive-restored WAL
files.  If that's right, we need to either back-patch v15's suppression of
those features during archive recovery (commit cc2c7d6 and five others), or we
need to revert and fix the multiple hard links differently.

v15 commit cc2c7d6 "Skip WAL recycling and preallocation during archive
recovery" wrote that it fixed "a spurious durable_rename_excl() LOG message".
Replacing durable_rename_excl() with durable_rename() increased consequences
beyond spurious messages.  I regret that I didn't make that connection during
post-commit review of commit 1f95181b44c843729caaa688f74babe9403b5850.
Trouble emerges during archive recovery, not during streaming or crash
recovery.  The symptom is "invalid magic number  in log segment X, offset
0" when PreallocXlogFiles() stomps.  The symptom is "unexpected pageaddr X in
log segment Y, offset 0" [X < Y] when RemoveOldXlogFiles() recycling stomps.
wal_recycle=off probably avoids the latter but not the former.

Options I see:

1. Make v14 and v13 skip WAL recycling and preallocation during archive
   recovery, like newer branches do.  I think that means back-patching the six
   commits cc2c7d6~4 cc2c7d6~3 cc2c7d6~2 cc2c7d6~1 cc2c7d6 e36cbef.

2. Revert 1f95181b44c843729caaa688f74babe9403b5850 and its v13 counterpart.
   Avoid multiple hard links by making durable_rename_excl() first rename
   oldfile to a temporary name.  (I haven't thought this through in detail.
   It may not suffice.)

I'm leaning toward (1) at least enough to see how messy the back-patch would
be, since I don't like risks of designing old-branch-specific solutions when
v15/v16/v17 have a proven solution.  What else should we be thinking about
before deciding?

Arun Thirupathi collected the symptoms, traced them to commit
1f95181b44c843729caaa688f74babe9403b5850, and reported the diagnosis to me.
These symptoms have not emerged in v15+.

Thanks,
nm




Re: bogus error message for ALTER TABLE ALTER CONSTRAINT

2025-03-11 Thread Amul Sul
On Mon, Mar 10, 2025 at 11:29 PM Álvaro Herrera  wrote:
>
> Hello,
>
> I fleshed this out more fully and I think 0001 is good enough to commit.
>

The approach looks good to me, but instead of adding a CAS_flags struct, could
we use macros like SEEN_DEFERRABILITY(bits), SEEN_ENFORCEABILITY(bits),
etc.? We can simply pass cas_bits to these macros, and to avoid the error
from processCASbits(), we can pass NULL for constrType.


Regards,
Amul




Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-03-11 Thread Shubham Khanna
On Mon, Mar 10, 2025 at 5:02 PM Nisha Moond  wrote:
>
> On Mon, Mar 10, 2025 at 4:31 PM Shubham Khanna
>  wrote:
> >
> > > 2) This part of code has another bug:
> > >  "dbinfos.dbinfo->made_publication = false;" incorrectly modifies
> > > dbinfo[0], even if the failure occurs in other databases (dbinfo[1],
> > > etc.). Since cleanup_objects_atexit() checks made_publication per
> > > database, this could lead to incorrect behavior.
> > > Solution: Pass the full 'dbinfo' structure to
> > > drop_publication_by_name() , and use it accordingly.
> > >
> > > --
> >
> > Fixed.
> >
> > The attached patch contains the suggested changes.
> >
>
> Thanks for the patch. Here are few comments for the new change -
> 1)
> +static void check_and_drop_existing_publications(PGconn *conn, struct
> LogicalRepInfo *dbinfo);
>  - move the second parameter to the next line to maintain 80 char length.
>

Fixed.

> 2)
>   pg_log_error("could not drop publication \"%s\" in database \"%s\": %s",
> - dbinfo->pubname, dbinfo->dbname, PQresultErrorMessage(res));
> + pubname, dbinfo->dbname,
> + PQresultErrorMessage(res));
>
> You can keep "PQresultErrorMessage(res));" in the previous line, it
> will still be < 80 chars.
>

Fixed.

> 3) The new IF check -
> +   if (strcmp(pubname, dbinfos.dbinfo->pubname) == 0)
> +   dbinfo->made_publication = false;   /* don't try again. */
>
> always compares dbinfo[0]->pubname, but it should compare 'pubname'
> with the respective database's publication. Please correct it as -
> if (strcmp(pubname, dbinfo->pubname) == 0)
>
>
> --

Fixed.

The attached patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.


v16-0001-Support-for-dropping-all-publications-in-pg_crea.patch
Description: Binary data


Re: Non-text mode for pg_dumpall

2025-03-11 Thread Álvaro Herrera
Hello,

On 2025-Mar-11, Mahendra Singh Thalor wrote:

> In map.dat file, I tried to fix this issue by adding number of characters
> in dbname but as per code comments, as of now, we are not supporting \n\r
> in dbnames so i removed handling.
> I will do some more study to fix this issue.

Yeah, I think this is saying that you should not consider the contents
of map.dat as a shell string.  After all, you're not going to _execute_
that file via the shell.

Maybe for map.dat you need to escape such characters somehow, so that
they don't appear as literal newlines/carriage returns.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: pg_attribute_noreturn(), MSVC, C11

2025-03-11 Thread Andres Freund
Hi,

On 2025-03-10 13:27:07 +0100, Peter Eisentraut wrote:
> From: Peter Eisentraut 
> Date: Thu, 13 Feb 2025 16:01:06 +0100
> Subject: [PATCH v3 1/4] pg_noreturn to replace pg_attribute_noreturn()
> 
> We want to support a "noreturn" decoration on more compilers besides
> just GCC-compatible ones, but for that we need to move the decoration
> in front of the function declaration instead of either behind it or
> wherever, which is the current style afforded by GCC-style attributes.
> Also rename the macro to "pg_noreturn" to be similar to the C11
> standard "noreturn".
> 
> pg_noreturn is now supported on all compilers that support C11 (using
> _Noreturn), as well as MSVC (using __declspec).  (When PostgreSQL
> requires C11, the latter variant can be dropped.)  (We don't need the
> previously used variant for GCC-compatible compilers using
> __attribute__, because all reasonable candidates already support C11,
> so that variant would be dead code in practice.)
> 
> Now, all supported compilers effectively support pg_noreturn, so the
> extra code for !HAVE_PG_ATTRIBUTE_NORETURN can be dropped.
> 
> This also fixes a possible problem if third-party code includes
> stdnoreturn.h, because then the current definition of
> 
> #define pg_attribute_noreturn() __attribute__((noreturn))
> 
> would cause an error.
> 
> Note that the C standard does not support a noreturn attribute on
> function pointer types.  So we have to drop these here.  There are
> only two instances at this time, so it's not a big loss.

That's still somewhat sad, but it's probably not worth having a separate
attribute just for those cases...

With the minor comment suggestion below, I think this looks good.


> diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
> index ec8eddad863..d32c0d318fb 100644
> --- a/src/backend/utils/mmgr/slab.c
> +++ b/src/backend/utils/mmgr/slab.c
> @@ -601,8 +601,8 @@ SlabAllocFromNewBlock(MemoryContext context, Size size, 
> int flags)
>   *   want to avoid that.
>   */
>  pg_noinline
> +pg_noreturn
>  static void
> -pg_attribute_noreturn()
>  SlabAllocInvalidSize(MemoryContext context, Size size)
>  {
>   SlabContext *slab = (SlabContext *) context;

Hm, is it good to put the pg_noreturn after the pg_noinline?


> +/*
> + * pg_noreturn corresponds to the C11 noreturn/_Noreturn function specifier.
> + * We can't use the standard name "noreturn" because some third-party code
> + * uses __attribute__((noreturn)) in headers, which would get confused if
> + * "noreturn" is defined to "_Noreturn", as is done by .
> + */
> +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L
> +#define pg_noreturn _Noreturn
> +#elif defined(_MSC_VER)
> +#define pg_noreturn __declspec(noreturn)
> +#else
> +#define pg_noreturn
> +#endif

I think it'd be good to add a comment explaining the placement of pg_noreturn
in a declaration, it's getting pickier...


> Subject: [PATCH v3 2/4] Add another pg_noreturn

WFM


> Subject: [PATCH v3 3/4] Swap order of extern/static and pg_nodiscard
> 
> Clang in C23 mode requires all attributes to be before extern or
> static.  So just swap these.  This also keeps the order consistent
> with the previously introduced pg_noreturn.

WFM.


> From 0d9f2f63e2166592bd703320cf18dfb61a47ab28 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut 
> Date: Sat, 28 Dec 2024 11:00:24 +0100
> Subject: [PATCH v3 4/4] Support pg_nodiscard on non-GNU compilers that support
>  C23
> 
> Support pg_nodiscard on compilers that support C23 attribute syntax.
> Previously, only GCC-compatible compilers were supported.
> 
> Also, define pg_noreturn similarly using C23 attribute syntax if the
> compiler supports C23.  This doesn't have much of an effect right now,
> since all compilers that support C23 surely also support the existing
> C11-compatible code.  But it keeps pg_nodiscard and pg_noreturn
> consistent.

This is a bit unexciting, but I guess worth it.

Greetings,

Andres Freund




Re: SQL:2011 application time

2025-03-11 Thread Peter Eisentraut

On 26.02.25 06:15, Paul Jungwirth wrote:
 > ON DELETE RESTRICT must be specified when PERIOD BUSINESS_TIME is 
also specified.


Here are some patches removing support for RESTRICT


I have committed this.

I think this is about as much as we can hope to get done from this patch 
series for PG18.  I don't think the subsequent patches are ready enough. 
 As an example, the FOR PORTION OF still has the problem I mentioned at 
, 
and a few similar structural problems.  Also, I see that you have 
recently changed some things to make use of SPI, which seems 
problematic.  This needs much further analysis.


My suggestions is to close the commitfest entry as "committed" and start 
new threads and new entries for the subsequent features.






Re: Commitfest app release on Feb 17 with many improvements

2025-03-11 Thread Tom Lane
Jelte Fennema-Nio  writes:
> Okay, the cause of this seems to be that the CFbot currently uses "git
> apply --3way", not "git am --3way". For some reason "git apply" fails
> to apply the patch while "git am" succeeds. I'll check this weekend if
> I can change the logic to use "git am" instead.

Please see if you can make it use patch(1).  IME git is too
stiff-necked about slightly stale patches no matter which
subcommand you use.

regards, tom lane




Re: INSERT ... ON CONFLICT DO SELECT [FOR ...] take 2

2025-03-11 Thread Kirill Reshke
On Mon, 10 Mar 2025 at 18:05, Kirill Reshke  wrote:
>
> On Wed, 5 Mar 2025 at 07:33, Andreas Karlsson  wrote:
> >
> > On 3/4/25 10:24 AM, Andreas Karlsson wrote:
> > > Rebased the patch to add support for OLD.* and NEW.*.
> >
> > Apparently the CI did not like that version.
> >
> > Andreas
>
> Hi!
> Applied v6.
>
> 1) Should we answer INSERT 0 10 here?

Sorry, i mean: INSERT 0 0

-- 
Best regards,
Kirill Reshke




Re: encode/decode support for base64url

2025-03-11 Thread Florents Tselai
On Tue, Mar 11, 2025 at 12:51 AM Cary Huang  wrote:

> > Oh well - you're probably right.
> > I guess I was blinded by my convenience.
> > Adding a 'base64url' option there is more appropriate.
>
> I agree with it too. It is neater to add "base64url" as a new option for
> encode() and decode() SQL functions in encode.c.
>

Attaching a v2 with that.

>
> In addition, you may also want to add the C versions of base64rul encode
> and decode functions to "src/common/base64.c" as new API calls so that
> the frontend, backend applications and extensions can also have access
> to these base64url conversions.
>
>
We could expose this in base64.c - it'll need some more checking
A few more test cases, especially around padding, are necessary.
I'll come back to this.


v2-0001-Add-base64url-in-encode-decode-functions.patch
Description: Binary data


v2-0002-Fix-declaration-after-statement.patch
Description: Binary data


Re: table_tuple_lock's snapshot argument

2025-03-11 Thread Heikki Linnakangas

On 10/03/2025 18:20, Ranier Vilela wrote:

Just for curiosity.
*RelationFindReplTupleSeq* on the same source, does not suffer
from the same issue?

PushActiveSnapshot(GetLatestSnapshot());

res = table_tuple_lock(rel, &(outslot->tts_tid), GetLatestSnapshot(),


:facepalm: yep, it sure does, and FindConflictTuple() too. Thanks, I'll 
go fix those too.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Modern SHA2- based password hashes for pgcrypto

2025-03-11 Thread Bernd Helmle
Hi,

> I definitely like that passlib have documented their thought process
> thoroughly.
> 

Please find attached v4 of this patch. I added the following changes:

- Check for non-supported characters in the salt like passlib does.
- Check for reserved tokens when parsing the salt string (i find this
very strict, but it covers the cases Japin Li pointed out).

I didn't implement being more strict when it comes to the salt length:
the salt might be provided from external sources and the password hash
is just used for validating within the database.

When hashing the password within postgres, users always should use
gen_salt(), which generates a reasonable salt string. 

Maybe, in case of empty salts, we should issue a WARNING instead of
erroring out and put additional documentation on how to use it right.

> I think using their strict mode is good on principle, but if we're
> going
> to do that, then the salt string should not be used verbatim, but
> instead base64-decoded first to get the actual salt bytes, like they
> do.
> Does this break compabitibility with other systems?  Are
> passlib-generated password hashes incompatible with, say, "openssl
> passwd" which you (Bernd) mentioned at the beginning of the thread?
> Maybe if the password hashes are no longer compatible, then we should
> ditch the idea of restricting salts to base64 chars and accept the
> whole
> range of bytes, like Drepper.
> 
> But in any case ISTM we should reject, as they suggest, the use of
> less
> than 4 bytes of salt (and should perhaps settle for a default of 16,
> as
> passlib suggests).  I suppose this is why passlib returns NULL with
> empty salt.  What we should do in that case IMO is ereport(ERROR). 
> 


Hmm, i didn't understand that passlib does decode them first, i thought
they use it encoded... at least, in our current form we're pretty much
compatible with Drepper, passlib and OpenSSL, as far as i tested:

Empty salt
--

# Drepper reference code

shacrypt password "" sha512crypt
$6$$bLTg4cpho8PIUrjfsE7qlU08Qx2UEfw..xOc6I1wpGVtyVYToGrr7BzRdAAnEr5lYFr
1Z9WcCf1xNZ1HG9qFW1

# Patch

SELECT crypt('password', '$6$$');
   crypt  
───
─
$6$$bLTg4cpho8PIUrjfsE7qlU08Qx2UEfw..xOc6I1wpGVtyVYToGrr7BzRdAAnEr5lYFr1
Z9WcCf1xNZ1HG9qFW1
(1 row)

# Passlib

from passlib.hash import sha256_crypt,sha512_crypt
hash = sha512_crypt.using(salt='').using(rounds=5000).hash("password")
print(hash)
$6$$bLTg4cpho8PIUrjfsE7qlU08Qx2UEfw..xOc6I1wpGVtyVYToGrr7BzRdAAnEr5lYFr
1Z9WcCf1xNZ1HG9qFW1

# OpenSSL

echo password | openssl passwd -6 -stdin -salt '' 


Short salt
--

# Drepper reference code

./shacrypt password abcdef sha512crypt
$6$abcdef$2yYFwtar3.rlLovG0bXbxXMjZe7Z/1EhQ0gEs0nXalsCW04p2iy6unkvbrBFH
1SXexabbdNXXOO11F7sdyurk/

# Patch

SELECT crypt('password', '$6$abcdef$');
  crypt   
───
───
$6$abcdef$2yYFwtar3.rlLovG0bXbxXMjZe7Z/1EhQ0gEs0nXalsCW04p2iy6unkvbrBFH1
SXexabbdNXXOO11F7sdyurk/
(1 row)

# Passlib

from passlib.hash import sha256_crypt,sha512_crypt
hash =
sha512_crypt.using(salt='abcdef').using(rounds=5000).hash("password")
print(hash)
$6$abcdef$2yYFwtar3.rlLovG0bXbxXMjZe7Z/1EhQ0gEs0nXalsCW04p2iy6unkvbrBFH
1SXexabbdNXXOO11F7sdyurk/

# OpenSSL

echo password | openssl passwd -6 -stdin -salt 'abcdef'
$6$abcdef$2yYFwtar3.rlLovG0bXbxXMjZe7Z/1EhQ0gEs0nXalsCW04p2iy6unkvbrBFH
1SXexabbdNXXOO11F7sdyurk/

Salt > MAXLEN(16 Bytes)
---

# Drepper reference code
(truncates salt)

shacrypt password abcdefghijklmnopqrstuvwxyz sha512crypt
$6$abcdefghijklmnop$0aenUFHf897F9u0tURIHOeACWajSuVGa7jgJGyq.DKZm/WXl/IZ
FvPbneFydBjomEOgM.Sh1m0L3KsS1.H5b//

# Patch
(truncates salt)

SELECT crypt('password', '$6$abcdefghijklmnopqrstuvwxyz$');
   crypt  
───
─
$6$abcdefghijklmnop$0aenUFHf897F9u0tURIHOeACWajSuVGa7jgJGyq.DKZm/WXl/IZF
vPbneFydBjomEOgM.Sh1m0L3KsS1.H5b//
(1 row)

# Passlib

Errors out with

ValueError: salt too large (sha512_crypt requires <= 16 chars)

# OpenSSL
(truncates salt)

echo password | openssl passwd -6 -stdin -salt
'abcdefghijklmnopqrstuvwxyz'
$6$abcdefghijklmnop$0aenUFHf897F9u0tURIHOeACWajSuVGa7jgJGyq.DKZm/WXl/IZ
FvPbneFydBjomEOgM.Sh1m0L3KsS1.H5b//

Salt with "invalid" character
-

# Drepper reference code

./shacrypt password abcdefghi%jklmno sha512crypt
$6$abcdefghi%jklmno$LMN/V1pW97IoK0rWSDVqCo9EYd6zpqP0TdTX9.cxFAFqsdSMWQM
jehkmMtDzL36VBKeG6dg.kFAQKoFvZpK0G.

# Patch
current version V4 (attached)

Errors out:

SELECT crypt('password', '$6$abcdefghi%jklmno$');
FEHLER:  invalid character in salt string: "%"
T

Re: Printing window function OVER clauses in EXPLAIN

2025-03-11 Thread Tom Lane
David Rowley  writes:
> What are your thoughts on being a bit more brief with the naming and
> just prefix with "w" instead of "window"? Looking at window.out, I see
> that the EXPLAIN output does become quite a bit wider than before. I
> favour the idea of saving a bit of space.  There is an example in
> src/sgml/advanced.sgml that has "OVER w", so it does not seem overly
> strange to me to name them "w1", "w2", etc.

OK by me, any objections elsewhere?

>> * In passing, I editorialized on the order in which the Run Condition
>> annotation comes out:

> I did it that way because "Rows Removed by Filter" is a property of
> "Filter", so it makes sense to me for those to be together. It doesn't
> make sense to me to put something unrelated between them.

Hmm, OK.  Do you think it could be sensible to put Run Condition
before Filter, then?  On the same grounds of "keeping related
things together", it could be argued that Run Condition is
related to the Window property.  Also, the Run Condition acts
before the Filter does, if I've got my head screwed on straight.

regards, tom lane




Re: DOCS: Make the Server Application docs synopses more consistent

2025-03-11 Thread David G. Johnston
On Thu, Dec 12, 2024 at 8:25 PM Peter Smith  wrote:

> [1] initdb [option...] [ --pgdata | -D ] directory
> [2] pg_archivecleanup [option...] archivelocation oldestkeptwalfile
> [3] pg_checksums [option...] [[ -D | --pgdata ]datadir]
> [4] pg_controldata [option] [[ -D | --pgdata ]datadir]
> [5] pg_createsubscriber [option...] { -d | --database }dbname { -D |
> --pgdata }datadir { -P | --publisher-server }connstr
> [6] pg_ctl (the many synopses here don't give all the switch
> alternatives, it would be too much...)
> [7] pg_resetwal [ -f | --force ] [ -n | --dry-run ] [option...] [ -D |
> --pgdata ]datadir
> [8] pg_rewind [option...] { -D | --target-pgdata } directory {
> --source-pgdata=directory | --source-server=connstr }
> [9] pg_test_fsync [option...]
> [10] pg_test_timing [option...]
> [11] pg_upgrade -b oldbindir [-B newbindir] -d oldconfigdir -D
> newconfigdir [option...]
> [12] pg_waldump [option...] [startseg [endseg]]
> [13] pg_walsummary [option...] [file...]
>
>
Here are some guidelines we should add to [1].

I don't feel listing both short and long form args is a good use of space -
and the { | } impairs readability.  The short form combined with, usually,
a decent replaceable name, shows the reader the common interactive usage
and they can find the long forms in the Options section.  Maybe use long
forms for value-less options since those don't get the argname hint.

The non-space between option and its value reduces readability.  IIUC a
space in between doesn't impact correctness so I say go for readability.
This becomes more pronounced with the first items since it is only
marginally readable now because there is always a } or ] before the
replaceable.  Though this may involve fighting the markup a bit...I haven't
experimented yet (pg_rewind managed it...).

The first listed command should probably only include mandatory options.
When there are multiple combinations of mandatory options show multiple
commands, one for each variant.  Use examples to showcase idiomatic usage
patterns with descriptions.  There is room to argue exceptions to be listed
also in Synopsis.

Establish the convention of datadir as the replaceable name.  Possibly do
the same for other common items.

We should have a reference page (near [1] listing DocBook elements and our
guidelines for how/where to use them.

In any case, details aside, modifying [1] with whatever is agreed to (and
making changes to conform) is something I agree should happen.

David J.

[1]
https://www.postgresql.org/docs/current/docguide-style.html#DOCGUIDE-STYLE-REF-PAGES


Re: Statistics import and export: difference in statistics of materialized view dumped

2025-03-11 Thread Tom Lane
Jeff Davis  writes:
> On Tue, 2025-03-11 at 10:17 -0400, Tom Lane wrote:
>> Are you doing the restore in parallel by any chance?  I had a todo
>> item to revisit the dependencies that pg_dump is creating for stats
>> items, because they looked wrong to me, ie inadequate to guarantee
>> correct restore order.

> It's creating a dependency on the relation and a boundary dependency on
> the postDataBound (unless it's an index, or an MV that got pushed to
> SECTION_POST_DATA).
> I suspect what we need here is a dependency on the MV *data*, because
> that's doing a heap swap, which resets the stats. Looking into it.

Right, that was what I was thinking, but hadn't had time to look in
detail.  The postDataBound dependency isn't real helpful here, we
could lose that if we had the data dependency.

regards, tom lane




Re: bogus error message for ALTER TABLE ALTER CONSTRAINT

2025-03-11 Thread Álvaro Herrera
On 2025-Mar-11, jian he wrote:

> but it's better to align CREATE DOMAIN with ALTER DOMAIN.
> For example, the following two logic should behave the same.
> 
> create domain d_int as int4 constraint nn1 not null initially immediate;
> alter domain d_int add constraint nn1 not null initially immediate;

Sure, they should.

> Also if we do not error out, then in the create_domain.sgml, alter_domain.sgml
>  section we should include these "useless" keywords.

Yeah, I guess we should do that.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: Non-text mode for pg_dumpall

2025-03-11 Thread Mahendra Singh Thalor
Thanks Alvaro and Jian for the review and feedback.

On Wed, 5 Mar 2025 at 20:42, Álvaro Herrera  wrote:
>
> Disclaimer: I didn't review these patches fully.
>
> On 2025-Mar-05, Mahendra Singh Thalor wrote:
>
> > On Wed, 5 Mar 2025 at 01:02, Álvaro Herrera  wrote:
> >
> > > A database name containing a newline breaks things for this patch:
> > >
> > > CREATE DATABASE "foo
> > > bar";
>
> > I also reported this issue on 29-01-2025. This breaks even without this
> > patch also.
>
> Okay, we should probably fix that, but I think the new map.dat file your
> patch adds is going to make the problem worse, because it doesn't look
> like you handled that case in any particular way that would make it not
> fail.  I think it would be good to avoid digging us up even deeper in
> that hole.  More generally, the pg_upgrade tests contain some code to
> try database names with almost all possible ascii characters (see
> generate_db in pg_upgrade/t/002_pg_upgrade.pl); it would be good to
> ensure that this new functionality also works correctly for that --
> perhaps add an equivalent test to the pg_dumpall test suite.

As Jian also pointed out, we should not allow \n\r in dbnames. I am
keeping dbanames as single line names only.

I am doing testing using the pg_upgrade/t/002_pg_upgrade.pl file to
check different-2 dbnames.

>
> Looking at 0001:
>
> I'm not sure that the whole common_dumpall_restore.c thing is properly
> structured.  First, the file name shouldn't presume which programs
> exactly are going to use the funcionality there.  Second, it looks like
> there's another PQconnectdbParams() in pg_backup_db.c and I don't
> understand what the reason is for that one to be separate.  In my mind,
> there should be a file maybe called connection.c or connectdb.c or
> whatever that's in charge of establishing connection for all the
> src/bin/pg_dump programs, for cleanliness sake.  (This is probably also
> the place where to put an on_exit callback that cleans up any leftover
> connections.)

I did some more refactoring and made a connectdb.c file.

> Looking at 0002 I see it mentions looking at the EXIT_NICELY macro for
> documentation.  No such macro exists.  But also I think the addition
> (and use) of reset_exit_nicely_list() is not a good idea.  It seems to
> assume that the only entries in that list are ones that can be cleared
> and reinstated whenever.  This makes too much of an assumption about how
> the program works.  It may work today, but it'll get in the way of any
> other patch that wants to set up some different on-exit clean up.  In
> other words, we shouldn't reset the on_exit list at all.  Also, this is
> just a weird addition:

Based on some discussions, I added handling for cleanup. for 1st
database, I am saving index of array and then I am using same index
for rest of the databases as we are closing archive file in
CloseArchive so we can use same index for next database.

>
> #define exit_nicely(code) exit(code)

Fixed.

>
> You added "A" as an option to the getopt_long() call in pg_restore, but
> no handling for it is added.
Fixed.

>
> I think the --globals-only option to pg_restore should be a separate
> commit.
I will make this in the next version.

Here, I am attaching updated patches for review and testing.


-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


v22_0001_move-common-code-of-pg_dumpall-and-pg_restore-to-new_file.patch
Description: Binary data


v22_0002_pg_dumpall-with-non-text_format-11th_march.patch
Description: Binary data


Re: AIO v2.5

2025-03-11 Thread Melanie Plageman
On Mon, Mar 10, 2025 at 2:23 PM Andres Freund  wrote:
>
> - 0016 to 0020 - cleanups for temp buffers code - I just wrote these to clean
>   up the code before making larger changes, needs review

This is a review of 0016-0020

Commit messages for 0017-0020 are thin. I assume you will beef them up
a bit before committing. Really, though, those matter much less than
0016 which is an actual bug (or pre-bug) fix. I called out the ones
where I think you should really consider adding more detail to the
commit message.

0016:

  * the case, write it out before reusing it!
  */
-if (buf_state & BM_DIRTY)
+if (pg_atomic_read_u32(&bufHdr->state) & BM_DIRTY)
 {
+uint32buf_state = pg_atomic_read_u32(&bufHdr->state);

I don't love that you fetch in the if statement and inside the if
statement. You wouldn't normally do this, so it sticks out. I get that
you want to avoid having the problem this commit fixes again, but
maybe it is worth just fetching the buf_state above the if statement
and adding a comment that it could have changed so you must do that.
Anyway, I think your future patches make the local buf_state variable
in this function obsolete, so perhaps it doesn't matter.

Not related to this patch, but while reading this code, I noticed that
this line of code is really weird:
LocalBufHdrGetBlock(bufHdr) = GetLocalBufferStorage();
I actually don't understand what it is doing ... setting the result of
the macro to the result of GetLocalBufferStorage()? I haven't seen
anything like that before.

Otherwise, this patch LGTM.

0017:

+++ b/src/backend/storage/buffer/localbuf.c
@@ -56,6 +56,7 @@ static intNLocalPinnedBuffers = 0;
 static Buffer GetLocalVictimBuffer(void);
+static void InvalidateLocalBuffer(BufferDesc *bufHdr, bool check_unreferenced);

Technically this line is too long

+ * InvalidateLocalBuffer -- mark a local buffer invalid.
+ *
+ * If check_unreferenced is true, error out if the buffer is still
+ * used. Passing false is appropriate when redesignating the buffer instead
+ * dropping it.
+ *
+ * See also InvalidateBuffer().
+ */
+static void
+InvalidateLocalBuffer(BufferDesc *bufHdr, bool check_unreferenced)
+{

I was on the fence about the language "buffer is still used", since
this is about the ref count and not the usage count. If this is the
language used elsewhere perhaps it is fine.

I also was not sure what redesignate means here. If you mean to use
this function in the future in other contexts than eviction and
dropping buffers, fine. But otherwise, maybe just use a more obvious
word (like eviction).

0018:

Compiler now warns that buf_state is unused in GetLocalVictimBuffer().

@@ -4564,8 +4548,7 @@ FlushRelationBuffers(Relation rel)
 IOCONTEXT_NORMAL, IOOP_WRITE,
 io_start, 1, BLCKSZ);

-buf_state &= ~(BM_DIRTY | BM_JUST_DIRTIED);
-pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
+TerminateLocalBufferIO(bufHdr, true, 0);

FlushRelationBuffers() used to clear BM_JUST_DIRTIED, which it seems
like wouldn't have been applicable to local buffers before, but,
actually with async IO could perhaps happen in the future? Anyway,
TerminateLocalBuffers() doesn't clear that flag, so you should call
that out if it was intentional.

@@ -5652,8 +5635,11 @@ TerminateBufferIO(BufferDesc *buf, bool
clear_dirty, uint32 set_flag_bits,
+buf_state &= ~BM_IO_IN_PROGRESS;
+buf_state &= ~BM_IO_ERROR;
-buf_state &= ~(BM_IO_IN_PROGRESS | BM_IO_ERROR);

Is it worth mentioning in the commit message that you made a cosmetic
change to TerminateBufferIO()?

0019:
LGTM

0020:
This commit message is probably tooo thin. I think you need to at
least say something about this being used by AIO in the future. Out of
context of this patch set, it will be confusing.

+/*
+ * Like StartBufferIO, but for local buffers
+ */
+bool
+StartLocalBufferIO(BufferDesc *bufHdr, bool forInput, bool nowait)
+{

I think you could use a comment about why nowait might be useful for
local buffers in the future. It wouldn't make sense with synchronous
I/O, so it feels a bit weird without any comment.

+if (forInput ? (buf_state & BM_VALID) : !(buf_state & BM_DIRTY))
+{
+/* someone else already did the I/O */
+UnlockBufHdr(bufHdr, buf_state);
+return false;
+}

UnlockBufHdr() explicitly says it should not be called for local
buffers. I know that code is unreachable right now, but it doesn't
feel quite right. I'm not sure what the architecture of AIO local
buffers will be like, but if other processes can't access these
buffers, I don't know why you would need BM_LOCKED. And if you will, I
think you need to edit the UnlockBufHdr() comment.

@@ -1450,13 +1450,11 @@ static inline bool
 WaitReadBuffersCanStartIO(Buffer buffer, bool nowait)
 {
 if (BufferIsLocal(buffer))
 else
-return StartBufferIO(GetBufferDescrip

Re: protocol support for labels

2025-03-11 Thread Frits Hoogland
The usecase that I think might be useful is to have a database client send 
metadata along with a query.
This partially is possible today by setting application_name, but that is a 
separate request, it would be great if that could be sent along with the query 
in one go.
Another option to pass metadata is to add a comment (/* .. */), but a comment 
cannot be set for a prepared statement, because the statement is prepared first 
and then later invoked on runtime, which executes a query that is fixed.

Frits Hoogland




> On 11 Mar 2025, at 15:49, Jeremy Schneider  wrote:
> 
> 
>> On Mar 11, 2025, at 3:03 AM, Kirill Reshke  wrote:
>> 
>> On Tue, 11 Mar 2025 at 11:09, Jeremy Schneider  
>> wrote:
>> 
>>> observability frameworks like OpenTelemetry support tracing through all
>>> layers of a stack, and trace_ids can even be passed into sql with
>>> extensions like sqlcommenter. however sqlcommenter puts the trace_id
>>> into a comment which effectively breaks all the utility of
>>> pg_stat_statements since each query has a unique trace_id.
>>> 
>> There are some other use-cases:
>> 1) RO-RW routing. Users can pass target-session-attrs to the server
>> within query labels to hint about its need. Useful when some kind of
>> proxy (odyssey,pgbouncer,spqr,pgpool II, pgcat, pg_doorman) is used.
>> 2) pg_comment_stats uses comments in the query to accumulate statistics. [0]
> 
> 
> Thinking a bit more, my root issue is specifically around pg_stat_statements 
> so maybe it’s also solvable with some changes to how query jumbling is done
> 
> But that topic seems like one where we’d never get consensus
> 
> Should query jumbling for calculating query_id be customizable somehow? How 
> would we resolve multiple concurrent opinions about how queries should be 
> jumbled (eg if comment_stats needs different tweaks than sqlcommenter)? Was 
> there previous discussion about this already? I’ll need to go search mailing 
> list history a bit
> 
> -Jeremy
> 
> 
> Sent from my TI-83
> 



Re: Wrong results with subquery pullup and grouping sets

2025-03-11 Thread Dean Rasheed
On Mon, 10 Mar 2025 at 13:05, Richard Guo  wrote:
>
> Attached are the patches.
>

This looks good to me. I did some more testing, and I wasn't able to break it.

Some minor nitpicks:

These 2 comment changes from 0002 could be made part of 0001:

1). In pull_up_simple_subquery(), removing the word "Again" from the
comment following the deleted block, since this is now the only place
that sets wrap_non_vars there.

2). In pull_up_constant_function(), moving "(See comments in
pull_up_simple_subquery().)" to the next comment.

> Another thing is that when deciding whether to wrap the newnode in
> pullup_replace_vars_callback(), I initially thought that we didn't
> need to handle Vars/PHVs specifically, and could instead merge them
> into the branch for handling general expressions.  However, doing so
> causes a plan diff in the regression tests.  The reason is that a
> non-strict construct hidden within a lower-level PlaceHolderVar can
> lead the code for handling general expressions to mistakenly think
> that another PHV is needed, even when it isn't.  Therefore, the
> branches for handling Vars/PHVs are retained in 0002.

Right. The comment addition in 0002, relating to that, confused me at first:

 * This analysis could be tighter: in particular, a non-strict
 * construct hidden within a lower-level PlaceHolderVar is not
 * reason to add another PHV.  But for now it doesn't seem
-* worth the code to be more exact.
+* worth the code to be more exact.  This is also why we need
+* to handle Vars and PHVs in the above branches, rather than
+* merging them into this one.

AIUI, it's not really that it *needs* to handle Vars and PHVs
separately, it's just better if it does, since that avoids
unnecessarily wrapping a PHV again, if it contains non-strict
constructs. Also, AFAICS there's no technical reason why simple Vars
couldn't be handled here (all the tests pass if that branch is
removed), but as a comment higher up says, that would be more
expensive. So perhaps this new comment should say "This is why it's
preferable to handle simple PHVs in the branch above, rather than this
branch."

Finally, ReplaceWrapOption should be added to pgindent's typedefs.list.

Regards,
Dean




Re: [PATCH] Add sortsupport for range types and btree_gist

2025-03-11 Thread Bernd Helmle
Hi,

Am Montag, dem 09.12.2024 um 18:10 +0100 schrieb Bernd Helmle:
> > I think we have two options:
> > 1. Just do not commit tests. We ran it manually, saw that paths are
> > taken, we are happy.
> 
> So here's a version with the original, unchanged regression tests and
> injection point removed (i hope i forgot nothing to revert).
> 
> This version also fixes cidr sortsupport (thanks again Andrey for
> spotting this), where i accidently used inet instead of the correct
> cidr datatype to pass down to the sortsupport function.

Please find a new rebased version of this patch. No changes to the
patch itself, but it didn't apply without conflicts anymore.

Thanks,
Bernd

From 5cd326a29d5f6bb7ffb0c6328870cfd113ed1b42 Mon Sep 17 00:00:00 2001
From: Bernd Helmle 
Date: Tue, 11 Mar 2025 19:15:13 +0100
Subject: [PATCH] Add support for sorted gist index builds to btree_gist

This enables sortsupport in the btree_gist extension for faster
builds of gist indexes. Additionally sortsupport is also added for range types
to the backend so that gist indexes on range types also benefit from the
speed up.

Sorted gist index build strategy is the new default now. Regression tests are
unchanged and are using the sorted build strategy instead.
---
 contrib/btree_gist/Makefile |   2 +-
 contrib/btree_gist/btree_bit.c  |  46 +
 contrib/btree_gist/btree_bool.c |  22 +++
 contrib/btree_gist/btree_bytea.c|  35 
 contrib/btree_gist/btree_cash.c |  27 +++
 contrib/btree_gist/btree_date.c |  25 +++
 contrib/btree_gist/btree_enum.c |  33 
 contrib/btree_gist/btree_float4.c   |  28 +++
 contrib/btree_gist/btree_float8.c   |  27 +++
 contrib/btree_gist/btree_gist--1.8--1.9.sql | 192 
 contrib/btree_gist/btree_gist.control   |   2 +-
 contrib/btree_gist/btree_inet.c |  27 +++
 contrib/btree_gist/btree_int2.c |  26 +++
 contrib/btree_gist/btree_int4.c |  33 +++-
 contrib/btree_gist/btree_int8.c |  26 +++
 contrib/btree_gist/btree_interval.c |  25 +++
 contrib/btree_gist/btree_macaddr.c  |  29 +++
 contrib/btree_gist/btree_macaddr8.c |  24 +++
 contrib/btree_gist/btree_numeric.c  |  33 
 contrib/btree_gist/btree_oid.c  |  28 +++
 contrib/btree_gist/btree_text.c |  34 
 contrib/btree_gist/btree_time.c |  26 +++
 contrib/btree_gist/btree_ts.c   |  25 +++
 contrib/btree_gist/btree_utils_var.h|  12 +-
 contrib/btree_gist/btree_uuid.c |  25 +++
 contrib/btree_gist/meson.build  |   1 +
 doc/src/sgml/btree-gist.sgml|   7 +
 src/backend/utils/adt/rangetypes_gist.c |  70 +++
 src/include/catalog/pg_amproc.dat   |   3 +
 src/include/catalog/pg_proc.dat |   3 +
 30 files changed, 888 insertions(+), 8 deletions(-)
 create mode 100644 contrib/btree_gist/btree_gist--1.8--1.9.sql

diff --git a/contrib/btree_gist/Makefile b/contrib/btree_gist/Makefile
index 7ac2df26c10..68190ac5e46 100644
--- a/contrib/btree_gist/Makefile
+++ b/contrib/btree_gist/Makefile
@@ -34,7 +34,7 @@ DATA = btree_gist--1.0--1.1.sql \
btree_gist--1.1--1.2.sql btree_gist--1.2.sql btree_gist--1.2--1.3.sql \
btree_gist--1.3--1.4.sql btree_gist--1.4--1.5.sql \
btree_gist--1.5--1.6.sql btree_gist--1.6--1.7.sql \
-   btree_gist--1.7--1.8.sql
+   btree_gist--1.7--1.8.sql btree_gist--1.8--1.9.sql
 PGFILEDESC = "btree_gist - B-tree equivalent GiST operator classes"
 
 REGRESS = init int2 int4 int8 float4 float8 cash oid timestamp timestamptz \
diff --git a/contrib/btree_gist/btree_bit.c b/contrib/btree_gist/btree_bit.c
index f346b956fa9..35aa5578f83 100644
--- a/contrib/btree_gist/btree_bit.c
+++ b/contrib/btree_gist/btree_bit.c
@@ -6,6 +6,7 @@
 #include "btree_gist.h"
 #include "btree_utils_var.h"
 #include "utils/fmgrprotos.h"
+#include "utils/sortsupport.h"
 #include "utils/varbit.h"
 
 
@@ -18,10 +19,33 @@ PG_FUNCTION_INFO_V1(gbt_bit_picksplit);
 PG_FUNCTION_INFO_V1(gbt_bit_consistent);
 PG_FUNCTION_INFO_V1(gbt_bit_penalty);
 PG_FUNCTION_INFO_V1(gbt_bit_same);
+PG_FUNCTION_INFO_V1(gbt_bit_sortsupport);
+PG_FUNCTION_INFO_V1(gbt_varbit_sortsupport);
 
 
 /* define for comparison */
 
+static int
+gbt_bit_ssup_cmp(Datum x, Datum y, SortSupport ssup)
+{
+	GBT_VARKEY *key1 = PG_DETOAST_DATUM(x);
+	GBT_VARKEY *key2 = PG_DETOAST_DATUM(y);
+
+	GBT_VARKEY_R arg1 = gbt_var_key_readable(key1);
+	GBT_VARKEY_R arg2 = gbt_var_key_readable(key2);
+	Datum result;
+
+	/* lower is always equal to upper, so just compare those fields */
+	result = DirectFunctionCall2(byteacmp,
+ PointerGetDatum(arg1.lower),
+ PointerGetDatum(arg2.lower));
+
+	GBT_FREE_IF_COPY(key1, x);
+	GBT_FREE_IF_COPY(key2, y);
+
+	return DatumGetInt32(result);
+}
+
 static bool
 gbt_bitgt(const void *a, const void *b, Oid collation, Fm

Re: Some read stream improvements

2025-03-11 Thread Thomas Munro
On Thu, Feb 27, 2025 at 11:20 PM Andres Freund  wrote:
> On 2025-02-27 11:19:55 +1300, Thomas Munro wrote:
> > On Wed, Feb 26, 2025 at 10:55 PM Andres Freund  wrote:
> > > I was working on expanding tests for AIO and as part of that wrote a test 
> > > for
> > > temp tables -- our coverage is fairly awful, there were many times during 
> > > AIO
> > > development where I knew I had trivially reachable temp table specific 
> > > bugs
> > > but all tests passed.
> > >
> > > The test for that does trigger the problem described above and is fixed 
> > > by the
> > > patches in this thread (which I included in the other thread):

Here is a subset of those patches again:

1.  Per-backend buffer limit, take III.  Now the check is in
read_stream_start_pending_read() so TOC == TOU.

Annoyingly, test cases like the one below still fail, despite
following the rules.  The other streams eat all the buffers and then
one gets an allowance of zero, but uses its right to take one pin
anyway to make progress, and there isn't one.  I wonder if we should
use temp_buffers - 100?  Then leave the minimum GUC value at 100
still, so you have an easy way to test with 0, 1, ... additional
buffers?

2.  It shouldn't give up issuing random advice immediately after a
jump, or it could stall on (say) the second 128kB of a 256kB
sequential chunk (ie the strace you showed on the BHS thread).  It
only makes sense to assume kernel readahead takes over once you've
actually *read* sequentially.  In practice this makes it a lot more
aggressive about advice (like the BHS code in master): it only gives
up if the whole look-ahead window is sequential.

3.  Change the distance algorithm to care only about hits and misses,
not sequential heuristics.  It made at least some sense before, but it
doesn't make sense for AIO, and even in synchronous mode it means that
you hit random jumps with insufficient look-ahead, so I don't think we
should keep it.

I also realised that the sequential heuristics are confused by that
hidden trailing block thing, so in contrived pattern testing with
hit-miss-hit-miss... would be considered sequential, and even if you
fix that (the forwarding patches above fix that), an exact
hit-miss-hit-miss pattern also gets stuck between distances 1 and 2
(double, decrement, double, ... might be worth waiting a bit longer
before decrementing, IDK.

I'll rebase the others and post soon.


set io_combine_limit = 32;
set temp_buffers = 100;

create temp table t1 as select generate_series(1, 1);
create temp table t2 as select generate_series(1, 1);
create temp table t3 as select generate_series(1, 1);
create temp table t4 as select generate_series(1, 1);
create temp table t5 as select generate_series(1, 1);

do
$$
declare
  c1 cursor for select * from t1;
  c2 cursor for select * from t2;
  c3 cursor for select * from t3;
  c4 cursor for select * from t4;
  c5 cursor for select * from t5;
  x record;
begin
  open c1;
  open c2;
  open c3;
  open c4;
  open c5;
  loop
fetch next from c1 into x;
exit when not found;
fetch next from c2 into x;
exit when not found;
fetch next from c3 into x;
exit when not found;
fetch next from c4 into x;
exit when not found;
fetch next from c5 into x;
exit when not found;
  end loop;
end;
$$;
From dd6b334cb744e71fc042057624e4953e3c039629 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 27 Feb 2025 21:03:39 +1300
Subject: [PATCH v2 1/4] Improve buffer manager API for backend pin limits.

Previously the support functions assumed that the caller needed one pin
to make progress, and could optionally use some more.  Add a couple more
functions for callers that want to know:

* what the maximum possible number could be irrespective of currently
  held pins, for space planning purposes, called the "soft pin limit"

* how many additional pins they could acquire right now, without the
  special case allowing one pin, for users that already hold pins and
  could make progress even if zero extra pins are available

These APIs are better suited to read_stream.c, which will be improved in
a follow-up patch.  Also compute MaxProportionalPins up front, to avoid
performing division whenever we check the balance.

Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
---
 src/backend/storage/buffer/bufmgr.c   | 83 +++
 src/backend/storage/buffer/localbuf.c | 16 ++
 src/include/storage/bufmgr.h  |  4 ++
 3 files changed, 78 insertions(+), 25 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 7915ed624c1..3f44681b1fe 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -211,6 +211,8 @@ static int32 PrivateRefCountOverflowed = 0;
 static uint32 PrivateRefCountClock = 0;
 static PrivateRefCountEntry *ReservedRefCountEntry = NULL;
 
+static uint32 MaxProportionalPins;
+
 stati

Re: AIO v2.5

2025-03-11 Thread Melanie Plageman
On Tue, Mar 11, 2025 at 1:56 PM Andres Freund  wrote:
>
> Hi,
>
> On 2025-03-11 11:31:18 -0400, Melanie Plageman wrote:
> > Commit messages for 0017-0020 are thin. I assume you will beef them up
> > a bit before committing.
>
> Yea. I wanted to get some feedback on whether these refactorings are a good
> idea or not...

I'd say yes, they seem like a good idea.

> > Really, though, those matter much less than 0016 which is an actual bug (or
> > pre-bug) fix. I called out the ones where I think you should really consider
> > adding more detail to the commit message.
> >
> > 0016:
>
> Do you think we should backpatch that change? It's not really an active bug in
> 16+, but it's also not quite right. The other changes surely shouldn't be
> backpatched...

I don't feel strongly about it. PinLocalBuffer() is passed with
adjust_usagecount false and we have loads of other places where things
would just not work if we changed the boolean flag passed in to a
function called by it (bgwriter and SyncOneBuffer() with
skip_recently_used comes to mind).

On the other hand it's a straightforward fix that only needs to be
backpatched a couple versions, so it definitely doesn't hurt.

> > +++ b/src/backend/storage/buffer/localbuf.c
> > @@ -56,6 +56,7 @@ static intNLocalPinnedBuffers = 0;
> >  static Buffer GetLocalVictimBuffer(void);
> > +static void InvalidateLocalBuffer(BufferDesc *bufHdr, bool 
> > check_unreferenced);
> >
> > Technically this line is too long
>
> Oh, do I love our line length limits. But, um, is it actually too long? It's
> 78 chars, which is exactly our limit, I think?

Teccchnically it's 79, which is why it showed up for me with this
handy line from the committing wiki page

git diff origin/master -- src/backend/storage/buffer/localbuf.c | grep
-E '^(\+|diff)' | sed 's/^+//' | expand -t4 | awk "length > 78 ||
/^diff/"

But anyway, it doesn't really matter. I only mentioned it because I
noticed it visually looked long.

> > +if (forInput ? (buf_state & BM_VALID) : !(buf_state & BM_DIRTY))
> > +{
> > +/* someone else already did the I/O */
> > +UnlockBufHdr(bufHdr, buf_state);
> > +return false;
> > +}
> >
> > UnlockBufHdr() explicitly says it should not be called for local
> > buffers. I know that code is unreachable right now, but it doesn't
> > feel quite right. I'm not sure what the architecture of AIO local
> > buffers will be like, but if other processes can't access these
> > buffers, I don't know why you would need BM_LOCKED. And if you will, I
> > think you need to edit the UnlockBufHdr() comment.
>
> You are right, this is a bug in my change. I started with a copy of
> StartBufferIO() and whittled it down insufficiently. Thanks for catching that!
>
> Wonder if we should add an assert against this to UnlockBufHdr()...

Yea, I think that makes sense.

- Melanie




Re: maintenance_work_mem = 64kB doesn't work for vacuum

2025-03-11 Thread John Naylor
On Mon, Mar 10, 2025 at 1:46 AM Masahiko Sawada  wrote:
>
> Commit bbf668d66fbf6 (back-patched to v17) lowered the minimum
> maintenance_work_mem to 64kB, but it doesn't work for parallel vacuum

That was done in the first place to make a regression test for a bug
fix easier, but that test never got committed. In any case I found it
worked back in July:

https://www.postgresql.org/message-id/CANWCAZZb7wd403wHQQUJZjkF%2BRWKAAa%2BWARP0Rj0EyMcfcdN9Q%40mail.gmail.com

-- 
John Naylor
Amazon Web Services




Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2025-03-11 Thread Sami Imseih
> It seems to me that if this fixes the issue, that the next similar one
> is already lurking in the shadows waiting to jump out on us.

Yes, this is true that there may be other cases, but I am not sure if
it's worth carrying all the
extra bytes in the jumble to deal with a few cases like this. This is
why I don't think Variant B
or tracking the offset is a thrilling idea. -1 for me.

--
Sami




Re: vacuumdb changes for stats import/export

2025-03-11 Thread Nathan Bossart
On Mon, Mar 10, 2025 at 10:08:49AM -0500, Nathan Bossart wrote:
> On Mon, Mar 10, 2025 at 12:35:22PM +0700, John Naylor wrote:
>> I have no further comments.
> 
> Thanks.  I'll give this a little more time for review before committing.

I realized that we could limit the catalog query reuse to only when
--missing-only is specified, so I've updated 0001 and 0002 accordingly.
This avoids changing any existing behavior.

> We'll still need to update the recommendation in pg_upgrade's
> documentation.  I'm going to keep that separate because the stats
> import/export work is still settling.

0003 is a first attempt at this.  Unless I am missing something, there's
really not much to update.

-- 
nathan
>From 3840b2456e5485a591becc4cfe19dba4b1a54f09 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 11 Mar 2025 11:18:36 -0500
Subject: [PATCH v5 1/3] vacuumdb: Teach vacuum_one_database() to reuse query
 results.

Presently, each call to vacuum_one_database() performs a catalog
query to retrieve the list of tables to process.  A proposed
follow-up commit would add a "missing only" feature to
--analyze-in-stages, which requires us to save the results of the
catalog query (since tables without statistics would have them
after the first stage).  This commit adds this ability via a new
parameter for vacuum_one_database() that specifies either a
previously-retrieved list to process or a place to store the
results of the catalog query for later use.  Note that this commit
does not make use of this new parameter for anything.  That is left
as a future exercise.

The trade-offs of this approach are increased memory usage and less
responsiveness to concurrent catalog changes in later stages,
neither of which is expected to bother anyone.

Author: Corey Huinker 
Co-authored-by: Nathan Bossart 
Reviewed-by: John Naylor 
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 src/bin/scripts/vacuumdb.c | 330 ++---
 1 file changed, 194 insertions(+), 136 deletions(-)

diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 982bf070be6..e28f82a0eba 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -62,10 +62,16 @@ typedef enum
 
 static VacObjFilter objfilter = OBJFILTER_NONE;
 
+static SimpleStringList *retrieve_objects(PGconn *conn,
+   
  vacuumingOptions *vacopts,
+   
  SimpleStringList *objects,
+   
  bool echo);
+
 static void vacuum_one_database(ConnParams *cparams,

vacuumingOptions *vacopts,
int stage,

SimpleStringList *objects,
+   
SimpleStringList **found_objs,
int 
concurrentCons,
const char 
*progname, bool echo, bool quiet);
 
@@ -405,7 +411,7 @@ main(int argc, char *argv[])
{
vacuum_one_database(&cparams, &vacopts,
stage,
-   
&objects,
+   
&objects, NULL,

concurrentCons,

progname, echo, quiet);
}
@@ -413,7 +419,7 @@ main(int argc, char *argv[])
else
vacuum_one_database(&cparams, &vacopts,

ANALYZE_NO_STAGE,
-   &objects,
+   &objects, NULL,
concurrentCons,
progname, echo, 
quiet);
}
@@ -461,8 +467,36 @@ escape_quotes(const char *src)
 /*
  * vacuum_one_database
  *
- * Process tables in the given database.  If the 'objects' list is empty,
- * process all tables in the database.
+ * Process tables in the given database.
+ *
+ * There are two ways to specify the list of objects to process:
+ *
+ * 1) The "found_objs" parameter is a double pointer to a fully qualified list
+ *of objects to process, as returned by a previous call to
+ *vacuum_one_database().
+ *
+ * a) If both "found_objs" (the double pointer) and "*found_objs" (the
+ 

Re: per backend WAL statistics

2025-03-11 Thread Bertrand Drouvot
Hi,

On Mon, Mar 10, 2025 at 03:08:49PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> Thank you for working on this!
> 
> I just started reading the code and have a couple of questions.

Thanks for looking at it!

> I think that every time we flush IO or WAL stats, we want(?) to flush
> backend stats as well,

Yeah, I think that's happening anyway.

> so would it make sense to move
> pgstat_flush_backend() calls to inside of pgstat_flush_io() and
> pgstat_wal_flush_cb()?

I don't think so because pgstat_flush_backend() still needs to be called by the
pgstat_backend_flush_cb() (i.e flush_static_cb) callback (I mean I think this
makes sense to keep this callback around and that it does "really" something).

So for example, for the WAL case, that would mean the backend WAL stats would be
flushed twice: one time from pgstat_wal_flush_cb() (i.e flush_static_cb) 
callback
and one time from the pgstat_backend_flush_cb() (another flush_static_cb) 
callback.

I think it's better to keep them separate and reason as they are distinct
types of stats (which they really are). I think we had the same kind of 
reasoning
while working on [1].

> I see that backend statistics are not collected
> for some of the backend types but that is already checked in the
> pgstat_flush_backend() with pgstat_tracks_backend_bktype().

Sorry, I don't get it. Do you have a question around that?

> Also, is there a chance that wal_bytes gets incremented without
> wal_records getting incremented? I searched the code and did not find
> any example of that but I just wanted to be sure. If there is a case
> like that, then pgstat_backend_wal_have_pending() needs to check
> wal_bytes instead of wal_records.

I think that's fine. That's also how pgstat_wal_have_pending_cb() has been 
re-factored in 2421e9a51d2.

[1]: 
https://www.postgresql.org/message-id/flat/Z0QjeIkwC0HNI16K%40ip-10-97-1-34.eu-west-3.compute.internal#16762c1767ffb168318e7c3734fa5f64

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Changing the state of data checksums in a running cluster

2025-03-11 Thread Tomas Vondra
Hi,

I continued stress testing this, as I was rather unsure why the assert
failures reported in [1] disappeared. And I managed to reproduce that
again, and I think I actually understand why it happens.

I modified the test script (attached) to setup replication, not just a
single instance. And then it does a bit of work, flips the checksums,
restarts the instances (randomly, fast/immediate), verifies the checkums
and so on. And I can hit this assert in AbsorbChecksumsOnBarrier()
pretty easily:

  Assert(LocalDataChecksumVersion ==
 PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);

The reason is pretty simple - this happens on the standby:

1) standby receives XLOG_CHECKSUMS and applies it from 2 to 1 (i.e. it
sets ControlFile->data_checksum_version from "inprogress-on" to "on"),
and signals all other processes to refresh LocalDataChecksumVersion

2) the control file gets written to disk for whatever reason (redo does
this in a number of places)

3) standby gets restarted with "immediate" mode (I'm not sure if this
can happen with "fast" mode, I only recall seeing "immediate")

4) the standby receives the XLOG_CHECKSUMS record *again*, updates the
ControlFile->data_checksum_version (to the same value, no noop), and
then signals the other processes again

5) the other processes already have LocalDataChecksumVersion=1 (on), but
the assert says it should be 2 (inprogress-on) => kaboom

I believe this can happen for changes in either direction, although the
window while disabling checksums is more narrow.


I'm not sure what to do about this. Maybe we could relax the assert in
some way? But that seems a bit ... possibly risky. It's not necessarily
true we'll see the immediately preceding checksum state, we might see a
couple updates back (if the control file was not updated in between).

Could this affect checksum verification during recovery? Imagine we get
to the "on" state, the controlfile gets flushed, and then the standby
restarts and starts receiving older records again. The control file says
we should be verifying checksums, but couldn't some of the writes have
been lost (and so the pages may not have a valid checksum)?

The one idea I have is to create an "immediate" restartpoint in
xlog_redo() right after XLOG_CHECKSUMS updates the control file. AFAICS
a "spread" restartpoint would not be enough, because then we could get
into the same situation with a control file of sync (ahead of WAL) after
a restart. It'd not be cheap, but it should be a rare operation ...

I was wondering if the primary has the same issue, but AFAICS it does
not. It flushes the control file in only a couple places, I couldn't
think of a way to get it out of sync.


regards

[1]
https://www.postgresql.org/message-id/e4dbcb2c-e04a-4ba2-bff0-8d979f55960e%40vondra.me

-- 
Tomas Vondra


test.sh
Description: application/shellscript


Re: table_tuple_lock's snapshot argument

2025-03-11 Thread Ranier Vilela
Em seg., 10 de mar. de 2025 às 13:52, Heikki Linnakangas 
escreveu:

> On 10/03/2025 18:20, Ranier Vilela wrote:
> > Just for curiosity.
> > *RelationFindReplTupleSeq* on the same source, does not suffer
> > from the same issue?
> >
> > PushActiveSnapshot(GetLatestSnapshot());
> >
> > res = table_tuple_lock(rel, &(outslot->tts_tid), GetLatestSnapshot(),
>
> :facepalm: yep, it sure does, and FindConflictTuple() too. Thanks, I'll
> go fix those too.
>
Thanks.

best regards,
Ranier Vilela


  1   2   >