date:20250624

Re: Improve doc on parallel stream changes for Stream Abort message

2025-06-24 Thread Amit Kapila

On Tue, Jun 24, 2025 at 11:58 AM Anthonin Bonnefoy
 wrote:
>
> On Tue, Jun 24, 2025 at 7:26 AM Amit Kapila  wrote:
> > How about a slightly modified version like: (a) The LSN of the abort
> > operation, present only when the change stream can be applied in
> > parallel. This field is available since protocol version 4. (b) Abort
> > timestamp of the transaction, present only when the change stream can
> > be applied in parallel. The value is in number of microseconds since
> > PostgreSQL epoch (2000-01-01). This field is available since protocol
> > version 4.
>
> What about ", present only when streaming is set to parallel"? I think
> clarifying the relation between streaming=parallel and the presence of
> those fields is the important part.
>

Works for me. I'll wait till tomorrow morning to see if there are any
further comments, and then push it.

-- 
With Regards,
Amit Kapila.

Remove unneeded check for XLH_INSERT_ALL_FROZEN in heap_xlog_insert

2025-06-24 Thread Melanie Plageman

Hi,

I noticed that 8e03eb92e9a forgot one line when reverting 39b66a91bd.
There is an extraneous check for XLH_INSERT_ALL_FROZEN_SET in
heap_xlog_insert() -- even though heap_insert() never freezes tuples.
It doesn't hurt anything, but I found it confusing, so I think it is
worth removing. I don't think it's worth backpatching, so I don't know
if that means that this commit shouldn't go in master until after we
branch.

- Melanie
From 96e9896bd737c9e2a647f5cb04465bfeb6d0c041 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Tue, 24 Jun 2025 13:32:15 -0400
Subject: [PATCH v1] Remove unused check in heap_xlog_insert()

8e03eb92e9ad54e2 reverted the commit 39b66a91bd which allowed freezing
in the heap_insert() code path but forgot to remove the corresponding
check in heap_xlog_insert(). This code is extraneous but not harmful.
However, cleaning it up makes it very clear that, as of now, we do not
support any freezing of pages in the heap_insert() path.
---
 src/backend/access/heap/heapam_xlog.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 30f4c2d3c67..eb4bd3d6ae3 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -438,6 +438,9 @@ heap_xlog_insert(XLogReaderState *record)
 	ItemPointerSetBlockNumber(&target_tid, blkno);
 	ItemPointerSetOffsetNumber(&target_tid, xlrec->offnum);
 
+	/* No freezing in the heap_insert() code path */
+	Assert(!(xlrec->flags & XLH_INSERT_ALL_FROZEN_SET));
+
 	/*
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
@@ -508,10 +511,6 @@ heap_xlog_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
-		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
-			PageSetAllVisible(page);
-
 		MarkBufferDirty(buffer);
 	}
 	if (BufferIsValid(buffer))
-- 
2.34.1

Fix some inconsistencies with open-coded visibilitymap_set() callers

2025-06-24 Thread Melanie Plageman

Hi,

visibilitymap_set() arguably breaks a few of the coding rules for
modifying and WAL logging buffers set out in
src/backend/access/transam/README.

In 4 of visibilitymap_set()'s 5 non-recovery callers, we set
PD_ALL_VISIBLE and mark the heap buffer dirty outside of the critical
section in which we make the VM changes and emit WAL.

In at least two of its callers, before visibilitymap_set() is called,
MarkBufferDirty() is used when MarkBufferDirtyHint() would be
appropriate (when checksums/wal_log_hints are disabled) -- because we
are not making other changes to the heap page.

And in at least one of its callers (see lazy_scan_prune()), where
PD_ALL_VISIBLE is already set and we don't mark the buffer dirty and
checksums/wal_log_hints are enabled, visibilitymap_set() will still
set the heap page LSN -- even though we didn't set the buffer dirty.

It may be uncommon for the page to be set PD_ALL_VISIBLE and the VM
bit to be clear because of a crash or error after modifying the heap
page but before setting the VM. But it seems easy for us to be only
setting the all-frozen bit in the VM and thus not need to set
PD_ALL_VISIBLE or mark the heap buffer dirty. In that case, we'll
incorrectly set the page LSN without having marked the buffer dirty
(when wal_log_hints/checksums on).

Besides all of these issues, having these operations open-coded all
over the place is error-prone. It's easy to forget to always
PageSetAllVisible() before visibilitymap_set().

I've attached a patch to add a heap-specific wrapper for
visibilitymap_set() that attempts to handle all cases.

One thing I noticed is that log_heap_visible() will do
XLogRegisterBuffer() for the heap page -- even if we made no changes
to the heap page. It seems like we shouldn't do that. The most common
case would be when setting all-visible pages all-frozen when checksums
are not enabled. I don't actually know how we handle replay when we
are not sure which blocks might be registered, though.

Besides feeling like principled cleanup, this patch is a prerequisite
for a larger project I am working on [1] (targeting 19) to combine VM
updates in the same WAL record as the heap page changes and eliminate
xl_heap_visible.

- Melanie

[1] 
https://www.postgresql.org/message-id/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
From 4a654c720fcea84a320d5e9378976d9bafd76dd2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Tue, 24 Jun 2025 15:05:41 -0400
Subject: [PATCH v1] Introduce heap-specific wrapper for visibilitymap_set()

visibilitymap_set(), which sets bits in the visibility map corresponding
to the heap block of the table passed in, arguably breaks a few of the
coding rules for modifying and WAL logging buffers set out in
access/transam/README.

In several of the places where visibilitymap_set() is called, setting
the heap page PD_ALL_VISIBLE and marking the buffer dirty are done
outside of a critical section.

In some places before visibilitymap_set() is called, MarkBufferDirty()
is used when MarkBufferDirtyHint() would be appropriate.

And in some places where PD_ALL_VISIBLE may already be set and we don't
mark the buffer dirty, when checksums/wal_log_hints are enabled
visibilitymap_set() will still set the heap page LSN -- even though it
was correct not to set the buffer dirty.

Besides all of these issues, having these operations open-coded all over
the place is error-prone. This commit introduces a wrapper that does the
correct operations to the heap page itself and invokes
visibilitymap_set() to make changes to the VM page.
---
 src/backend/access/heap/heapam.c| 92 -
 src/backend/access/heap/heapam_xlog.c   |  2 +-
 src/backend/access/heap/vacuumlazy.c| 66 +-
 src/backend/access/heap/visibilitymap.c | 58 ++--
 src/include/access/heapam.h |  3 +
 src/include/access/visibilitymap.h  |  2 +-
 6 files changed, 117 insertions(+), 106 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..112f946dab0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2505,8 +2505,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 BufferGetBlockNumber(buffer),
 vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
-		else if (all_frozen_set)
-			PageSetAllVisible(page);
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2632,23 +2630,16 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 		/*
 		 * If we've frozen everything on the page, update the visibilitymap.
-		 * We're already holding pin on the vmbuffer.
+		 * We're already holding pin on the vmbuffer. It's fine to use
+		 * InvalidTransactionId as the cutoff_xid here - this is only used
+		 * when HEAP_INSERT_FROZEN is specified, which intentionally violates
+		 * visibility rules.
 		 */
 		if (all_frozen_set)
-		{
-

Re: regdatabase

2025-06-24 Thread Nathan Bossart

Here is what I have staged for commit.

-- 
nathan
>From 41168622a142ae40e43f9d71b8ed1e992fe4e4a2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 24 Jun 2025 14:57:31 -0500
Subject: [PATCH v7 1/1] Add new OID alias type regdatabase.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Not much to say here: does what it says on the tin.  Like the
regrole type, regdatabase has cluster-wide scope, so we disallow
regdatabase constants from appearing in stored expressions.

XXX: NEEDS CATVERSION BUMP

Author: Ian Lawrence Barwick 
Reviewed-by: Greg Sabino Mullane 
Reviewed-by: Jian He 
Reviewed-by: Fabrízio de Royes Mello 
Reviewed-by: Tom Lane 
Discussion: https://postgr.es/m/aBpjJhyHpM2LYcG0%40nathan
---
 contrib/postgres_fdw/deparse.c|   6 +
 doc/src/sgml/datatype.sgml|  15 +-
 doc/src/sgml/func.sgml|  17 +++
 doc/src/sgml/ref/pgupgrade.sgml   |   3 +-
 src/backend/bootstrap/bootstrap.c |   2 +
 src/backend/catalog/dependency.c  |  11 ++
 src/backend/utils/adt/regproc.c   | 118 +++
 src/backend/utils/adt/selfuncs.c  |   2 +
 src/backend/utils/cache/catcache.c|   1 +
 src/bin/pg_upgrade/check.c|   1 +
 src/include/catalog/pg_cast.dat   |  14 ++
 src/include/catalog/pg_proc.dat   |  17 +++
 src/include/catalog/pg_type.dat   |   5 +
 src/test/regress/expected/regproc.out | 174 ++
 src/test/regress/expected/type_sanity.out |   1 +
 src/test/regress/sql/regproc.sql  |  38 +
 src/test/regress/sql/type_sanity.sql  |   1 +
 17 files changed, 423 insertions(+), 3 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index d9970dd6753..9351835b5e4 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_aggregate.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_collation.h"
+#include "catalog/pg_database.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_opfamily.h"
@@ -455,6 +456,11 @@ foreign_expr_walker(Node *node,

  AuthIdRelationId, fpinfo))
return false;
break;
+   case REGDATABASEOID:
+   if 
(!is_shippable(DatumGetObjectId(c->constvalue),
+   
  DatabaseRelationId, fpinfo))
+   return false;
+   break;
}
}
 
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 09309ba0390..49a7c180a80 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4737,6 +4737,10 @@ INSERT INTO mytable VALUES(-1);  -- fails
 regconfig

 
+   
+regdatabase
+   
+

 regdictionary

@@ -4878,6 +4882,13 @@ SELECT * FROM pg_attribute
 english

 
+   
+regdatabase
+pg_database
+database name
+template1
+   
+

 regdictionary
 pg_ts_dict
@@ -5049,8 +5060,8 @@ WHERE ...
 be dropped without first removing the default expression.  The
 alternative of nextval('my_seq'::text) does not
 create a dependency.
-(regrole is an exception to this property. Constants of this
-type are not allowed in stored expressions.)
+(regdatabase and regrole are exceptions to this
+property.  Constants of these types are not allowed in stored expressions.)

 

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 224d4fe5a9f..2d693e907db 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26750,6 +26750,23 @@ SELECT currval(pg_get_serial_sequence('sometable', 
'id'));

   
 
+  
+   
+
+ to_regdatabase
+
+to_regdatabase ( text )
+regdatabase
+   
+   
+Translates a textual database name to its OID.  A similar result is
+obtained by casting the string to type regdatabase (see
+); however, this function will return
+NULL rather than throwing an error if the name is
+not found.
+   
+  
+
   

 
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index aeeed297437..5ddf3a8ae92 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -1110,7 +1110,8 @@ psql --username=postgres --file=script.sql postgres
 regproc
 regprocedur

Re: [PATCH] Fix OAuth feature detection on OpenBSD+Meson

2025-06-24 Thread Peter Eisentraut


On 24.06.25 01:36, Jacob Champion wrote:

I noticed that the OpenBSD build in CI wasn't running the libcurl
tests. Turns out the feature test I added in b0635bfda is subtly
broken, because it uses cc.check_header() rather than cc.has_header().
On OpenBSD, apparently, the  header can't be compiled
without including additional prerequisite headers.

The attached patch fixes that by just switching to has_header(). (I
could add the prerequisites to the test instead, but I'd prefer to
exactly match the logic we use to determine the value of the
HAVE_SYS_EVENT_H macro. I don't think I had a good reason not to in
the first place.)


Note that Autoconf uses a compilation test, not a preprocessor test, for 
its AC_CHECK_HEADERS, so it uses .check_header() semantics.  And this 
was the result of a long transition, because the compile test was 
ultimately deemed to be better.  So in general, I would be wary about 
moving away from .check_header() toward .has_header().  But it looks 
like meson.build mixes those without much of a pattern, so maybe it 
doesn't matter for now.


But I'm also suspicious, because by this explanation, the 
AC_CHECK_HEADERS calls on sys/event.h should fail on OpenBSD, but they 
do not on the existing buildfarm members.

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Tomas Vondra

On 6/24/25 17:30, Christoph Berg wrote:
> Re: Tomas Vondra
>> If it's a reliable fix, then I guess we can do it like this. But won't
>> that be a performance penalty on everyone? Or does the system split the
>> array into 16-element chunks anyway, so this makes no difference?
> 
> There's still the overhead of the syscall itself. But no idea how
> costly it is to have this 16-step loop in user or kernel space.
> 
> We could claim that on 32-bit systems, shared_buffers would be smaller
> anyway, so there the overhead isn't that big. And the step size should
> be larger (if at all) on 64-bit.
> 
>> Anyway, maybe we should start by reporting this to the kernel people. Do
>> you want me to do that, or shall one of you take care of that? I suppose
>> that'd be better, as you already wrote a fix / know the code better.
> 
> Submitted: https://marc.info/?l=linux-mm&m=175077821909222&w=2
> 

Thanks! Now we wait ...

Attached is a minor tweak of the valgrind suppresion rules, to add the
two places touching the memory. I was hoping I could add a single rule
for pg_numa_touch_mem_if_required, but that does not work - it's a
macro, not a function. So I had to add one rule for both functions,
querying the NUMA. That's a bit disappointing, because it means it'll
hide all other failues (of Memcheck:Addr8 type) in those functions.

Perhaps it'd be be better to turn pg_numa_touch_mem_if_required into a
proper (inlined) function, at least with USE_VALGRIND defined. Something
like the v2 patch - needs more testing to ensure the inlined function
doesn't break the touching or something silly like that.

regards

-- 
Tomas Vondra
diff --git a/src/tools/valgrind.supp b/src/tools/valgrind.supp
index 7ea464c8094..36bf3253f76 100644
--- a/src/tools/valgrind.supp
+++ b/src/tools/valgrind.supp
@@ -180,3 +180,22 @@
Memcheck:Cond
fun:PyObject_Realloc
 }
+
+# Querying NUMA node for shared memory requires touching the memory so
+# that it gets allocated in the process. But we'll touch memory backing
+# buffers, but that memory may be marked as noaccess for buffers that
+# are not pinned. So just ignore that, we're not really accessing the
+# buffers, for both places querying the NUMA status.
+{
+   pg_buffercache_numa_pages
+   Memcheck:Addr8
+   fun:pg_buffercache_numa_pages
+   fun:ExecMakeTableFunctionResult
+}
+
+{
+   pg_get_shmem_allocations_numa
+   Memcheck:Addr8
+   fun:pg_get_shmem_allocations_numa
+   fun:ExecMakeTableFunctionResult
+}
diff --git a/src/include/port/pg_numa.h b/src/include/port/pg_numa.h
index 40f1d324dcf..3b9a5b42898 100644
--- a/src/include/port/pg_numa.h
+++ b/src/include/port/pg_numa.h
@@ -24,9 +24,22 @@ extern PGDLLIMPORT int pg_numa_get_max_node(void);
  * This is required on Linux, before pg_numa_query_pages() as we
  * need to page-fault before move_pages(2) syscall returns valid results.
  */
+#ifdef USE_VALGRIND
+
+static inline void
+pg_numa_touch_mem_if_required(uint64 tmp, char *ptr)
+{
+	volatile uint64 ro_volatile_var pg_attribute_unused();
+	ro_volatile_var = *(volatile uint64 *) ptr;
+}
+
+#else
+
 #define pg_numa_touch_mem_if_required(ro_volatile_var, ptr) \
 	ro_volatile_var = *(volatile uint64 *) ptr
 
+#endif
+
 #else
 
 #define pg_numa_touch_mem_if_required(ro_volatile_var, ptr) \
diff --git a/src/tools/valgrind.supp b/src/tools/valgrind.supp
index 7ea464c8094..6b9a8998f82 100644
--- a/src/tools/valgrind.supp
+++ b/src/tools/valgrind.supp
@@ -180,3 +180,14 @@
Memcheck:Cond
fun:PyObject_Realloc
 }
+
+# Querying NUMA node for shared memory requires touching the memory so
+# that it gets allocated in the process. But we'll touch memory backing
+# buffers, but that memory may be marked as noaccess for buffers that
+# are not pinned. So just ignore that, we're not really accessing the
+# buffers, for all places querying the NUMA status.
+{
+   pg_numa_touch_mem_if_required
+   Memcheck:Addr8
+   fun:pg_numa_touch_mem_if_required
+}

Re: like pg_shmem_allocations, but fine-grained for DSM registry ?

2025-06-24 Thread Nathan Bossart

I think this will need some rework to deal with the addition of
GetNamedDSA() and GetNamedDSHash() [0].  We probably want to add a "type"
column to show whether the entry is for a DSM, DSA, or dshash table.  And
for DSAs and dshash tables, we probably want to use dsa_get_total_size()
for the "size" column.

[0] https://postgr.es/m/flat/aEC8HGy2tRQjZg_8@nathan

-- 
nathan

Re: [PATCH] Fix OAuth feature detection on OpenBSD+Meson

2025-06-24 Thread Jacob Champion

On Tue, Jun 24, 2025 at 1:29 PM Peter Eisentraut  wrote:
> Note that Autoconf uses a compilation test, not a preprocessor test, for
> its AC_CHECK_HEADERS, so it uses .check_header() semantics.  And this
> was the result of a long transition, because the compile test was
> ultimately deemed to be better.  So in general, I would be wary about
> moving away from .check_header() toward .has_header().  But it looks
> like meson.build mixes those without much of a pattern, so maybe it
> doesn't matter for now.

I don't mind moving in that direction, but I do want the two sides to
match. So if it was good enough up to this point to use has_header()
for our feature macros, I don't think I want to try to change that for
18.

> But I'm also suspicious, because by this explanation, the
> AC_CHECK_HEADERS calls on sys/event.h should fail on OpenBSD, but they
> do not on the existing buildfarm members.

I think Andres tracked that discrepancy down [1]:

> Gah, configure does pass - because AC_CHECK_HEADER(), if includes is not 
> passed
> in, first includes what's defined in AC_INCLUDES_DEFAULT.

Thanks!
--Jacob

[1] 
https://www.postgresql.org/message-id/637haqqyhg2wlz7q6wq25m2qupe67g7f2uupngzui64zypy4x2%40ysr2xnmynmu4

Re: [PATCH] Fix OAuth feature detection on OpenBSD+Meson

2025-06-24 Thread Tom Lane

Peter Eisentraut  writes:
> Note that Autoconf uses a compilation test, not a preprocessor test, for 
> its AC_CHECK_HEADERS, so it uses .check_header() semantics.  And this 
> was the result of a long transition, because the compile test was 
> ultimately deemed to be better.  So in general, I would be wary about 
> moving away from .check_header() toward .has_header().  But it looks 
> like meson.build mixes those without much of a pattern, so maybe it 
> doesn't matter for now.

> But I'm also suspicious, because by this explanation, the 
> AC_CHECK_HEADERS calls on sys/event.h should fail on OpenBSD, but they 
> do not on the existing buildfarm members.

I was curious about this, so I tried it on a handy OpenBSD 7.7
installation.  Indeed, sys/event.h does not compile on its own:

$ cat tst.c
#include 

int main() { return 0; }

$ cc tst.c  
In file included from tst.c:1:
/usr/include/sys/event.h:57:2: error: unknown type name '__uintptr_t'; did you 
mean '__uint128_t'?
__uintptr_t ident;  /* identifier for this event */
^
note: '__uint128_t' declared here
/usr/include/sys/event.h:61:2: error: unknown type name '__int64_t'; did you 
mean '__int128_t'?
__int64_t   data;   /* filter data value */
^
note: '__int128_t' declared here
2 errors generated.

Whether this is intentional is hard to say, because I can't find
either event.h or any of the functions it declares in the OpenBSD
man pages.  But anyway, AC_CHECK_HEADERS does think that 
is available, and that's because it does NOT just blindly include
the header-to-test.  It includes assorted standard headers
such as  first, thus dodging the problem.

I confirm Jacob's result that our meson.build fails to think
that  is available, so we do need to do something.
I'm not excited about dropping the compilability check though.
If meson can't do that more like autoconf does it, I foresee
needing to build ad-hoc reimplementations of autoconf's logic.

regards, tom lane

Re: add function for creating/attaching hash table in DSM registry

2025-06-24 Thread Nathan Bossart

Here is what I have staged for commit.

-- 
nathan
>From e344a2757838e6b99168d4aa1093fee4da18d8a8 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 24 Jun 2025 16:32:00 -0500
Subject: [PATCH v11 1/1] Add GetNamedDSA() and GetNamedDSHash().
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Presently, the dynamic shared memory (DSM) registry only provides
GetNamedDSMSegment(), which allocates a fixed-size segment.  To use
the DSM registry for more sophisticated things like dynamic shared
memory areas (DSAs) or a hash table backed by a DSA (dshash), users
need to create a DSM segment that stores various handles and LWLock
tranche IDs and to write fairly complicated initialization code.
Furthermore, there is likely little variation in this
initialization code between libraries.

This commit introduces functions that simplify allocating a DSA or
dshash within the DSM registry.  These functions are very similar
to GetNamedDSMSegment().  Notable differences include the lack of
an initialization callback parameter and the prohibition of calling
the functions more than once for a given entry in each backend
(which should be trivially avoidable in most circumstances).  While
at it, this commit bumps the maximum DSM registry entry name length
from 63 bytes to 127 bytes.

The test_dsm_registry test module contains tests for the new
functions and also serves as a complete usage example.

Reviewed-by: Dagfinn Ilmari Mannsåker 
Reviewed-by: Sami Imseih 
Reviewed-by: Florents Tselai 
Reviewed-by: Rahila Syed 
Discussion: https://postgres./m/aEC8HGy2tRQjZg_8%40nathan
---
 src/backend/storage/ipc/dsm_registry.c| 265 +-
 src/backend/utils/mmgr/dsa.c  |  15 +
 src/include/storage/dsm_registry.h|   7 +-
 src/include/utils/dsa.h   |   1 +
 .../expected/test_dsm_registry.out|  26 +-
 .../sql/test_dsm_registry.sql |   6 +-
 .../test_dsm_registry--1.0.sql|  10 +-
 .../test_dsm_registry/test_dsm_registry.c | 111 ++--
 src/tools/pgindent/typedefs.list  |   5 +
 9 files changed, 400 insertions(+), 46 deletions(-)

diff --git a/src/backend/storage/ipc/dsm_registry.c 
b/src/backend/storage/ipc/dsm_registry.c
index 1d4fd31ffed..828c2ff0c7f 100644
--- a/src/backend/storage/ipc/dsm_registry.c
+++ b/src/backend/storage/ipc/dsm_registry.c
@@ -15,6 +15,20 @@
  * current backend.  This function guarantees that only one backend
  * initializes the segment and that all other backends just attach it.
  *
+ * A DSA can be created in or retrieved from the registry by calling
+ * GetNamedDSA().  As with GetNamedDSMSegment(), if a DSA with the provided
+ * name does not yet exist, it is created.  Otherwise, GetNamedDSA()
+ * ensures the DSA is attached to the current backend.  This function
+ * guarantees that only one backend initializes the DSA and that all other
+ * backends just attach it.
+ *
+ * A dshash table can be created in or retrieved from the registry by
+ * calling GetNamedDSHash().  As with GetNamedDSMSegment(), if a hash
+ * table with the provided name does not yet exist, it is created.
+ * Otherwise, GetNamedDSHash() ensures the hash table is attached to the
+ * current backend.  This function guarantees that only one backend
+ * initializes the table and that all other backends just attach it.
+ *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
@@ -32,6 +46,12 @@
 #include "storage/shmem.h"
 #include "utils/memutils.h"
 
+#define DSMR_NAME_LEN  128
+
+#define DSMR_DSA_TRANCHE_SUFFIX" DSA"
+#define DSMR_DSA_TRANCHE_SUFFIX_LEN (sizeof(DSMR_DSA_TRANCHE_SUFFIX) - 1)
+#define DSMR_DSA_TRANCHE_NAME_LEN  (DSMR_NAME_LEN + 
DSMR_DSA_TRANCHE_SUFFIX_LEN)
+
 typedef struct DSMRegistryCtxStruct
 {
dsa_handle  dsah;
@@ -40,15 +60,48 @@ typedef struct DSMRegistryCtxStruct
 
 static DSMRegistryCtxStruct *DSMRegistryCtx;
 
-typedef struct DSMRegistryEntry
+typedef struct NamedDSMState
 {
-   charname[64];
dsm_handle  handle;
size_t  size;
+} NamedDSMState;
+
+typedef struct NamedDSAState
+{
+   dsa_handle  handle;
+   int tranche;
+   chartranche_name[DSMR_DSA_TRANCHE_NAME_LEN];
+} NamedDSAState;
+
+typedef struct NamedDSHState
+{
+   NamedDSAState dsa;
+   dshash_table_handle handle;
+   int tranche;
+   chartranche_name[DSMR_NAME_LEN];
+} NamedDSHState;
+
+typedef enum DSMREntryType
+{
+   DSMR_ENTRY_TYPE_DSM,
+   DSMR_ENTRY_TYPE_DSA,
+   DSMR_ENTRY_TYPE_DSH,
+} DSMREntryType;
+
+typedef struct DSMRegistryEntry
+{
+   charname[DSMR_NAME_LEN];
+   DSMREntryType type;
+   union
+   {
+   NamedDSMState dsm;
+

Re: [PATCH] Fix OAuth feature detection on OpenBSD+Meson

2025-06-24 Thread Tom Lane

Jacob Champion  writes:
> On Tue, Jun 24, 2025 at 2:03 PM Tom Lane  wrote:
>> I confirm Jacob's result that our meson.build fails to think
>> that  is available, so we do need to do something.

> (To clarify for other readers: it's the OAuth feature test I added
> that fails. The existing test for HAVE_SYS_EVENT_H is working fine. I
> accidentally made mine stricter.)

Ah, you're correct: I saw "Check usable header "sys/event.h" : NO"
in the meson log, but that came from the check in the
oauth_flow_supported stanza.  We do end up with

build/src/include/pg_config.h:#define HAVE_SYS_EVENT_H 1

after a second test that tries to compile

#ifdef __has_include
 #if !__has_include("sys/event.h")
  #error "Header 'sys/event.h' could not be found"
 #endif
#else
 #include 
#endif

Can't say that I find this to be impressive software engineering:
rather than having only one probe failure mode to worry about,
we have two, depending on whether the compiler knows __has_include().
Pretty close to the worst of all possible worlds.

regards, tom lane

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Tomas Vondra




On 6/24/25 13:10, Bertrand Drouvot wrote:
> Hi,
> 
> On Tue, Jun 24, 2025 at 11:20:15AM +0200, Tomas Vondra wrote:
>> On 6/24/25 10:24, Bertrand Drouvot wrote:
>>> Yeah, same for me with pg_get_shmem_allocations_numa(). It works if
>>> pg_numa_query_pages() is done on chunks <= 16 pages but fails if done on 
>>> more
>>> than 16 pages.
>>>
>>> It's also confirmed by test_chunk_size.c attached:
>>>
>>> $ gcc-11 -m32 -o test_chunk_size test_chunk_size.c
>>> $ ./test_chunk_size
>>>  1 pages: SUCCESS (0 errors)
>>>  2 pages: SUCCESS (0 errors)
>>>  3 pages: SUCCESS (0 errors)
>>>  4 pages: SUCCESS (0 errors)
>>>  5 pages: SUCCESS (0 errors)
>>>  6 pages: SUCCESS (0 errors)
>>>  7 pages: SUCCESS (0 errors)
>>>  8 pages: SUCCESS (0 errors)
>>>  9 pages: SUCCESS (0 errors)
>>> 10 pages: SUCCESS (0 errors)
>>> 11 pages: SUCCESS (0 errors)
>>> 12 pages: SUCCESS (0 errors)
>>> 13 pages: SUCCESS (0 errors)
>>> 14 pages: SUCCESS (0 errors)
>>> 15 pages: SUCCESS (0 errors)
>>> 16 pages: SUCCESS (0 errors)
>>> 17 pages: 1 errors
>>> Threshold: 17 pages
>>>
>>> No error if -m32 is not used.
>>>
>>> We could work by chunks (16?) on 32 bits but would probably produce 
>>> performance
>>> degradation (we mention it in the doc though). Also would always 16 be a 
>>> correct
>>> chunk size? 
>>
>> I don't see how this would solve anything?
>>
>> AFAICS the problem is the two places are confused about how large the
>> array elements are, and get to interpret that differently.
> 
>> I don't see how using smaller array makes this correct. That it works is
>> more a matter of luck,
> 
> Not sure it's luck, maybe the wrong pointers arithmetic has no effect if batch
> size is <= 16.
> 
> So we have kernel_move_pages() -> kernel_move_pages() (because nodes is NULL 
> here
> for us as we call "numa_move_pages(pid, count, pages, NULL, status, 0);").
> 
> So, if we look at do_pages_stat() ([1]), we can see that it uses an hardcoded
> "#define DO_PAGES_STAT_CHUNK_NR 16UL" and that this pointers arithmetic:
> 
> "
> pages += chunk_nr;
> status += chunk_nr;
> "
> 
> is done but has no effect since nr_pages will exit the loop if we use a batch
> size <= 16.
> 
> So if this pointer arithmetic is not correct, (it seems that it should advance
> by 16 * sizeof(compat_uptr_t) instead) then it has no effect as long as the 
> batch
> size is <= 16.
> 
> Does test_chunk_size also fails at 17 for you?

Yes, it fails for me at 17 too. So you're saying the access within each
chunk of 16 elements is OK, but that maybe advancing to the next chunk
is not quite right? In which case limiting the access to 16 entries
might be a workaround.

In any case, this sounds like a kernel bug, right? I don't have much
experience with the kernel code, so don't want to rely too much on my
interpretation of it.


regards

-- 
Tomas Vondra

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Tomas Vondra




On 6/24/25 13:10, Andres Freund wrote:
> Hi,
> 
> On 2025-06-24 03:43:19 +0200, Tomas Vondra wrote:
>> FWIW while looking into this, I tried running this under valgrind (on a
>> regular 64-bit system, not in the chroot), and I get this report:
>>
>> ==65065== Invalid read of size 8
>> ==65065==at 0x113B0EBE: pg_buffercache_numa_pages
>> (pg_buffercache_pages.c:380)
>> ==65065==by 0x6B539D: ExecMakeTableFunctionResult (execSRF.c:234)
>> ==65065==by 0x6CEB7E: FunctionNext (nodeFunctionscan.c:94)
>> ==65065==by 0x6B6ACA: ExecScanFetch (execScan.h:126)
>> ==65065==by 0x6B6B31: ExecScanExtended (execScan.h:170)
>> ==65065==by 0x6B6C9D: ExecScan (execScan.c:59)
>> ==65065==by 0x6CEF0F: ExecFunctionScan (nodeFunctionscan.c:269)
>> ==65065==by 0x6B29FA: ExecProcNodeFirst (execProcnode.c:469)
>> ==65065==by 0x6A6F56: ExecProcNode (executor.h:313)
>> ==65065==by 0x6A9533: ExecutePlan (execMain.c:1679)
>> ==65065==by 0x6A7422: standard_ExecutorRun (execMain.c:367)
>> ==65065==by 0x6A7330: ExecutorRun (execMain.c:304)
>> ==65065==by 0x934EF0: PortalRunSelect (pquery.c:921)
>> ==65065==by 0x934BD8: PortalRun (pquery.c:765)
>> ==65065==by 0x92E4CD: exec_simple_query (postgres.c:1273)
>> ==65065==by 0x93301E: PostgresMain (postgres.c:4766)
>> ==65065==by 0x92A88B: BackendMain (backend_startup.c:124)
>> ==65065==by 0x85A7C7: postmaster_child_launch (launch_backend.c:290)
>> ==65065==by 0x860111: BackendStartup (postmaster.c:3580)
>> ==65065==by 0x85DE6F: ServerLoop (postmaster.c:1702)
>> ==65065==  Address 0x7b6c000 is in a rw- anonymous segment
>>
>>
>> This fails here (on the pg_numa_touch_mem_if_required call):
>>
>> for (char *ptr = startptr; ptr < endptr; ptr += os_page_size)
>> {
>> os_page_ptrs[idx++] = ptr;
>>
>> /* Only need to touch memory once per backend process */
>> if (firstNumaTouch)
>> pg_numa_touch_mem_if_required(touch, ptr);
>> }
> 
> That's because we mark unpinned pages as inaccessible / mark them as
> accessible when pinning. See logic related to that in PinBuffer():
> 
>   /*
>* Assume that we acquired a buffer pin for the 
> purposes of
>* Valgrind buffer client checks (even in 
> !result case) to
>* keep things simple.  Buffers that are unsafe 
> to access are
>* not generally guaranteed to be marked 
> undefined or
>* non-accessible in any case.
>*/
> 
> 
>> The 0x7b6c000 is the very first pointer, and it's the only pointer that
>> triggers this warning.
> 
> I suspect that that's because valgrind combines different reports or such.
> 

Thanks. It probably is something like that, although I made sure to not
use any such options when running valgrind (so --error-limit=no). But
maybe there's something else, hiding the reports.

I guess there are two ways to address this - make sure the buffers are
marked as accessible/defined, or add a valgrind suppression. I think the
suppression is the right approach here, otherwise we'd need to worry
about already pinned buffers etc. Which seems not great, the functions
don't even care about buffers right now, they mostly work with memory
pages (especially pg_shmem_allocations_numa).

Barring objections, I'll fix it this way.


regards

-- 
Tomas Vondra

Re: Make COPY format extendable: Extract COPY TO format implementations

2025-06-24 Thread Masahiko Sawada

On Tue, Jun 24, 2025 at 4:10 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Tue, 24 Jun 2025 15:24:23 +0900,
>   Masahiko Sawada  wrote:
>
> >> It's natural to add more related APIs with this
> >> approach. The single registration API provides one feature
> >> by one operation. If we use the RegisterCopyRoutine() for
> >> FROM and TO formats API, it's not natural that we add more
> >> related APIs. In this case, some APIs may provide multiple
> >> features by one operation and other APIs may provide single
> >> feature by one operation. Developers may be confused with
> >> the API. For example, developers may think "what does mean
> >> NULL here?" or "can we use NULL here?" for
> >> "RegisterCopyRoutine("new-format", NewFormatFromRoutine,
> >> NULL)".
> >
> > We can document it in the comment for the registration function.
>
> I think that API that can be understandable without the
> additional note is better API than API that needs some
> notes.

I don't see much difference in this case.

>
> Why do you suggest the RegisterCopyRoutine("new-format",
> NewFormatFromRoutine, NewFormatToRoutine) API? You want to
> remove the duplicated codes in
> RegisterCopy{From,To}Routine(), right?

No. I think that if extensions are likely to support both
CopyToRoutine and CopyFromRoutine in most cases, it would be simpler
to register the custom format using a single API. Registering
CopyToRoutine and CopyFromRoutine separately seems redundant to me.

>
> BTW, what do you think about my answer (one feature by one
> operation API is more extendable API) for your question
> (extendability and API compatibility)?

Could you provide some examples? It seems to me that even if we
provide the single API for the registration we can provide other APIs
differently. For example, if we want to provide an API to register a
custom option, we can provide RegisterCopyToOption() and
RegisterCopyFromOption().

>
> > Suppose that extension-A implements only CopyToRoutine for the
> > custom-format-X with the format name 'myformat' and extension-B
> > implements only CopyFromRoutine for the custom-format-Y with the same
> > name, if users load both extension-A and extension-B, it seems to me
> > that extension-A registers the custom-format-X format as 'myformat'
> > only with CopyToRoutine, and extension-B overwrites the 'myformat'
> > registration by adding custom-format-Y's CopyFromRoutine. However, if
> > users register extension-C that implements both routines with the
> > format name 'myformat', they can register neither extension-A nor
> > extension-B, which seems to me that we don't allow overwriting the
> > registration in this case.
>
> Do you assume that users use extension-A, extension-B and
> extension-C without reading their documentation? If users
> read their documentation before users use them, users can
> know all of them use the same format name 'myformat' and
> which extension provides Copy{From,To}Routine.
>
> In this case, these users (who don't read documentation)
> will be confused with the RegisterCopyRoutine("new-format",
> NewFormatFromRoutine, NewFormatToRoutine) API too. Do we
> really need to care about this case?

My point is about the consistency of registration behavior. I think
that we should raise an error if the custom format name that an
extension tries to register already exists. Therefore I'm not sure why
installing extension-A+B is okay but installing extension-C+A or
extension-C+B is not okay? We can think that's an extension-A's choice
not to implement CopyFromRoutine for the 'myformat' format so
extension-B should not change it.

>
> > I think the core issue appears to be the internal management of custom
> > format entries but the current patch does enable registration
> > overwriting in the former case (extension-A and extension-B case).
>
> This is the same behavior as existing custom EXPLAIN option
> implementation. Should we use different behavior here?

I think that unlike custom EXPLAIN options, it's better to raise an
error or a warning if the custom format name (or combination of format
name and COPY direction) that an extension tries to register already
exists.

> >> >>FYI: RegisterCopy{From,To}Routine() uses the same logic
> >> >>as RegisterExtensionExplainOption().
> >> >
> >> > I'm concerned that the patch has duplicated logics for the
> >> > registration of COPY FROM and COPY TO.
> >>
> >> We can implement a convenient routine that can be used for
> >> RegisterExtensionExplainOption() and
> >> RegisterCopy{From,To}Routine() if it's needed.
> >
> > I meant there are duplicated codes in COPY FROM and COPY TO. For
> > instance, RegisterCopyFromRoutine() and RegisterCopyToRoutine() have
> > the same logic.
>
> Yes, I understand it. I wanted to say that we can remove the
> duplicated codes by introducing a RegisterSomething()
> function that can be used by
> RegisterExtensionExplainOption() and
> RegisterCopy

Re: [PATCH] Fix OAuth feature detection on OpenBSD+Meson

2025-06-24 Thread Jacob Champion

On Mon, Jun 23, 2025 at 5:19 PM Tom Lane  wrote:
> As far as I recall, we've always thought that autoconf's approach
> of "explicitly specify the features you expect to get" is the
> right way to do things.  I don't love meson's default you-get-
> whatever-seems-available approach at all, though maybe that's
> my history as a package builder talking.  I think not using that
> behavior would be a real good idea, at least for testing purposes.

For development I like "give me everything" by default, but for
testing and packaging I agree with you.

--Jacob

Re: Remove unneeded check for XLH_INSERT_ALL_FROZEN in heap_xlog_insert

2025-06-24 Thread Melanie Plageman

On Tue, Jun 24, 2025 at 1:53 PM Tomas Vondra  wrote:
>
> Thanks for noticing this. I'd probably be +0.5 to backpatch this, even
> if it's ultimately harmless, just to keep the code not confusing.

Okay, I can backpatch this (to 14, I believe). I'll wait until the
thread has been around for closer to 24 hours before doing anything,
though.

- Melanie

Re: [PATCH] Fix OAuth feature detection on OpenBSD+Meson

2025-06-24 Thread Jacob Champion

On Tue, Jun 24, 2025 at 2:50 PM Tom Lane  wrote:
> Can't say that I find this to be impressive software engineering:
> rather than having only one probe failure mode to worry about,
> we have two, depending on whether the compiler knows __has_include().
> Pretty close to the worst of all possible worlds.

I did a double-take on the code you posted, but! It looks like they're
running the preprocessor only. That doesn't seem so bad to me (though
they could probably do better than calling it a "compile" in the log).

--Jacob

Re: pg_dump --with-* options

2025-06-24 Thread Greg Sabino Mullane

On Wed, Jun 18, 2025 at 11:43 AM Nathan Bossart 
wrote:

> IIUC the current proposal is to:
>
> * Dump/restore stats by default.
> * Keep the --no-statistics, --no-schema, and --no-data options.
> * Keep the --statistics-only, --schema-only, and --data-only options.
> * Remove the --with-statistics, --with-schema, and --with-data options.
>

This is so close to ideal. It's just that the first bullet should be "off
by default" :)

I think pg_restore and pg_upgrade are solved problems at this point. I'm
still not convinced why stats should be on by default, as they are metadata
- neither schema nor data, but something special that should be explicitly
requested. Also, +1 to the idea of --statistics-only as a QA / debug tool
as someone mentioned upthread.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

Re: Add \pset options for boolean value display

2025-06-24 Thread David G. Johnston

On Thu, Mar 20, 2025 at 8:24 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> It's \pset null for boolean values
>

v1, Ready aside from bike-shedding the name.

David J.
From c897e53577d433c374e51619405a5c2b8bd4151c Mon Sep 17 00:00:00 2001
From: "David G. Johnston" 
Date: Wed, 18 Jun 2025 12:20:43 -0700
Subject: [PATCH] Add \pset options for boolean value display

The server's space-expedient choice to use 't' and 'f' to represent
boolean true and false respectively is technically understandable
but visually atrocious.  Teach psql to detect these two values
and print whatever it deems is appropriate.  For now, in the
interest of backward compatability, that defaults to 't' and 'f'.
However, now the user can impose their own standards by using the
newly introduced display_true and display_false pset settings.
---
 doc/src/sgml/ref/psql-ref.sgml | 24 
 src/bin/psql/command.c | 45 +-
 src/fe_utils/print.c   |  6 
 src/include/fe_utils/print.h   |  2 ++
 src/test/regress/expected/psql.out | 32 +
 src/test/regress/sql/psql.sql  | 16 +++
 6 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 570ef21d1fc..03d4e429d8a 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -3287,6 +3287,30 @@ SELECT $1 \parse stmt1
   
   
 
+  
+  display_true
+  
+  
+  Sets the string to be printed in place of a true value.
+  The default is to print t as that is the value
+  transmitted by the server.  For readability,
+  \pset print_true 'true' is recommended.
+  
+  
+  
+
+  
+  display_false
+  
+  
+  Sets the string to be printed in place of a false value.
+  The default is to print f as that is the value
+  transmitted by the server.  For readability,
+  \pset print_false 'false' is recommended.
+  
+  
+  
+
   
   null
   
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 83e84a77841..c6afa982a59 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -2688,7 +2688,8 @@ exec_command_pset(PsqlScanState scan_state, bool active_branch)
 
 			int			i;
 			static const char *const my_list[] = {
-"border", "columns", "csv_fieldsep", "expanded", "fieldsep",
+"border", "columns", "csv_fieldsep",
+"display_true", "display_false", "expanded", "fieldsep",
 "fieldsep_zero", "footer", "format", "linestyle", "null",
 "numericlocale", "pager", "pager_min_lines",
 "recordsep", "recordsep_zero",
@@ -5198,6 +5199,26 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet)
 		}
 	}
 
+	/* true display */
+	else if (strcmp(param, "display_true") == 0)
+	{
+		if (value)
+		{
+			free(popt->truePrint);
+			popt->truePrint = pg_strdup(value);
+		}
+	}
+
+	/* false display */
+	else if (strcmp(param, "display_false") == 0)
+	{
+		if (value)
+		{
+			free(popt->falsePrint);
+			popt->falsePrint = pg_strdup(value);
+		}
+	}
+
 	/* field separator for unaligned text */
 	else if (strcmp(param, "fieldsep") == 0)
 	{
@@ -5416,6 +5437,20 @@ printPsetInfo(const char *param, printQueryOpt *popt)
 			   popt->nullPrint ? popt->nullPrint : "");
 	}
 
+	/* show boolean true display */
+	else if (strcmp(param, "display_true") == 0)
+	{
+		printf(_("Boolean true display is \"%s\".\n"),
+popt->truePrint ? popt->truePrint : "t");
+	}
+
+	/* show boolean false display */
+	else if (strcmp(param, "display_false") == 0)
+	{
+		printf(_("Boolean false display is \"%s\".\n"),
+			   popt->falsePrint ? popt->falsePrint : "f");
+	}
+
 	/* show locale-aware numeric output */
 	else if (strcmp(param, "numericlocale") == 0)
 	{
@@ -5661,6 +5696,14 @@ pset_value_string(const char *param, printQueryOpt *popt)
 		return pset_quoted_string(popt->nullPrint
   ? popt->nullPrint
   : "");
+	else if (strcmp(param, "display_true") == 0)
+		return pset_quoted_string(popt->truePrint
+? popt->truePrint
+: "t");
+	else if (strcmp(param, "display_false") == 0)
+		return pset_quoted_string(popt->falsePrint
+  ? popt->falsePrint
+  : "f");
 	else if (strcmp(param, "numericlocale") == 0)
 		return pstrdup(pset_bool_string(popt->topt.numericLocale));
 	else if (strcmp(param, "pager") == 0)
diff --git a/src/fe_utils/print.c b/src/fe_utils/print.c
index 4af0f32f2fc..7829460f7d3 100644
--- a/src/fe_utils/print.c
+++ b/src/fe_utils/print.c
@@ -3582,6 +3582,12 @@ printQuery(const PGresult *result, const printQueryOpt *opt,
 
 			if (PQgetisnull(result, r, c))
 cell = opt->nullPrint ? opt->nullPrint : "";
+			else if (PQftype(result, c) == BOOLOID)
+			{
+cell = (PQgetvalue(result, r, c)[0] == 't')
+		? (opt-

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Bertrand Drouvot

Hi,

On Tue, Jun 24, 2025 at 11:20:15AM +0200, Tomas Vondra wrote:
> On 6/24/25 10:24, Bertrand Drouvot wrote:
> > Yeah, same for me with pg_get_shmem_allocations_numa(). It works if
> > pg_numa_query_pages() is done on chunks <= 16 pages but fails if done on 
> > more
> > than 16 pages.
> > 
> > It's also confirmed by test_chunk_size.c attached:
> > 
> > $ gcc-11 -m32 -o test_chunk_size test_chunk_size.c
> > $ ./test_chunk_size
> >  1 pages: SUCCESS (0 errors)
> >  2 pages: SUCCESS (0 errors)
> >  3 pages: SUCCESS (0 errors)
> >  4 pages: SUCCESS (0 errors)
> >  5 pages: SUCCESS (0 errors)
> >  6 pages: SUCCESS (0 errors)
> >  7 pages: SUCCESS (0 errors)
> >  8 pages: SUCCESS (0 errors)
> >  9 pages: SUCCESS (0 errors)
> > 10 pages: SUCCESS (0 errors)
> > 11 pages: SUCCESS (0 errors)
> > 12 pages: SUCCESS (0 errors)
> > 13 pages: SUCCESS (0 errors)
> > 14 pages: SUCCESS (0 errors)
> > 15 pages: SUCCESS (0 errors)
> > 16 pages: SUCCESS (0 errors)
> > 17 pages: 1 errors
> > Threshold: 17 pages
> > 
> > No error if -m32 is not used.
> > 
> > We could work by chunks (16?) on 32 bits but would probably produce 
> > performance
> > degradation (we mention it in the doc though). Also would always 16 be a 
> > correct
> > chunk size? 
> 
> I don't see how this would solve anything?
> 
> AFAICS the problem is the two places are confused about how large the
> array elements are, and get to interpret that differently.

> I don't see how using smaller array makes this correct. That it works is
> more a matter of luck,

Not sure it's luck, maybe the wrong pointers arithmetic has no effect if batch
size is <= 16.

So we have kernel_move_pages() -> kernel_move_pages() (because nodes is NULL 
here
for us as we call "numa_move_pages(pid, count, pages, NULL, status, 0);").

So, if we look at do_pages_stat() ([1]), we can see that it uses an hardcoded
"#define DO_PAGES_STAT_CHUNK_NR 16UL" and that this pointers arithmetic:

"
pages += chunk_nr;
status += chunk_nr;
"

is done but has no effect since nr_pages will exit the loop if we use a batch
size <= 16.

So if this pointer arithmetic is not correct, (it seems that it should advance
by 16 * sizeof(compat_uptr_t) instead) then it has no effect as long as the 
batch
size is <= 16.

Does test_chunk_size also fails at 17 for you?

[1]: https://github.com/torvalds/linux/blob/master/mm/migrate.c

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Bertrand Drouvot

Hi,

On Tue, Jun 24, 2025 at 02:33:59PM +0200, Tomas Vondra wrote:
> 
> 
> On 6/24/25 13:10, Bertrand Drouvot wrote:
> > So, if we look at do_pages_stat() ([1]), we can see that it uses an 
> > hardcoded
> > "#define DO_PAGES_STAT_CHUNK_NR 16UL" and that this pointers arithmetic:
> > 
> > "
> > pages += chunk_nr;
> > status += chunk_nr;
> > "
> > 
> > is done but has no effect since nr_pages will exit the loop if we use a 
> > batch
> > size <= 16.
> > 
> > So if this pointer arithmetic is not correct, (it seems that it should 
> > advance
> > by 16 * sizeof(compat_uptr_t) instead) then it has no effect as long as the 
> > batch
> > size is <= 16.
> > 
> > Does test_chunk_size also fails at 17 for you?
> 
> Yes, it fails for me at 17 too. So you're saying the access within each
> chunk of 16 elements is OK, but that maybe advancing to the next chunk
> is not quite right?

Yes, I think compat_uptr_t usage is missing in do_pages_stat() (while it's used
in do_pages_move()).

Having a chunk size <= DO_PAGES_STAT_CHUNK_NR ensures we are not affected
by the wrong pointer arithmetic.

> In which case limiting the access to 16 entries
> might be a workaround.

Yes, something like:

diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
index c9ae3b45b76..070ad2f13e7 100644
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -689,8 +689,17 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS)
CHECK_FOR_INTERRUPTS();
}

-   if (pg_numa_query_pages(0, shm_ent_page_count, page_ptrs, 
pages_status) == -1)
-   elog(ERROR, "failed NUMA pages inquiry status: %m");
+   #define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= 
DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
+
+   for (uint64 chunk_start = 0; chunk_start < shm_ent_page_count; 
chunk_start += NUMA_QUERY_CHUNK_SIZE) {
+uint64 chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, 
shm_ent_page_count - chunk_start);
+
+   if (pg_numa_query_pages(0, chunk_size, 
&page_ptrs[chunk_start],
+   
&pages_status[chunk_start]) == -1)
+   elog(ERROR, "failed NUMA pages inquiry status: 
%m");
+   }
+
+   #undef NUMA_QUERY_CHUNK_SIZE

> In any case, this sounds like a kernel bug, right?

yes it sounds like a kernel bug.

> I don't have much
> experience with the kernel code, so don't want to rely too much on my
> interpretation of it.

I don't have that much experience too but I think the issue is in 
do_pages_stat()
and that "pages += chunk_nr" should be advanced by sizeof(compat_uptr_t) 
instead.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: SQL:2023 JSON simplified accessor support

2025-06-24 Thread jian he

hi.

I have applied for 0001 to 0006.

static void
jsonb_subscript_transform(SubscriptingRef *sbsref,
  List **indirection,
  ParseState *pstate,
  bool isSlice,
  bool isAssignment)
{
List   *upperIndexpr = NIL;
ListCell   *idx;
sbsref->refrestype = JSONBOID;
sbsref->reftypmod = -1;
if (jsonb_check_jsonpath_needed(*indirection))
{
sbsref->refjsonbpath =
jsonb_subscript_make_jsonpath(pstate, indirection,
  &sbsref->refupperindexpr,
  &sbsref->reflowerindexpr);
return;
}
foreach(idx, *indirection)
{
Node   *i = lfirst(idx);
A_Indices  *ai;
Node   *subExpr;
Assert(IsA(i, A_Indices));
ai = castNode(A_Indices, i);
if (isSlice)
{
Node   *expr = ai->uidx ? ai->uidx : ai->lidx;
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
 errmsg("jsonb subscript does not support slices"),
 parser_errposition(pstate, exprLocation(expr;
}

I am confused by the above error handling:
errmsg("jsonb subscript does not support slices").

we can do
select (jsonb '[1,2,3]')[0:1];
or
SELECT (jb).a[2:3].b FROM test_jsonb_dot_notation;

this is by definition, "slices"?
Anyway, I doubt this error handling will ever be reachable.

jsonb_check_jsonpath_needed checks whether the indirection contains is_slice,
but jsonb_subscript_transform already takes isSlice as an argument.
Maybe we can refactor it somehow.


T_String is a primitive node type with no subnodes.
typedef struct String
{
pg_node_attr(special_read_write)
NodeTagtype;
char   *sval;
} String;
then in src/backend/nodes/nodeFuncs.c:
if (expr && !IsA(expr, String) && WALK(expr))
return true;
we can change it to
if (WALK(expr))
return true;
but in function expression_tree_walker_impl
we have to change it as

case T_MergeSupportFunc:
case T_String:
/* primitive node types with no expression subnodes */
break;

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Tomas Vondra

On 6/24/25 16:41, Christoph Berg wrote:
> Re: Bertrand Drouvot
>> Yes, I think compat_uptr_t usage is missing in do_pages_stat() (while it's 
>> used
>> in do_pages_move()).
> 
> I was also reading the kernel source around that place but you spotted
> the problem before me. This patch resolves the issue here:
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 8cf0f9c9599..542c81ec3ed 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2444,7 +2444,13 @@ static int do_pages_stat(struct mm_struct *mm, 
> unsigned long nr_pages,
>   if (copy_to_user(status, chunk_status, chunk_nr * 
> sizeof(*status)))
>   break;
> 
> - pages += chunk_nr;
> + if (in_compat_syscall()) {
> + compat_uptr_t __user *pages32 = (compat_uptr_t __user 
> *)pages;
> +
> + pages32 += chunk_nr;
> + pages = (const void __user * __user *) pages32;
> + } else
> + pages += chunk_nr;
>   status += chunk_nr;
>   nr_pages -= chunk_nr;
>   }
> 
> 
>> Having a chunk size <= DO_PAGES_STAT_CHUNK_NR ensures we are not affected
>> by the wrong pointer arithmetic.
> 
> Good idea. Buggy kernels will be around for some time.
> 

If it's a reliable fix, then I guess we can do it like this. But won't
that be a performance penalty on everyone? Or does the system split the
array into 16-element chunks anyway, so this makes no difference?

Anyway, maybe we should start by reporting this to the kernel people. Do
you want me to do that, or shall one of you take care of that? I suppose
that'd be better, as you already wrote a fix / know the code better.

>> +   #define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= 
>> DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
>> +
>> +   for (uint64 chunk_start = 0; chunk_start < 
>> shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
> 
> Perhaps optimize it to this:
> 
> #if sizeof(void *) == 4
> #define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= DO_PAGES_STAT_CHUNK_NR 
> (do_pages_stat())*/
> #else
> #define NUMA_QUERY_CHUNK_SIZE 1024
> #endif
> 
> ... or some other bigger number.
> 

Hmm, maybe. I guess that'd hurt only fully 32-bit systems, but that also
seems like a non-issue.

> The loop could also include CHECK_FOR_INTERRUPTS();
> 
>>> I don't have much
>>> experience with the kernel code, so don't want to rely too much on my
>>> interpretation of it.
>>
>> I don't have that much experience too but I think the issue is in 
>> do_pages_stat()
>> and that "pages += chunk_nr" should be advanced by sizeof(compat_uptr_t) 
>> instead.
> 
> Me neither, but I'll try submit this fix.
> 

+1

Thanks to both of you for the report and the investigation.


regards

-- 
Tomas Vondra

Allow ON CONFLICT DO UPDATE to return EXCLUDED values

2025-06-24 Thread Dean Rasheed

The attached patch allows EXCLUDED values to appear in the RETURNING
list of INSERT ... ON CONFLICT DO UPDATE. For example:

CREATE TABLE t (a int PRIMARY KEY, b text);
INSERT INTO t VALUES (1, 'old value');

INSERT INTO t VALUES (1, 'excluded value')
  ON CONFLICT (a) DO UPDATE SET b = 'new value'
  RETURNING a, old.b, new.b, excluded.b;

 a | b | b |   b
---+---+---+
 1 | old value | new value | excluded value
(1 row)

If there is no conflict, then OLD and EXCLUDED values are NULL.


For the most part, this is just an extension of the code to support
returning OLD and NEW. Originally, I had intended to not use
varreturningtype, since EXCLUDED is a different RTE than the result
relation, so the executor just uses the Var's varno (set to INNER_VAR
in setrefs.c). However, the rewriter code needed to support updatable
views and virtual generated columns turns out to be simpler if these
Vars have a separate varreturningtype.

I still have a lot more testing to do, and docs to update, but so far
the results look promising. I'll add this to the next CF.

Regards,
Dean
From fe79b310c287bf4bc6b37e25d53053b342d05767 Mon Sep 17 00:00:00 2001
From: Dean Rasheed 
Date: Mon, 23 Jun 2025 17:20:02 +0100
Subject: [PATCH v1] Allow EXCLUDED in RETURNING list of INSERT ON CONFLICT DO
 UPDATE.

TODO:
* More testing
* Docs
---
 src/backend/executor/execExpr.c   | 44 +--
 src/backend/executor/execExprInterp.c | 21 -
 src/backend/executor/execPartition.c  | 14 
 src/backend/executor/nodeModifyTable.c| 51 +---
 src/backend/optimizer/path/allpaths.c |  3 +-
 src/backend/optimizer/plan/setrefs.c  | 77 ---
 src/backend/optimizer/prep/prepjointree.c | 13 ++--
 src/backend/optimizer/prep/preptlist.c|  9 ++-
 src/backend/optimizer/util/clauses.c  |  2 +-
 src/backend/optimizer/util/var.c  | 11 ++-
 src/backend/parser/analyze.c  | 17 +++-
 src/backend/parser/parse_expr.c   |  2 +-
 src/backend/parser/parse_relation.c   |  4 +-
 src/backend/rewrite/rewriteHandler.c  | 42 ++
 src/backend/rewrite/rewriteManip.c| 65 +---
 src/include/executor/execExpr.h   |  7 +-
 src/include/nodes/execnodes.h |  2 +
 src/include/nodes/primnodes.h | 26 +--
 src/include/parser/parse_node.h   |  7 +-
 src/include/rewrite/rewriteManip.h|  4 +-
 src/test/regress/expected/arrays.out  | 26 +--
 src/test/regress/expected/inherit.out |  9 ++-
 src/test/regress/expected/insert_conflict.out | 26 ---
 src/test/regress/expected/updatable_views.out | 16 +++-
 src/test/regress/sql/arrays.sql   |  9 ++-
 src/test/regress/sql/inherit.sql  |  3 +-
 src/test/regress/sql/insert_conflict.sql  |  8 +-
 src/test/regress/sql/updatable_views.sql  |  6 +-
 src/tools/pgindent/typedefs.list  |  1 +
 29 files changed, 366 insertions(+), 159 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index f1569879b52..d13c82bdd30 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -455,6 +455,7 @@ ExecBuildProjectionInfo(List *targetList,
 	/*
 	 * Get the tuple from the relation being scanned, or the
 	 * old/new tuple slot, if old/new values were requested.
+	 * Should not see EXCLUDED here (should be INNER_VAR).
 	 */
 	switch (variable->varreturningtype)
 	{
@@ -469,6 +470,10 @@ ExecBuildProjectionInfo(List *targetList,
 			scratch.opcode = EEOP_ASSIGN_NEW_VAR;
 			state->flags |= EEO_FLAG_HAS_NEW;
 			break;
+		case VAR_RETURNING_EXCLUDED:
+			elog(ERROR, "wrong varno %d (expected %d) for variable returning excluded",
+ variable->varno, INNER_VAR);
+			break;
 	}
 	break;
 			}
@@ -972,6 +977,10 @@ ExecInitExprRec(Expr *node, ExprState *state,
 	scratch.opcode = EEOP_NEW_SYSVAR;
 	state->flags |= EEO_FLAG_HAS_NEW;
 	break;
+case VAR_RETURNING_EXCLUDED:
+	elog(ERROR, "wrong varno %d (expected %d) for variable returning excluded",
+		 variable->varno, INNER_VAR);
+	break;
 			}
 			break;
 	}
@@ -1007,6 +1016,10 @@ ExecInitExprRec(Expr *node, ExprState *state,
 	scratch.opcode = EEOP_NEW_VAR;
 	state->flags |= EEO_FLAG_HAS_NEW;
 	break;
+case VAR_RETURNING_EXCLUDED:
+	elog(ERROR, "wrong varno %d (expected %d) for variable returning excluded",
+		 variable->varno, INNER_VAR);
+	break;
 			}
 			break;
 	}
@@ -2638,10 +2651,23 @@ ExecInitExprRec(Expr *node, ExprState *state,
 ReturningExpr *rexpr = (ReturningExpr *) node;
 int			retstep;
 
-/* Skip expression evaluation if OLD/NEW row doesn't exist */
+/*
+ * Ski

Re: BackendKeyData is mandatory?

2025-06-24 Thread Jacob Champion

On Tue, Jun 24, 2025 at 1:36 AM Jelte Fennema-Nio  wrote:
> Okay, that sounds widely used enough to continue that we should
> probably change the new PG18 behaviour of PQgetCancel and
> PQcancelCreate like I suggested. Failing all psycopg2 connection
> attempts against AWS its proxy service doesn't seem like something
> we'd want to do.

So that's
1) return an (empty) cancellation object even if the server has not
sent a key, and
2) error out when trying to cancel with an empty object?

That sounds reasonable to me.

Are there any reading along who want us to continue sending an
all-zeroes CancelRequest if the server has not sent a key? Personally,
I don't feel a need to push for that without evidence that it's
actually used, and both RDS Proxy and Cockroach [1] seem to fall in
the "don't support cancellation at all" bucket.

Thanks!
--Jacob

[1] https://github.com/cockroachdb/cockroach/issues/32973

Re: BackendKeyData is mandatory?

2025-06-24 Thread Jelte Fennema-Nio

On Tue, 24 Jun 2025 at 17:07, Jacob Champion
 wrote:
> So that's
> 1) return an (empty) cancellation object even if the server has not
> sent a key, and
> 2) error out when trying to cancel with an empty object?

Yes (and empty being non-NULL obviously)

> That sounds reasonable to me.

Alright, let's do that then. I could probably write a patch for that tomorrow.

Re: Simplify VM counters in vacuum code

2025-06-24 Thread Melanie Plageman

On Tue, Jun 24, 2025 at 9:17 AM Melanie Plageman
 wrote:
>
> On Tue, Jun 24, 2025 at 4:01 AM Nazir Bilal Yavuz  wrote:
>
> > I think we do not need to check visibility of the page here, as we
> > already know that page was not all-visible due to LP_DEAD items. We
> > can simply increment the vacrel->vm_new_visible_pages and check
> > whether the page is frozen.
>
> My idea with the assert was sort of to codify the expectation that the
> page couldn't have been all-visible because of the dead items. But
> perhaps that is obvious. But you are right that the if statement is
> not needed. Perhaps I ought to remove the asserts as they may be more
> confusing than helpful.

Thinking about this more, I think it is better without the asserts.
I've done this in attached v3.

- Melanie
From d4386febdb8eaae9c49d78d2d9815de6f653d1c6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Wed, 18 Jun 2025 16:12:15 -0400
Subject: [PATCH v3] Simplify vacuum VM update logging counters

We can simplify the VM counters added in dc6acfd910b8 to
lazy_vacuum_heap_page() and lazy_scan_new_or_empty().

We won't invoke lazy_vacuum_heap_page() unless there are dead line
pointers, so we know the page can't be all-visible.

In lazy_scan_new_or_empty(), we only update the VM if the page-level
hint PD_ALL_VISIBLE is clear, and the VM bit cannot be set if the page
level bit is clear because a subsequent page update would fail to clear
the visibility map bit.

Simplify the logic for determining which log counters to increment based
on this knowledge.

Author: Melanie Plageman 
Reviewed-by: Nazir Bilal Yavuz 
Reviewed-by: Masahiko Sawada 
Discussion: https://postgr.es/m/flat/CAAKRu_a9w_n2mwY%3DG4LjfWTvRTJtjbfvnYAKi4WjO8QXHHrA0g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 53 +---
 1 file changed, 16 insertions(+), 37 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 09416450af9..9912ec52030 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1872,8 +1872,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
-			uint8		old_vmbits;
-
 			START_CRIT_SECTION();
 
 			/* mark buffer dirty before writing a WAL record */
@@ -1893,24 +1891,16 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 log_newpage_buffer(buf, true);
 
 			PageSetAllVisible(page);
-			old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-		   InvalidXLogRecPtr,
-		   vmbuffer, InvalidTransactionId,
-		   VISIBILITYMAP_ALL_VISIBLE |
-		   VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set(vacrel->rel, blkno, buf,
+			  InvalidXLogRecPtr,
+			  vmbuffer, InvalidTransactionId,
+			  VISIBILITYMAP_ALL_VISIBLE |
+			  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
-			/*
-			 * If the page wasn't already set all-visible and/or all-frozen in
-			 * the VM, count it as newly set for logging.
-			 */
-			if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-			{
-vacrel->vm_new_visible_pages++;
-vacrel->vm_new_visible_frozen_pages++;
-			}
-			else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0)
-vacrel->vm_new_frozen_pages++;
+			/* Count the newly all-frozen pages for logging. */
+			vacrel->vm_new_visible_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 		}
 
 		freespace = PageGetHeapFreeSpace(page);
@@ -2915,7 +2905,6 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
  &all_frozen))
 	{
-		uint8		old_vmbits;
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
 		if (all_frozen)
@@ -2925,25 +2914,15 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		}
 
 		PageSetAllVisible(page);
-		old_vmbits = visibilitymap_set(vacrel->rel, blkno, buffer,
-	   InvalidXLogRecPtr,
-	   vmbuffer, visibility_cutoff_xid,
-	   flags);
+		visibilitymap_set(vacrel->rel, blkno, buffer,
+		  InvalidXLogRecPtr,
+		  vmbuffer, visibility_cutoff_xid,
+		  flags);
 
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (all_frozen)
-vacrel->vm_new_visible_frozen_pages++;
-		}
-
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- all_frozen)
-			vacrel->vm_new_frozen_pages++;
+		/* Count the newly set VM page for logging */
+		vacrel->vm_new_visible_pages++;
+		if (all_frozen)
+			vacrel->vm_new_visible_frozen_pages++;
 	}
 
 	/* Revert to the previous phase information for error traceback */
-- 
2.34.1

Re: BackendKeyData is mandatory?

2025-06-24 Thread Tom Lane

Jacob Champion  writes:
> So that's
> 1) return an (empty) cancellation object even if the server has not
> sent a key, and
> 2) error out when trying to cancel with an empty object?

> That sounds reasonable to me.

+1.

> Are there any reading along who want us to continue sending an
> all-zeroes CancelRequest if the server has not sent a key?

We might have to consider doing so if evidence emerges of a server
that depends on us doing that, but right now we have no such evidence.

On the whole, reporting an error seems like a better user experience
than silently sending a cancel we know won't work.  But we have to
delay the error until a cancel is actually attempted.

regards, tom lane

Re: Improve the performance of Unicode Normalization Forms.

2025-06-24 Thread Alexander Borisov


20.06.2025 20:20, Jeff Davis wrote:

On Fri, 2025-06-20 at 17:51 +0300, Alexander Borisov wrote:

I don't quite see how this compares to the implementation on Rust. In
the link provided, they use perfect hash, which I get rid of and get
a x2 boost.
If you take ICU implementations in C++, I have always considered them
slow, at least when used in C code.
I may well run benchmarks and compare the performance of the approach
in Postgres and ICU. But this is beyond the scope of the patches
under
discussion.


Are you saying that, with these patches, Postgres will offer the
fastest open-source Unicode normalization? If so, that would be very
cool.


That's what we're aiming for - to implement the fastest approach.
By applying the proposed patches (two patches) we get the fastest
codepoints search by tables. This is evidenced by the measurements made
here and earlier in the patch for unicode case improvement.

After these patches are compiled, I will improve the C normalization
code directly, optimize it. That's when we can take benchmarks and say 
with confidence that we're the best at speed.



The reason I'm asking is because, if there are multiple open source
implementations, we should either have the best one, or just borrow
another one as long as it has a suitable license (perhaps translating
to C as necessary).


Before getting into this "fight" I studied different approaches to
searching for the necessary codepoints in tables (hash, binary search,
radix trees...) and came to the conclusion that the approach I proposed
(range index) is the fastest for this purpose.


--
Regards,
Alexander Borisov

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Christoph Berg

Re: Tomas Vondra
> If it's a reliable fix, then I guess we can do it like this. But won't
> that be a performance penalty on everyone? Or does the system split the
> array into 16-element chunks anyway, so this makes no difference?

There's still the overhead of the syscall itself. But no idea how
costly it is to have this 16-step loop in user or kernel space.

We could claim that on 32-bit systems, shared_buffers would be smaller
anyway, so there the overhead isn't that big. And the step size should
be larger (if at all) on 64-bit.

> Anyway, maybe we should start by reporting this to the kernel people. Do
> you want me to do that, or shall one of you take care of that? I suppose
> that'd be better, as you already wrote a fix / know the code better.

Submitted: https://marc.info/?l=linux-mm&m=175077821909222&w=2

Christoph

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Christoph Berg

Re: Bertrand Drouvot
> Yes, something like:
> 
> diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
> index c9ae3b45b76..070ad2f13e7 100644
> --- a/src/backend/storage/ipc/shmem.c
> +++ b/src/backend/storage/ipc/shmem.c
> @@ -689,8 +689,17 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS)
> CHECK_FOR_INTERRUPTS();
> }
> 
> -   if (pg_numa_query_pages(0, shm_ent_page_count, page_ptrs, 
> pages_status) == -1)
> -   elog(ERROR, "failed NUMA pages inquiry status: %m");
> +   #define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= 
> DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
> +
> +   for (uint64 chunk_start = 0; chunk_start < 
> shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
> +uint64 chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, 
> shm_ent_page_count - chunk_start);
> +
> +   if (pg_numa_query_pages(0, chunk_size, 
> &page_ptrs[chunk_start],
> +   
> &pages_status[chunk_start]) == -1)
> +   elog(ERROR, "failed NUMA pages inquiry 
> status: %m");
> +   }
> +
> +   #undef NUMA_QUERY_CHUNK_SIZE

I uploaded a variant of this patch to Debian and it seems to have fixed the 
issue:

https://buildd.debian.org/status/package.php?p=postgresql-18&suite=experimental

(No reply from linux-mm yet.)

Christoph
Work around a Linux bug in move_pages

In 32-bit mode on 64-bit kernels, move_pages() does not correctly advance to
the next chunk. Work around by not asking for more than 16 pages at once so
move_pages() internal loop is not executed more than once.

https://www.postgresql.org/message-id/flat/a3a4fe3d-1a80-4e03-aa8e-150ee15f6c35%40vondra.me#6abe7eaa802b5b07bb70cc3229e63a9f
https://marc.info/?l=linux-mm&m=175077821909222&w=2

--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -390,8 +390,15 @@ pg_buffercache_numa_pages(PG_FUNCTION_AR
memset(os_page_status, 0xff, sizeof(int) * os_page_count);
 
/* Query NUMA status for all the pointers */
-   if (pg_numa_query_pages(0, os_page_count, os_page_ptrs, 
os_page_status) == -1)
-   elog(ERROR, "failed NUMA pages inquiry: %m");
+#define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= DO_PAGES_STAT_CHUNK_NR 
(do_pages_stat())*/
+   for (uint64 chunk_start = 0; chunk_start < os_page_count; 
chunk_start += NUMA_QUERY_CHUNK_SIZE) {
+   uint64 chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, 
os_page_count - chunk_start);
+
+   if (pg_numa_query_pages(0, chunk_size, 
&os_page_ptrs[chunk_start],
+   &os_page_status[chunk_start]) 
== -1)
+   elog(ERROR, "failed NUMA pages inquiry status: 
%m");
+   }
+#undef NUMA_QUERY_CHUNK_SIZE
 
/* Initialize the multi-call context, load entries about 
buffers */
 
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -689,8 +689,15 @@ pg_get_shmem_allocations_numa(PG_FUNCTIO
CHECK_FOR_INTERRUPTS();
}
 
-   if (pg_numa_query_pages(0, shm_ent_page_count, page_ptrs, 
pages_status) == -1)
-   elog(ERROR, "failed NUMA pages inquiry status: %m");
+#define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= DO_PAGES_STAT_CHUNK_NR 
(do_pages_stat())*/
+   for (uint64 chunk_start = 0; chunk_start < shm_ent_page_count; 
chunk_start += NUMA_QUERY_CHUNK_SIZE) {
+   uint64 chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, 
shm_ent_page_count - chunk_start);
+
+   if (pg_numa_query_pages(0, chunk_size, 
&page_ptrs[chunk_start],
+   &pages_status[chunk_start]) == 
-1)
+   elog(ERROR, "failed NUMA pages inquiry status: 
%m");
+   }
+#undef NUMA_QUERY_CHUNK_SIZE
 
/* Count number of NUMA nodes used for this shared memory entry 
*/
memset(nodes, 0, sizeof(Size) * (max_nodes + 1));

Re: Fixes inconsistent behavior in vacuum when it processes multiple relations

2025-06-24 Thread Nathan Bossart

On Tue, Jun 24, 2025 at 10:18:18AM +0900, Michael Paquier wrote:
> Knowing that I'm indirectly responsible for this mess, I would like to
> take care of that myself.  Would that be OK for you?

I would be grateful if you took care of it.

> Another approach that we could use is an injection point with some
> LOGs that show some information based on what VacuumParams holds.
> That would be the cheapest method (no need for any data), and entirely
> stable as we would look at the stack.  Perhaps going down to that is
> not really necessary for the sake of this thread.

+1 for this.  I did something similar to verify the other bugs I reported,
and this seems far easier to maintain than potentially-unstable tests that
require lots of setup and that depend on secondary effects.

-- 
nathan

Re: Logrep launcher race conditions leading to slow tests

2025-06-24 Thread Tom Lane

Amit Kapila  writes:
> On Tue, Jun 24, 2025 at 5:26 AM Tom Lane  wrote:
>> 1. WaitForReplicationWorkerAttach sometimes has to clear a process
>> latch event so that it can keep waiting for the worker to launch.
>> It neglects to set the latch again, allowing ApplyLauncherMain
>> to miss events.

> There was a previous discussion to fix this behavior. Heikki has
> proposed a similar fix for this, but at the caller. See the patch
> attached in email [1].

Ah, thank you for that link.  I vaguely recalled that we'd discussed
this strange behavior before, but I did not realize that anyone had
diagnosed a cause.  I don't much like any of the patches proposed
in that thread though --- they seem overcomplicated or off-point.

I do not think we can switch to having two latches here.  The
only reason WaitForReplicationWorkerAttach needs to pay attention
to the process latch at all is that it needs to service
CHECK_FOR_INTERRUPTS() conditions in a timely fashion, which is
also true in the ApplyLauncherMain loop.  We can't realistically
expect other processes to signal different latches depending on
where the launcher is waiting, so those cases have to be triggered
by the same latch.

However, WaitForReplicationWorkerAttach can't service any
latch-setting conditions other than those managed by
CHECK_FOR_INTERRUPTS().  So if CHECK_FOR_INTERRUPTS() returns,
we have some other triggering condition, which is the business
of some outer code level to deal with.  Having it re-set the latch
to allow that to happen promptly after it returns seems like a
pretty straightforward answer to me.

I do note Heikki's concern about whether we could get rid of the
sleep-and-retry looping in WaitForReplicationWorkerAttach in
favor of getting signaled somehow, and I agree with that as a
possible future improvement.  But I don't especially see why that
demands another latch; in fact, unless we want to teach WaitLatch
to be able to wait on more than one latch, it *can't* be a
separate latch from the one that receives CHECK_FOR_INTERRUPTS()
conditions.

>> 4. In process_syncing_tables_for_apply (the other caller of
>> logicalrep_worker_launch), it seems okay to ignore the
>> result of logicalrep_worker_launch, but I think it should
>> fill hentry->last_start_time before not after the call.

> With this, won't we end up retrying to launch the worker sooner if the
> launch took time, but still failed to launch the worker?

That code already does update last_start_time unconditionally, and
I think that's the right behavior for the same reason that it's
right for ApplyLauncherMain to do ApplyLauncherSetWorkerStartTime
whether or not logicalrep_worker_launch succeeds.  If the worker
launch fails, we don't want to retry instantly, we want to wait
wal_retrieve_retry_interval before retrying.  My desire to change
this code is just based on the idea that it's not clear what else
if anything looks at this hashtable, and by the time that
logicalrep_worker_launch returns the system state could be a lot
different.  (For instance, the worker could have started and
failed already.)  So, just as in ApplyLauncherMain, I'd rather
store the start time before calling logicalrep_worker_launch.

BTW, it strikes me that if we're going to leave
process_syncing_tables_for_apply() ignoring the result of
logicalrep_worker_launch, it'd be smart to insert an explicit
(void) cast to show that that's intentional.  Otherwise Coverity
is likely to complain about how we're ignoring the result in
one place and not the other.

regards, tom lane

Re: pg_dump --with-* options

2025-06-24 Thread Robert Haas

On Tue, Jun 24, 2025 at 12:48 PM Nathan Bossart
 wrote:
> On Mon, Jun 23, 2025 at 01:38:10PM -0400, Robert Haas wrote:
> > I had thought we had a consensus that pg_upgrade should preserve stats
> > but regularly pg_dump shouldn't include them; perhaps I misunderstood
> > or that changed.
>
> I think it's a bit of both.  I skimmed through the past discussions and
> found that not only was there a rough consensus in 2024 that stats _should_
> be on by default [0], but also that an updated vote tally didn't show much
> of a consensus at all [1].  Like you, I thought we had pretty much closed
> that door, but the aforementioned analysis along with further discussion
> has me convinced that we might want to reconsider [2].

Well, I don't know: I still think that's the right answer, so I don't
really want to reconsider, but I understand that I'm not in charge
here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add \pset options for boolean value display

2025-06-24 Thread Tom Lane

"David G. Johnston"  writes:
> On Thu, Mar 20, 2025 at 8:24 PM David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>> It's \pset null for boolean values

> v1, Ready aside from bike-shedding the name.

Do we really want this?  It's the sort of thing that has a strong
potential to break anything that reads psql output --- and I'd
urge you to think that human consumers of psql output may well
be the minority.  There's an awful lot of scripts out there.

I concede that \pset null hasn't had a huge amount of pushback,
but that doesn't mean that making boolean output unpredictable
will be cost-free.  And the costs won't be paid by you (or me),
but by people who didn't ask for it.

regards, tom lane

Re: pg_dump --with-* options

2025-06-24 Thread Nathan Bossart

On Mon, Jun 23, 2025 at 01:38:10PM -0400, Robert Haas wrote:
> I had thought we had a consensus that pg_upgrade should preserve stats
> but regularly pg_dump shouldn't include them; perhaps I misunderstood
> or that changed.

I think it's a bit of both.  I skimmed through the past discussions and
found that not only was there a rough consensus in 2024 that stats _should_
be on by default [0], but also that an updated vote tally didn't show much
of a consensus at all [1].  Like you, I thought we had pretty much closed
that door, but the aforementioned analysis along with further discussion
has me convinced that we might want to reconsider [2].

[0] 
https://postgr.es/m/e16cd9caf4f5229a152d318d70b4d323a03e3539.ca...@j-davis.com
[1] https://postgr.es/m/aFCIB1AwXuNzxHXX%40nathan
[2] https://postgr.es/m/aFC9rWSeFz7c07uI%40nathan

-- 
nathan

Re: Add \pset options for boolean value display

2025-06-24 Thread David G. Johnston

On Tue, Jun 24, 2025 at 3:30 PM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > On Thu, Mar 20, 2025 at 8:24 PM David G. Johnston <
> > david.g.johns...@gmail.com> wrote:
> >> It's \pset null for boolean values
>
> > v1, Ready aside from bike-shedding the name.
>
> Do we really want this?  It's the sort of thing that has a strong
> potential to break anything that reads psql output --- and I'd
> urge you to think that human consumers of psql output may well
> be the minority.  There's an awful lot of scripts out there.
>
> I concede that \pset null hasn't had a huge amount of pushback,
> but that doesn't mean that making boolean output unpredictable
> will be cost-free.  And the costs won't be paid by you (or me),
> but by people who didn't ask for it.
>
>
If we didn't use psql to produce all of our examples I'd be a bit more
accepting of this position.  Yes, users of it need to do so responsibly.
But we have tons of pretty-presentation-oriented options in psql so, yes, I
do believe this is well within its charter.

David J.

Re: Safeguards against incorrect fd flags for fsync()

2025-06-24 Thread Tom Lane

Michael Banck  writes:
> Is removing the debug_parallel_query=on configuration for HEAD a valid
> mode of operation for a buildfarm animal? I ran the tests 10 times in a
> row without issues today.

Sure, especially on slower machines.  It's pretty much owner's
option whether to use that.

regards, tom lane

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Christoph Berg

Re: Bertrand Drouvot
> Yes, I think compat_uptr_t usage is missing in do_pages_stat() (while it's 
> used
> in do_pages_move()).

I was also reading the kernel source around that place but you spotted
the problem before me. This patch resolves the issue here:

diff --git a/mm/migrate.c b/mm/migrate.c
index 8cf0f9c9599..542c81ec3ed 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2444,7 +2444,13 @@ static int do_pages_stat(struct mm_struct *mm, unsigned 
long nr_pages,
if (copy_to_user(status, chunk_status, chunk_nr * 
sizeof(*status)))
break;

-   pages += chunk_nr;
+   if (in_compat_syscall()) {
+   compat_uptr_t __user *pages32 = (compat_uptr_t __user 
*)pages;
+
+   pages32 += chunk_nr;
+   pages = (const void __user * __user *) pages32;
+   } else
+   pages += chunk_nr;
status += chunk_nr;
nr_pages -= chunk_nr;
}


> Having a chunk size <= DO_PAGES_STAT_CHUNK_NR ensures we are not affected
> by the wrong pointer arithmetic.

Good idea. Buggy kernels will be around for some time.

> +   #define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= 
> DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
> +
> +   for (uint64 chunk_start = 0; chunk_start < 
> shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {

Perhaps optimize it to this:

#if sizeof(void *) == 4
#define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= DO_PAGES_STAT_CHUNK_NR 
(do_pages_stat())*/
#else
#define NUMA_QUERY_CHUNK_SIZE 1024
#endif

... or some other bigger number.

The loop could also include CHECK_FOR_INTERRUPTS();

> > I don't have much
> > experience with the kernel code, so don't want to rely too much on my
> > interpretation of it.
> 
> I don't have that much experience too but I think the issue is in 
> do_pages_stat()
> and that "pages += chunk_nr" should be advanced by sizeof(compat_uptr_t) 
> instead.

Me neither, but I'll try submit this fix.

Christoph

Re: Add Option To Check All Addresses For Matching target_session_attr

2025-06-24 Thread Navrotskiy Artem

Hi Andrey! I would like to add that from my point of view the current behavior directly contradicts the documentation (https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-MULTIPLE-HOSTS):When multiple hosts are specified, or when a single host name is translated to multiple addresses, all the hosts and addresses will be tried in order, until one succeeds. If none of the hosts can be reached, the connection fails. If a connection is established successfully, but authentication fails, the remaining hosts in the list are not tried.In my case, I want to make a single DNS name that resolves to all the hosts in the PostgreSQL cluster. This will allow you to add/remove cluster nodes without changing the service configuration. Now I have to change the connection string and restart the application service for each operation when adding/deleting a cluster node. Working with the PostgreSQL service and cluster is the responsibility of different teams, and I really want to separate them. The application service and PostgreSQL are the responsibility of different teams, and I really want to be able to work with them independently Кому: Andrew Jackson (andrewjackson...@gmail.com), Vladimir Sitnikov (sitnikov.vladi...@gmail.com), Dave Page (dp...@pgadmin.org);Копия: pgsql-hackers (pgsql-hack...@postgresql.org);Тема: Add Option To Check All Addresses For Matching target_session_attr;24.06.2025, 18:25, "Andrey Borodin" :Hi Andrew!cc Jelte, I suspect he might be interested.  On 20 Nov 2024, at 20:51, Andrew Jackson  wrote:  Would appreciate any feedback on the applicability/relevancy of the goal here or the implementation.Thank you for raising the issue. Following our discussion in Discord I'm putting my thoughts to list.ContextA DNS record might return several IPs. Consider we have a connection string with "host=A,B", A is resolved to 1.1.1.1,2.2.2.2, B to 3.3.3.3,4.4.4.4.If we connect with "target_session_attrs=read-write" IPs 1.1.1.1 and 3.3.3.3 will be probed, but 2.2.2.2 and 4.4.4.4 won't (if 1.1.1.1 and 3.3.3.3 responded).If we enable libpq load balancing some random 2 IPs will be probed.IMO it's a bug, at least when load balancing is enabled. Let's consider if we can change default behavior here. I suspect we can't do it for "load_balance_hosts=disable". And even for load balancing this might be too unexpected change for someone.Further I only consider proposal not as a bug fix, but as a feature.In Discord we have surveyed some other drivers.pgx treats all IPs as different servers [1]. npgsql goes through all IPs one-by-one always [2]. PGJDBC are somewhat in a decision process [3] (cc Dave and Vladimir, if they would like to provide some input).ReviewThe patch needs a rebase. It's trivial, so please fine attached. The patch needs real commit message, it's not trivial :)We definitely need to adjust tests [0]. We need to change 004_load_balance_dns.pl so that it tests target_session_attrs too.Some documentation would be nice.I do not like how this check is performed+ if (conn->check_all_addrs && conn->check_all_addrs[0] == '1')Let's make it like load balancing is done [4].Finally, let's think about naming alternatives for "check_all_addrs".I think that's enough for a first round of the review. If it's not a bug, but a feature - it's a very narrow window to get to 18. But we might be lucky...Thank you!Best regards, Andrey Borodin.[0] https://github.com/postgres/postgres/commit/7f5b19817eaf38e70ad1153db4e644ee9456853e#diff-b05b74d2a97d7f74d4311ba1702d732f0df1b101c6ac99c146b51215174cf3ffR94[1] https://github.com/jackc/pgx/blob/master/pgconn/pgconn.go#L177[2] https://github.com/npgsql/npgsql/blob/7f1a59fa8dc1ccc34a70154f49a768e1abf826ba/src/Npgsql/Internal/NpgsqlConnector.cs#L986[3] https://github.com/pgjdbc/pgjdbc/pull/3012#discussion_r1408069450[4] https://github.com/postgres/postgres/commit/7f5b19817eaf38e70ad1153db4e644ee9456853e#diff-8d819454e061b9d4cdae9c8922ded05753a629d70f2ac1de1d4f6d5a4aeb7f68R1660  -- Best regards,Artem Navrotskiy

Re: Logrep launcher race conditions leading to slow tests

2025-06-24 Thread Ashutosh Bapat

On Tue, Jun 24, 2025 at 5:26 AM Tom Lane  wrote:
>
> I've been annoyed for awhile because, while a parallel check-world
> run usually takes a bit over a minute on my machine, sometimes it
> takes between three and four minutes.  I was finally able to
> track down what is happening, and it's this: sometimes one or
> another of the src/test/subscription tests takes an extra three
> minutes because the logical replication launcher is sleeping
> instead of launching the next task.  It eventually reaches its
> hard-wired maximum wait of DEFAULT_NAPTIME_PER_CYCLE (3min),
> wakes up and notices it has something to do, and then we're
> on our merry way again.  I'm not sure how often this is a problem
> in the real world, but it happens often enough to be annoying
> during development.
>
> There are two distinct bugs involved:
>
> 1. WaitForReplicationWorkerAttach sometimes has to clear a process
> latch event so that it can keep waiting for the worker to launch.
> It neglects to set the latch again, allowing ApplyLauncherMain
> to miss events.

Agreed.

>
> 2. ApplyLauncherMain ignores a failure return from
> logicalrep_worker_launch, which is bad because (unless it has
> another worker launch pending) it will then sleep for
> DEFAULT_NAPTIME_PER_CYCLE before reconsidering.  What it ought to do
> is try again after wal_retrieve_retry_interval.  This situation can
> arise in resource-exhaustion cases (cf. the early exits in
> logicalrep_worker_launch), but what's more likely in the regression
> tests is that the worker stops due to some error condition before
> WaitForReplicationWorkerAttach sees it attached, which is then duly
> reported as a failure.

Agreed.

>
> It's possible to make the test slowness extremely reproducible
> with this change, which widens the race condition window for
> both problems:
>
> diff --git a/src/backend/replication/logical/launcher.c 
> b/src/backend/replication/logical/launcher.c
> index 1c3c051403d..724e82bcdc1 100644
> --- a/src/backend/replication/logical/launcher.c
> +++ b/src/backend/replication/logical/launcher.c
> @@ -214,7 +214,7 @@ WaitForReplicationWorkerAttach(LogicalRepWorker *worker,
>  */
> rc = WaitLatch(MyLatch,
>WL_LATCH_SET | WL_TIMEOUT | 
> WL_EXIT_ON_PM_DEATH,
> -  10L, WAIT_EVENT_BGWORKER_STARTUP);
> +  1000L, 
> WAIT_EVENT_BGWORKER_STARTUP);
>
> if (rc & WL_LATCH_SET)
> {
>
> I don't recommend that as a permanent change, but it's helpful
> for testing the attached patch.

Thanks. That makes reproduction easier.

>
> In the attached, I made two other non-cosmetic changes:
>
> 3. In WaitForReplicationWorkerAttach, capture worker->in_use
> before not after releasing LogicalRepWorkerLock.  Maybe there
> is a reason why that's not a dangerous race condition, but
> it sure is un-obvious to me.

Looking at the code this seems to be dangerous. We will return from
the condition if worker->in_use = false which means that the slot is
free. By the time we release LogicalRepWorkerLock and read the value
of worker->in_use, it may have been filled with some other worker and
thus return true instead of false, and the caller would falsely assume
that the worker was successfully launched. That itself might cause the
launcher to not start the worker again. That's possibly rare but not
impossible.

+1 for this change.

>
> 4. In process_syncing_tables_for_apply (the other caller of
> logicalrep_worker_launch), it seems okay to ignore the
> result of logicalrep_worker_launch, but I think it should
> fill hentry->last_start_time before not after the call.
> Otherwise we might be changing a hashtable entry that's
> no longer relevant to this worker.

A hash entry is associated with a table, not the worker. In case the
worker fails to launch it records the time when worker launch for that
table was attempted so that next attempt could be well-spaced in time.
I am not able your last statement, what is the entry's relevance to
the worker.

But your change makes this code similar to ApplyLauncherMain(), which
deals with subscriptions. +1 for the consistency.

--
Best Wishes,
Ashutosh Bapat

Re: Logical Replication of sequences

2025-06-24 Thread Shlok Kyal

On Sun, 22 Jun 2025 at 08:05, vignesh C  wrote:
>
> On Thu, 19 Jun 2025 at 11:26, Nisha Moond  wrote:
> >
> > Hi,
> >
> > Here are my review comments for v20250610 patches:
> >
> > Patch-0005:sequencesync.c
> >
> > 1) report_error_sequences()
> >
> > In case there are both missing and mismatched sequences, the ERROR
> > message logged is -
> >
> > ```
> > 2025-05-28 14:22:19.898 IST [392259] ERROR:  logical replication
> > sequence synchronization failed for subscription "subs": sequences
> > ("public"."n84") are missing on the publisher. Additionally,
> > parameters differ for the remote and local sequences ("public.n1")
> > ```
> >
> > I feel this error message is quite long. Would it be possible to split
> > it into ERROR and DETAIL? Also, if feasible, we could consider
> > including a HINT, as was done in previous versions.
> >
> > I explored a few possible ways to log this error with a hint. Attached
> > top-up patch has the suggestion implemented. Please see if it seems
> > okay to consider.
>
> This looks good, merged it.
> > ~~~
> >
> > 2) copy_sequences():
> > + /* Retrieve the sequence object fetched from the publisher */
> > + for (int i = 0; i < batch_size; i++)
> > + {
> > + LogicalRepSequenceInfo *sequence_info =
> > lfirst(list_nth_cell(remotesequences, current_index + i));
> > +
> > + if (!strcmp(sequence_info->nspname, nspname) &&
> > + !strcmp(sequence_info->seqname, seqname))
> > + seqinfo = sequence_info;
> > + }
> >
> > The current logic performs a search through the local sequence list
> > for each sequence fetched from the publisher, repeating the traverse
> > of 100(batch size) length of the list per sequence, which may impact
> > performance.
> >
> > To improve efficiency, we can optimize it by sorting the local list
> > and traverses it only once for matching. Kindly review the
> > implementation in the attached top-up patch and consider merging it if
> > it looks good to you.
>
> Looks good, merged it.
>
> > ~~~
> >
> > 3) copy_sequences():
> > + if (message_level_is_interesting(DEBUG1))
> > + ereport(DEBUG1,
> > + errmsg_internal("logical replication synchronization for
> > subscription \"%s\", sequence \"%s\" has finished",
> > + MySubscription->name,
> > + seqinfo->seqname));
> > +
> > + batch_succeeded_count++;
> > + }
> >
> > The current debug log might be a bit confusing when sequences with the
> > same name exist in different schemas. To improve clarity, we could
> > include the schema name in the message, e.g.,
> > " ... sequence "schema"."sequence" has finished".
>
> Modified
>
> > 
> >
> > Few minor comments in doc - Patch-0006 : logical-replication.sgml
> >
> > 4)
> > +  
> > +   To replicate sequences from a publisher to a subscriber, first publish 
> > them
> > +   using 
> > +   CREATE PUBLICATION ... FOR ALL SEQUENCES.
> > +  
> >
> > I think it would be better to use "To synchronize" instead of "To
> > replicate" here, to maintain consistency and avoid confusion between
> > replication and synchronization.
>
> Modified
>
> > 5)
> > +   
> > +Update the sequences at the publisher side few times.
> > +
> >
> > /side few /side a few /
> >
> > 6) Can avoid using multiple "or" in the sentences below:
> >
> > 6a)
> > -   a change set or replication set.  Each publication exists in only
> > one database.
> > +   generated from a table or a group of tables or the current state of all
> > +   sequences, and might also be described as a change set or replication 
> > set
> >
> > / table or a group of tables/ table, a group of tables/
>
> Modified
>
> > 6b)
> > +   Publications may currently only contain tables or sequences. Objects 
> > must be
> > +   added explicitly, except when a publication is created using
> > +   FOR TABLES IN SCHEMA, or FOR ALL
> > TABLES,
> > +   or FOR ALL SEQUENCES. Unlike tables, the current 
> > state of
> >
> > / IN SCHEMA, or FOR ALL TABLES/ IN
> > SCHEMA, FOR ALL TABLES
>
> Modified
>
> Thanks for the comment, the attached v20250622 version patch has the
> changes for the same.
>
Hi Vignesh,

I have reviewed the patches 0001 and 0002. I do not have any comments
for 0001 patch. Here are comments for 0002 patch.

1. Initially, I have created a publication on sequence s1.
postgres=# CREATE PUBLICATION pub1 FOR ALL SEQUENCES;
CREATE PUBLICATION
postgres=# ALTER PUBLICATION pub1 SET TABLE t1;
ALTER PUBLICATION
postgres=# \d s1
 Sequence "public.s1"
  Type  | Start | Minimum |   Maximum   | Increment | Cycles? | Cache
+---+-+-+---+-+---
 bigint | 1 |   1 | 9223372036854775807 | 1 | no  | 1
Publications:
"pub1"
postgres=# select * from pg_publication_rel;
  oid  | prpubid | prrelid | prqual | prattrs
---+-+-++-
 16415 |   16414 |   16388 ||
(1 row)

Here, we can set the publication to TABLE or TABLES FOR SCHEMA. Should
this be allowed?
If a publication is created on FOR ALL T

Re: Conflict detection for update_deleted in logical replication

2025-06-24 Thread shveta malik

>
> Here is the V41 patch set which includes the following changes:
>

Thanks for the patches. Few trivial things:

1)
In ReplicationSlotAcquire(), does it make more sense to move the error
after checking the slot's existence first? If a user is trying to use
a slot which does not exist, he should first get that error instead of
'slot is reserved' error.

2)
When max_replication_slots limit is reached and user is trying to
enable rci for the first time, launcher will give error in log file:

ERROR:  all replication slots are in use
HINT:  Free one or increase "max_replication_slots".
LOG:  background worker "logical replication launcher" (PID 13147)
exited with exit code 1

It is not clear from this message as to what launcher was actually
trying to create.  A log-msg  in CreateConflictDetectionSlot() saying
"Creating conflict-detection slot" may help here.

thanks
Shveta

Re: Simplify VM counters in vacuum code

2025-06-24 Thread Nazir Bilal Yavuz

Hi,

On Tue, 24 Jun 2025 at 18:12, Melanie Plageman
 wrote:
>
> On Tue, Jun 24, 2025 at 9:17 AM Melanie Plageman
>  wrote:
> >
> > On Tue, Jun 24, 2025 at 4:01 AM Nazir Bilal Yavuz  
> > wrote:
> >
> > > I think we do not need to check visibility of the page here, as we
> > > already know that page was not all-visible due to LP_DEAD items. We
> > > can simply increment the vacrel->vm_new_visible_pages and check
> > > whether the page is frozen.
> >
> > My idea with the assert was sort of to codify the expectation that the
> > page couldn't have been all-visible because of the dead items. But
> > perhaps that is obvious. But you are right that the if statement is
> > not needed. Perhaps I ought to remove the asserts as they may be more
> > confusing than helpful.
>
> Thinking about this more, I think it is better without the asserts.
> I've done this in attached v3.

I liked this version more. I agree that the asserts were causing some confusion.

nitpick:
+/* Count the newly all-frozen pages for logging. */

AFAIK, we do not use periods in the one line comments. Other than
that, the patch looks good to me.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Andres Freund

Hi,

On 2025-06-24 03:43:19 +0200, Tomas Vondra wrote:
> FWIW while looking into this, I tried running this under valgrind (on a
> regular 64-bit system, not in the chroot), and I get this report:
> 
> ==65065== Invalid read of size 8
> ==65065==at 0x113B0EBE: pg_buffercache_numa_pages
> (pg_buffercache_pages.c:380)
> ==65065==by 0x6B539D: ExecMakeTableFunctionResult (execSRF.c:234)
> ==65065==by 0x6CEB7E: FunctionNext (nodeFunctionscan.c:94)
> ==65065==by 0x6B6ACA: ExecScanFetch (execScan.h:126)
> ==65065==by 0x6B6B31: ExecScanExtended (execScan.h:170)
> ==65065==by 0x6B6C9D: ExecScan (execScan.c:59)
> ==65065==by 0x6CEF0F: ExecFunctionScan (nodeFunctionscan.c:269)
> ==65065==by 0x6B29FA: ExecProcNodeFirst (execProcnode.c:469)
> ==65065==by 0x6A6F56: ExecProcNode (executor.h:313)
> ==65065==by 0x6A9533: ExecutePlan (execMain.c:1679)
> ==65065==by 0x6A7422: standard_ExecutorRun (execMain.c:367)
> ==65065==by 0x6A7330: ExecutorRun (execMain.c:304)
> ==65065==by 0x934EF0: PortalRunSelect (pquery.c:921)
> ==65065==by 0x934BD8: PortalRun (pquery.c:765)
> ==65065==by 0x92E4CD: exec_simple_query (postgres.c:1273)
> ==65065==by 0x93301E: PostgresMain (postgres.c:4766)
> ==65065==by 0x92A88B: BackendMain (backend_startup.c:124)
> ==65065==by 0x85A7C7: postmaster_child_launch (launch_backend.c:290)
> ==65065==by 0x860111: BackendStartup (postmaster.c:3580)
> ==65065==by 0x85DE6F: ServerLoop (postmaster.c:1702)
> ==65065==  Address 0x7b6c000 is in a rw- anonymous segment
> 
> 
> This fails here (on the pg_numa_touch_mem_if_required call):
> 
> for (char *ptr = startptr; ptr < endptr; ptr += os_page_size)
> {
> os_page_ptrs[idx++] = ptr;
> 
> /* Only need to touch memory once per backend process */
> if (firstNumaTouch)
> pg_numa_touch_mem_if_required(touch, ptr);
> }

That's because we mark unpinned pages as inaccessible / mark them as
accessible when pinning. See logic related to that in PinBuffer():

/*
 * Assume that we acquired a buffer pin for the 
purposes of
 * Valgrind buffer client checks (even in 
!result case) to
 * keep things simple.  Buffers that are unsafe 
to access are
 * not generally guaranteed to be marked 
undefined or
 * non-accessible in any case.
 */


> The 0x7b6c000 is the very first pointer, and it's the only pointer that
> triggers this warning.

I suspect that that's because valgrind combines different reports or such.

Greetings,

Andres Freund

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-24 Thread Dilip Kumar

On Wed, Jun 25, 2025 at 10:57 AM Zhijie Hou (Fujitsu)
 wrote:
>
> Hi,
>
> After commit ca307d5, I noticed another crash when testing
> some other logical replication features.
>
> The server with max_replication_slots set to 0 would crash when executing 
> CHECKPOINT.
>
> TRAP: failed Assert("ReplicationSlotCtl != NULL"), File: "slot.c", Line: 
> 1162, PID: 577315
> postgres: checkpointer (ExceptionalCondition+0x9e)[0xc046cb]
> postgres: checkpointer (ReplicationSlotsComputeRequiredLSN+0x30)[0x99697f]
> postgres: checkpointer (CheckPointReplicationSlots+0x191)[0x997dc1]
> postgres: checkpointer [0x597b1b]
> postgres: checkpointer (CreateCheckPoint+0x6d1)[0x59729e]
> postgres: checkpointer (CheckpointerMain+0x559)[0x93ee79]
> postgres: checkpointer (postmaster_child_launch+0x15f)[0x940311]
> postgres: checkpointer [0x9468b0]
> postgres: checkpointer (PostmasterMain+0x1258)[0x9434f8]
> postgres: checkpointer (main+0x2fe)[0x7f5f9c]
> /lib64/libc.so.6(__libc_start_main+0xe5)[0x7f7585f81d85]
> postgres: checkpointer (_start+0x2e)[0x4958ee]
>
> I think it is trying to access the replication slots when the shared memory
> for them was not allocated.


I do not understand why CheckPointReplicationSlots() calls
ReplicationSlotsComputeRequiredLSN() unconditionally, shouldn't this
be called under the check[1], If not then instead of asserting
Assert("ReplicationSlotCtl != NULL"), this should just return if
ReplicationSlotCtl is NULL, isn't it, because ReplicationSlotCtl is
not allocated if max_replication_slots is 0.


[1]

--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c

@@ -2131,7 +2131,8 @@ CheckPointReplicationSlots(bool is_shutdown)
 * Recompute the required LSN as SaveSlotToPath() updated
 * last_saved_restart_lsn for slots.
 */
-   ReplicationSlotsComputeRequiredLSN();
+   if (max_replication_slots > 0)
+   ReplicationSlotsComputeRequiredLSN();
 }


-- 
Regards,
Dilip Kumar
Google

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Bertrand Drouvot

Hi,

On Tue, Jun 24, 2025 at 10:32:25PM +0200, Tomas Vondra wrote:
> 
> Attached is a minor tweak of the valgrind suppresion rules,

Thanks!

> to add the
> two places touching the memory. I was hoping I could add a single rule
> for pg_numa_touch_mem_if_required, but that does not work - it's a
> macro, not a function. So I had to add one rule for both functions,
> querying the NUMA. That's a bit disappointing, because it means it'll
> hide all other failues (of Memcheck:Addr8 type) in those functions.
>

Shouldn't we add 2 rules for Memcheck:Addr4 too?

> Perhaps it'd be be better to turn pg_numa_touch_mem_if_required into a
> proper (inlined) function, at least with USE_VALGRIND defined.

Yeah I think that's probably better to reduce the scope to what we really want 
to.

> Something
> like the v2 patch -

yeah, maybe:

- add a rule for Memcheck:Addr4?
- have the same parameters name for the macro and the function?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Periodic FSM vacuum doesn't happen in one-pass strategy vacuum.

2025-06-24 Thread Melanie Plageman

On Mon, Jun 9, 2025 at 6:22 PM Melanie Plageman
 wrote:
>
> Reviewing your patch, I think there might be an issue still. You
> replaced has_lpdead_items with ndeleted. While ndeleted will count
> those items we set LP_UNUSED (which is what we want), it also counts
> LP_NORMAL items that vacuum sets LP_REDIRECT (see
> heap_prune_record_redirect()).
>
> Looking at PageGetHeapFreeSpace(), it seems like it only considers
> LP_UNUSED items as freespace. So, if an item is set LP_REDIRECT, there
> would be no need to update the FSM, I think.

I was wrong. Setting an item to LP_REDIRECT does free up space --
because there is no associated storage. PageGetFreeSpace() is
concerned with pd_upper and pd_lower. So after setting an item
LP_REDIRECT, we'll compactify_tuples() in PageRepairFragmentation(),
altering pd_upper/lower. Then future calls to PageGetHeapFreeSpace()
will report this updated number and we'll put that in the freespace
map. So, we can expect setting items LP_REDIRECT will result in new
freespace to report.

(I got confused by some code in PageGetHeapFreeSpace() that was
checking to make sure that, if there were >= MaxHeapTuplesPerPage line
pointers, that at least one of them is LP_UNUSED).

So, I think we should commit the fix you proposed.

The only question I have left is implementation: should we have
ndeleted as an output parameter of lazy_scan_prune() or have
lazy_scan_prune() return it (instead of void)?

In <= 16, heap_page_prune() returned the number of tuples deleted, so
I thought of maybe having lazy_scan_prune() do this. Though, maybe it
is confusing to have one result returned as the return value and the
others returned in output parameters unless there is something more
special about ndeleted. With heap_page_prune(), I think it was the
return value because that was kind of what heap_page_prune()
"accomplished".

- Melanie

Re: Huge commitfest app update upcoming: Tags, Draft CF, Help page, and automated commitfest creat/open/close

2025-06-24 Thread David G. Johnston

On Mon, Jun 16, 2025 at 5:47 AM Jelte Fennema-Nio 
wrote:

> The new PG19 development cycle is starting soon. So that seemed like a
> good excuse to make some big improvements to the commitfest app. My
> plan is to deploy these changes on the 30th of June. So that we can
> start the new cycle fresh with these changes. As always feedback on
> these changes is very welcome. The below will contain some links to
> the staging environment both username and password for the http auth
> are "pgtest".
>
>
Been using this a bit today:

Due to using "Open" for Drafts the tag colors for Pg19-Drafts and PG19-1
are identical.  They need to be different.

When creating a new patch it should either be placed in Drafts first, and
then moved if appropriate, or the user should choose (and be given an
explanation of the decision factors behind that choice) during creation.

The "Help -" tag coloring probably should indicate some kind of blocker, so
a hot color like red/orange/yellow.  Presently it is green, which for many
is an "All good" color.

David J.

Re: [PATCH] Fix OAuth feature detection on OpenBSD+Meson

2025-06-24 Thread Tom Lane

Jacob Champion  writes:
> On Tue, Jun 24, 2025 at 2:50 PM Tom Lane  wrote:
>> Can't say that I find this to be impressive software engineering:
>> rather than having only one probe failure mode to worry about,
>> we have two, depending on whether the compiler knows __has_include().
>> Pretty close to the worst of all possible worlds.

> I did a double-take on the code you posted, but! It looks like they're
> running the preprocessor only. That doesn't seem so bad to me (though
> they could probably do better than calling it a "compile" in the log).

Oh!  Okay, then the two cases should be mostly semantically
equivalent, I think, ie it's just a "does the header exist" test.
There are probably some edge cases where the effects are different,
but nothing that would be likely to matter for us.

regards, tom lane

Re: [PATCH] Fix hostaddr crash during non-blocking cancellation

2025-06-24 Thread Greg Sabino Mullane

01 looks sensible to me.

I like 02 as well. Only quibble would be the name (tcp) as it doesn't
really describe a class of things to be tested like the other things in
PG_TEST_EXTRA. Something indicating a lack of socket? Just more verbose
somehow? "tcp_only" perhaps?

Cheers,
Greg


--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

Re: Remove unneeded check for XLH_INSERT_ALL_FROZEN in heap_xlog_insert

2025-06-24 Thread Tomas Vondra

On 6/24/25 19:41, Melanie Plageman wrote:
> Hi,
> 
> I noticed that 8e03eb92e9a forgot one line when reverting 39b66a91bd.
> There is an extraneous check for XLH_INSERT_ALL_FROZEN_SET in
> heap_xlog_insert() -- even though heap_insert() never freezes tuples.
> It doesn't hurt anything, but I found it confusing, so I think it is
> worth removing. I don't think it's worth backpatching, so I don't know
> if that means that this commit shouldn't go in master until after we
> branch.

Thanks for noticing this. I'd probably be +0.5 to backpatch this, even
if it's ultimately harmless, just to keep the code not confusing.


regards

-- 
Tomas Vondra

Re: Logical Replication of sequences

2025-06-24 Thread shveta malik

On Tue, Jun 24, 2025 at 6:44 PM Shlok Kyal  wrote:
>
>
> 1. Initially, I have created a publication on sequence s1.
> postgres=# CREATE PUBLICATION pub1 FOR ALL SEQUENCES;
> CREATE PUBLICATION
> postgres=# ALTER PUBLICATION pub1 SET TABLE t1;
> ALTER PUBLICATION
> postgres=# \d s1
>  Sequence "public.s1"
>   Type  | Start | Minimum |   Maximum   | Increment | Cycles? | Cache
> +---+-+-+---+-+---
>  bigint | 1 |   1 | 9223372036854775807 | 1 | no  | 1
> Publications:
> "pub1"
> postgres=# select * from pg_publication_rel;
>   oid  | prpubid | prrelid | prqual | prattrs
> ---+-+-++-
>  16415 |   16414 |   16388 ||
> (1 row)
>
> Here, we can set the publication to TABLE or TABLES FOR SCHEMA. Should
> this be allowed?
> If a publication is created on FOR ALL TABLES, such an operation is not 
> allowed.
>

Good catch. IMO, this should not be allowed as currently we strictly
support either ALL SEQUENCES or ALL SEQUENCES with ALL TABLES alone.

thanks
Shveta

Re: Logrep launcher race conditions leading to slow tests

2025-06-24 Thread Amit Kapila

On Tue, Jun 24, 2025 at 9:53 PM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > On Tue, Jun 24, 2025 at 5:26 AM Tom Lane  wrote:
> >> 1. WaitForReplicationWorkerAttach sometimes has to clear a process
> >> latch event so that it can keep waiting for the worker to launch.
> >> It neglects to set the latch again, allowing ApplyLauncherMain
> >> to miss events.
>
> > There was a previous discussion to fix this behavior. Heikki has
> > proposed a similar fix for this, but at the caller. See the patch
> > attached in email [1].
>
> Ah, thank you for that link.  I vaguely recalled that we'd discussed
> this strange behavior before, but I did not realize that anyone had
> diagnosed a cause.  I don't much like any of the patches proposed
> in that thread though --- they seem overcomplicated or off-point.
>
> I do not think we can switch to having two latches here.  The
> only reason WaitForReplicationWorkerAttach needs to pay attention
> to the process latch at all is that it needs to service
> CHECK_FOR_INTERRUPTS() conditions in a timely fashion, which is
> also true in the ApplyLauncherMain loop.  We can't realistically
> expect other processes to signal different latches depending on
> where the launcher is waiting, so those cases have to be triggered
> by the same latch.
>
> However, WaitForReplicationWorkerAttach can't service any
> latch-setting conditions other than those managed by
> CHECK_FOR_INTERRUPTS().  So if CHECK_FOR_INTERRUPTS() returns,
> we have some other triggering condition, which is the business
> of some outer code level to deal with.  Having it re-set the latch
> to allow that to happen promptly after it returns seems like a
> pretty straightforward answer to me.
>
> I do note Heikki's concern about whether we could get rid of the
> sleep-and-retry looping in WaitForReplicationWorkerAttach in
> favor of getting signaled somehow, and I agree with that as a
> possible future improvement.  But I don't especially see why that
> demands another latch; in fact, unless we want to teach WaitLatch
> to be able to wait on more than one latch, it *can't* be a
> separate latch from the one that receives CHECK_FOR_INTERRUPTS()
> conditions.
>

I agree that we don't need another latch, and I find your patch solves
it in the best possible way. The only minor point if you think makes
sense to change is the following comment:
+ /*
+ * If we had to clear a latch event in order to wait, be sure to restore
+ * it before exiting.  Otherwise caller may miss events.
+ */
+ if (dropped_latch)
..

The part of the comment "Otherwise caller may miss events." is clear
to me, but I'm not sure if it would be equally easy for everyone to
understand what the other events code is talking about here. Something
on the lines of what Heikki wrote, " We use the same latch to be
signalled about subscription changes and workers exiting, so we might
have missed some notifications, if those events happened
concurrently." is more specific.

> >> 4. In process_syncing_tables_for_apply (the other caller of
> >> logicalrep_worker_launch), it seems okay to ignore the
> >> result of logicalrep_worker_launch, but I think it should
> >> fill hentry->last_start_time before not after the call.
>
> > With this, won't we end up retrying to launch the worker sooner if the
> > launch took time, but still failed to launch the worker?
>
> That code already does update last_start_time unconditionally, and
> I think that's the right behavior for the same reason that it's
> right for ApplyLauncherMain to do ApplyLauncherSetWorkerStartTime
> whether or not logicalrep_worker_launch succeeds.  If the worker
> launch fails, we don't want to retry instantly, we want to wait
> wal_retrieve_retry_interval before retrying.  My desire to change
> this code is just based on the idea that it's not clear what else
> if anything looks at this hashtable, and by the time that
> logicalrep_worker_launch returns the system state could be a lot
> different.  (For instance, the worker could have started and
> failed already.)  So, just as in ApplyLauncherMain, I'd rather
> store the start time before calling logicalrep_worker_launch.
>
> BTW, it strikes me that if we're going to leave
> process_syncing_tables_for_apply() ignoring the result of
> logicalrep_worker_launch, it'd be smart to insert an explicit
> (void) cast to show that that's intentional.  Otherwise Coverity
> is likely to complain about how we're ignoring the result in
> one place and not the other.
>

Sounds reasonable.

-- 
With Regards,
Amit Kapila.

Re: queryId constant squashing does not support prepared statements

2025-06-24 Thread Michael Paquier

On Tue, Jun 24, 2025 at 07:45:15PM +0200, Alvaro Herrera wrote:
> +   /*
> +* If we have an external param at this location, but no lists are
> +* being squashed across the query, then we skip here; this will make
> +* us print print the characters found in the original query that
> +* represent the parameter in the next iteration (or after the loop is
> +* done), which is a bit odd but seems to work okay in most cases.
> +*/
> +   if (jstate->clocations[i].extern_param && !jstate->has_squashed_lists)
> +   continue;

+* us print print the characters found in the original query that

The final commit includes this comment, with a s/print print/print/
required.

> and if I understand this correctly, the reason is that the query is
> being executed from an SQL function,
> 
> CREATE FUNCTION PLUS_ONE(i INTEGER) RETURNS INTEGER AS
> $$ SELECT (i + 1.0)::INTEGER LIMIT 1 $$ LANGUAGE SQL;

Right, with two executions of PLUS_ONE() making it a single entry with
calls=2 .

> so the 'i' is actually a parameter, so it makes sense that it gets
> turned into a parameter in the normalized query.  With the 'if' test I
> mentioned above, we print it as 'i' literally only because we
> 'continued' and the next time through the loop we print the text from
> the original query.  This may be considered not entirely correct ...
> note that the constants in the query are shown as parameters, and that
> the numbers do not start from 1.  (Obviously, a few other queries also
> change.)

What you have committed is also consistent with the decision in v17
and older branches.  The current result looks OK to me for v18.

> I added one more commit, which iterates to peel as many layers of
> CoerceViaIO and RelabelType as there are around an expression.  I
> noticed this lack while reading Sami's patch for that function, which
> just simplified the coding therein.  For normal cases this would iterate
> just once (because repeated layers of casts are likely very unusual), so
> I think it's okay to do that.  There was some discussion about handling
> this via recursion but it died inconclusively; I added recursive
> handling of FuncExpr's arguments in 0f65f3eec478, which was different,
> but I think handling this case by iterating is better than recursing.

Agreed.  I was actually wondering about the logic of
IsSquashableConstant() when it came to RelabelType and CoerceViaIO.
The order of the scans was expected but just going back to the top of
IsSquashableConstant() for these two nodes makes the code easier to
follow.  So agreed that your change is an improvement.

> With these commits, IMO the open item can now be closed.  Even if we
> ultimately end up reverting any of this, we would probably punt support
> of Params to pg19, so the open item would be gone anyway.

Yes.

> Lastly, I decided not to do a catversion bump.  As far as I can tell,
> changes in the jumbling functions do not need them.  I tried an
> 'installcheck' run with a datadir initdb'd with the original code, and
> it works fine.

This reminds me of 4c7cd07aa62a and this thread:
https://www.postgresql.org/message-id/1364409.1727673...@sss.pgh.pa.us

Doesn't the change in the Param structure actually require one because
it can change the representation of some SQL functions?  I am not
completely sure.

> I also tried an 'installcheck' with pg_stat_statements
> installed, and was surprised to realize that the Query Id reports in
> EXPLAIN make a large number of tests fail.  If I take those lines from
> the original code into the expected output, and then run the tests with
> the new code, I notice that a few queries have changed queryId.  I
> suppose this was to be expected, and shouldn't harm anything.

I don't think we've put a lot of work in trying to make installcheck
work with PGSS (never tried it TBH), so I am not surprised to see some
failures when one tries this mode.
--
Michael


signature.asc
Description: PGP signature

Re: array_in sub function ReadArrayDimensions error message

2025-06-24 Thread David G. Johnston

On Tue, Jul 9, 2024 at 10:31 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

>
> That said, it isn’t making it back to us if our users are actually having
> this confusion and would benefit meaningfully from such a hint.
>
>
None of us ever put forth an actual patch for this so it seems to have been
forgotten about.  It seemed we agree some change is needed here, just not
exactly what.

I was thinking of adding a tracking entry in Drafts, even without the
patch, but then realized I didn't start this thread.  So I decided to jump
ping it instead.

David J.

Re: Safeguards against incorrect fd flags for fsync()

2025-06-24 Thread Michael Paquier

On Tue, Jun 24, 2025 at 07:51:08AM +0200, Michael Banck wrote:
> I got it working, I had to rebuild gnumach with --enable-apic in order
> to get HPET. With that, the regular build-farm checks (check/
> installcheck in contrib, src/test/regress and src/test/isolation) pass
> without patches to testsuite timings.

How many custom patches did you have to apply to the backend to make
these suites work on this platform?
--
Michael


signature.asc
Description: PGP signature

Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin

2025-06-24 Thread Melanie Plageman

On Mon, Jun 23, 2025 at 11:49 PM John Naylor  wrote:
>
> On Tue, Jun 24, 2025 at 5:30 AM Melanie Plageman
>  wrote:
> > Attached v3 has all of the above. I think the only thing that is
> > needed to be changed for the backpatch to 17 is removing
> > io_combine_limit.
>
> Looks good to me.

Okay, pushed to 17 (which does have io_combine_limit, I forgot) and
master. Now, let's see if the buildfarm is happy. Thanks for all your
help!

- Melanie

Re: Introduce New Command Processing Chapter to Documentation

2025-06-24 Thread David G. Johnston

On Tue, Mar 11, 2025 at 3:52 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> The attached patch implements the above.  The WindowAgg code is just my
> proposal from the other thread and wouldn't be part of the initial commit.
> Just the mechanical refactoring bits.
>
>
I've added this to Drafts and put on what I thought were relevant tags and
status given that:
It only touches documentation.
I need feedback on what it does before moving forward.
Hence, Draft, Docs Only, Help - Docs, Need Review

David J.

Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

2025-06-24 Thread Peter Smith

On Fri, Jun 20, 2025 at 4:09 AM Timur Magomedov
 wrote:
>
> Hello Peter!
>
> Thank you for working on VCI updates.
> Here are some proposals for small improvements:

Hi Timur, Thanks for your feedback!

The newly posted v9-0002 addresses some of your comments. Details are below.

>
> Since hothash feature lives in separate patch now, vci_hothash.o should
> be removed from vci/executor/Makefile of VCI-module-main patch.

Fixed. Thanks. I also fixed a similar problem and removed vci_mp_rle.o
from storage/Makefile.

>
> 0001-Avoid-magic-numbers-in-vci_is_supported_type.patch
> I've looked at vci_supported_types.c and tried to rewrite it without
> using magic constants. Removed call to vci_check_supported_types() from
> SQL code since there is no need now to check that OID constants still
> match same types.
> Using defined constants for OIDs seems more robust than plain numbers.
> There were 23 type OIDs supported by VCI, all of them are there in a
> new version of code. I've replaced binary search by simple switch-case
> since compilers optimize it into nice binary decision tree. Previous
> version was binary searching among 87 structs. New version only
> searches among 23 integers.

Hmm. IIUC the “supported types” code was written this way for the
following reason. Out of an over-abundance (?) of caution, the
original patch authors wanted to call the function
'vci_check_supported_types' as a compatibility check, to discover
up-front (during CREATE EXTENSION) if any of the current PostgreSQL
OID definitions differ in any way since when the VCI extension was
built.
e.g.
- if vci build supports the type oid, but the oid value in current
PostgreSQL no longer exists then raise an error
- if vci build supports the type oid but the oid now has a different
meaning (typname mismatch) then raise an error

In other words, even though these type OIDs are in a range that is
supposed to be stable/unlikely to change, I think original VCI
developers were deliberately cautious about any chance of OID values
changing in later PostgreSQL versions.

So, for the compatibility checking we still need to keep that function
'vci_check_supported_types', and therefore we also still need to keep
the struct because that is where the known oid/name mapping (at time
of VCI extension build) is held which is used for the checking.

The current code is written so that when building a new VCI you only
need to execute:
"SELECT oid, typname FROM pg_type WHERE typnamespace = 11 AND typrelid
= 0 AND typelem = 0 ORDER BY oid;"
for the current PostgreSQL then substitute in the necessary true/false
for support.

I agree this means the resulting vci_is_supported_type() is not as
efficient as your implementation, but OTOH it is perhaps easier to
maintain and check against new PostgreSQL releases?

Also, you'll encounter all the same problems with the supported
functions logic of vci_supported_funcs.c -- those have similar logic,
so maybe it is better to keep them similar despite inefficiencies?

FYI, I have re-executed the SQL
SELECT oid, typname FROM pg_type WHERE typnamespace = 11 AND typrelid
= 0 AND typelem = 0 ORDER BY oid;
and this exposed a few changes since whenever this
vci_supported_type_table[] was last generated many years ago.

~~

AFAICT we could implement the vci_supported_type_table[] to *only*
include the types where supported is “true”. That would be more
efficient than now because then the entire array would only have 20ish
elements, but comes at a cost of perhaps being less easy to compare
with the executed SQL result.

Thoughts?

Also, while the vci_supported_type_table[]  lookup is needed for the
compatibility check logic, I guess we could implement the
vci_is_supported_type()  function exactly the same as your patch
switch code, to be called for the runtime checking (during CREATE
INDEX). That would be more efficient at runtime, but comes at a cost
of then having 2 functions to maintain.

Thoughts?

~~~

So, for now, I have only done the following:
- Updated vci_supported_type_table[] according to the current SQL result.
- Notice that “name” (NAMEOID) no longer qualifies as a VCI supported
type (because typelem is not 0 anymore) so the test code was modified
to remove the name column “c04”.
- In passing, I removed the costings from the EXPLAIN test output

>
> 0002-Updated-supported-funcs-SQL-for-recent-PostgreSQL.patch
> I wonder if it is possible to rewrite vci_supported_funcs.c in a
> similar way. There is a vci_supported_funcs.sql which I updated so it
> can be executed at master version of PostgreSQL.

OK. I included your changes in v9 and also added some IF EXISTS so the
.sql file can be run repeatedly without error. I also changed the
matching comment in the supported_funcs.c code where the SQL of this
file was described.

> I don't know if it is intentional, but safe_types list from
> vci_supported_funcs.sql have some differences to the types list from
> vci_supported_types.c. Specifically, it doesn't include varchar, v

Re: pg_dump --with-* options

2025-06-24 Thread Fujii Masao





On 2025/06/25 5:07, Robert Haas wrote:

On Tue, Jun 24, 2025 at 12:48 PM Nathan Bossart
 wrote:

On Mon, Jun 23, 2025 at 01:38:10PM -0400, Robert Haas wrote:

I had thought we had a consensus that pg_upgrade should preserve stats
but regularly pg_dump shouldn't include them; perhaps I misunderstood
or that changed.


I think it's a bit of both.  I skimmed through the past discussions and
found that not only was there a rough consensus in 2024 that stats _should_
be on by default [0], but also that an updated vote tally didn't show much
of a consensus at all [1].  Like you, I thought we had pretty much closed
that door, but the aforementioned analysis along with further discussion
has me convinced that we might want to reconsider [2].


Well, I don't know: I still think that's the right answer, so I don't
really want to reconsider, but I understand that I'm not in charge
here.


For the record, my vote is: default "off" for pg_dump and pg_dumpall,
and "on" for pg_restore.

For pg_dump and pg_dumpall, I agree with Jeff's idea in [1],
but if the statistics is skipped by default, I don't think
we need a --no-statistics option. So, here's how I think
the options should work:

* Keep: --schema-only, --data-only, --statistics-only, --no-schema, 
--no-data, -and -statistics
* Remove: --no-statistics, --with-schema, and --with-data
* Combinations:
Schema + Data + Stats : --statistics
Schema + Data : (default)
Schema + Stats: --no-data --statistics
Data + Stats  : --no-schema --statistics
Schema only   : --schema-only   (or --no-data)
Data only : --data-only (or --no-schema)
Stats only: --statistics-only (or --no-schema --no-data 
--statistics)

As I mentioned in [2], if we treat --statistics in the similar way to
--sequence-data, i.e., allow --statistics to be used with --schema-only
or --data-only, we could simplify further:

* Keep: --schema-only, --data-only, --statistics-only, and --statistics
* Remove: --no-schema, --no-data, --no-statistics, --with-schema, and 
--with-data
* Combinations:
Schema + Data + Stats : --statistics
Schema + Data : (default)
Schema + Stats: --schema-only --statistics
Data + Stats  : --data-only --statistics
Schema only   : --schema-only
Data only : --data-only
Stats only: --statistics-only

Some may find this confusing due to mixing --statistics with --schema-only
or --data-only, so I understand if there's hesitation.

For pg_restore, I believe there's agreement to restore statistics
by default if they exist in the archive. So:

* Keep: --schema-only, --data-only, --statistics-only, --no-schema, 
--no-data, and --no-statistics
* Remove: --with-schema, --with-data, and --statistics
* Combinations:
Schema + Data + Stats : (default)
Schema + Data : --no-statistics
Schema + Stats: --no-data
Data + Stats  : --no-schema
Schema only   : --schema-only   (or --no-data 
--no-statistics)
Data only : --data-only (or --no-schema 
--no-statistics)
Stats only: --statistics-only (or --no-schema --no-data)

Thought?

Regards,

[1] 
https://postgr.es/m/031558c60e84362898922caa6a90587e7fdf2a57.ca...@j-davis.com
[2] https://postgr.es/m/94f89b0a-5d83-4a67-9092-50ba39134...@oss.nttdata.com

--
Fujii Masao
NTT DATA Japan Corporation

Re: Simplify VM counters in vacuum code

2025-06-24 Thread Melanie Plageman

Thanks for the review!

On Tue, Jun 24, 2025 at 12:12 AM Masahiko Sawada  wrote:
>
> On Tue, Jun 24, 2025 at 4:21 AM Melanie Plageman
>  wrote:
>
> The flags is initialized as:
>
> uint8   flags = VISIBILITYMAP_ALL_VISIBLE;
>
> so the new if-condition is always true.

Yep, this was a mistake. I pulled this patch out of a larger set in
which I moved setting these counters outside of the
heap_page_is_all_visible() == true branch. Attached v2 fixes this.

> The patch removes if statements and adds some assertions, which seems
> to be a refactoring to me rather than a fix. I think we need to
> consider whether it's really okay to apply it to v18.

The reason I consider it a fix is that the if statement is confusing
-- it makes the reader think it is possible that the VM page was
already all-visible/frozen. In the other cases where we set the VM
counters, that is true. But in the case of lazy_vacuum_heap_page(), it
would not be correct for the page to have been all-visible because it
contained LP_DEAD items. And in the case of an empty page where
PD_ALL_VISIBLE was clear, the VM bits cannot have been set (because
the page bit must be set if the VM bit is set).

We could remove the asserts, as we rely on other code to enforce these
invariants. So, here the asserts would only really be protecting from
code changes that make it so the counters are no longer correctly
counting newly all-visible pages -- which isn't critical to get right.

- Melanie
From 93d73a4f9d499d58b27d5aed6f4c15fed3b79e45 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Wed, 18 Jun 2025 16:12:15 -0400
Subject: [PATCH v2] Simplify vacuum VM update logging counters

We can simplify the VM counters added in dc6acfd910b8 to
lazy_vacuum_heap_page() and lazy_scan_new_or_empty().

We won't invoke lazy_vacuum_heap_page() unless there are dead line
pointers, so we know the page can't be all-visible.

In lazy_scan_new_or_empty(), we only update the VM if the page-level
hint PD_ALL_VISIBLE is clear, and the VM bit cannot be set if the page
level bit is clear because a subsequent page update would fail to clear
the visibility map bit.

Simplify the logic for determining which log counters to increment based
on this knowledge.

Author: Melanie Plageman 
Reviewed-by: Nazir Bilal Yavuz 
Reviewed-by: Masahiko Sawada 
Discussion: https://postgr.es/m/flat/CAAKRu_a9w_n2mwY%3DG4LjfWTvRTJtjbfvnYAKi4WjO8QXHHrA0g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 37 ++--
 1 file changed, 13 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 09416450af9..1a8326b7750 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1900,17 +1900,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		   VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
-			/*
-			 * If the page wasn't already set all-visible and/or all-frozen in
-			 * the VM, count it as newly set for logging.
-			 */
-			if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-			{
-vacrel->vm_new_visible_pages++;
-vacrel->vm_new_visible_frozen_pages++;
-			}
-			else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0)
-vacrel->vm_new_frozen_pages++;
+			/* VM bits cannot have been set if PD_ALL_VISIBLE was clear */
+			Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
+			(void) old_vmbits; /* Silence compiler */
+			/* Count the newly all-frozen pages for logging. */
+			vacrel->vm_new_visible_pages++;
+			vacrel->vm_new_visible_frozen_pages++;
 		}
 
 		freespace = PageGetHeapFreeSpace(page);
@@ -2930,20 +2925,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	   vmbuffer, visibility_cutoff_xid,
 	   flags);
 
-		/*
-		 * If the page wasn't already set all-visible and/or all-frozen in the
-		 * VM, count it as newly set for logging.
-		 */
-		if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-		{
-			vacrel->vm_new_visible_pages++;
-			if (all_frozen)
-vacrel->vm_new_visible_frozen_pages++;
-		}
+		/* We know the page should not have been all-visible */
+		Assert((old_vmbits & VISIBILITYMAP_VALID_BITS) == 0);
+		(void) old_vmbits; /* Silence compiler */
 
-		else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- all_frozen)
-			vacrel->vm_new_frozen_pages++;
+		/* Count the newly set VM page for logging */
+		vacrel->vm_new_visible_pages++;
+		if (all_frozen)
+			vacrel->vm_new_visible_frozen_pages++;
 	}
 
 	/* Revert to the previous phase information for error traceback */
-- 
2.34.1

Re: Fixes inconsistent behavior in vacuum when it processes multiple relations

2025-06-24 Thread Michael Paquier

On Tue, Jun 24, 2025 at 11:30:13AM -0500, Nathan Bossart wrote:
> On Tue, Jun 24, 2025 at 10:18:18AM +0900, Michael Paquier wrote:
> > Knowing that I'm indirectly responsible for this mess, I would like to
> > take care of that myself.  Would that be OK for you?
> 
> I would be grateful if you took care of it.

Okay, applied the main fix down to v13 then.

> +1 for this.  I did something similar to verify the other bugs I reported,
> and this seems far easier to maintain than potentially-unstable tests that
> require lots of setup and that depend on secondary effects.

After sleeping on this point, I have finished with this idea for tests
down to v17, adding more coverage for v18 with the GUC vacuum_truncate
and the "auto" mode of index_cleanup, while on it.  v16 and older
branches don't have tests, but I have checked the sanity of the two
code paths using the tests attached for vacuum.sql and
pgstattuple.sql, at least.  Two truncation tests for the "false" cases
were actually unstable.  Sometimes the truncation may not happen even
with an aggressive VACUUM.  Anyway, it is enough to check the true
case for the two code paths patched, even if it may not happen at
100%.  I did not notice a problem with the index_cleanup case as long
as we had enough pages in the TOAST btree index, requiring some
runtime to load the entries, so the cost was annoying.

Attached is the remaining patch for HEAD, planned once v19 opens, and
the tests I have used on the back-branches as a txt to not confuse the
CI, for reference.
--
Michael
From 32b561d88e7480cd0ab9f79443fbacd4f31dc53c Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 25 Jun 2025 10:22:08 +0900
Subject: [PATCH v5] Refactor handling of VacuumParams

This adds a couple of const markers, while removing pointer uses of
VacuumParams to force the following policy among all the VACUUM-related
calls: do not touch the values given by the caller.

This patch is for HEAD.

Author: Michael P.
---
 src/include/access/heapam.h  |   4 +-
 src/include/access/tableam.h |   6 +-
 src/include/commands/vacuum.h|   6 +-
 src/backend/access/heap/vacuumlazy.c |  44 +-
 src/backend/commands/analyze.c   |  26 +++---
 src/backend/commands/cluster.c   |   2 +-
 src/backend/commands/vacuum.c| 118 ---
 src/backend/postmaster/autovacuum.c  |   2 +-
 8 files changed, 98 insertions(+), 110 deletions(-)

diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3a9424c19c9a..a2bd5a897f87 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -21,6 +21,7 @@
 #include "access/skey.h"
 #include "access/table.h"		/* for backward compatibility */
 #include "access/tableam.h"
+#include "commands/vacuum.h"
 #include "nodes/lockoptions.h"
 #include "nodes/primnodes.h"
 #include "storage/bufpage.h"
@@ -396,9 +397,8 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	  OffsetNumber *unused, int nunused);
 
 /* in heap/vacuumlazy.c */
-struct VacuumParams;
 extern void heap_vacuum_rel(Relation rel,
-			struct VacuumParams *params, BufferAccessStrategy bstrategy);
+			const VacuumParams params, BufferAccessStrategy bstrategy);
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8713e12cbfb9..1c9e802a6b12 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -20,6 +20,7 @@
 #include "access/relscan.h"
 #include "access/sdir.h"
 #include "access/xact.h"
+#include "commands/vacuum.h"
 #include "executor/tuptable.h"
 #include "storage/read_stream.h"
 #include "utils/rel.h"
@@ -36,7 +37,6 @@ extern PGDLLIMPORT bool synchronize_seqscans;
 struct BulkInsertStateData;
 struct IndexInfo;
 struct SampleScanState;
-struct VacuumParams;
 struct ValidateIndexState;
 
 /*
@@ -645,7 +645,7 @@ typedef struct TableAmRoutine
 	 * integrate with autovacuum's scheduling.
 	 */
 	void		(*relation_vacuum) (Relation rel,
-	struct VacuumParams *params,
+	const VacuumParams params,
 	BufferAccessStrategy bstrategy);
 
 	/*
@@ -1664,7 +1664,7 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
  * routine, even if (for ANALYZE) it is part of the same VACUUM command.
  */
 static inline void
-table_relation_vacuum(Relation rel, struct VacuumParams *params,
+table_relation_vacuum(Relation rel, const VacuumParams params,
 	  BufferAccessStrategy bstrategy)
 {
 	rel->rd_tableam->relation_vacuum(rel, params, bstrategy);
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index bc37a80dc74f..14eeccbd7185 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -336,7 +336,7 @@ extern PGDLLIMPORT int64 parallel_vacuum_worker_delay_ns;
 
 /* in commands/vacuum.c */
 extern void ExecVacuum(ParseState *pstate, VacuumStmt *v

Re: Make COPY format extendable: Extract COPY TO format implementations

2025-06-24 Thread Masahiko Sawada

On Tue, Jun 24, 2025 at 2:11 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Tue, 24 Jun 2025 11:59:17 +0900,
>   Masahiko Sawada  wrote:
>
> >> 1. This provides 2 registration APIs
> >>(RegisterCopy{From,To}Routine(name, routine)) instead of
> >>1 registration API (RegisterCopyFormat(name,
> >>from_routine, to_routine)).
> >>
> >>It's for simple implementation and easy to extend without
> >>breaking APIs in the future. (And some formats may
> >>provide only FROM routine or TO routine.)
> >>
> >>Is this design acceptable?
> >
> > With the single registration API idea, we can register the custom
> > format routine that supports only FROM routine using the API like:
> >
> > RegisterCopyRoutine("new-format", NewFormatFromRoutine, NULL);
> >
> > Compared to this approach, what points do you think having separate
> > two registration APIs is preferable in terms of extendability and API
> > compatibility?
>
> It's natural to add more related APIs with this
> approach. The single registration API provides one feature
> by one operation. If we use the RegisterCopyRoutine() for
> FROM and TO formats API, it's not natural that we add more
> related APIs. In this case, some APIs may provide multiple
> features by one operation and other APIs may provide single
> feature by one operation. Developers may be confused with
> the API. For example, developers may think "what does mean
> NULL here?" or "can we use NULL here?" for
> "RegisterCopyRoutine("new-format", NewFormatFromRoutine,
> NULL)".

We can document it in the comment for the registration function.

>
>
> >I think it would be rather confusing for example if
> > each COPY TO routine and COPY FROM routine is registered by different
> > extensions with the same format name.
>
> Hmm, I don't think so. Who is confused by the case? DBA?
> Users who use COPY? Why is it confused?
>
> Do you assume the case that the same name is used for
> different format? For example, "json" is used for JSON Lines
> for COPY FROM and and normal JSON for COPY TO by different
> extensions.

Suppose that extension-A implements only CopyToRoutine for the
custom-format-X with the format name 'myformat' and extension-B
implements only CopyFromRoutine for the custom-format-Y with the same
name, if users load both extension-A and extension-B, it seems to me
that extension-A registers the custom-format-X format as 'myformat'
only with CopyToRoutine, and extension-B overwrites the 'myformat'
registration by adding custom-format-Y's CopyFromRoutine. However, if
users register extension-C that implements both routines with the
format name 'myformat', they can register neither extension-A nor
extension-B, which seems to me that we don't allow overwriting the
registration in this case.

I think the core issue appears to be the internal management of custom
format entries but the current patch does enable registration
overwriting in the former case (extension-A and extension-B case).

>
> >>FYI: RegisterCopy{From,To}Routine() uses the same logic
> >>as RegisterExtensionExplainOption().
> >
> > I'm concerned that the patch has duplicated logics for the
> > registration of COPY FROM and COPY TO.
>
> We can implement a convenient routine that can be used for
> RegisterExtensionExplainOption() and
> RegisterCopy{From,To}Routine() if it's needed.

I meant there are duplicated codes in COPY FROM and COPY TO. For
instance, RegisterCopyFromRoutine() and RegisterCopyToRoutine() have
the same logic.

>
> >> 3. I want to register the built-in COPY {FROM,TO} routines
> >>in the PostgreSQL initialization phase. Where should we
> >>do it? In 0002, it's done in InitPostgres() but I'm not
> >>sure whether it's a suitable location or not.
> >
> > InitPostgres() is not a correct function as it's a process
> > initialization function. Probably we don't necessarily need to
> > register the built-in formats in the same way as custom formats. A
> > simple solution would be to have separate arrays for built-in formats
> > and custom formats and have the GetCopy[To|From]Routine() search both
> > arrays (built-in array first).
>
> We had a discussion that we should dog-food APIs:
>
> https://www.postgresql.org/message-id/flat/CAKFQuwaCHhrS%2BRE4p_OO6d7WEskd9b86-2cYcvChNkrP%2B7PJ7A%40mail.gmail.com#e6d1cdd04dac53eafe34b784ac47b68b
>
> > We should (and usually do) dog-food APIs when reasonable
> > and this situation seems quite reasonable.
>
> In this case, we don't need to dog-food APIs, right?

Yes, I think so.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: getting "shell command argument contains a newline or carriage return:" error with pg_dumpall when db name have new line in double quote

2025-06-24 Thread Mahendra Singh Thalor

On Fri, 11 Apr 2025 at 20:17, Nathan Bossart 
wrote:
>
> On Thu, Apr 10, 2025 at 11:58:41PM +0530, Mahendra Singh Thalor wrote:
> > As per above discussions, for v18, we will not do any change to server
> > side to fix the issue of \n\r in database names. But as a cleanup
> > patch, we can give an alert to the user by "pg_upgrade --check". As
> > per current code, pg_dump and pg_upgrade will fail with "shell
> > command" error but in the attached patch, we will give some extra info
> > to the user by "pg_upgrade --check" so that they can fix database
> > names before trying to upgrade.
> >
> > Here, I am attaching a patch which will give a list of invalid
> > database names in "pg_upgrade --check". We can consider this as a
> > cleanup patch.
>
> Are you proposing this for v18?  I think this is all v19 material at this
> point.  Perhaps we could argue this is a bug fix that should be
> back-patched, but IMHO that's a bit of a stretch.  I don't sense a
> tremendous amount of urgency, either.
>
> --
> nathan

Thanks Nathan for the review.

I want to re-start this thread for v19. I posted v06* patches in my
previous updates[1]. Please someone review it and let me know feedback.

v06 patches: v06_patches


[1] :
https://www.postgresql.org/message-id/CAKYtNAqC5pkjmh8UgvbNLtMyEVeKUtDF3_%2B9dvG9zb8YrhTJQQ%40mail.gmail.com

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: Logical Replication of sequences

2025-06-24 Thread Nisha Moond

On Sun, Jun 22, 2025 at 8:05 AM vignesh C  wrote:
>
> Thanks for the comment, the attached v20250622 version patch has the
> changes for the same.
>

Thanks for the patches, please find my review comments for patches 001 and 002:

1) patch-001 :pg_sequence_state()

+ /* open and lock sequence */
+ init_sequence(seq_relid, &elm, &seqrel);

/ open/ Open
~~~

patch-0002:

2)
- /* FOR ALL TABLES requires superuser */
- if (stmt->for_all_tables && !superuser())
- ereport(ERROR,
- (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- errmsg("must be superuser to create FOR ALL TABLES publication")));
+ if (!superuser())

I think we can retain the original comment here with required modification :
  /* FOR ALL TABLES and FOR ALL SEQUENCES requires superuser */
~~~

3) ALL TABLES vs ALL SEQUENCES

For the command:
  CREATE PUBLICATION FOR ALL TABLES, [...]
 - only "SEQUENCES" is allowed after ',', and TABLE or TABLES IN
SCHEMA are not allowed. Which aligns with the fact that "all tables"
is inclusive of "FOR TABLE" and "FOR TABLES IN SCHEMA".
Therefore, adding and dropping of tables from a "ALL TABLES"
publication is also not allowed. e.g.,

```
postgres=# alter publication test add table t1;
ERROR:  publication "test" is defined as FOR ALL TABLES
DETAIL:  Tables cannot be added to or dropped from FOR ALL TABLES publications.

postgres=# alter publication test add tables in schema public;
ERROR:  publication "test" is defined as FOR ALL TABLES
DETAIL:  Schemas cannot be added to or dropped from FOR ALL TABLES publications.
```

However, for ALL SEQUENCES, the behavior seems inconsistent. CREATE
PUBLICATION doesn’t allow adding TABLE or TABLES IN SCHEMA along with
ALL SEQUENCES, but ALTER PUBLICATION does.
e.g., for a all sequence publication 'pubs', below succeeds :
postgres=# alter publication pubs add table t1;
ALTER PUBLICATION

Is this expected?
If adding tables is allowed using ALTER PUBLICATION, perhaps it should
also be permitted during CREATE PUBLICATION or disallowed in both
cases. Thoughts?
~~~

4) Consider a publication 'pubs' on all sequences and 'n1' is a sequence, now -

postgres=# alter publication pubs drop table n1;
ERROR:  relation "n1" is not part of the publication

This error message can be misleading, as the relation n1 is part of
the publication - it's just a sequence, not a table.
It might be more accurate to add a DETAIL line similar to ADD case:

postgres=# alter publication pubs add table n1;
ERROR:  cannot add relation "n1" to publication
DETAIL:  This operation is not supported for sequences.
~~~

--
Thanks.
Nisha

Improve pg_sync_replication_slots() to wait for primary to advance

2025-06-24 Thread Ajin Cherian

Hello,

Creating this thread for a POC based on discussions in thread [1].
Hou-san had created this patch, and I just cleaned up some documents,
did some testing and now sharing the patch here.

In this patch, the pg_sync_replication_slots() API now waits
indefinitely for the remote slot to catch up. We could later add a
timeout parameter to control maximum wait time if this approach seems
acceptable. If there are more ideas on improving this patch, let me
know.

regards,
Ajin Cherian
[1]  - 
https://www.postgresql.org/message-id/CAF1DzPWTcg%2Bm%2Bx%2BoVVB%3Dy4q9%3DPYYsL_mujVp7uJr-_oUtWNGbA%40mail.gmail.com


0001-Fix-stale-snapshot-issue.patch
Description: Binary data

Re: pg18: Virtual generated columns are not (yet) safe when superuser selects from them

2025-06-24 Thread Peter Eisentraut


On 23.06.25 18:11, jian he wrote:

seems we didn't check the ALTER TABLE case.

CREATE TYPE double_int as (a int, b int);
CREATE TABLE y (a int);
alter table y add column b double_int GENERATED ALWAYS AS ((a * 2, a *
3)) VIRTUAL;

in ATExecAddColumn, we can change it to:
 CheckAttributeType(NameStr(attribute->attname),
attribute->atttypid, attribute->attcollation,
list_make1_oid(rel->rd_rel->reltype),
(attribute->attgenerated ==
ATTRIBUTE_GENERATED_VIRTUAL ? CHKATYPE_IS_VIRTUAL : 0));


Yes, this is an existing fault separate from this patch.  I have pushed 
a fix for this along these lines.

Re: Safeguards against incorrect fd flags for fsync()

2025-06-24 Thread Michael Banck

Hi,

On Wed, Jun 25, 2025 at 08:36:01AM +0900, Michael Paquier wrote:
> On Tue, Jun 24, 2025 at 07:51:08AM +0200, Michael Banck wrote:
> > I got it working, I had to rebuild gnumach with --enable-apic in order
> > to get HPET. With that, the regular build-farm checks (check/
> > installcheck in contrib, src/test/regress and src/test/isolation) pass
> > without patches to testsuite timings.
> 
> How many custom patches did you have to apply to the backend to make
> these suites work on this platform?

Just those two (i.e. the one I posted in this thread and one adopted
from the current Debian package and discussed in [1]):

https://github.com/postgres/postgres/compare/master...mbanck:postgres:hurd-port

I am going to post them again for the next commitfest.


Michael

[1] 
https://www.postgresql.org/message-id/6846e0c3.df0a0220.39ef9b.c60e%40mx.google.com

Re: SQL:2023 JSON simplified accessor support

2025-06-24 Thread jian he

hi.

in src/backend/catalog/sql_features.txt
should we mark any T860, T861, T862, T863, T864
items as YES?


typedef struct SubscriptingRef
{
/* expressions that evaluate to upper container indexes */
List   *refupperindexpr;
}
SubscriptingRef.refupperindexpr meaning changed,
So the above comments also need to be changed?


v11-0004-Extract-coerce_jsonpath_subscript.patch
+ /*
+ * We known from can_coerce_type that coercion will succeed, so
+ * coerce_type could be used. Note the implicit coercion context, which is
+ * required to handle subscripts of different types, similar to overloaded
+ * functions.
+ */
+ subExpr = coerce_type(pstate,
+  subExpr, subExprType,
+  targetType, -1,
+  COERCION_IMPLICIT,
+  COERCE_IMPLICIT_CAST,
+  -1);
+ if (subExpr == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("jsonb subscript must have text type"),

the targetType can be "integer", then the error message
errmsg("jsonb subscript must have text type") would be wrong?
Also this error handling is not necessary.
since we can_coerce_type already tell us that coercion will succeed.
Also, do we really put v11-0004 as a separate patch?


in gram.y we have:
if (!IsA($5, A_Const) ||
castNode(A_Const, $5)->val.node.type != T_String)
ereport(ERROR,
errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("only string constants are
supported in JSON_TABLE path specification"),
so simply, in make_jsonpath_item_expr we can

expr = transformExpr(pstate, expr, pstate->p_expr_kind);
if (!IsA(expr, Const) || ((Const *) expr)->consttype != INT4OID)
ereport(ERROR,
errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("only integer constants are supported in jsonb
simplified accessor subscripting"),
parser_errposition(pstate, exprLocation(expr)));

because I think the current error message "got type: unknown" is not good.
select ('123'::jsonb).a['1'];
ERROR:  jsonb simplified accessor supports subscripting in type: INT4,
got type: unknown
then we don't need two "ereport(ERROR" within make_jsonpath_item_expr
we can also Assert (cnst->constisnull) is false.
see gram.y:16976


I saw you introduce the word "AST", for example
"Converts jsonpath AST into jsonpath value in binary."
I am not sure that is fine.


in jsonb_subscript_make_jsonpath we have:
+ foreach(lc, *indirection)
+ {
+
+if (IsA(accessor, String))
+ 
+else if (IsA(accessor, A_Star))
+ 
+else if (IsA(accessor, A_Indices))
+ 
+else
+  break;

Is the last else branch unreliable? since indirection only support for
String, A_Star, A_Indices, we already have Assert in jsonb_check_jsonpath_needed
to ensure that.
+ *indirection = list_delete_first_n(*indirection, pathlen);
also this is not necessary,
because pathlen will be the same length as list *indirection in current design.


Please check the attached minor refactoring based on v11-0001 to v11-0006


v11-0001-misc-refactoring-based-on-v11_01_to_06.no-cfbot
Description: Binary data

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Bertrand Drouvot

Hi,

On Tue, Jun 24, 2025 at 04:41:33PM +0200, Christoph Berg wrote:
> Re: Bertrand Drouvot
> > Yes, I think compat_uptr_t usage is missing in do_pages_stat() (while it's 
> > used
> > in do_pages_move()).
> 
> I was also reading the kernel source around that place but you spotted
> the problem before me. This patch resolves the issue here:
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 8cf0f9c9599..542c81ec3ed 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2444,7 +2444,13 @@ static int do_pages_stat(struct mm_struct *mm, 
> unsigned long nr_pages,
>   if (copy_to_user(status, chunk_status, chunk_nr * 
> sizeof(*status)))
>   break;
> 
> - pages += chunk_nr;
> + if (in_compat_syscall()) {
> + compat_uptr_t __user *pages32 = (compat_uptr_t __user 
> *)pages;
> +
> + pages32 += chunk_nr;
> + pages = (const void __user * __user *) pages32;
> + } else
> + pages += chunk_nr;
>   status += chunk_nr;
>   nr_pages -= chunk_nr;
>   }
> 

Thanks! Yeah, I had the same kind of patch idea in mind.

> > +   #define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= 
> > DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
> > +
> > +   for (uint64 chunk_start = 0; chunk_start < 
> > shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
> 
> Perhaps optimize it to this:
> 
> #if sizeof(void *) == 4
> #define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= DO_PAGES_STAT_CHUNK_NR 
> (do_pages_stat())*/
> #else
> #define NUMA_QUERY_CHUNK_SIZE 1024
> #endif
> 
> ... or some other bigger number.

I had in mind to split the batch size on the PG side only for 32-bits, what 
about
the attached?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 8f38a49f7e929fa385b3784725e8a1955db9fc57 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Wed, 25 Jun 2025 05:06:40 +
Subject: [PATCH v1] Work around Linux kernel bug in do_pages_stat()

do_pages_stat() is already handling the input arrays correctly in
32-bit mode, but at the end of the "while (nr_pages)" loop, it
incorrectly advances the pages pointer with the wrong word size.

Work around is to ensure that pg_numa_query_pages() does not pass more than
DO_PAGES_STAT_CHUNK_NR (see do_pages_stat()) pages to do_pages_stat() so that
the wrong pointer arithmetic has no effect (as the pages variable is not being
used).

Linux kernel bug reported here: https://marc.info/?l=linux-mm&m=175077821909222&w=2
---
 src/port/pg_numa.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/src/port/pg_numa.c b/src/port/pg_numa.c
index 4b487a2a4e8..86921a156b4 100644
--- a/src/port/pg_numa.c
+++ b/src/port/pg_numa.c
@@ -16,6 +16,7 @@
 #include "c.h"
 #include 
 
+#include "miscadmin.h"
 #include "port/pg_numa.h"
 
 /*
@@ -46,7 +47,31 @@ pg_numa_init(void)
 int
 pg_numa_query_pages(int pid, unsigned long count, void **pages, int *status)
 {
+/*
+ * Work around Linux kernel bug in 32-bit compat mode: do_pages_stat() has
+ * incorrect pointer arithmetic for more than DO_PAGES_STAT_CHUNK_NR pages.
+ */
+#if SIZEOF_SIZE_T == 4
+#define NUMA_QUERY_CHUNK_SIZE 16	/* has to be <= DO_PAGES_STAT_CHUNK_NR
+	 * (do_pages_stat()) */
+	for (size_t chunk_start = 0; chunk_start < count; chunk_start += NUMA_QUERY_CHUNK_SIZE)
+	{
+		int			result;
+		uint64		chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, count - chunk_start);
+
+		CHECK_FOR_INTERRUPTS();
+
+		result = numa_move_pages(pid, chunk_size, &pages[chunk_start], NULL,
+ &status[chunk_start], 0);
+
+		if (result != 0)
+			return result;
+	}
+
+	return 0;
+#else
 	return numa_move_pages(pid, count, pages, NULL, status, 0);
+#endif
 }
 
 int
-- 
2.34.1

Re: Logrep launcher race conditions leading to slow tests

2025-06-24 Thread Dilip Kumar

On Tue, Jun 24, 2025 at 9:53 PM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > On Tue, Jun 24, 2025 at 5:26 AM Tom Lane  wrote:
> >> 1. WaitForReplicationWorkerAttach sometimes has to clear a process
> >> latch event so that it can keep waiting for the worker to launch.
> >> It neglects to set the latch again, allowing ApplyLauncherMain
> >> to miss events.
>
> > There was a previous discussion to fix this behavior. Heikki has
> > proposed a similar fix for this, but at the caller. See the patch
> > attached in email [1].
>
> Ah, thank you for that link.  I vaguely recalled that we'd discussed
> this strange behavior before, but I did not realize that anyone had
> diagnosed a cause.  I don't much like any of the patches proposed
> in that thread though --- they seem overcomplicated or off-point.
>
> I do not think we can switch to having two latches here.  The
> only reason WaitForReplicationWorkerAttach needs to pay attention
> to the process latch at all is that it needs to service
> CHECK_FOR_INTERRUPTS() conditions in a timely fashion, which is
> also true in the ApplyLauncherMain loop.  We can't realistically
> expect other processes to signal different latches depending on
> where the launcher is waiting, so those cases have to be triggered
> by the same latch.
>
> However, WaitForReplicationWorkerAttach can't service any
> latch-setting conditions other than those managed by
> CHECK_FOR_INTERRUPTS().  So if CHECK_FOR_INTERRUPTS() returns,
> we have some other triggering condition, which is the business
> of some outer code level to deal with.  Having it re-set the latch
> to allow that to happen promptly after it returns seems like a
> pretty straightforward answer to me.

Yeah that makes sense, we can not simply wait on another latch which
is not managed by CHECK_FOR_INTERRUPTS(), and IMHO the proposed patch
looks simple for resolving this issue.

-- 
Regards,
Dilip Kumar
Google

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-24 Thread Bertrand Drouvot

Hi,

On Tue, Jun 24, 2025 at 05:30:02PM +0200, Christoph Berg wrote:
> Re: Tomas Vondra
> > If it's a reliable fix, then I guess we can do it like this. But won't
> > that be a performance penalty on everyone? Or does the system split the
> > array into 16-element chunks anyway, so this makes no difference?
> 
> There's still the overhead of the syscall itself. But no idea how
> costly it is to have this 16-step loop in user or kernel space.
> 
> We could claim that on 32-bit systems, shared_buffers would be smaller
> anyway, so there the overhead isn't that big. And the step size should
> be larger (if at all) on 64-bit.

Right, and we already mention in the doc that using those views is "very slow"
or "can take a noticeable amount of time".

> > Anyway, maybe we should start by reporting this to the kernel people. Do
> > you want me to do that, or shall one of you take care of that? I suppose
> > that'd be better, as you already wrote a fix / know the code better.
> 
> Submitted: https://marc.info/?l=linux-mm&m=175077821909222&w=2

Thanks! I had in mind to look at how to report such a bug and provide a patch
but you beat me to it.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

IPC/MultixactCreation on the Standby server

2025-06-24 Thread Dmitry


Hi, hackers

The problem is as follows.
A replication cluster includes a primary server and one hot-standby replica.
The workload on the primary server is represented by multiple requests 
generating multixact IDs, while the hot-standby replica performs reading 
requests.


After some time, all requests on the hot-standby are stuck and never get 
finished.


The `pg_stat_activity` view on the replica reports that processes are 
stuck waiting for IPC/MultixactCreation,
pg_cancel_backend and pg_terminate_backend cannot cancel the request, 
SIGQUIT is the only way to stop it.


We tried:
- changing the `autovacuum_multixact_freeze_max_age` parameters,
- increasing `multixact_member_buffers` and `multixact_offset_buffers`,
- disabling `hot_standby_feedback`,
- switching the replica to synchronous and asynchronous mode,
- and much more.
But nothing helped.

We ran the replica in recovery mode from WAL archive, i.e. as 
warm-standby, the result is the same.


We tried to build from the sources based on REL_17_5 branch with the 
default configure settings

    ./configure
    make
    make install
But got no luck.

Here is an example with a synthetic workload reproducing the problem.

Test system
===

- Architecture: x86_64
- OS: Ubuntu 24.04.2 LTS (Noble Numbat)
- Tested postgres version(s):
    - latest 17 (17.5)
    - latest 18 (18-beta1)

The problem is not reproducible on PostgreSQL 16.9

Steps to reproduce
==

    postgres=# create table tbl (
    id int primary key,
    val int
    );
    postgres=# insert into tbl select i, 0 from generate_series(1,5) i;


The first and second scripts execute queries on the master server
-

    pgbench --no-vacuum --report-per-command -M prepared -c 200 -j 200 
-T 300 -P 1 --file=/dev/stdin <<'EOF'

    \set id random(1, 5)
    begin;
    select * from tbl where id = :id for key share;
    commit;
    EOF

    pgbench --no-vacuum --report-per-command -M prepared -c 100 -j 100 
-T 300 -P 1 --file=/dev/stdin <<'EOF'

    \set id random(1, 5)
    begin;
    update tbl set val = val+1 where id = :id;
    \sleep 10 ms
    commit;
    EOF


The following script is executed on the replica
---

    pgbench --no-vacuum --report-per-command -M prepared -c 100 -j 100 
-T 300 -P 1 --file=/dev/stdin <<'EOF'

    begin;
    select sum(val) from tbl;
    \sleep 10 ms
    select sum(val) from tbl;
    \sleep 10 ms
    commit;
    EOF

    pgbench (17.5 (Ubuntu 17.5-1.pgdg24.04+1))
    progress: 1.0 s, 2606.8 tps, lat 33.588 ms stddev 13.316, 0 failed
    progress: 2.0 s, 3315.0 tps, lat 30.174 ms stddev 5.933, 0 failed
    progress: 3.0 s, 3357.0 tps, lat 29.699 ms stddev 5.541, 0 failed
    progress: 4.0 s, 3350.0 tps, lat 29.911 ms stddev 5.311, 0 failed
    progress: 5.0 s, 3206.0 tps, lat 30.999 ms stddev 6.343, 0 failed
    progress: 6.0 s, 3264.0 tps, lat 30.828 ms stddev 6.389, 0 failed
    progress: 7.0 s, 3224.0 tps, lat 31.099 ms stddev 6.197, 0 failed
    progress: 8.0 s, 3168.0 tps, lat 31.486 ms stddev 6.940, 0 failed
    progress: 9.0 s, 3118.0 tps, lat 32.004 ms stddev 6.546, 0 failed
    progress: 10.0 s, 3017.0 tps, lat 33.183 ms stddev 7.971, 0 failed
    progress: 11.0 s, 3157.0 tps, lat 31.697 ms stddev 6.624, 0 failed
    progress: 12.0 s, 3180.0 tps, lat 31.415 ms stddev 6.310, 0 failed
    progress: 13.0 s, 3150.9 tps, lat 31.591 ms stddev 6.280, 0 failed
    progress: 14.0 s, 3329.0 tps, lat 30.189 ms stddev 5.792, 0 failed
    progress: 15.0 s, 3233.6 tps, lat 30.852 ms stddev 5.723, 0 failed
    progress: 16.0 s, 3185.4 tps, lat 31.378 ms stddev 6.383, 0 failed
    progress: 17.0 s, 3035.0 tps, lat 32.920 ms stddev 7.390, 0 failed
    progress: 18.0 s, 3173.0 tps, lat 31.547 ms stddev 6.390, 0 failed
    progress: 19.0 s, 3077.0 tps, lat 32.427 ms stddev 6.634, 0 failed
    progress: 20.0 s, 3266.1 tps, lat 30.740 ms stddev 5.842, 0 failed
    progress: 21.0 s, 2990.9 tps, lat 33.353 ms stddev 7.019, 0 failed
    progress: 22.0 s, 3048.1 tps, lat 32.933 ms stddev 6.951, 0 failed
    progress: 23.0 s, 3148.0 tps, lat 31.769 ms stddev 6.077, 0 failed
    progress: 24.0 s, 1523.2 tps, lat 30.029 ms stddev 5.093, 0 failed
    progress: 25.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
    progress: 26.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
    progress: 27.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
    progress: 28.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
    progress: 29.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
    progress: 30.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
    progress: 31.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
    progress: 32.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
    progress: 33.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
    progress: 34.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
    progress: 35.0 s, 0.0 tps, lat 0.000 ms stdde

Re: problems with toast.* reloptions

2025-06-24 Thread Nathan Bossart

On Tue, Jun 24, 2025 at 02:10:55PM +0900, Michael Paquier wrote:
> On Mon, Jun 23, 2025 at 03:59:56PM -0500, Nathan Bossart wrote:
>> Here is a very rough proof-of-concept patch set for this.  AFAICT there are
>> a few options we cannot fix on the back-branches because there is no way to
>> tell whether it is set or has just picked up the default.  On v18 and
>> newer, we could use isset_offset, but that doesn't exist on older versions.
>> (I haven't looked closely, but I'm assuming that back-patching isset_offset
>> isn't an option.)
> 
> Hmm.  I am wondering if we need to be aggressive about this set of
> changes at all in the back branches.  It's been broken for a long time
> without anybody really complaining about the fact that reloptions
> being set or not influenced the outcome in the context of autovacuum,
> so perhaps there is a good argument for keeping all that in v19.  My
> conservative 2c.

Yeah, I'm tempted to even ask how folks feel about removing the toast.*
reloptions.  Maybe there's some simple cases that work well enough, but
AFAICT any moderately-complicated setup basically doesn't work at all.  In
any case, writing out this patch set has got me on the fix-on-HEAD-only
bandwagon.

>> I would like to explore the "option 2" from upthread [0] for v19.  I think
>> that is a better long-term solution, and it may allow us to remove the
>> table_toast_map in autovacuum.
> 
> It would be nice to have some tests here to check the state of the
> options used?  My best guess would be a DEBUG1 entry combined with a
> scan of the logs generated and an aggressive autovacuum worker
> spawn to check that the options generated are what we expect for the
> relations autovacuum picks up.

Eh...  I agree that's probably how we'd have to test it with the existing
tools, but it sure sounds like a recipe for a flaky test.

-- 
nathan

Re: [PATCH] Fix OAuth feature detection on OpenBSD+Meson

2025-06-24 Thread Jacob Champion

On Tue, Jun 24, 2025 at 1:27 AM Nazir Bilal Yavuz  wrote:
> I think this is a good idea. Another point is that CI images and their
> packages are updated automatically, so it would be easier to catch if
> something breaks when the VM is updated.

Yes, that's a great point too. Okay, sounds like there is some
interest, and I'll add it to my list of patchsets to try (but if
anyone wants to beat me to it, please go ahead!).

--Jacob

Re: Logrep launcher race conditions leading to slow tests

2025-06-24 Thread Tom Lane

Ashutosh Bapat  writes:
> On Tue, Jun 24, 2025 at 5:26 AM Tom Lane  wrote:
>> 4. In process_syncing_tables_for_apply (the other caller of
>> logicalrep_worker_launch), it seems okay to ignore the
>> result of logicalrep_worker_launch, but I think it should
>> fill hentry->last_start_time before not after the call.
>> Otherwise we might be changing a hashtable entry that's
>> no longer relevant to this worker.

> A hash entry is associated with a table, not the worker. In case the
> worker fails to launch it records the time when worker launch for that
> table was attempted so that next attempt could be well-spaced in time.
> I am not able your last statement, what is the entry's relevance to
> the worker.

> But your change makes this code similar to ApplyLauncherMain(), which
> deals with subscriptions. +1 for the consistency.

Yeah, mainly I want to make it look more like ApplyLauncherMain().
It's true right now that nothing outside this process will touch that
hash table, so it doesn't matter which way we do it.  But if we were
to switch that table to being shared state, this'd be an unsafe order
of operations for the same reasons it'd be wrong to do it like that in
ApplyLauncherMain().

regards, tom lane

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

2025-06-24 Thread shveta malik

On Tue, Jun 24, 2025 at 2:12 PM Bertrand Drouvot
 wrote:
>
> Hi,
>
> On Tue, Jun 24, 2025 at 12:13:32AM +0900, Masahiko Sawada wrote:
> > On Mon, Jun 23, 2025 at 7:01 PM Bertrand Drouvot
> >  wrote:
> > >
> > > Hi,
> > >
> > > On Mon, Jun 23, 2025 at 05:10:37PM +0900, Masahiko Sawada wrote:
> > > > On Thu, Jun 19, 2025 at 6:00 PM Bertrand Drouvot
> > > >  wrote:
> > > > >
> > > > > - pg_activate_logical_decoding() is needed only if there is not 
> > > > > already a logical
> > > > > slot on the primary
> > > > > - the drop requires the user to think twice if this is the last 
> > > > > logical slot
> > > > > - we don't ask the user to create a logical slot if he does not want 
> > > > > to use it
> > > > > on the primary
> > > > >
> > > > > Thoughts?
> > > >
> > > > If there is no logical slot on the primary, how can the user disable
> > > > logical decoding that has been enabled via
> > > > pg_activate_logical_decoding()?
> > >
> > > I was thinking to keep the pg_deactivate_logical_decoding() API proposed
> > > in this thread.
> >
> > Okay. One approach that combines your idea and Shveta's idea is:
> >
> > - a special (empty) logical slot with the reserved slot name can be
> > created and deleted only by SQL functions,
> > pg_activate_logical_decoding() and pg_deactivate_logical_decoding().
> > - this special slot cannot be used by logical decoding.
> > - effective_wal_level is increased and decreased when creating and
> > dropping a slot (i.e., either a normal logical slots or the special
> > logical slot).
>
> Yeah, I think that sounds reasonable and that would avoid users to use
> the slot created with immediately_reserve set to false by mistake.
>

+1.
I think we do need to provide 'immediately_reserve' as a new argument
now for logical slots creation. If the slot is a special one with a
reserved name, it can internally be created with WALs not reserved for
our purpose.

> > > You mean a GUC to both automaticly set effective_wal_level to logical at 
> > > slot creation
> > > and also decrease effective_wal_level to replica if last replication slot 
> > > is dropped?
> >
> > What I imagined was to control only the decreasing behavior that could
> > be more problematic than the increase case. But it might be rather
> > confusing (e.g., what if we turn off that behavior and restart the
> > server?).
>
> Right...So not sure we need such a GUC. What about always behave with the
> automatic behavior?
>

Does it make sense to provide a GUC which will have the default set to
automatic but if the user is not interested or having some issues with
new behaviour, he can switch off the GUC, making the new functions
no-op as well?
In absence of such a GUC, users will have absolutely no way to switch
back to old behaviour. Will that be okay?

thanks
Shveta

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

2025-06-24 Thread shveta malik

On Wed, Jun 25, 2025 at 9:12 AM shveta malik  wrote:
>
> On Tue, Jun 24, 2025 at 2:12 PM Bertrand Drouvot
>  wrote:
> >
> > Hi,
> >
> > On Tue, Jun 24, 2025 at 12:13:32AM +0900, Masahiko Sawada wrote:
> > > On Mon, Jun 23, 2025 at 7:01 PM Bertrand Drouvot
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Mon, Jun 23, 2025 at 05:10:37PM +0900, Masahiko Sawada wrote:
> > > > > On Thu, Jun 19, 2025 at 6:00 PM Bertrand Drouvot
> > > > >  wrote:
> > > > > >
> > > > > > - pg_activate_logical_decoding() is needed only if there is not 
> > > > > > already a logical
> > > > > > slot on the primary
> > > > > > - the drop requires the user to think twice if this is the last 
> > > > > > logical slot
> > > > > > - we don't ask the user to create a logical slot if he does not 
> > > > > > want to use it
> > > > > > on the primary
> > > > > >
> > > > > > Thoughts?
> > > > >
> > > > > If there is no logical slot on the primary, how can the user disable
> > > > > logical decoding that has been enabled via
> > > > > pg_activate_logical_decoding()?
> > > >
> > > > I was thinking to keep the pg_deactivate_logical_decoding() API proposed
> > > > in this thread.
> > >
> > > Okay. One approach that combines your idea and Shveta's idea is:
> > >
> > > - a special (empty) logical slot with the reserved slot name can be
> > > created and deleted only by SQL functions,
> > > pg_activate_logical_decoding() and pg_deactivate_logical_decoding().
> > > - this special slot cannot be used by logical decoding.
> > > - effective_wal_level is increased and decreased when creating and
> > > dropping a slot (i.e., either a normal logical slots or the special
> > > logical slot).
> >
> > Yeah, I think that sounds reasonable and that would avoid users to use
> > the slot created with immediately_reserve set to false by mistake.
> >
>
> +1.
> I think we do need to provide 'immediately_reserve' as a new argument
> now for logical slots creation. If the slot is a special one with a
> reserved name, it can internally be created with WALs not reserved for
> our purpose.
>

One correction here.
I think we do NOT need to provide 'immediately_reserve' as a new
argument  now for logical slots creation. ...

> > > > You mean a GUC to both automaticly set effective_wal_level to logical 
> > > > at slot creation
> > > > and also decrease effective_wal_level to replica if last replication 
> > > > slot is dropped?
> > >
> > > What I imagined was to control only the decreasing behavior that could
> > > be more problematic than the increase case. But it might be rather
> > > confusing (e.g., what if we turn off that behavior and restart the
> > > server?).
> >
> > Right...So not sure we need such a GUC. What about always behave with the
> > automatic behavior?
> >
>
> Does it make sense to provide a GUC which will have the default set to
> automatic but if the user is not interested or having some issues with
> new behaviour, he can switch off the GUC, making the new functions
> no-op as well?
> In absence of such a GUC, users will have absolutely no way to switch
> back to old behaviour. Will that be okay?
>
> thanks
> Shveta

Re: Logical Replication of sequences

2025-06-24 Thread Nisha Moond

On Tue, Jun 24, 2025 at 3:07 PM Nisha Moond  wrote:
>
> On Sun, Jun 22, 2025 at 8:05 AM vignesh C  wrote:
> >
> > Thanks for the comment, the attached v20250622 version patch has the
> > changes for the same.
> >
>
> Thanks for the patches, please find my review comments for patches 001 and 
> 002:
>

Please find my further comments on patches 004 and 005:
(I've no comments for 006)

patch-004:

5) The new fetch_sequence_list() function should be guarded with version checks.
Without this, CREATE SUBSCRIPTION will always fail when a newer
subscriber (>=PG19) attempts to create a subscription to an older
publisher (conninfo, true, true,
+must_use_password,
+app_name.data, &err);
+ if (LogRepWorkerWalRcvConn == NULL)
+ ereport(ERROR,
+ errcode(ERRCODE_CONNECTION_FAILURE),
+ errmsg("could not connect to the publisher: %s", err));

The error message should mention the specific process or worker that
failed to connect, similar to how it's done for other workers like
slotsync or tablesync.

Suggestion:
 errmsg("sequencesync worker for subscription \"%s\" could not connect
to the publisher: %s", MySubscription->name, err));
~~~

8)
+ CommitTransactionCommand();
+
+ copy_sequences(LogRepWorkerWalRcvConn, remotesequences, subid);
+
+ list_free_deep(sequences_not_synced);

Should we also free the 'remotesequences' list here?
~~~

--
Thanks,
Nisha

RE: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-06-24 Thread Zhijie Hou (Fujitsu)

Hi,

After commit ca307d5, I noticed another crash when testing
some other logical replication features.

The server with max_replication_slots set to 0 would crash when executing 
CHECKPOINT.

TRAP: failed Assert("ReplicationSlotCtl != NULL"), File: "slot.c", Line: 1162, 
PID: 577315
postgres: checkpointer (ExceptionalCondition+0x9e)[0xc046cb]
postgres: checkpointer (ReplicationSlotsComputeRequiredLSN+0x30)[0x99697f]
postgres: checkpointer (CheckPointReplicationSlots+0x191)[0x997dc1]
postgres: checkpointer [0x597b1b]
postgres: checkpointer (CreateCheckPoint+0x6d1)[0x59729e]
postgres: checkpointer (CheckpointerMain+0x559)[0x93ee79]
postgres: checkpointer (postmaster_child_launch+0x15f)[0x940311]
postgres: checkpointer [0x9468b0]
postgres: checkpointer (PostmasterMain+0x1258)[0x9434f8]
postgres: checkpointer (main+0x2fe)[0x7f5f9c]
/lib64/libc.so.6(__libc_start_main+0xe5)[0x7f7585f81d85]
postgres: checkpointer (_start+0x2e)[0x4958ee]

I think it is trying to access the replication slots when the shared memory
for them was not allocated.

Best Regards,
Hou zj

Re: Safeguards against incorrect fd flags for fsync()

2025-06-24 Thread Michael Banck

Hi,

another update:

On Wed, Jun 11, 2025 at 09:24:24PM +0200, Michael Banck wrote:
> So it seems the low-resolution timer is the only functional issue right
> now. I upgraded my VM to current Debian unstable, but unfortunately that
> did not increase the timer resolution is hoped, maybe some more pieces
> in glibc are missing, I checked in on that[1]. In any case, they are
> actively working on this.

I got it working, I had to rebuild gnumach with --enable-apic in order
to get HPET. With that, the regular build-farm checks (check/
installcheck in contrib, src/test/regress and src/test/isolation) pass
without patches to testsuite timings.

> However, there is another caveat:
> 
> Running "make check" manually only sometimes hangs the VM (without any
> output anywhere), while running it via the buildfarm client reliably
> makes it hang each time, so I added --tests=test_setup as a work-around
> to test the other stages. I have not found a reason for this yet, I will
> try to get a serial console working for the VM next, maybe that will
> show some helpful info.

This was due to the build-farm running on HEAD with a config including
debug_parallel_query=on, which adds a lot of strain on the machine I
guess? As this is a single-node VM, I guess it overloaded it regularly.
I reported that and I think this is something that needs to be
addressed, but people are working on similar issues right now[1].

Is removing the debug_parallel_query=on configuration for HEAD a valid
mode of operation for a buildfarm animal? I ran the tests 10 times in a
row without issues today.

Michael

[1] https://lists.debian.org/debian-hurd/2025/05/msg00031.html
https://lists.debian.org/debian-hurd/2025/05/msg00031.html

Re: Make COPY format extendable: Extract COPY TO format implementations

2025-06-24 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Tue, 24 Jun 2025 15:24:23 +0900,
  Masahiko Sawada  wrote:

>> It's natural to add more related APIs with this
>> approach. The single registration API provides one feature
>> by one operation. If we use the RegisterCopyRoutine() for
>> FROM and TO formats API, it's not natural that we add more
>> related APIs. In this case, some APIs may provide multiple
>> features by one operation and other APIs may provide single
>> feature by one operation. Developers may be confused with
>> the API. For example, developers may think "what does mean
>> NULL here?" or "can we use NULL here?" for
>> "RegisterCopyRoutine("new-format", NewFormatFromRoutine,
>> NULL)".
> 
> We can document it in the comment for the registration function.

I think that API that can be understandable without the
additional note is better API than API that needs some
notes.

Why do you suggest the RegisterCopyRoutine("new-format",
NewFormatFromRoutine, NewFormatToRoutine) API? You want to
remove the duplicated codes in
RegisterCopy{From,To}Routine(), right? I think that we can
do it by creating a convenient function that has the
duplicated codes extracted from
RegisterCopy{From,To}Routine() and
RegisterExtensionExplainOption().

BTW, what do you think about my answer (one feature by one
operation API is more extendable API) for your question
(extendability and API compatibility)?

> Suppose that extension-A implements only CopyToRoutine for the
> custom-format-X with the format name 'myformat' and extension-B
> implements only CopyFromRoutine for the custom-format-Y with the same
> name, if users load both extension-A and extension-B, it seems to me
> that extension-A registers the custom-format-X format as 'myformat'
> only with CopyToRoutine, and extension-B overwrites the 'myformat'
> registration by adding custom-format-Y's CopyFromRoutine. However, if
> users register extension-C that implements both routines with the
> format name 'myformat', they can register neither extension-A nor
> extension-B, which seems to me that we don't allow overwriting the
> registration in this case.

Do you assume that users use extension-A, extension-B and
extension-C without reading their documentation? If users
read their documentation before users use them, users can
know all of them use the same format name 'myformat' and
which extension provides Copy{From,To}Routine.

In this case, these users (who don't read documentation)
will be confused with the RegisterCopyRoutine("new-format",
NewFormatFromRoutine, NewFormatToRoutine) API too. Do we
really need to care about this case?

> I think the core issue appears to be the internal management of custom
> format entries but the current patch does enable registration
> overwriting in the former case (extension-A and extension-B case).

This is the same behavior as existing custom EXPLAIN option
implementation. Should we use different behavior here?

>> >>FYI: RegisterCopy{From,To}Routine() uses the same logic
>> >>as RegisterExtensionExplainOption().
>> >
>> > I'm concerned that the patch has duplicated logics for the
>> > registration of COPY FROM and COPY TO.
>>
>> We can implement a convenient routine that can be used for
>> RegisterExtensionExplainOption() and
>> RegisterCopy{From,To}Routine() if it's needed.
> 
> I meant there are duplicated codes in COPY FROM and COPY TO. For
> instance, RegisterCopyFromRoutine() and RegisterCopyToRoutine() have
> the same logic.

Yes, I understand it. I wanted to say that we can remove the
duplicated codes by introducing a RegisterSomething()
function that can be used by
RegisterExtensionExplainOption() and
RegisterCopy{From,To}Routine():

void
RegisterSomething(...)
{
  /* Common codes in RegisterExtensionExplainOption() and
 RegisterCopy{From,To}Routine()
 ...
   */
}

void
RegisterExtensionExplainOption(...)
{
  RegisterSomething(...);
}

void
RegisterCopyFromRoutine(...)
{
  RegisterSomething(...);
}

void
RegisterCopyToRoutine(...)
{
  RegisterSomething(...);
}

You think that this approach can't remove the duplicated
codes, right?

>> > InitPostgres() is not a correct function as it's a process
>> > initialization function. Probably we don't necessarily need to
>> > register the built-in formats in the same way as custom formats. A
>> > simple solution would be to have separate arrays for built-in formats
>> > and custom formats and have the GetCopy[To|From]Routine() search both
>> > arrays (built-in array first).
>>
>> We had a discussion that we should dog-food APIs:
>>
>> https://www.postgresql.org/message-id/flat/CAKFQuwaCHhrS%2BRE4p_OO6d7WEskd9b86-2cYcvChNkrP%2B7PJ7A%40mail.gmail.com#e6d1cdd04dac53eafe34b784ac47b68b
>>
>> > We should (and usually do) dog-food APIs when reasonable
>> > and this situation seems quite reasonable.
>>
>> In this case, we don't need to dog-food APIs, right?
> 
> Yes, I think so.

OK. I don't have a strong opinion for it. If n

Re: Allow to collect statistics on virtual generated columns

2025-06-24 Thread Yugo Nagata

On Tue, 17 Jun 2025 10:43:41 -0400
Andres Freund  wrote:

> Hi,
> 
> On 2025-04-22 18:10:06 +0900, Yugo Nagata wrote:
> > With your feedback, I would like to progress or rework the patch.
> 
> Right now the tests seem to always fail:
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F571

Thank you for letting me know it.

I've attached an updated patch to fix the test failure.

However, I'm now reconsidering the current approach, where the expression
of a virtual generated column is expanded at the time of creating extended
statistics. This seems not be ideal, as the statistics would become useless
if the expression is later modified.

Instead, I'm thinking of an alternative approach: expanding the expression
at the time statistics are collected.

Best regards,
Yugo Nagata

> 
> Fails e.g. with:
> https://api.cirrus-ci.com/v1/artifact/task/5921189782093824/testrun/build/testrun/regress/regress/regression.diffs
> 
> diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/stats_ext.out 
> /tmp/cirrus-ci-build/build/testrun/regress/regress/results/stats_ext.out
> --- /tmp/cirrus-ci-build/src/test/regress/expected/stats_ext.out  
> 2025-05-26 00:59:01.813042000 +
> +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/stats_ext.out  
> 2025-05-26 01:02:20.350387000 +
> @@ -56,7 +56,6 @@
>  ERROR:  unrecognized statistics kind "unrecognized"
>  -- incorrect expressions
>  CREATE STATISTICS tst ON (y) FROM ext_stats_test; -- single column reference
> -ERROR:  extended statistics require at least 2 columns
>  CREATE STATISTICS tst ON y + z FROM ext_stats_test; -- missing parentheses
>  ERROR:  syntax error at or near "+"
>  LINE 1: CREATE STATISTICS tst ON y + z FROM ext_stats_test;
> @@ -69,25 +68,24 @@
>  -- statistics on virtual generated column not allowed
>  CREATE TABLE ext_stats_test1 (x int, y int, z int GENERATED ALWAYS AS (x+y) 
> VIRTUAL, w xid);
>  CREATE STATISTICS tst on z from ext_stats_test1;
> -ERROR:  statistics creation on virtual generated columns is not supported
>  CREATE STATISTICS tst on (z) from ext_stats_test1;
> -ERROR:  statistics creation on virtual generated columns is not supported
> +ERROR:  statistics object "tst" already exists
>  CREATE STATISTICS tst on (z+1) from ext_stats_test1;
> -ERROR:  statistics creation on virtual generated columns is not supported
> +ERROR:  statistics object "tst" already exists
>  CREATE STATISTICS tst (ndistinct) ON z from ext_stats_test1;
> -ERROR:  statistics creation on virtual generated columns is not supported
> +ERROR:  statistics object "tst" already exists
>  -- statistics on system column not allowed
>  CREATE STATISTICS tst on tableoid from ext_stats_test1;
> -ERROR:  statistics creation on system columns is not supported
> +ERROR:  statistics object "tst" already exists
>  CREATE STATISTICS tst on (tableoid) from ext_stats_test1;
> -ERROR:  statistics creation on system columns is not supported
> +ERROR:  statistics object "tst" already exists
>  CREATE STATISTICS tst on (tableoid::int+1) from ext_stats_test1;
> -ERROR:  statistics creation on system columns is not supported
> +ERROR:  statistics object "tst" already exists
>  CREATE STATISTICS tst (ndistinct) ON xmin from ext_stats_test1;
> -ERROR:  statistics creation on system columns is not supported
> +ERROR:  statistics object "tst" already exists
>  -- statistics without a less-than operator not supported
>  CREATE STATISTICS tst (ndistinct) ON w from ext_stats_test1;
> -ERROR:  column "w" cannot be used in statistics because its type xid has no 
> default btree operator class
> +ERROR:  statistics object "tst" already exists
>  DROP TABLE ext_stats_test1;
>  -- Ensure stats are dropped sanely, and test IF NOT EXISTS while at it
>  CREATE TABLE ab1 (a INTEGER, b INTEGER, c INTEGER);
> 
> Greetings,
> 
> Andres
> 
> 


-- 
Yugo Nagata 
>From f6151679d4b222bd5c60107322486ea90fa951d2 Mon Sep 17 00:00:00 2001
From: Yugo Nagata 
Date: Tue, 22 Apr 2025 17:03:50 +0900
Subject: [PATCH v2] Allow to create extended statistics on virtual generated
 columns

---
 src/backend/commands/statscmds.c| 88 -
 src/test/regress/expected/stats_ext.out | 11 +---
 src/test/regress/sql/stats_ext.sql  |  7 +-
 3 files changed, 43 insertions(+), 63 deletions(-)

diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index e24d540cd45..1875ea26879 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -29,6 +29,7 @@
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/optimizer.h"
+#include "rewrite/rewriteHandler.h"
 #include "statistics/statistics.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -240,59 +241,56 @@ CreateStatistics(CreateStatsStmt *stmt)
 attname)));
 			attForm = (Form_pg_attribute) GETSTRUCT(atttuple);
 
-			/* Disallow use of system attributes in extended stats */
-			if (attForm->attnum <= 0)
-ereport(ERROR,
-		(

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2025-06-24 Thread jian he

hi.

+ /* Sort array of lower bounds. */
+ qsort_arg(lower_bounds, nparts, sizeof(PartitionRangeBound *),
+  qsort_partition_rbound_cmp, (void *) key);
here, we don't need ``(void *)``


+ALTER TABLE [ IF EXISTS ] name
+MERGE PARTITIONS (partition_name1, partition_name2 [, ...])
+INTO partition_name
In the synopsis section, we can combine the last two lines into one
for better formatting.

after

we can add the following to briefly explain parameters: partition_name1,
partition_name2


 
  partition_name1
  partition_name2
  
   
The names of the tables being merged into the new partition.
   
  
 

What do you think about alternative syntax:
ALTER TABLE tab1 MERGE PARTITION part1 WITH (part2, part3) mentioned in [1].
I think we need to settle this issue before moving forward.
If the current MERGE PARTITION design is finalized, then v48-0001 looks solid.

[1] 
https://postgr.es/m/CA+TgmoY0=bT_xBP8csR=MFE=fxge2n2-me2-31jbogeclvw...@mail.gmail.com

Re: [PATCH] Fix OAuth feature detection on OpenBSD+Meson

2025-06-24 Thread Nazir Bilal Yavuz

Hi,

On Tue, 24 Jun 2025 at 02:37, Jacob Champion
 wrote:
> As a potential follow-up, is there any interest in switching the
> Cirrus Meson setup to explicitly enable the features that we expect to
> test? That would match what we do for Autoconf, I think; it would have
> helped me catch my mistake earlier.

I think this is a good idea. Another point is that CI images and their
packages are updated automatically, so it would be easier to catch if
something breaks when the VM is updated.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: BackendKeyData is mandatory?

2025-06-24 Thread Jelte Fennema-Nio

On Tue, 24 Jun 2025 at 01:44, Tatsuo Ishii  wrote:
> One example is Amazon RDS Proxy.

Okay, that sounds widely used enough to continue that we should
probably change the new PG18 behaviour of PQgetCancel and
PQcancelCreate like I suggested. Failing all psycopg2 connection
attempts against AWS its proxy service doesn't seem like something
we'd want to do.

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

2025-06-24 Thread Bertrand Drouvot

Hi,

On Tue, Jun 24, 2025 at 12:13:32AM +0900, Masahiko Sawada wrote:
> On Mon, Jun 23, 2025 at 7:01 PM Bertrand Drouvot
>  wrote:
> >
> > Hi,
> >
> > On Mon, Jun 23, 2025 at 05:10:37PM +0900, Masahiko Sawada wrote:
> > > On Thu, Jun 19, 2025 at 6:00 PM Bertrand Drouvot
> > >  wrote:
> > > >
> > > > - pg_activate_logical_decoding() is needed only if there is not already 
> > > > a logical
> > > > slot on the primary
> > > > - the drop requires the user to think twice if this is the last logical 
> > > > slot
> > > > - we don't ask the user to create a logical slot if he does not want to 
> > > > use it
> > > > on the primary
> > > >
> > > > Thoughts?
> > >
> > > If there is no logical slot on the primary, how can the user disable
> > > logical decoding that has been enabled via
> > > pg_activate_logical_decoding()?
> >
> > I was thinking to keep the pg_deactivate_logical_decoding() API proposed
> > in this thread.
> 
> Okay. One approach that combines your idea and Shveta's idea is:
> 
> - a special (empty) logical slot with the reserved slot name can be
> created and deleted only by SQL functions,
> pg_activate_logical_decoding() and pg_deactivate_logical_decoding().
> - this special slot cannot be used by logical decoding.
> - effective_wal_level is increased and decreased when creating and
> dropping a slot (i.e., either a normal logical slots or the special
> logical slot).

Yeah, I think that sounds reasonable and that would avoid users to use
the slot created with immediately_reserve set to false by mistake. 

> > You mean a GUC to both automaticly set effective_wal_level to logical at 
> > slot creation
> > and also decrease effective_wal_level to replica if last replication slot 
> > is dropped?
> 
> What I imagined was to control only the decreasing behavior that could
> be more problematic than the increase case. But it might be rather
> confusing (e.g., what if we turn off that behavior and restart the
> server?).

Right...So not sure we need such a GUC. What about always behave with the 
automatic behavior?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Things I don't like about \du's "Attributes" column

2025-06-24 Thread Pavel Luzanov


On 11.04.2025 20:09, David G. Johnston wrote:
However, I do think we are at something committable, though I'd make 
one, maybe two, changes to v8.


Let's return to this topic.

Valid until -> Password valid until: the timestamp value already 
forces a wide column, adding the word Password to the header to 
clarify what is valid simply provides the same context that the 
create role page provides when it shows the valid until attribute 
immediately below the password attribute.  Leaving "valid until" 
alone retains the attribute name tieback.


Connection limit -> Con. limit: maybe this gets rejected on 
translation grounds but the abbreviation here seems obvious and 
shaves 7 characters off the mandatory width for a column that 
occupies 12 characters more than the values require.


Both changes implemented in v9.

Example output from regression test:

CREATE ROLE regress_du_role0 LOGIN PASSWORD '123' VALID UNTIL 
'2024-06-04' CONNECTION LIMIT 0;
CREATE ROLE regress_du_role1 CREATEROLE CONNECTION LIMIT -1 VALID UNTIL 
'infinity';
CREATE ROLE regress_du_role2 LOGIN REPLICATION BYPASSRLS CONNECTION 
LIMIT 42;
CREATE ROLE regress_du_admin LOGIN SUPERUSER CREATEROLE CREATEDB 
BYPASSRLS REPLICATION INHERIT;

COMMENT ON ROLE regress_du_admin IS 'some description';

\du+ regress_du*
 List of roles
    Role name | Login | Attributes  | Password valid until 
| Con. limit |   Description

--+---+-+--++--
 regress_du_admin | yes   | Superuser +|  
|    | some description
  |   | Create DB +|  
|    |
  |   | Create role+|  
|    |
  |   | Inherit +|  
|    |
  |   | Replication+|  
|    |
  |   | Bypass RLS |  
|    |
 regress_du_role0 | yes   | Inherit | Tue Jun 04 00:00:00 2024 PDT 
|  0 |
 regress_du_role1 | no    | Create role+| infinity 
|    |
  |   | Inherit |  
|    |
 regress_du_role2 | yes   | Inherit +|  
| 42 |
  |   | Replication+|  
|    |
  |   | Bypass RLS |  
|    |

(4 rows)

I marked commitfest entry[1] as Need Review.

1. https://commitfest.postgresql.org/patch/4738/

--
Pavel Luzanov
Postgres Professional:https://postgrespro.com
From 84b313f44fb50da97b8bff36694d549e2a32c55c Mon Sep 17 00:00:00 2001
From: Pavel Luzanov 
Date: Tue, 24 Jun 2025 12:31:02 +0300
Subject: [PATCH v9] psql: Rethinking of \du command

- rolcanlogin, rolconnlimit, rolvaliduntil attributes placed in
  separate columns.
- The "Attributes" column includes only the enabled logical attributes.
- The attribute names correspond to the keywords of the CREATE ROLE command.
- The attributes are listed in the same order as in the documentation.
- Value -1 for rolconnlimit replaced by NULL to make it easier
  to understand that there is no limit.
- General refactoring of describeRoles function in describe.c.
---
 src/bin/psql/describe.c| 146 -
 src/test/regress/expected/psql.out |  40 +---
 src/test/regress/sql/psql.sql  |  12 ++-
 3 files changed, 72 insertions(+), 126 deletions(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 24e0100c9f0..8fd3f35a1ad 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -43,7 +43,6 @@ static bool describeOneTableDetails(const char *schemaname,
 	bool verbose);
 static void add_tablespace_footer(printTableContent *const cont, char relkind,
   Oid tablespace, const bool newline);
-static void add_role_attribute(PQExpBuffer buf, const char *const str);
 static bool listTSParsersVerbose(const char *pattern);
 static bool describeOneTSParser(const char *oid, const char *nspname,
 const char *prsname);
@@ -3712,34 +3711,47 @@ describeRoles(const char *pattern, bool verbose, bool showSystem)
 {
 	PQExpBufferData buf;
 	PGresult   *res;
-	printTableContent cont;
-	printTableOpt myopt = pset.popt.topt;
-	int			ncols = 2;
-	int			nrows = 0;
-	int			i;
-	int			conns;
-	const char	align = 'l';
-	char	  **attr;
-
-	myopt.default_footer = false;
+	printQueryOpt myopt = pset.popt;
 
 	initPQExpBuffer(&buf);
-
 	printfPQExpBuffer(&buf,
-	  "SELECT r.rolname, r.rolsuper, r.rolinherit,\n"
-	  "  r.rolcreaterole, r.rolcreatedb, r.rolcanlogin,\n"
-	  "  r.rolconnlimit, r.rolvaliduntil");
-
-	if (verbose)
-	{
-		appendPQExpBufferStr(&buf, "\n, pg_catalog.shobj_description(r.oid, 'pg_authid') AS description"

Re: Improve doc on parallel stream changes for Stream Abort message

2025-06-24 Thread Anthonin Bonnefoy

On Tue, Jun 24, 2025 at 7:26 AM Amit Kapila  wrote:
> How about a slightly modified version like: (a) The LSN of the abort
> operation, present only when the change stream can be applied in
> parallel. This field is available since protocol version 4. (b) Abort
> timestamp of the transaction, present only when the change stream can
> be applied in parallel. The value is in number of microseconds since
> PostgreSQL epoch (2000-01-01). This field is available since protocol
> version 4.

What about ", present only when streaming is set to parallel"? I think
clarifying the relation between streaming=parallel and the presence of
those fields is the important part.

Re: Logical Replication of sequences

2025-06-24 Thread shveta malik

>
> Thanks for the comment, the attached v20250622 version patch has the
> changes for the same.
>

Thanks for the patches, I am not done with review yet, but please find
the feedback so far:


1)
+ if (!OidIsValid(seq_relid))
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("sequence \"%s.%s\" does not exist",
+schema_name, sequence_name));

ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE might not be a correct error
code here. Shall we have ERRCODE_UNDEFINED_OBJECT? Thoughts?


2)
tab-complete here shows correctly:

postgres=# CREATE PUBLICATION pub6 FOR ALL
SEQUENCES  TABLES

But tab-complete for these 2 commands does not show anything:

postgres=# CREATE PUBLICATION pub6 FOR ALL TABLES,
postgres=# CREATE PUBLICATION pub6 FOR ALL SEQUENCES,

We shall specify SEQUENCES/TABLES in above commands. IIUC, we do not
support any other combination like (table , tables in schema
) once we get ALL clause in command. So it is safe to display
tab-complete as either TABLES or SEQUENECS in above.

3)
postgres=#  CREATE publication pub1 for sequences;
ERROR:  invalid publication object list
LINE 1: CREATE publication pub1 for sequences;
 ^
DETAIL:  One of TABLE, TABLES IN SCHEMA or ALL SEQUENCES must be
specified before a standalone table or schema name.


This message is not correct as we can not have ALL SEQUENCES *before*
a standalone table or schema name. The problem is that gram.y is
taking *sequences* as a table or schema name. I noticed that it does
same with *tables* as well:

postgres=#  CREATE publication pub1 for tables;
ERROR:  invalid publication object list
LINE 1: CREATE publication pub1 for tables;
  ^
DETAIL:  One of TABLE, TABLES IN SCHEMA or ALL SEQUENCES must be
specified before a standalone table or schema name.


But since gram.y here can not identify an in-between missing keyword
*all*, thus it is considering it (tables/sequenecs) as a literal/name
instead of keyword. We can revert back to old message in such a case.
I am unable to think of a good solution here.


4)
I think the error here is wrong as we are not trying to specify
multiple all-tables entries.

postgres=# CREATE PUBLICATION pub6 for all tables, tables in schema public;
ERROR:  invalid publication object list
LINE 1: CREATE PUBLICATION pub6 for all tables, tables in schema pub...

^
DETAIL:  ALL TABLES can be specified only once.


5)
The log messages still has some scope of improvement:

2025-06-24 08:52:17.988 IST [110359] LOG:  logical replication
sequence synchronization worker for subscription "sub1" has started
2025-06-24 08:52:18.029 IST [110359] LOG:  logical replication
sequence synchronization for subscription "sub1" - total
unsynchronized: 3
2025-06-24 08:52:18.090 IST [110359] LOG:  logical replication
sequence synchronization for subscription "sub1" - batch #1 = 3
attempted, 0 succeeded, 2 mismatched, 1 missing
2025-06-24 08:52:18.091 IST [110359] ERROR:  logical replication
sequence synchronization failed for subscription "sub1"
2025-06-24 08:52:18.091 IST [110359] DETAIL:  Sequences
("public.myseq100") are missing on the publisher. Additionally,
parameters differ for the remote and local sequences
("public.myseq101", "public.myseq102").
2025-06-24 08:52:18.091 IST [110359] HINT:  Use ALTER SUBSCRIPTION ...
REFRESH PUBLICATION or use ALTER SUBSCRIPTION ... REFRESH PUBLICATION
SEQUENCES. Alter or re-create local sequences to have the same
parameters as the remote sequences


a)
Sequences ("public.myseq100") are missing on the publisher.
Additionally, parameters differ for the remote and local sequences
("public.myseq101", "public.myseq102").

Shall we change this to:
missing sequence(s) on publisher: ("public.myseq100"); mismatched
sequences(s) on subscriber: ("public.myseq101", "public.myseq102")

It will then be similar to the previous log pattern ( 3 attempted, 0
succeeded etc) instead of being more verbal. Thoughts?


b)
Hints are a little odd. First line of hint is just saying to use
'ALTER SUBSCRIPTION' without giving any purpose. While the second line
of hint is giving the purpose of alter-sequences.

Shall we have?
For missing sequences, use ALTER SUBSCRIPTION with either REFRESH
PUBLICATION or REFRESH PUBLICATION SEQUENCES
For mismatched sequences, alter or re-create local sequences to have
matching parameters as publishers.

Thoughts?

6)
postgres=# create publication pub1 for table tab1, all sequences;
ERROR:  syntax error at or near "all"
LINE 1: create publication pub1 for table tab1, all sequences;

We can mention in commit msg that this combination is also not
supported or what all combinations are supported. Currently it is not
clear f

thanks
Shveta

Re: split func.sgml to separated individual sgml files

2025-06-24 Thread jian he

On Thu, Mar 20, 2025 at 10:16 AM David G. Johnston
 wrote:
>
> In short, ready to commit (see last paragraph below however), but the 
> committer will need to run the python script at the time of commit on the 
> then-current tree.
>

hi.
more explanation, since the python script seems quite large...

each  in doc/src/sgml/func.sgml
corresponds to each individual section in [1].

each  within func.sgml is unique.
if you try to rename it, having two 
will error out saying something like:
../../Desktop/pg_src/src6/postgres/doc/src/sgml/postgres.sgml:199:
element sect1: validity error : ID functions-logical already defined
see [2] also.

Based on this, we can use the literal string  to
perform pattern matching and identify the line numbers that mark the start and
end of each  section.

The polished v2 python script use the following  steps for splitting func.sgml
into several pieces:

0. For each 9.X section listed in [1], create an empty SGML file to hold the
corresponding content.

1. Use the pattern  to locate the starting and ending
line number of each section in func.sgml

2. Copy func.sgml all the content block ()

  ...main content

into the newly created SGML files.

3. Remove the copied content from func.sgml.
4. In func.sgml, insert general entity references [3] to include the newly
created SGML files.

because PG18, and PG17, Chapter 9. Functions and Operators
have the same amount of section (31),

so v1-0001-split_func_sgml.py will work just fine.
but I did some minor changes, therefore v2 attached.

I used the sed --in-place option [3] to modify and truncate the original large
func.sgml file directly.
I also used the -n and -p options with sed to extract lines from func.sgml
between line X and line Y, as shown in reference [4].

for the attach file:
first run ``python3 v2-0001-split_func_sgml.py``
then run ``git apply v2-0001-update-filelist.sgml-allfiles.sgml.no-cfbot``
(`git am` won't work, need to use `git apply`).

[1] https://www.postgresql.org/docs/current/functions.html
[2] https://en.wikipedia.org/wiki/Document_type_definition
[3] 
https://www.gnu.org/software/sed/manual/html_node/Command_002dLine-Options.html#index-_002di
[4] 
https://www.gnu.org/software/sed/manual/html_node/Common-Commands.html#index-n-_0028next_002dline_0029

v2-0001-update-filelist.sgml-allfiles.sgml.no-cfbot
Description: Binary data
import subprocess
import os
import re
#https://git-scm.com/docs/git-rev-parse#Documentation/git-rev-parse.txt---show-toplevel
top_dir = subprocess.check_output(["git", "rev-parse", "--show-toplevel"])
top_dir = top_dir.decode().rstrip('\n')
top_doc_dir = top_dir + "/doc/src/sgml"
#go to doc/src/sgml directory first, exit when the directory does not exist
try:
os.chdir(top_doc_dir)
except FileNotFoundError:
print(f'{top_doc_dir} not does not exist. exit now')
quit()

func_sgml="func.sgml"
#
func_logical="func-logical.sgml"
func_comparison="func-comparison.sgml"
func_math="func-math.sgml"
func_string="func-string.sgml"
func_binarystring="func-binarystring.sgml"
func_bitstring="func-bitstring.sgml"
func_matching="func-matching.sgml"
func_formatting="func-formatting.sgml"
func_datetime="func-datetime.sgml"
func_enum="func-enum.sgml"
func_geometry="func-geometry.sgml"
func_net="func-net.sgml"
func_textsearch="func-textsearch.sgml"
func_uuid="func-uuid.sgml"
func_xml="func-xml.sgml"
func_json="func-json.sgml"
func_sequence="func-sequence.sgml"
func_conditional="func-conditional.sgml"
func_array="func-array.sgml"
func_range="func-range.sgml"
func_aggregate="func-aggregate.sgml"
func_window="func-window.sgml"
func_merge_support ="func-merge-support.sgml"
func_subquery="func-subquery.sgml"
func_comparisons="func-comparisons.sgml"
func_srf="func-srf.sgml"
func_info="func-info.sgml"
func_admin="func-admin.sgml"
func_trigger="func-trigger.sgml"
func_event_triggers="func-event-triggers.sgml"
func_statistics="func-statistics.sgml"
#
func_filenames = list()
func_filenames.append(func_logical)
func_filenames.append(func_comparison)
func_filenames.append(func_math)
func_filenames.append(func_string)
func_filenames.append(func_binarystring)
func_filenames.append(func_bitstring)
func_filenames.append(func_matching)
func_filenames.append(func_formatting)
func_filenames.append(func_datetime)
func_filenames.append(func_enum)
func_filenames.append(func_geometry)
func_filenames.append(func_net)
func_filenames.append(func_textsearch)
func_filenames.append(func_uuid)
func_filenames.append(func_xml)
func_filenames.append(func_json)
func_filenames.append(func_sequence)
func_filenames.append(func_conditional)
func_filenames.append(func_array)
func_filenames.append(func_range)
func_filenames.append(func_aggregate)
func_filenames.append(func_window)
func_filenames.append(func_merge_support )
func_filenames.append(func_subquery)
func_filenames.append(func_comparisons)
func_filenames.append(func_srf)
func_filenames.append(func_info)
func_filenames.appen

Hooks or another better way to track session objects

2025-06-24 Thread Roman Khapov

Hi all! I am working on a new type of pooling mechanism for Postgresconnection managers. In this mode, the client connection isdetached from server between transactions only if no temporaryobjects were created in that session(temporary tables, prepared statements, and advisory locks). Currently, I am exploring ways to notify the connection managerwhen any of these session-level objects are created or existing.I got next ideas:- send a status info byte in RFQ package, but that requires a protocol changes, so this option should be discarded, I suppose- create an extension, that executes elog(NOTIFY, "some message") or sends dedicated package with smth like pq_putemptymessage('O') in new hook, that is called before RFQ sending (see POC patch in attachments)- create an extension, that executes the same things as in option above, but uses more specialized hooks. For temporary tables I can use ProcessUtility_hook that already exists. For prepared statements and advisory locks I wasn't able to find a suitable hooks, so I add a hook before sending ParseComplete message and in LockAcquire end (see another POC patch in attachments) I would appreciate any feedback on the general idea, advices on a clean wayto implement such notifications, and comments on the attached POC patches.If you have ideas, experience with similar tasks, or suggestions for abetter (or more canonical) Postgres way to approach this, please let me know. Thanks! Best regards,Roman Khapov. From 9c1a6aa28306b07ec72ae770e3ed5fa980fb1b71 Mon Sep 17 00:00:00 2001
From: rkhapov 
Date: Tue, 24 Jun 2025 08:56:35 +
Subject: [PATCH] dummy hooks for parse complete and lock acquired

Signed-off-by: rkhapov 
---
 src/backend/storage/lmgr/lock.c   | 4 
 src/backend/tcop/postgres.c   | 8 
 src/backend/utils/adt/lockfuncs.c | 2 ++
 src/include/postgres.h| 4 
 src/include/storage/lock.h| 4 
 5 files changed, 22 insertions(+)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 2776ceb295b..00178d98a07 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -1265,6 +1265,10 @@ LockAcquireExtended(const LOCKTAG *locktag,
 			   locktag->locktag_field2);
 	}

+	if (LockAcquired_hook) {
+		LockAcquired_hook();
+	}
+
 	return LOCKACQUIRE_OK;
 }

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e76b625ec31..f9bf13bfa8c 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -104,6 +104,7 @@ int			client_connection_check_interval = 0;
 /* flags for non-system relation kinds to restrict use */
 int			restrict_nonsystem_relation_kind;

+ParseComplete_hook_type ParseComplete_hook = NULL;
 ReadyForQuery_hook_type ReadyForQuery_hook = NULL;

 /* 
@@ -1585,6 +1586,13 @@ exec_parse_message(const char *query_string,	/* string to execute */
 	 */
 	CommandCounterIncrement();

+	/*
+	 * Notify extensions about parse complete
+	 */
+	if (ParseComplete_hook) {
+		ParseComplete_hook();
+	}
+
 	/*
 	 * Send ParseComplete.
 	 */
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index 00e67fb46d0..d2a2235a811 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -65,6 +65,8 @@ typedef struct
 /* Number of columns in pg_locks output */
 #define NUM_LOCK_STATUS_COLUMNS		16

+LockAcquired_hook_type LockAcquired_hook = NULL;
+
 /*
  * VXIDGetDatum - Construct a text representation of a VXID
  *
diff --git a/src/include/postgres.h b/src/include/postgres.h
index c844742d977..f62d8355a8a 100644
--- a/src/include/postgres.h
+++ b/src/include/postgres.h
@@ -581,6 +581,10 @@ extern Datum Float8GetDatum(float8 X);
 #define NON_EXEC_STATIC static
 #endif

+/* Hook for plugins to handle prepared statements creation */
+typedef void (*ParseComplete_hook_type) ();
+extern PGDLLIMPORT ParseComplete_hook_type ParseComplete_hook;
+
 /* Hook for plugins to catch RFQ sending */
 typedef void (*ReadyForQuery_hook_type) ();
 extern PGDLLIMPORT ReadyForQuery_hook_type ReadyForQuery_hook;
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 4862b80eec3..a0cc3bcd698 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -627,4 +627,8 @@ extern void VirtualXactLockTableInsert(VirtualTransactionId vxid);
 extern void VirtualXactLockTableCleanup(void);
 extern bool VirtualXactLock(VirtualTransactionId vxid, bool wait);

+/* Hook for plugins to handle lock acquire */
+typedef void (*LockAcquired_hook_type) ();
+extern PGDLLIMPORT LockAcquired_hook_type LockAcquired_hook;
+
 #endif			/* LOCK_H_ */
-- 
2.43.0

From bd4f35bfa4c5709808169b847d64d350eae38804 Mon Sep 17 00:00:00 2001
From: rkhapov 
Date: Tue, 24 Jun 2025 06:17:29 +
Subject: [PATCH] dummy RFQ hook impl

Signed-off-by: rkhapov 
---
 src/backend/tcop/postgres.c | 6 ++
 src/include/postgres.h  | 4 
 2 files changed, 10 insertions(+)

diff

Re: Improve doc on parallel stream changes for Stream Abort message

2025-06-24 Thread Amit Kapila

On Mon, Jun 23, 2025 at 2:10 PM Anthonin Bonnefoy
 wrote:
>
> While adding parsing of logical replication to Wireshark[1], I've
> found the Stream Abort description possibly misleading. Stream Abort
> will have the Abort LSN and TS when parallel streaming is enabled.
> However, the documentation only mentions "This field is available
> since protocol version 4" which could be interpreted (at least, I've
> interpreted it this way) as "this field will always present with
> protocol version 4".
>
> The attached patch adds a "(only present for parallel stream)" mention
> to the Abort LSN and TS documentation, akin to what's done for
> parallel transactions.
>

How about a slightly modified version like: (a) The LSN of the abort
operation, present only when the change stream can be applied in
parallel. This field is available since protocol version 4. (b) Abort
timestamp of the transaction, present only when the change stream can
be applied in parallel. The value is in number of microseconds since
PostgreSQL epoch (2000-01-01). This field is available since protocol
version 4.

-- 
With Regards,
Amit Kapila.

94 matches

Mail list logo