Re: Conflict detection for update_deleted in logical replication

2025-05-27 Thread Amit Kapila
On Mon, May 26, 2025 at 12:46 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Sun, May 25, 2025 at 4:36 PM Dilip Kumar wrote:
>
> >
> > I am thinking can't we make it more deterministic such that when we
> > get the status first time if we find some transactions that are in
> > commit phase then we should just wait for those transaction to get
> > committed?  One idea is to get the list of xids in commit phase and
> > next time when we get the list we can just compare and in next status
> > if we don't get any xids in commit phase which were in commit phase
> > during previous status then we are done.  But not sure is this worth
> > the complexity?  Mabe not but shall we add some comment explaining the
> > case and also explaining why this corner case is not harmful?
>
> I also think it's not worth the complexity for this corner case which is
> rare.

Yeah, complexity is one part, but I feel improving such less often
cases could add performance burden for more often cases where we need
to either maintain and invalidate the cache on the publisher or send
the list of all such xids to the subscriber over the network.

> So, I have added some comments in wait_for_publisher_status() to
> mention the same.
>

I agree that at this stage it is good to note down in comments, and if
we face such cases often, then we can improve it in the future.

-- 
With Regards,
Amit Kapila.




Re: Add pg_get_injection_points() for information of injection points

2025-05-27 Thread Michael Paquier
On Sun, May 25, 2025 at 03:59:51AM +, Rustam ALLAKOV wrote:
> to match how `unicode_version` function is documented
> > Returns a string representing the version of Unicode used by ICU, if
> > the server was built with ICU support; otherwise returns

Makes sense, with the difference that the function throws an error if
Postgres is configured without --enable-injection-points.
--
Michael
From 3f85f040d1e00cb3c3802cab1b8baae0a0d59e95 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 22 Apr 2025 12:59:30 +0900
Subject: [PATCH v5 1/2] Add InjectionPointList() to retrieve list of injection
 points

This hides the internals of the shmem array lookup, allocating the
result in a palloc'd array usable by the caller.
---
 src/include/utils/injection_point.h  | 16 +
 src/backend/utils/misc/injection_point.c | 46 
 src/tools/pgindent/typedefs.list |  1 +
 3 files changed, 63 insertions(+)

diff --git a/src/include/utils/injection_point.h b/src/include/utils/injection_point.h
index a37958e1835f..fd5bc061b7bd 100644
--- a/src/include/utils/injection_point.h
+++ b/src/include/utils/injection_point.h
@@ -11,6 +11,19 @@
 #ifndef INJECTION_POINT_H
 #define INJECTION_POINT_H
 
+#include "nodes/pg_list.h"
+
+/*
+ * Injection point data, used when retrieving a list of all the attached
+ * injection points.
+ */
+typedef struct InjectionPointData
+{
+	const char *name;
+	const char *library;
+	const char *function;
+} InjectionPointData;
+
 /*
  * Injection points require --enable-injection-points.
  */
@@ -47,6 +60,9 @@ extern void InjectionPointCached(const char *name, void *arg);
 extern bool IsInjectionPointAttached(const char *name);
 extern bool InjectionPointDetach(const char *name);
 
+/* Get the current set of injection points attached */
+extern List *InjectionPointList(void);
+
 #ifdef EXEC_BACKEND
 extern PGDLLIMPORT struct InjectionPointsCtl *ActiveInjectionPoints;
 #endif
diff --git a/src/backend/utils/misc/injection_point.c b/src/backend/utils/misc/injection_point.c
index f58ebc8ee522..12570fba56e4 100644
--- a/src/backend/utils/misc/injection_point.c
+++ b/src/backend/utils/misc/injection_point.c
@@ -584,3 +584,49 @@ IsInjectionPointAttached(const char *name)
 	return false;/* silence compiler */
 #endif
 }
+
+/*
+ * Retrieve a list of all the injection points currently attached.
+ *
+ * This list is palloc'd in the current memory context.
+ */
+List *
+InjectionPointList(void)
+{
+#ifdef USE_INJECTION_POINTS
+	List	   *inj_points = NIL;
+	uint32		max_inuse;
+
+	LWLockAcquire(InjectionPointLock, LW_SHARED);
+
+	max_inuse = pg_atomic_read_u32(&ActiveInjectionPoints->max_inuse);
+
+	for (uint32 idx = 0; idx < max_inuse; idx++)
+	{
+		InjectionPointEntry *entry;
+		InjectionPointData *inj_point;
+		uint64		generation;
+
+		entry = &ActiveInjectionPoints->entries[idx];
+		generation = pg_atomic_read_u64(&entry->generation);
+
+		/* skip free slots */
+		if (generation % 2 == 0)
+			continue;
+
+		inj_point = (InjectionPointData *) palloc0(sizeof(InjectionPointData));
+		inj_point->name = pstrdup(entry->name);
+		inj_point->library = pstrdup(entry->library);
+		inj_point->function = pstrdup(entry->function);
+		inj_points = lappend(inj_points, inj_point);
+	}
+
+	LWLockRelease(InjectionPointLock);
+
+	return inj_points;
+
+#else
+	elog(ERROR, "Injection points are not supported by this build");
+	return NIL;	/* keep compiler quiet */
+#endif
+}
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a8346cda633a..4ad6fdb0d003 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1283,6 +1283,7 @@ InjectionPointCacheEntry
 InjectionPointCallback
 InjectionPointCondition
 InjectionPointConditionType
+InjectionPointData
 InjectionPointEntry
 InjectionPointsCtl
 InjectionPointSharedState
-- 
2.49.0

From 82aa223a25a7bbd19e6bc2c873d3b8d5e5398870 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 27 May 2025 16:07:08 +0900
Subject: [PATCH v5 2/2] Add pg_get_injection_points()

This is a system function that displays the information about the
injection points currently attached to the system, feeding from the
states of things in shared memory.
---
 src/include/catalog/pg_proc.dat   |  8 +++
 src/backend/utils/misc/Makefile   |  1 +
 .../utils/misc/injection_point_funcs.c| 60 +++
 src/backend/utils/misc/meson.build|  1 +
 .../expected/injection_points.out | 16 +
 .../injection_points/sql/injection_points.sql |  7 +++
 doc/src/sgml/func.sgml| 48 +++
 7 files changed, 141 insertions(+)
 create mode 100644 src/backend/utils/misc/injection_point_funcs.c

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 37a484147a8f..eb9c6cd16263 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12556,4 +12556,12 @@
   proargnames =

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-27 Thread Alexander Korotkov
On Tue, May 27, 2025 at 7:08 AM Amit Kapila  wrote:
> On Mon, May 26, 2025 at 10:36 PM Alexander Korotkov
>  wrote:
> >
> > On Mon, May 26, 2025 at 2:43 PM Amit Kapila  wrote:
> > >
> > > On Mon, May 26, 2025 at 3:52 PM Vitaly Davydov  
> > > wrote:
> >
> > > OTOH, if we don't want to adjust physical
> > > slot machinery, it seems saving the logical slots to disk immediately
> > > when its restart_lsn is updated is a waste of effort after your patch,
> > > no? If so, why are we okay with that?
> >
> > I don't think so.  I think the reason why logical slots are synced to
> > disk immediately after update is that logical changes are not
> > idempotent (you can't safely apply the same change twice) unlike
> > physical block-level changes.  This is why logical slots need to be
> > synced to prevent double replication of same changes, which could
> > lead, for example, to double insertion.
> >
>
> Hmm, if this has to be true, then even in the else branch of
> LogicalConfirmReceivedLocation [1], we should have saved the slot.
> AFAIU, whether the logical changes are sent to the client is decided
> based on two things: (a) the replication origins, which tracks
> replication progress and are maintained by clients (which for built-in
> replication are subscriber nodes), see [2]; and (b) confirmed_flush
> LSN maintained in the slot by the server. Now, for each ack by the
> client after applying/processing changes, we update the
> confirmed_flush LSN of the slot but don't immediately flush it. This
> shouldn't let us send the changes again because even if the system
> crashes and restarts, the client will send the server the location to
> start sending the changes from based on its origin tracking. There is
> more to it, like there are cases when confirm_flush LSN in the slot
> could be ahead the origin's LSN, and we handle all such cases, but I
> don't think those are directly related here, so I am skipping those
> details for now.
>
> Note that LogicalConfirmReceivedLocation won't save the slot to disk
> if it updates only confirmed_flush LSN, which is used to decide
> whether to send the changes.

You're right, I didn't study these aspects careful enough.

> > LogicalConfirmReceivedLocation() implements immediate sync for
> > different reasons.
> >
>
> I may be missing something, but let's discuss some more before we conclude 
> this.

So, yes probably LogicalConfirmReceivedLocation() tries to care about
keeping all WAL segments after the synchronized value of restart_lsn.
But it just doesn't care about concurrent
ReplicationSlotsComputeRequiredLSN().  In order to fix that logic, we
need effective_restart_lsn field by analogy to effective_catalog_xmin
(similar approach was discussed in this thread before).  But that
would require ABI compatibility breakage.

So, I'd like to propose following: backpatch 0001 and 0002, but
implement effective_restart_lsn field for pg19.  What do you think?

--
Regards,
Alexander Korotkov
Supabase




Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-27 Thread Alexander Korotkov
On Tue, May 27, 2025 at 12:12 PM Alexander Korotkov
 wrote:
>
> On Tue, May 27, 2025 at 7:08 AM Amit Kapila  wrote:
> > On Mon, May 26, 2025 at 10:36 PM Alexander Korotkov
> >  wrote:
> > >
> > > On Mon, May 26, 2025 at 2:43 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Mon, May 26, 2025 at 3:52 PM Vitaly Davydov 
> > > >  wrote:
> > >
> > > > OTOH, if we don't want to adjust physical
> > > > slot machinery, it seems saving the logical slots to disk immediately
> > > > when its restart_lsn is updated is a waste of effort after your patch,
> > > > no? If so, why are we okay with that?
> > >
> > > I don't think so.  I think the reason why logical slots are synced to
> > > disk immediately after update is that logical changes are not
> > > idempotent (you can't safely apply the same change twice) unlike
> > > physical block-level changes.  This is why logical slots need to be
> > > synced to prevent double replication of same changes, which could
> > > lead, for example, to double insertion.
> > >
> >
> > Hmm, if this has to be true, then even in the else branch of
> > LogicalConfirmReceivedLocation [1], we should have saved the slot.
> > AFAIU, whether the logical changes are sent to the client is decided
> > based on two things: (a) the replication origins, which tracks
> > replication progress and are maintained by clients (which for built-in
> > replication are subscriber nodes), see [2]; and (b) confirmed_flush
> > LSN maintained in the slot by the server. Now, for each ack by the
> > client after applying/processing changes, we update the
> > confirmed_flush LSN of the slot but don't immediately flush it. This
> > shouldn't let us send the changes again because even if the system
> > crashes and restarts, the client will send the server the location to
> > start sending the changes from based on its origin tracking. There is
> > more to it, like there are cases when confirm_flush LSN in the slot
> > could be ahead the origin's LSN, and we handle all such cases, but I
> > don't think those are directly related here, so I am skipping those
> > details for now.
> >
> > Note that LogicalConfirmReceivedLocation won't save the slot to disk
> > if it updates only confirmed_flush LSN, which is used to decide
> > whether to send the changes.
>
> You're right, I didn't study these aspects careful enough.
>
> > > LogicalConfirmReceivedLocation() implements immediate sync for
> > > different reasons.
> > >
> >
> > I may be missing something, but let's discuss some more before we conclude 
> > this.
>
> So, yes probably LogicalConfirmReceivedLocation() tries to care about
> keeping all WAL segments after the synchronized value of restart_lsn.
> But it just doesn't care about concurrent
> ReplicationSlotsComputeRequiredLSN().  In order to fix that logic, we
> need effective_restart_lsn field by analogy to effective_catalog_xmin
> (similar approach was discussed in this thread before).  But that
> would require ABI compatibility breakage.
>
> So, I'd like to propose following: backpatch 0001 and 0002, but
> implement effective_restart_lsn field for pg19.  What do you think?

Possibly we could implement effective_restart_lsn even for pg18.  As I
know, keeping ABI compatibility is not required for beta.

--
Regards,
Alexander Korotkov
Supabase




Suggestion : support for environment variable in initdb to set the superuser password

2025-05-27 Thread Reda Agaoua
Hi,

The initdb command currently supports prompting for the superuser password
using --pwprompt or reading it from a file using --pwfile. I’m wondering if
it might also make sense to allow specifying the password via an
environment variable.

I do believe it can be useful in a variety of settings, but I'm not sure
whether this is secure. Specifically, the documentation advises against
using PGPASSWORD for connecting to postgres :

"Use of this environment variable is not recommended for security reasons,
as some operating systems allow non-root users to see process environment
variables via ps; instead consider using a password file (see Section
32.16)." (32.15. Environment Variables)

In my opinion, the context for using PGPASSWORD (i.e. connecting to an
instance) is very different from that of initdb, where the password is only
used once during cluster initialization. So I think the security concerns
from section 32.16 may not necessarily apply here.

I'm looking to contribute to postgres and I'm pretty sure this would be a
good opportunity for getting into the code, but I'm not sure if this idea
is relevant and I’d appreciate your thoughts on whether it's worth pursuing.

Best regards,
Reda AGAOUA


Support specifying compression level in wal_compression

2025-05-27 Thread Nikolay Samokhvalov
I thought it makes sense to extend wal_compression to support compression
levels.

The patch replaces the simple enum-based setting with string-based
'method[:level]' syntax, similar to the --compress option in pg_dump.

What's inside:
- Unified GUC design: wal_compression = 'method[:level]'
- Supported formats: 'off', 'on', 'pglz', 'lz4[:level]', 'zstd[:level]'
- Algorithm-specific defaults: LZ4 defaults to level 1, ZSTD to level 3
when no level is specified.
- Parameter validation is performed at SET time.
- The string is parsed only once during GUC assignment.
- Backward compatibility for common settings: While the GUC type changes
from enum to string, common previous boolean-like string values (e.g.,
'off', 'true', '0') are handled, mapping to 'pglz' or 'off' correspondingly.
- Includes docs change proposal, as well extensive new regression and TAP
tests.

Additionally, it adds LZ4HC support -- for LZ4, if levels 10-12 are
specified, then, when available (checked at build time + runtime), it's
supposed to use LZ4HC for higher compression ratio. This part needs
additional testing.


Originally, I considered adding wal_compression_level but eventually
decided to exten wal_compression, because of two reasons:
1. with a separate param, a question of defaults needs to be solved, and I
didn't find an elegant solution;
2. "method:level" syntax is already used in another place – pg_dump

An early version of this patch was reviewed by Andrey Borodin off-list, and
it was extremely helpful.

looking forward to seeing feedback

Nik


001_wal_compression_with_levels_and_lz4hc.patch
Description: Binary data


Re: Init connection time grows quadratically

2025-05-27 Thread Потапов Александр

Sorry, I forgot to add the table and graph for point #8
The graph is attached. 
This is the table:
--
| Number of clients  |  Average init time, ms |
--
|            1024             |        ~435 +-20               |
|            2048            |        ~1062 +-20              |
|            4096            |        ~3284 +-40             |
|            8192             |        ~11617 +-120           |
|            16384           |        ~43391 +-230          |
--
 
Best regards,
Alexander Potapov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Conflict detection for update_deleted in logical replication

2025-05-27 Thread shveta malik
On Mon, May 26, 2025 at 12:46 PM Zhijie Hou (Fujitsu)
 wrote:
>
> Attaching the V32 patch set which addressed comments in [1]~[5].

Thanks for the patch, I am still reviewing the patches, please find
few trivial comments for patch001:

1)

+ FullTransactionId last_phase_at; /* publisher transaction ID that must
+ * be awaited to complete before
+ * entering the final phase
+ * (RCI_WAIT_FOR_LOCAL_FLUSH) */

'last_phase_at' seems like we are talking about the phase in the past.
(similar to 'last' in last_recv_time).
Perhaps we should name it as 'final_phase_at'


2)
RetainConflictInfoData data = {0};

We can change this name as well to rci_data.

3)
+ /*
+ * Compute FullTransactionId for the oldest running transaction ID. This
+ * handles the case where transaction ID wraparound has occurred.
+ */
+ full_xid = FullTransactionIdFromAllowableAt(next_full_xid,
oldest_running_xid);

Shall we name it to full_oldest_xid for better clarity?


4)
+ /*
+ * Update and check the remote flush position if we are applying changes
+ * in a loop. This is done at most once per WalWriterDelay to avoid
+ * performing costy operations in get_flush_position() too frequently
+ * during change application.
+ */
+ if (last_flushpos < rci_data->remote_lsn && rci_data->last_recv_time &&
+ TimestampDifferenceExceeds(rci_data->flushpos_update_time,
+rci_data->last_recv_time, WalWriterDelay))

a) costy --> costly

thanks
Shveta




Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-05-27 Thread Nikhil Kumar Veldanda
On Wed, May 7, 2025 at 5:38 PM Michael Paquier  wrote:
> Yes, I was wondering if this is not the most natural approach in terms
> of structure once if we plug an extra byte into the varlena header if
> all the bits of va_extinfo for the compression information are used.
> Having all the bits may not mean that this necessarily means that the
> information would be cmp_data all the time, just that this a natural
> option when plugging in a new compression method in the new byte
> available.
>

Thanks for reviewing and providing feedback on the patch. Regarding
questions about varatt_external—specifically, storing compression
methods in one byte for extended compression methods for external
ondisk datum here’s the proposal for varatt_external. We check for
compression methods for external ondisk datum in 3 trivial places in
core, my previous proposal just mark 0b11 in the top bits of
va_extinfo and fetch externally stored chunks and form varattrib_4b to
find the compression method id for extended compression methods.
However, I understand why embedding the method byte directly is
clearer.

```
typedef struct varatt_external
{
int32   va_rawsize; /* Original data size (includes header) */
uint32  va_extinfo; /* External size (without header) and
 * compression method */
Oid va_valueid; /* Unique ID within TOAST table */
Oid va_toastrelid;  /* OID of TOAST table containing it */
/*  optional trailer  */
union
{
struct  /* compression-method trailer */
{
uint8   va_ecinfo; /* Extended-compression-method info */
} cmp;
} extended;   /* “extended” = optional byte */
} varatt_external;
```

I'm proposing not to store algorithm metadata exclusively at
varatt_external level because storing metadata within varatt_external
is not always appropriate because in scenarios where datum initially
qualifies for out-of-line storage but becomes sufficiently small in
size after compression—specifically under the 2KB threshold(extended
storage type)—it no longer meets the criteria for external storage.
Consequently, it cannot utilize a TOAST pointer and must instead be
stored in-line.
Given this behavior, it is more robust to store metadata at the
varattrib_4b level. This ensures that metadata remains accessible
regardless of whether the datum ends up stored in-line or externally.
Moreover, during detoasting it first fetches the external data,
reconstructs it into varattrib_4b, then decompresses—so keeping
metadata in varattrib_4b matches that flow.

This is the layout for extra 1 byte in both varatt_external and varattrib_4b.
```
bit 7   6   5   4   3   2   1   0
+---+---+---+---+---+---+---+---+
|  cmid − 2|  F|
+---+---+---+---+---+---+---+---+

• Bits 7–1 (cmid − 2)
  – 7-bit field holding compression IDs: raw ∈ [0…127] ⇒ cmid = raw +
2 ([2…129])
• Bit 0 (F)
  – flag indicating whether the algorithm expects metadata
```

Introduced metadata flag in the 1-byte layout, To prevent zstd from
exposing dict or nodict types for ToastCompressionId. This metadata
flag indicates whether the algorithm expects any metadata or not. For
the ZSTD scenario, if the flag is set, it expects a dictid; otherwise,
no dictid is present.

```
typedef enum ToastCompressionId
{
TOAST_PGLZ_COMPRESSION_ID = 0,
TOAST_LZ4_COMPRESSION_ID  = 1,
TOAST_ZSTD_COMPRESSION_ID = 2,
TOAST_INVALID_COMPRESSION_ID  = 3,
} ToastCompressionId;

// varattrib_4b remains unchanged from the previous proposal
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char   va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;

struct /* Compressed in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size and method; see va_extinfo */
char   va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed;

struct /* Extended compressed in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size and method; see va_extinfo */
uint8  va_ecinfo;
char   va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;
```

During compression, compression methods (zstd_compress_datum) will
determine whether to use metadata(dictionary) or not based on
CompressionInfo.meta.



Per-column ZSTD compression levels:
Since ZSTD supports compression levels (default = 3, up to
ZSTD_maxCLevel()—currently 22—and negative “fast” levels), I’m
proposing an option for users to choose their preferred level on a
per-column basis via pg_attribute.attoptions. If unset, we’ll use
ZSTD’s default:

```
typedef struct AttributeOpts
{
int32   vl_len_;   /* varlena header (do not touch!) */
float8  n_distinct;
float8  n_distinct_inherited;
int zstd_level;/* user-specified ZSTD level */
} AttributeOpts;

ALTER TAB

Re: Init connection time grows quadratically

2025-05-27 Thread Matthias van de Meent
On Tue, 27 May 2025 at 12:45, Потапов Александр
 wrote:
>
> Hello!
>
> I ran some experiments with pgbench to measure the initialization time and 
> found that the time increases quadratically with the number of clients. It 
> was surprising to me and I would like to understand a reason of such behavior.
>
> Some details on how it was done:
>
[...]
>
> It turned out that the results correspond to a quadratic dependence like y ~ 
> 0.0002x^2 where x is a number of clients and y is init time (ms).
> Here there is a question: is it expected behavior or a bug? What do you 
> think? I appreciate any comments and opinions.

Note that the value of "initial connection time" is based on the time
it takes from about the start of the pg_bench process until the moment
all N expected connections have been established, *not* the average
time it took pg_bench to connect to PostgreSQL. This does also not
exclude other known measurable delays (like spawning threads,
synchronization, etc), so the actual per-connection connection time is
probably closer to O(n) than O(n^2).

Q: Did you check that pgbench or the OS does not have
O(n_active_connections) or O(n_active_threads) overhead per worker
during thread creation or connection establishment, e.g. by varying
the number of threads used to manage these N clients? I wouldn't be
surprised if there are inefficiencies in e.g. the threading- or
synchronization model that cause O(N) per-thread overhead, or O(N^2)
overall when you have one thread per connection.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




All supported PostgreSQL 17 extensions list

2025-05-27 Thread Zaid Shabbir
Hello Hackers,

I’m looking for a complete list of PostgreSQL 17 extensions — both
open-source and proprietary. I found a link
,
but it doesn’t seem to include all available extensions.

Is there an official or community-maintained source where I can find a
comprehensive list of supported extensions?


Thanks & Regards,

Zaid Shabbir


Re: Non-reproducible AIO failure

2025-05-27 Thread Thomas Munro
On Mon, May 26, 2025 at 12:05 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > Could you guys please share your exact repro steps?
>
> I've just been running 027_stream_regress.pl over and over.
> It's not a recommendable answer though because the failure
> probability is tiny, under 1%.  It sounded like Alexander
> had a better way.

Could you please share your configure options?

While flailing around in the dark and contemplating sources of
nondeterminism that might come from a small system under a lot of load
(as hinted at by Alexander's mention of running the test in parallel)
with a 1MB buffer pool (as used by 027_stream_read.pl via Cluster.pm's
settings for replication tests), I thought about partial reads:

--- a/src/backend/storage/aio/aio_io.c
+++ b/src/backend/storage/aio/aio_io.c
@@ -128,6 +128,8 @@ pgaio_io_perform_synchronously(PgAioHandle *ioh)
result = pg_preadv(ioh->op_data.read.fd, iov,

ioh->op_data.read.iov_length,

ioh->op_data.read.offset);
+   if (result > BLCKSZ && rand() < RAND_MAX / 2)
+   result = BLCKSZ;

... and the fallback path for io_method=worker that runs IOs
synchronous when the submission queue overflows because the I/O
workers aren't keeping up:

--- a/src/backend/storage/aio/method_worker.c
+++ b/src/backend/storage/aio/method_worker.c
@@ -253,7 +253,7 @@ pgaio_worker_submit_internal(int nios, PgAioHandle *ios[])
for (int i = 0; i < nios; ++i)
{
Assert(!pgaio_worker_needs_synchronous_execution(ios[i]));
-   if (!pgaio_worker_submission_queue_insert(ios[i]))
+   if (rand() < RAND_MAX / 2 ||
!pgaio_worker_submission_queue_insert(ios[i]))
{
/*
 * We'll do it synchronously, but only after
we've sent as many as

... but still no dice here...




Init connection time grows quadratically

2025-05-27 Thread Потапов Александр

Hello!
 
I ran some experiments with pgbench to measure the initialization time and 
found that the time increases quadratically with the number of clients. It was 
surprising to me and I would like to understand a reason of such behavior.
 
Some details on how it was done:
 
1) I used the branch REL_16_STABLE (commit 2caa85f4).
 
2) The default system configuration was modified (CPU speed control, memory 
control, network, ram disk). Briefly:
    sudo cpupower frequency-set -g performance
    sudo cpupower idle-set -D0 
    sudo swapoff -a 
    sudo sh -c 'echo 16384  >/proc/sys/net/core/somaxconn'
sudo sh -c 'echo 16384  >/proc/sys/net/core/netdev_max_backlog'
sudo sh -c ‘echo 16384  >/proc/sys/net/ipv4/tcp_max_syn_backlog’ 
numactl --membind=0 bash
sudo mount -t tmpfs -o rw,size=512G tmpfs /mnt/ramdisk
    exit
Hyperthreading and cpu boost were disabled:
    echo 0 | sudo tee /sys/devices/system/cpu/cpufreq/boost
Please note: When testing on a fast multi-core server with a large number of 
clients, when the speed of creation of new connections becomes very high, even 
with such kernel parameters an error may occur:
    pgbench:pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5114" 
failed: Resource temporarily unavailable

In such case you need to apply the patch 
0001-Fix-fast-connection-rate-issue.patch (attached).
 
3) The server was configured as:
    ./configure --enable-debug --with-perl --with-icu --enable-depend 
--enable-tap-tests

4) Build and install on ramdrive:
    make -j$(nproc) -s && make install

5) DB initialization:
    /mnt/ramdisk/bin/initdb -k -D /mnt/ramdisk/data -U postgres

Add to the postgresql.conf:
    huge_pages = off #for the sake of test stability and reproducibility
    shared_buffers = 1GB
    max_connections = 16384

6) Start command:
    a) Start server (e.g on the first numa socket)
        /mnt/ramdisk/bin/pg_ctl -w -D /mnt/ramdisk/data start

    b) create test database and stop the server
        /mnt/ramdisk/bin/psql -U postgres -c 'create database bench'
        /mnt/ramdisk/bin/pg_ctl -w -D /mnt/ramdisk/data stop
7) pgbench commands:
Perform the single test sequence (I've got a dual socket server, so the server 
was running on the first socket while the clients were running on the second 
one):

    export PATH=/mnt/ramdisk/bin:$PATH
    export NUMACTL_CLIENT="--physcpubind=96-191 --membind=1"
    export NUMACTL_SERVER="--physcpubind=0-95 --membind=0"
    export CLIENTS=1024

    numactl $NUMACTL_SERVER pg_ctl -w -D /mnt/ramdisk/data start
    numactl $NUMACTL_CLIENT pgbench -U postgres -i -s100 bench
    numactl $NUMACTL_CLIENT psql -U postgres -d bench -c "checkpoint"
    numactl $NUMACTL_CLIENT pgbench -U postgres -c$CLIENTS -j$CLIENTS -t100 -S 
bench
    numactl $NUMACTL_SERVER pg_ctl -m smart -w -D /mnt/ramdisk/data stop
8) Measurements & Results
Before the measurements I rebooted host machine and configured the host as 
described above. After that I ran a script that did 30 measurements of init 
connection time per a given number of clients, average time and standard 
deviation were also calculated.
The measurements results are presented as graph an in table form:
 
Number of clientsAverage init time, ms1024~435 +-202048~1062 +-204096~3284 
+-408192~11617 +-12016384~43391 +-230

 
 
 
 
 
 
 
9) The Question
It turned out that the results correspond to a quadratic dependence like y ~ 
0.0002x^2 where x is a number of clients and y is init time (ms).
Here there is a question: is it expected behavior or a bug? What do you think? 
I appreciate any comments and opinions.

--
Best regards,
Alexander Potapov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
 


0001-Fix-fast-connection-rate-issue-for-v16.patch
Description: Binary data


Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-27 Thread Amit Kapila
On Tue, May 27, 2025 at 2:48 PM Alexander Korotkov  wrote:
>
> On Tue, May 27, 2025 at 12:12 PM Alexander Korotkov
>  wrote:
> >
> > On Tue, May 27, 2025 at 7:08 AM Amit Kapila  wrote:
> > > On Mon, May 26, 2025 at 10:36 PM Alexander Korotkov
> > >  wrote:
> > > >
> > > > On Mon, May 26, 2025 at 2:43 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Mon, May 26, 2025 at 3:52 PM Vitaly Davydov 
> > > > >  wrote:
> > > >
> > > > > OTOH, if we don't want to adjust physical
> > > > > slot machinery, it seems saving the logical slots to disk immediately
> > > > > when its restart_lsn is updated is a waste of effort after your patch,
> > > > > no? If so, why are we okay with that?
> > > >
> > > > I don't think so.  I think the reason why logical slots are synced to
> > > > disk immediately after update is that logical changes are not
> > > > idempotent (you can't safely apply the same change twice) unlike
> > > > physical block-level changes.  This is why logical slots need to be
> > > > synced to prevent double replication of same changes, which could
> > > > lead, for example, to double insertion.
> > > >
> > >
> > > Hmm, if this has to be true, then even in the else branch of
> > > LogicalConfirmReceivedLocation [1], we should have saved the slot.
> > > AFAIU, whether the logical changes are sent to the client is decided
> > > based on two things: (a) the replication origins, which tracks
> > > replication progress and are maintained by clients (which for built-in
> > > replication are subscriber nodes), see [2]; and (b) confirmed_flush
> > > LSN maintained in the slot by the server. Now, for each ack by the
> > > client after applying/processing changes, we update the
> > > confirmed_flush LSN of the slot but don't immediately flush it. This
> > > shouldn't let us send the changes again because even if the system
> > > crashes and restarts, the client will send the server the location to
> > > start sending the changes from based on its origin tracking. There is
> > > more to it, like there are cases when confirm_flush LSN in the slot
> > > could be ahead the origin's LSN, and we handle all such cases, but I
> > > don't think those are directly related here, so I am skipping those
> > > details for now.
> > >
> > > Note that LogicalConfirmReceivedLocation won't save the slot to disk
> > > if it updates only confirmed_flush LSN, which is used to decide
> > > whether to send the changes.
> >
> > You're right, I didn't study these aspects careful enough.
> >
> > > > LogicalConfirmReceivedLocation() implements immediate sync for
> > > > different reasons.
> > > >
> > >
> > > I may be missing something, but let's discuss some more before we 
> > > conclude this.
> >
> > So, yes probably LogicalConfirmReceivedLocation() tries to care about
> > keeping all WAL segments after the synchronized value of restart_lsn.
> > But it just doesn't care about concurrent
> > ReplicationSlotsComputeRequiredLSN().  In order to fix that logic, we
> > need effective_restart_lsn field by analogy to effective_catalog_xmin
> > (similar approach was discussed in this thread before).  But that
> > would require ABI compatibility breakage.
> >
> > So, I'd like to propose following: backpatch 0001 and 0002, but
> > implement effective_restart_lsn field for pg19.  What do you think?
>
> Possibly we could implement effective_restart_lsn even for pg18.  As I
> know, keeping ABI compatibility is not required for beta.
>

Yeah, we should be able to change ABI during beta, but I can't comment
on the idea of effective_restart_lsn without seeing the patch or a
detailed explanation of this idea.

Now, you see my point related to restart_lsn computation for logical
slots, it is better to also do some analysis of the problem related to
xmin I have highlighted in one of my previous emails [1]. I see your
response to it, but I feel someone needs to give it a try by writing a
test and see the behavior. I am saying because logical slots took
precaution of flushing to disk before updating shared values of xmin
for a reason, whereas similar precautions are not taken for physical
slots, so there could be a problem with that computation as well.

[1]: 
https://www.postgresql.org/message-id/CAA4eK1KMaPA5jir_SFu%2Bqr3qu55OOdFWVZpuUkqTSGZ9fyPpHA%40mail.gmail.com
-- 
With Regards,
Amit Kapila.




Re: Non-reproducible AIO failure

2025-05-27 Thread Tomas Vondra



On 5/24/25 23:00, Alexander Lakhin wrote:
> ...
>
> I'm yet to see the Assert triggered on the buildfarm, but this one looks
> interesting too.
> 
> (I can share the complete patch + script for such testing, if it can be
> helpful.)
> 

I'm interested in how you run these tests in parallel. Can you share the
patch/script?

thank

-- 
Tomas Vondra





Re: Conflict detection for update_deleted in logical replication

2025-05-27 Thread Amit Kapila
On Tue, May 27, 2025 at 3:59 PM shveta malik  wrote:
>
> On Mon, May 26, 2025 at 12:46 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > Attaching the V32 patch set which addressed comments in [1]~[5].
>
> Thanks for the patch, I am still reviewing the patches, please find
> few trivial comments for patch001:
>
> 1)
>
> + FullTransactionId last_phase_at; /* publisher transaction ID that must
> + * be awaited to complete before
> + * entering the final phase
> + * (RCI_WAIT_FOR_LOCAL_FLUSH) */
>
> 'last_phase_at' seems like we are talking about the phase in the past.
> (similar to 'last' in last_recv_time).
> Perhaps we should name it as 'final_phase_at'
>

I am not sure the phase in this variable name matches with what it is
used for. The other option could be remote_wait_for or something on
those lines.

Additionally, please find a few cosmetic changes atop 0001 and 0002
patches. Please include in next set, if those looks okay to you.

-- 
With Regards,
Amit Kapila.
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 740ec89e070..fe558f0a81c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4961,8 +4961,8 @@ ANY num_sync 
( 
   The name of the slot to create. Must be a valid replication slot
   name (see ).
-  The name cannot be pg_conflict_detection, as it
-  is reserved for logical replication conflict detection.
+  The name cannot be pg_conflict_detection as it
+  is reserved for the conflict detection.
  
 

diff --git a/doc/src/sgml/ref/create_subscription.sgml 
b/doc/src/sgml/ref/create_subscription.sgml
index 739161df715..dba0f541ac5 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -170,8 +170,8 @@ CREATE SUBSCRIPTION subscription_name
   Name of the publisher's replication slot to use.  The default is
   to use the name of the subscription for the slot name. The name 
cannot
-  be pg_conflict_detection, as it is reserved for
-  logical replication conflict detection.
+  be pg_conflict_detection as it is reserved for the
+  conflict detection.
  
 
  
diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index 6a59db47583..17963486795 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -3862,8 +3862,8 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
wait_time = NAPTIME_PER_CYCLE;
 
/*
-* Ensure to wake up when it's possible to attempt to advance 
the
-* non-removable transaction ID.
+* Ensure to wake up when it's possible to advance the 
non-removable
+* transaction ID.
 */
if (data.phase == RCI_GET_CANDIDATE_XID && 
data.xid_advance_interval)
wait_time = Min(wait_time, data.xid_advance_interval);
@@ -4103,11 +4103,13 @@ send_feedback(XLogRecPtr recvpos, bool force, bool 
requestReply)
  * WALs that are being replicated from the primary and those WALs could have
  * earlier commit timestamp.
  *
- * XXX It might seem feasible to track the latest commit timestamp on the
- * publisher and send the WAL position once the timestamp exceeds that on the
- * subscriber. However, commit timestamps can regress since a commit with a
- * later LSN is not guaranteed to have a later timestamp than those with
- * earlier LSNs.
+ * XXX It seems feasible to get the latest commit's WAL location from the
+ * publisher and wait till that is applied. However, we can't do that
+ * because commit timestamps can regress as a commit with a later LSN is not
+ * guaranteed to have a later timestamp than those with earlier LSNs. Having
+ * said that, even if that is possible, it won't improve performance much as
+ * the apply always lag and moves slowly as compared with the transactions
+ * on the publisher.
  */
 static void
 maybe_advance_nonremovable_xid(RetainConflictInfoData *rci_data,
@@ -4211,6 +4213,10 @@ get_candidate_xid(RetainConflictInfoData *rci_data)
 */
full_xid = FullTransactionIdFromAllowableAt(next_full_xid, 
oldest_running_xid);
 
+   /*
+* Oldest active transaction ID (full_xid) can't be behind any of its
+* previously computed value.
+*/

Assert(FullTransactionIdPrecedesOrEquals(MyLogicalRepWorker->oldest_nonremovable_xid,

 full_xid));
 
@@ -4294,12 +4300,12 @@ wait_for_publisher_status(RetainConflictInfoData 
*rci_data,
 *
 * It's possible that transactions in the commit phase during the last
 * cycle have now finished committing, but remote_oldestxid remains 
older
-* than last_phase_at. This can happen if some old transaction was in 
the
+* than last_

Re: Non-reproducible AIO failure

2025-05-27 Thread Andres Freund
Hi,

On 2025-05-25 20:05:49 -0400, Tom Lane wrote:
> Thomas Munro  writes:
> > Could you guys please share your exact repro steps?
> 
> I've just been running 027_stream_regress.pl over and over.
> It's not a recommendable answer though because the failure
> probability is tiny, under 1%.  It sounded like Alexander
> had a better way.

Just FYI, I've been trying to reproduce this as well, without a single failure
so far. Despite running all tests for a few hundred times (~2 days) and
027_stream_regress.pl many hundreds of times (~1 day).

This is on a m4 mac mini.  I'm wondering if there's some hardware specific
memory ordering issue or disk speed based timing issue that I'm just not
hitting.

Greetings,

Andres Freund




RE: Review/Pull Request: Adding new CRC32C implementation for IBM S390X

2025-05-27 Thread Eduard Stefes
Hi,

So I worked on the algorithm to also work on buffers between 16-64
bytes. Then I ran the performance measurement on two
dataset[^raw_data_1] [^raw_data_2]. And created two diagrams
[^attachment]. 

my findings so far:

- the optimized crc32cvx is faster
- the sb8 performance is heavily depending on alignment (see the
ripples every 8 bytes)
- the 8 byte ripple is also visible in the vx implementation. As it can
only perform on 16 or 64 byte chunks, it will still use sb8 for the
remaining bytes. 
- there is no obvious speed regression in the vx algorithm. Except
raw_data_2-28 which I assume is a fluke. I am sharing the system with a
bunch of other devs.


I hope this this is acceptable as performance measurement. However we
will setup a dedicated performance test and try to get precise numbers
without side-effects. But it may take some time until we get to that.

I'll post the update on the Code together with the other requested
updates. 

cheers, Eddy



[^raw_data_1]
bytes   crc32c_sb8  crc32c_vx
4   6.54 ms 6.548 ms
8   4.476 ms4.47 ms
10  7.346 ms7.348 ms
12  10.955 ms   10.958 ms
14  14.548 ms   14.546 ms
16  6.837 ms6.193 ms
32  12.23 ms6.741 ms
64  22.826 ms   7.6 ms
80  28.536 ms   8.307 ms
96  34.426 ms   9.09 ms
112 40.295 ms   9.844 ms
128 46.053 ms   10.825 ms
144 51.868 ms   11.712 ms
160 65.91 ms12.122 ms
176 71.649 ms   13.055 ms
192 77.465 ms   11.716 ms
208 83.286 ms   13.532 ms
224 88.991 ms   13.165 ms
240 94.875 ms   13.881 ms
256 100.653 ms  13.147 ms
81922967.477 ms 182.911 ms

[^raw_data_2]
bytes   crc32c_sb8  crc32c_vx
4   6.543 ms6.536 ms
8   4.476 ms4.47 ms
10  7.35 ms 7.345 ms
12  10.96 ms10.954 ms
14  14.552 ms   14.588 ms
16  6.843 ms6.189 ms
18  10.253 ms   9.814 ms
24  9.645 ms9.924 ms
28  15.957 ms   17.211 ms
32  12.226 ms   6.726 ms
36  18.823 ms   14.484 ms
42  17.855 ms   14.271 ms
48  17.342 ms   7.344 ms
52  24.208 ms   15.306 ms
58  23.525 ms   14.695 ms
64  22.818 ms   7.593 ms



On Thu, 2025-05-08 at 05:32 +0700, John Naylor wrote:
> On Wed, May 7, 2025 at 8:15 PM Aleksander Alekseev
>  wrote:
> > 
> > I didn't review the patch but wanted to point out that when it
> > comes
> > to performance improvements it's typically useful to provide some
> > benchmarks.
> 
> +1 -- It's good to have concrete numbers for the commit message, and
> also to verify improvement on short inputs. There is a test harness
> in
> the  v7-0002 patch from here:
> 
> https://www.postgresql.org/message-id/canwcazad5niydbf6q3v_cjapnv05cw-lpxxftmbwdplsz-p...@mail.gmail.com
>  
>  
> 
> After building, run the "test-crc.sh" script here after executing
> "CREATE EXTENSION test_crc32c;":
> 
> https://www.postgresql.org/message-id/CANWCAZahvhE-%2BhtZiUyzPiS5e45ukx5877mD-dHr-KSX6LcdjQ%40mail.gmail.com
>  
>  
> 
> 
> 
> --
> John Naylor
> Amazon Web Services


Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

2025-05-27 Thread Yugo Nagata
On Tue, 27 May 2025 08:33:42 +0200
Jim Jones  wrote:

> Hi Yugo
> 
> On 26.05.25 18:39, Yugo Nagata wrote:
> > I can see the error when two concurrent transactions issue
> > "alter function f() immutable".
> 
> 
> I might have missed something in my last tests... I could now reproduce
> the behaviour you mentioned.
> 
> I've tested v2 and it works as described. CREATE OR REPLACE FUNCTION and
> ALTER TABLE no longer raise an error after the lock by the concurrent
> transaction was freed.
> 
> One quick question in v2-002:
> 
>  tup = SearchSysCacheCopy1(PROCOID, ObjectIdGetDatum(funcOid));
> -    if (!HeapTupleIsValid(tup)) /* should not happen */
> -        elog(ERROR, "cache lookup failed for function %u", funcOid);
> +    if (!HeapTupleIsValid(tup))
> +        ereport(ERROR,
> +                errcode(ERRCODE_UNDEFINED_OBJECT),
> +                errmsg("function \"%s\" does not exist",
> +                       NameListToString(stmt->func->objname)));
> 
> 
> Is it really ok to change this error message here? Did the addition of
> LockDatabaseObject change the semantics of the previous message? 

Yes. AcceptInvalidationMessages() is called in LockDatabaseObject() after wait,
and this enables the detection of object deletion during the wait.

Regards,
Yugo Nagata

-- 
Yugo Nagata 




Re: All supported PostgreSQL 17 extensions list

2025-05-27 Thread Chapman Flack
On 05/27/25 09:29, Laurenz Albe wrote:
> There is no "supported".  Each extension has to support itself.
> An exception are the "contrib" extensions shipped with PostgreSQL:
> they are supported by the PGDG.
> 
> There is also no complete list of extensions that I am aware of.

There is some info in the slides from David Wheeler's talk
at PGConf.dev earlier this month:


https://www.pgevents.ca/events/pgconfdev2025/sessions/session/331/slides/87/pgconf.dev-2025-adventures-extension-packaging.pdf

The fifth slide has this link:
https://gist.github.com/joelonsql/e5aa27f8cc9bd22b8999b7de8aee9d47

... which seems to include not quite 1200 extensions.

Regards,
-Chap




Re: Non-reproducible AIO failure

2025-05-27 Thread Tom Lane
Thomas Munro  writes:
> Could you please share your configure options?

The failures on indri and sifaka were during ordinary buildfarm
runs, you can check the animals' details on the website.
(Note those are same host machine, the difference is that
indri uses some MacPorts packages while sifaka is meant to be
a bare-bones-macOS build.)

On the other two, I believe I just used my normal Mac devel
configuration.  On the M4 laptop that's

$ ./configure CC="ccache gcc" CFLAGS="-O2 -fno-common" --enable-debug 
--enable-cassert --with-bonjour --without-icu --enable-tap-tests 
--with-system-tzdata=/usr/share/zoneinfo

but I see I haven't quite got the same thing on the M1 mini:

$ ./configure CC="ccache clang" --enable-debug --enable-cassert 
--enable-tap-tests --with-system-tzdata=/usr/share/zoneinfo

In both cases the compiler is really

$ gcc -v
Apple clang version 17.0.0 (clang-1700.0.13.3)
Target: arm64-apple-darwin24.5.0
Thread model: posix
InstalledDir: 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

I did some experimentation with different -O levels to see if that
would make the failure any more or less common.  But I've yet to
reproduce it at anything except -O2.

regards, tom lane




Re: All supported PostgreSQL 17 extensions list

2025-05-27 Thread Florents Tselai


> On 27 May 2025, at 4:29 PM, Laurenz Albe  wrote:
> 
> On Tue, 2025-05-27 at 18:01 +0500, Zaid Shabbir wrote:
>> I’m looking for a complete list of PostgreSQL 17 extensions — both 
>> open-source
>> and proprietary. I found a link, but it doesn’t seem to include all 
>> available extensions.
>> 
>> Is there an official or community-maintained source where I can find a 
>> comprehensive
>> list of supported extensions?
> 
> There is no "supported".  Each extension has to support itself.
> An exception are the "contrib" extensions shipped with PostgreSQL:
> they are supported by the PGDG.
> 
> There is also no complete list of extensions that I am aware of.
> 
> In addition to the link you mention, you can search Github and
> pgxn.org; that should cover a lot of them.
> 
> Yours,
> Laurenz Albe


On top of what Laurenz said, 
this list is can be useful too 
https://yum.postgresql.org/extensions/



Re: Non-reproducible AIO failure

2025-05-27 Thread Tom Lane
Andres Freund  writes:
> This is on a m4 mac mini.  I'm wondering if there's some hardware specific
> memory ordering issue or disk speed based timing issue that I'm just not
> hitting.

I dunno, I've seen it on three different physical machines now
(one M1, two M4 Pros).  But it is darn hard to repro, for sure.

regards, tom lane




Re: Non-reproducible AIO failure

2025-05-27 Thread Alexander Lakhin

Hello Tomas,

27.05.2025 16:26, Tomas Vondra wrote:

I'm interested in how you run these tests in parallel. Can you share the
patch/script?


Yeah, sure. I'm running the test as follows:
rm -rf src/test/recovery_*; for i in `seq 40`; do cp -r src/test/recovery/ src/test/recovery_$i/; sed -i .bak 
"s|src/test/recovery|src/test/recovery_$i|" src/test/recovery_$i/Makefile; done

PROVE_TESTS="t/027*" make -s check -s -C src/test/recovery

for i in `seq 100`; do echo "iteration $i"; parallel -j40 --linebuffer --tag PROVE_TESTS="t/027*" NO_TEMP_INSTALL=1 make 
-s check -s -C src/test/recovery_{} ::: `seq 8` || break;  done; echo -e "\007"


Alexander Lakhin
Neon (https://neon.tech)

Re: PG 18 release notes draft committed

2025-05-27 Thread Nathan Bossart
For the "Deprecate MD5 password authentication" item, IMHO we should
emphasize that support for MD5 passwords will be removed in a future major
release, as we did for the 18beta1 announcement.  Here's an attempt:

Deprecate MD5 password authentication (Nathan Bossart)

Support for MD5 passwords will be removed in a future major version
release.  CREATE ROLE and ALTER ROLE now emit deprecation warnings when
setting MD5 passwords.  These warnings can be disabled by setting the
md5_password_warnings parameter to "off".

-- 
nathan




Re: All supported PostgreSQL 17 extensions list

2025-05-27 Thread Laurenz Albe
On Tue, 2025-05-27 at 18:01 +0500, Zaid Shabbir wrote:
> I’m looking for a complete list of PostgreSQL 17 extensions — both open-source
> and proprietary. I found a link, but it doesn’t seem to include all available 
> extensions.
>
> Is there an official or community-maintained source where I can find a 
> comprehensive
> list of supported extensions?

There is no "supported".  Each extension has to support itself.
An exception are the "contrib" extensions shipped with PostgreSQL:
they are supported by the PGDG.

There is also no complete list of extensions that I am aware of.

In addition to the link you mention, you can search Github and
pgxn.org; that should cover a lot of them.

Yours,
Laurenz Albe




Re: Cygwin support

2025-05-27 Thread Andrew Dunstan



On 2025-04-28 Mo 4:53 PM, Mark Woodward wrote:
What are the economics of this? I used PostgreSQL and Cygwin 25 years 
ago and am amazed it is still a thing.

How much effort is it to support PostgreSQL on Cygwin?
How many actual users are using PostgreSQL on cygwin in production? (I 
should hope none!)


I would say it is something that should be announced as "deprecated" 
and see how many people complain, my bet no one will really care. 
Cygwin was a phenomenal hack in its day but I believe that those days 
have passed.




Please don't top-post on PostgreSQL lists.

I don't see it as our role to pass judgements like this on what people 
use. While Cygwin exists it's not our province to deprecate its use. If 
the maintenance effort were onerous I might be willing to relook at our 
support for it, but the simple answer to your first question is that the 
maintenance effort is close to zero. As I pointed out elsewhere in this 
thread, even if the server has limited use, the Cygwin psql client is 
nicer to use on Windows than the native build, reason enough to keep it 
going, at least until we improve the native build.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Non-reproducible AIO failure

2025-05-27 Thread Andres Freund
Hi,

On 2025-05-27 14:43:14 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I just meant that it seems that I can't reproduce it for some as of yet
> > unknown reason. I've now been through 3k+ runs of 027_stream_regress, 
> > without
> > a single failure, so there has to be *something* different about my
> > environment than yours.
> 
> > Darwin m4-dev 24.1.0 Darwin Kernel Version 24.1.0: Thu Oct 10 21:06:23 PDT 
> > 2024; root:xnu-11215.41.3~3/RELEASE_ARM64_T8132 arm64
> 
> > cc -v
> > Apple clang version 16.0.0 (clang-1600.0.26.4)
> > Target: arm64-apple-darwin24.1.0
> > Thread model: posix
> 
> > I guess I'll try to update to a later version and see if it repros there?
> 
> Maybe.  All the machines I've seen it on are current-software:
> 
> $ uname -a
> Darwin minim4.sss.pgh.pa.us 24.5.0 Darwin Kernel Version 24.5.0: Tue Apr 22 
> 19:53:27 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T6041 arm64
> $ cc -v
> Apple clang version 17.0.0 (clang-1700.0.13.3)
> Target: arm64-apple-darwin24.5.0
> Thread model: posix
> InstalledDir: 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
> 
> If it's OS-version-specific, that raises the odds IMO that this
> is Apple's fault more than ours.

Uh, huh.  After more than 3k successful runs, ~1 minute after I started to log
in graphically (to update the OS), I got my first reproduction.

2025-05-27 14:51:34.703 EDT [88755] pg_regress/sanity_check LOG:  statement: 
VACUUM;
TRAP: failed Assert("aio_ret->result.status != PGAIO_RS_UNKNOWN"), File: 
"../../src/postgres/src/backend/storage/buffer/bufmgr.c", Line: 1605, PID: 88755
0   postgres0x000102747514 ExceptionalCondition 
+ 108
1   postgres0x0001025cd618 WaitReadBuffers + 596
2   postgres0x0001025c9314 
read_stream_next_buffer + 428
3   postgres0x000102345a24 heap_vacuum_rel + 
1884
4   postgres0x000102452fec vacuum_rel + 724
5   postgres0x000102452b54 vacuum + 1656
6   postgres0x000102452400 ExecVacuum + 1504
7   postgres0x000102615990 
standard_ProcessUtility + 444
8   pg_stat_statements.dylib0x000102f2c39c pgss_ProcessUtility 
+ 668
9   postgres0x0001026153c4 PortalRunUtility + 
136
10  postgres0x000102614af4 PortalRunMulti + 232
11  postgres0x000102614530 PortalRun + 456
12  postgres0x0001026135ac exec_simple_query + 
1240
13  postgres0x00010261084c PostgresMain + 1400
14  postgres0x00010260c5d4 BackendInitialize + 0
15  postgres0x000102568f44 
postmaster_child_launch + 372
16  postgres0x00010256d218 ServerLoop + 4960
17  postgres0x00010256b55c InitProcessGlobals + 0
18  postgres0x0001024beabc help + 0
19  dyld0x000192b80274 start + 2840

I'll see if being graphically logged in somehow indeed increased the repro
rate, and if so I'll expand the debugging somewhat, or if this was just an
absurd coincidence.

Greetings,

Andres Freund




Re: Non-reproducible AIO failure

2025-05-27 Thread Robert Haas
On Sun, May 25, 2025 at 8:25 PM Tom Lane  wrote:
> The fact that I can trace through this Assert failure but not the
> AIO one strongly suggests some system-level problem in the latter.
> There is something rotten in the state of Denmark.

I have been quite frustrated with lldb on macOS for a while now -- I
tend to find that when I get a can get stack trace from a
still-running process it works fine, but trying to get a stack trace
from a core dump often fails to produce anything useful (but sometimes
it does produce something useful). I haven't been able to find any
information on the Internet to explain why this sometimes happens and
sometimes doesn't, and various things I attempted as fixes didn't work
out. There could be something wrong specifically with this machine,
but I also wouldn't be too shocked if this is just randomly broken on
macOS.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Understanding when VM record needs snapshot conflict horizon

2025-05-27 Thread Melanie Plageman
On Sun, May 25, 2025 at 6:45 AM Dilip Kumar  wrote:
>
> IMHO, if we include snapshot conflict horizon in cases where it is not
> necessary, don't you think it will impact performance on standby?
> because now it has to loop through the procarray on standby to check
> whether there is any conflict before applying this WAL.

Yep, that's a good point. In my patch set to combine the prune/freeze
record and visible record, the only time we could omit the snapshot
conflict horizon after phase I of vacuum in this combined record is
when the heap page was unmodified by phase I and the heap page was
already marked all-visible in the VM and is only being set all-frozen.
I will make sure that the snapshot conflict horizon is omitted in that
case to ensure we don't spend more time on the standby to check for
conflicts.

- Melanie




Re: Tightening DecodeNumberField's parsing rules

2025-05-27 Thread Robert Haas
On Tue, May 27, 2025 at 2:38 PM Tom Lane  wrote:
> So what I propose we do about this is to apply the attached to HEAD
> and leave the back branches alone.

+1. In most cases, we pride ourselves on carefully validating the
input we receive and people on this list have been known to disparage
other products for failing to do the same. But our validation of
timestamps is notably less strict. I think that's somewhat unavoidable
given that there are multiple date-time formats that somebody might
use, but I'm in favor of not being more lax than we have to be. If
some input can't be interpreted as anything sensible, we should reject
it rather than making up a fake value. However, I agree that it's best
not to do such tightening in the back-branches.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Large expressions in indexes can't be stored (non-TOASTable)

2025-05-27 Thread Nathan Bossart
On Thu, May 08, 2025 at 08:55:08AM +0900, Michael Paquier wrote:
> On Wed, May 07, 2025 at 02:55:49PM -0500, Nathan Bossart wrote:
>> Committed with that change.  That takes care of a good chunk of these TOAST
>> snapshot problems.  I think there are about 4 others left, unless something
>> has changed since I last checked.  I hope to look into those again soon.
> 
> Thanks, for both things.

Here is what I have staged for commit for the others.  I'm hoping to push
these in the next couple of days.

-- 
nathan
>From ac77516715c7d05e94f585a08253e43c64a91382 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 27 May 2025 14:58:37 -0500
Subject: [PATCH v6 1/1] Ensure we have a snapshot when updating various system
 catalogs.

A few places that access system catalogs don't set up an active
snapshot before potentially accessing their TOAST tables.  To fix,
push an active snapshot just before each section of code that might
require accessing one of these TOAST tables, and pop it shortly
afterwards.  While at it, this commit adds some rather strict
assertions in an attempt to prevent such issues in the future.

Commit 16bf24e0e4 recently removed pg_replication_origin's TOAST
table in order to fix the same problem for that catalog.  On the
back-branches, those bugs are left in place.  We cannot easily
remove a catalog's TOAST table on released major versions, and only
replication origins with extremely long names are affected.  Given
the low severity of the issue, it didn't seem worth the trouble of
significantly modifying the patch on the back-branches.

Also, on v13 and v14, the aforementioned strict assertions have
been omitted because commit 2776922201, which added
HaveRegisteredOrActiveSnapshot(), was not backpatched.  While we
could probably back-patch it now, I've opted against it because it
seems unlikely that new TOAST snapshot issues will be introduced in
the oldest supported versions.

Reported-by: Alexander Lakhin 
Reviewed-by: Michael Paquier 
Discussion: https://postgr.es/m/18127-fe54b6a667f29658%40postgresql.org
Discussion: https://postgr.es/m/18309-c0bf914950c46692%40postgresql.org
Discussion: https://postgr.es/m/ZvMSUPOqUU-VNADN%40nathan
Backpatch-through: 13
---
 src/backend/access/heap/heapam.c | 29 
 src/backend/commands/indexcmds.c |  2 +-
 src/backend/commands/tablecmds.c |  8 +++
 src/backend/postmaster/autovacuum.c  |  7 ++
 src/backend/replication/logical/worker.c | 24 
 5 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9ec8cda1c68..2be7f817c78 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -213,6 +213,27 @@ static const int MultiXactStatusLock[MaxMultiXactStatus + 
1] =
 #define TUPLOCK_from_mxstatus(status) \
(MultiXactStatusLock[(status)])
 
+/*
+ * Check that we have a valid snapshot if we might need TOAST access.
+ */
+static inline void
+AssertHasSnapshotForToast(Relation rel)
+{
+#ifdef USE_ASSERT_CHECKING
+
+   /* bootstrap mode in particular breaks this rule */
+   if (!IsNormalProcessingMode())
+   return;
+
+   /* if the relation doesn't have a TOAST table, we are good */
+   if (!OidIsValid(rel->rd_rel->reltoastrelid))
+   return;
+
+   Assert(HaveRegisteredOrActiveSnapshot());
+
+#endif /* USE_ASSERT_CHECKING 
*/
+}
+
 /* 
  *  heap support routines
  * 
@@ -2066,6 +2087,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId 
cid,
Assert(HeapTupleHeaderGetNatts(tup->t_data) <=
   RelationGetNumberOfAttributes(relation));
 
+   AssertHasSnapshotForToast(relation);
+
/*
 * Fill in tuple header fields and toast the tuple if necessary.
 *
@@ -2343,6 +2366,8 @@ heap_multi_insert(Relation relation, TupleTableSlot 
**slots, int ntuples,
/* currently not needed (thus unsupported) for heap_multi_insert() */
Assert(!(options & HEAP_INSERT_NO_LOGICAL));
 
+   AssertHasSnapshotForToast(relation);
+
needwal = RelationNeedsWAL(relation);
saveFreeSpace = RelationGetTargetPageFreeSpace(relation,

   HEAP_DEFAULT_FILLFACTOR);
@@ -2765,6 +2790,8 @@ heap_delete(Relation relation, ItemPointer tid,
 
Assert(ItemPointerIsValid(tid));
 
+   AssertHasSnapshotForToast(relation);
+
/*
 * Forbid this during a parallel operation, lest it allocate a combo 
CID.
 * Other workers might need that combo CID for visibility checks, and we
@@ -3260,6 +3287,8 @@ heap_update(Relation relation, ItemPo

Re: POC: make mxidoff 64 bits

2025-05-27 Thread Maxim Orlov
On Tue, 29 Apr 2025 at 15:01, Ashutosh Bapat 
wrote:

>
> I notice that transam/README does not mention multixact except one place
> in section "Transaction Emulation during Recovery". I expected it to
> document how pg_multixact/members and pg_multixact/offset are used and what
> is their layout. It's not the responsibility of this patchset to document
> it, but it will be good if we add a section about multixacts in
> transam/README. It will make reviews easier.
>

Yeah, I agree, this is a big overlook, I think. Anyone who tries to
understand how pg_multixact works has to deal with the code.
Certainly, we need to address this issue.

But for now, here is a new rebase @ 70a13c528b6e382a381f.
The only change is that following commits 15a79c7 and a0ed19e, we must also
switch to PRIu64 format.

-- 
Best regards,
Maxim Orlov.
From cd2af98bef93b18d3f64afed7239f2a18958a878 Mon Sep 17 00:00:00 2001
From: Maxim Orlov 
Date: Wed, 4 May 2022 15:53:36 +0300
Subject: [PATCH v15 5/7] TEST: initdb option to initialize cluster with
 non-standard xid/mxid/mxoff

To date testing database cluster wraparund was not easy as initdb has always
inited it with default xid/mxid/mxoff. The option to specify any valid
xid/mxid/mxoff at cluster startup will make these things easier.

Author: Maxim Orlov 
Author: Pavel Borisov 
Author: Svetlana Derevyanko 
Discussion: 
https://www.postgresql.org/message-id/flat/CACG%3Dezaa4vqYjJ16yoxgrpa-%3DgXnf0Vv3Ey9bjGrRRFN2YyWFQ%40mail.gmail.com
---
 src/backend/access/transam/clog.c  |  21 +
 src/backend/access/transam/multixact.c |  53 
 src/backend/access/transam/subtrans.c  |   8 +-
 src/backend/access/transam/xlog.c  |  15 ++--
 src/backend/bootstrap/bootstrap.c  |  50 +++-
 src/backend/main/main.c|   6 ++
 src/backend/postmaster/postmaster.c|  14 +++-
 src/backend/tcop/postgres.c|  53 +++-
 src/bin/initdb/initdb.c| 107 -
 src/bin/initdb/t/001_initdb.pl |  60 ++
 src/include/access/xlog.h  |   3 +
 src/include/c.h|   4 +
 src/include/catalog/pg_class.h |   2 +-
 13 files changed, 382 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/transam/clog.c 
b/src/backend/access/transam/clog.c
index 48f10bec91..eb8a9791ab 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -834,6 +834,7 @@ BootStrapCLOG(void)
 {
int slotno;
LWLock *lock = SimpleLruGetBankLock(XactCtl, 0);
+   int64   pageno;
 
LWLockAcquire(lock, LW_EXCLUSIVE);
 
@@ -844,6 +845,26 @@ BootStrapCLOG(void)
SimpleLruWritePage(XactCtl, slotno);
Assert(!XactCtl->shared->page_dirty[slotno]);
 
+   pageno = 
TransactionIdToPage(XidFromFullTransactionId(TransamVariables->nextXid));
+   if (pageno != 0)
+   {
+   LWLock *nextlock = SimpleLruGetBankLock(XactCtl, pageno);
+
+   if (nextlock != lock)
+   {
+   LWLockRelease(lock);
+   LWLockAcquire(nextlock, LW_EXCLUSIVE);
+   lock = nextlock;
+   }
+
+   /* Create and zero the first page of the commit log */
+   slotno = ZeroCLOGPage(pageno, false);
+
+   /* Make sure it's written out */
+   SimpleLruWritePage(XactCtl, slotno);
+   Assert(!XactCtl->shared->page_dirty[slotno]);
+   }
+
LWLockRelease(lock);
 }
 
diff --git a/src/backend/access/transam/multixact.c 
b/src/backend/access/transam/multixact.c
index 059a72f106..9e24198a1e 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1955,6 +1955,7 @@ BootStrapMultiXact(void)
 {
int slotno;
LWLock *lock;
+   int64   pageno;
 
lock = SimpleLruGetBankLock(MultiXactOffsetCtl, 0);
LWLockAcquire(lock, LW_EXCLUSIVE);
@@ -1966,6 +1967,26 @@ BootStrapMultiXact(void)
SimpleLruWritePage(MultiXactOffsetCtl, slotno);
Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
+   pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+   if (pageno != 0)
+   {
+   LWLock *nextlock = SimpleLruGetBankLock(MultiXactOffsetCtl, 
pageno);
+
+   if (nextlock != lock)
+   {
+   LWLockRelease(lock);
+   LWLockAcquire(nextlock, LW_EXCLUSIVE);
+   lock = nextlock;
+   }
+
+   /* Create and zero the first page of the offsets log */
+   slotno = ZeroMultiXactOffsetPage(pageno, false);
+
+   /* Make sure it's written out */
+   SimpleLruWritePage(MultiXactOffsetCtl, slotno);
+   Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+   }
+
LWLockRelease(loc

Re: Automatically sizing the IO worker pool

2025-05-27 Thread Dmitry Dolgov
On Mon, May 26, 2025, 8:01 AM Thomas Munro  wrote:

> But ... I'm not even sure if we can say that our
> I/O arrivals have a Poisson distribution, since they are not all
> independent.
>

Yeah, a good point, one have to be careful with assumptions about
distribution -- from what I've read many processes in computer systems are
better described by a Pareto. But the beauty of the queuing theory is that
many results are independent from the distribution (not sure about
dependencies though).

In this version I went back to basics and built something that looks
> more like the controls of a classic process/thread pool (think Apache)
> or connection pool (think JDBC), with a couple of additions based on
> intuition: (1) a launch interval, which acts as a bit of damping
> against overshooting on brief bursts that are too far apart, and (2)
> the queue length > workers * k as a simple way to determine that
> latency is being introduced by not having enough workers.  Perhaps
> there is a good way to compute an adaptive value for k with some fancy
> theories, but k=1 seems to have *some* basis: that's the lowest number
> which the pool is too small and *certainly* introducing latency, but
> any lower constant is harder to defend because we don't know how many
> workers are already awake and about to consume tasks.  Something from
> queuing theory might provide an adaptive value, but in the end, I
> figured we really just want to know if the queue is growing ie in
> danger of overflowing (note: the queue is small!  64, and not
> currently changeable, more on that later, and the overflow behaviour
> is synchronous I/O as back-pressure).  You seem to be suggesting that
> k=1 sounds too low, not too high, but there is that separate
> time-based defence against overshoot in response to rare bursts.
>

I probably had to start with a statement that I find the current approach
reasonable, and I'm only curious if there is more to get out of it. I
haven't benchmarked the patch yet (plan getting to it when I'll get back),
and can imagine practical considerations significantly impacting any
potential solution.

About control theory... yeah.  That's an interesting bag of tricks.
> FWIW Melanie and (more recently) I have looked into textbook control
> algorithms at a higher level of the I/O stack (and Melanie gave a talk
> about other applications in eg VACUUM at pgconf.dev).  In
> read_stream.c, where I/O demand is created, we've been trying to set
> the desired I/O concurrency level and thus lookahead distance with
> adaptive feedback.  We've tried a lot of stuff.  I hope we can share
> some concept patches some time soon, well, maybe in this cycle.  Some
> interesting recent experiments produced graphs that look a lot like
> the ones in the book "Feedback Control for Computer Systems" (an easy
> software-person book I found for people without an engineering/control
> theory background where the problems match our world more closely, cf
> typical texts that are about controlling motors and other mechanical
> stuff...).  Experimental goals are: find the the smallest concurrent
> I/O request level (and thus lookahead distance and thus speculative
> work done and buffers pinned) that keeps the I/O stall probability
> near zero (and keep adapting, since other queries and applications are
> sharing system I/O queues), and if that's not even possible, find the
> highest concurrent I/O request level that doesn't cause extra latency
> due to queuing in lower levels (I/O workers, kernel, ...,  disks).
> That second part is quite hard.  In other words, if higher levels own
> that problem and bring the adaptivity, then perhaps io_method=worker
> can get away with being quite stupid.  Just a thought...
>

Looking forward to it. And thanks for the reminder about the talk, wanted
to watch it already long time ago, but somehow didn't managed yet.

>


RE: Replication slot is not able to sync up

2025-05-27 Thread Zhijie Hou (Fujitsu)
On Wed, May 28, 2025 at 2:09 AM Masahiko Sawada wrote:
> 
> On Fri, May 23, 2025 at 10:07 PM Amit Kapila 
> wrote:
> >
> > In the case presented here, the logical slot is expected to keep
> > forwarding, and in the consecutive sync cycle, the sync should be
> > successful. Users using logical decoding APIs should also be aware
> > that if due for some reason, the logical slot is not moving forward,
> > the master/publisher node will start accumulating dead rows and WAL,
> > which can create bigger problems.
> 
> I've tried this case and am concerned that the slot synchronization using
> pg_sync_replication_slots() would never succeed while the primary keeps
> getting write transactions. Even if the user manually consumes changes on the
> primary, the primary server keeps advancing its XID in the meanwhile. On the
> standby, we ensure that the
> TransamVariables->nextXid is beyond the XID of WAL record that it's
> going to apply so the xmin horizon calculated by
> GetOldestSafeDecodingTransactionId() ends up always being higher than the
> slot's catalog_xmin on the primary. We get the log message "could not
> synchronize replication slot "s" because remote slot precedes local slot" and
> cleanup the slot on the standby at the end of pg_sync_replication_slots().

I think the issue occurs because unlike the slotsync worker, the SQL API
removes temporary slots when the function ends, so it cannot hold back the
standby's catalog_xmin. If transactions on the primary keep advancing xids, the
source slot's catalog_xmin on the primary fails to catch up with the standby's
nextXid, causing sync failure.
 
We chose this behavior because we could not predict when (or if) the SQL
function might be executed again, and the creating session might persist after
promotion. Without automatic cleanup, this could lead to temporary slots being
retained for a longer time.
 
This only affects the initial sync when creating a new slot on the standby.
Once the slot exists, the standby's catalog_xmin stabilizes, preventing the
issue in subsequent syncs.
 
I think the SQL API was mainly intended for testing and debugging purposes
where controlled sync operations are useful. For production use, the slotsync
worker (with sync_replication_slots=on) is recommended because it automatically
handles this problem and requires minimal manual intervention. But to avoid
confusion, I think we should clearly document this distinction.

Best Regards,
Hou zj


foreign key on virtual generated column

2025-05-27 Thread jian he
hi.

attached patch is implement a TODO (foreign key on virtual generated
column) left on [1]
for foreign key on virtual generated column, we only support
ON UPDATE NO ACTION
ON UPDATE RESTRICT
ON DELETE CASCADE
ON DELETE NO ACTION
ON DELETE RESTRICT

demo:
CREATE TABLE gtest23a (x int PRIMARY KEY, y int);
INSERT INTO gtest23a VALUES (1, 11), (2, 22), (3, 33), (131072, 44);
CREATE TABLE gtest23b (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 1) VIRTUAL REFERENCES gtest23a (x) ON DELETE CASCADE); --ok
INSERT INTO gtest23b VALUES (1);  -- ok
INSERT INTO gtest23b VALUES (5);  -- error
UPDATE gtest23b SET a = 5 WHERE a = 1; --error
DELETE FROM gtest23a WHERE x = 1; --ok


ALTER TABLE ALTER COLUMN SET EXPRESSION
ALTER TABLE ALTER COLUMN  SET DATA TYPE

if foreign key on virtual generated column, the above two will not cause table
rewrite,but will do foreign key constraint validation.

[1] 
https://git.postgresql.org/cgit/postgresql.git/commit/?id=83ea6c54025bea67bcd4949a6d58d3fc11c3e21b
From 6acff5606b6181442eac7a1128c879c378adcb05 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Tue, 27 May 2025 21:24:18 +0800
Subject: [PATCH v1 2/2] foreign key on virtual generated column

for virtual generated column, currently support
ON UPDATE NO ACTION
ON UPDATE RESTRICT
ON DELETE CASCADE
ON DELETE NO ACTION
ON DELETE RESTRICT

discussion: https://postgr.es/m/
---
 src/backend/commands/copyfrom.c   |   3 +-
 src/backend/commands/tablecmds.c  |  97 ---
 src/backend/executor/execReplication.c|   6 +-
 src/backend/executor/execUtils.c  |   2 +-
 src/backend/executor/nodeModifyTable.c|  49 +---
 src/backend/utils/adt/ri_triggers.c   | 114 ++
 src/include/executor/nodeModifyTable.h|   7 +-
 .../regress/expected/generated_virtual.out|  67 --
 src/test/regress/sql/generated_virtual.sql|  40 --
 9 files changed, 304 insertions(+), 81 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 906b6581e11..1a545d78593 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1347,7 +1347,8 @@ CopyFrom(CopyFromState cstate)
 if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
 	resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
 	ExecComputeGenerated(resultRelInfo, estate, myslot,
-		 CMD_INSERT);
+		 CMD_INSERT,
+		 false);
 
 /*
  * If the target is a plain table, check the constraints of
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 54ad38247aa..252776c0005 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -659,7 +659,7 @@ static void RebuildConstraintComment(AlteredTableInfo *tab, AlterTablePass pass,
 	 Oid objid, Relation rel, List *domname,
 	 const char *conname);
 static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
-static void TryReuseForeignKey(Oid oldId, Constraint *con);
+static void TryReuseForeignKey(Oid oldId, Constraint *con, Oid oldRelId);
 static ObjectAddress ATExecAlterColumnGenericOptions(Relation rel, const char *colName,
 	 List *options, LOCKMODE lockmode);
 static void change_owner_fix_column_acls(Oid relationOid,
@@ -8642,18 +8642,18 @@ ATExecSetExpression(AlteredTableInfo *tab, Relation rel, const char *colName,
 		 * this renders them pointless.
 		 */
 		RelationClearMissing(rel);
-
-		/* make sure we don't conflict with later attribute modifications */
-		CommandCounterIncrement();
-
-		/*
-		 * Find everything that depends on the column (constraints, indexes,
-		 * etc), and record enough information to let us recreate the objects
-		 * after rewrite.
-		 */
-		RememberAllDependentForRebuilding(tab, AT_SetExpression, rel, attnum, colName);
 	}
 
+	/* make sure we don't conflict with later attribute modifications */
+	CommandCounterIncrement();
+
+	/*
+	 * Find everything that depends on the column (constraints, indexes,
+	 * etc), and record enough information to let us recreate the objects
+	 * after rewrite.
+	*/
+	RememberAllDependentForRebuilding(tab, AT_SetExpression, rel, attnum, colName);
+
 	/*
 	 * Drop the dependency records of the GENERATED expression, in particular
 	 * its INTERNAL dependency on the column, which would otherwise cause
@@ -10222,19 +10222,6 @@ ATAddForeignKeyConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		 errmsg("invalid %s action for foreign key constraint containing generated column",
 "ON DELETE")));
 		}
-
-		/*
-		 * FKs on virtual columns are not supported.  This would require
-		 * various additional support in ri_triggers.c, including special
-		 * handling in ri_NullCheck(), ri_KeysEqual(),
-		 * RI_FKey_fk_upd_check_required() (since all virtual columns appear
-		 * as NULL there).  Also not really practical as long as you can't
-		 * index virtual column

Re: [PATCH] PGSERVICEFILE as part of a normal connection string

2025-05-27 Thread Michael Paquier
On Sun, Apr 13, 2025 at 07:06:06PM +0900, Ryo Kanbayashi wrote:
> I rebased our patch according to  2c7bd2ba507e.
> https://commitfest.postgresql.org/patch/5387/

Thanks for the new version.

-# for the connection options and their environment variables.
+# for the connection options, servicefile options and their environment 
variables.

It seems to me that this comment does not need to be changed.

+   {"servicefile", "PGSERVICEFILE", NULL, NULL,
+   "Database-Service-File", "", 64, -1},

Could it be better to have a new field in pg_conn?  This would also
require a free() in freePGconn() and new PQserviceFile() routine.

+if (strcmp(key, "servicefile") == 0)
+{
+libpq_append_error(errorMessage,
+   "nested servicefile specifications not 
supported in service file \"%s\", line %d",
+   serviceFile,
+   linenr);
+result = 3;
+goto exit;
+}

Perhaps we should add a test for that?  The same is true with
"service", as I am looking at these code paths now.  I'd suggest to
apply double quotes to the parameter name "servicefile" in this error
message, to make clear what this is.

+   # Additionaly encode a colon in servicefile path of Windows 

Typo: Additionally.

+# Backslashes escaped path string for getting collect result at concatenation
+# for Windows environment

Comment is unclear.  But what you mean here is that the conversion is
required to allow the test to work when giving the path to the
connection option, right?
--
Michael


signature.asc
Description: PGP signature


Re: Replication slot is not able to sync up

2025-05-27 Thread Masahiko Sawada
On Tue, May 27, 2025 at 9:15 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Wed, May 28, 2025 at 2:09 AM Masahiko Sawada wrote:
> >
> > On Fri, May 23, 2025 at 10:07 PM Amit Kapila 
> > wrote:
> > >
> > > In the case presented here, the logical slot is expected to keep
> > > forwarding, and in the consecutive sync cycle, the sync should be
> > > successful. Users using logical decoding APIs should also be aware
> > > that if due for some reason, the logical slot is not moving forward,
> > > the master/publisher node will start accumulating dead rows and WAL,
> > > which can create bigger problems.
> >
> > I've tried this case and am concerned that the slot synchronization using
> > pg_sync_replication_slots() would never succeed while the primary keeps
> > getting write transactions. Even if the user manually consumes changes on 
> > the
> > primary, the primary server keeps advancing its XID in the meanwhile. On the
> > standby, we ensure that the
> > TransamVariables->nextXid is beyond the XID of WAL record that it's
> > going to apply so the xmin horizon calculated by
> > GetOldestSafeDecodingTransactionId() ends up always being higher than the
> > slot's catalog_xmin on the primary. We get the log message "could not
> > synchronize replication slot "s" because remote slot precedes local slot" 
> > and
> > cleanup the slot on the standby at the end of pg_sync_replication_slots().
>
> I think the issue occurs because unlike the slotsync worker, the SQL API
> removes temporary slots when the function ends, so it cannot hold back the
> standby's catalog_xmin. If transactions on the primary keep advancing xids, 
> the
> source slot's catalog_xmin on the primary fails to catch up with the standby's
> nextXid, causing sync failure.

Agreed with this analysis.

> This only affects the initial sync when creating a new slot on the standby.
> Once the slot exists, the standby's catalog_xmin stabilizes, preventing the
> issue in subsequent syncs.

Right. I think this is an area where we can improve, if there is a
real use case.

> I think the SQL API was mainly intended for testing and debugging purposes
> where controlled sync operations are useful. For production use, the slotsync
> worker (with sync_replication_slots=on) is recommended because it 
> automatically
> handles this problem and requires minimal manual intervention. But to avoid
> confusion, I think we should clearly document this distinction.

I didn't know it was intended for testing and debugging purposes so
clearilying it in the documentation would be a good idea. Also, I
agree that using the slotsync worker is the primary usage of this
feature. I'm interested in whether there is a use case where the SQL
API is more preferable. If there is, we can improve the SQL API part,
especially the first synchronization part, for v19 or later.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Fixing memory leaks in postgres_fdw

2025-05-27 Thread Matheus Alcantara
Hi,

On 26/05/25 16:36, Tom Lane wrote:
> Here's a v4 that is actually more or less feature-complete:
> it removes no-longer-needed complexity such as PG_TRY blocks.
> I've checked that Valgrind shows no leaks in the postgres_fdw
> and dblink tests after applying this on top of my other
> patch series.
>
> 0001 is like the previous version except that I took out some
> inessential simplifications to get to the minimum possible
> patch.  Then 0002 does all the simplifications.  Removal of
> PG_TRY blocks implies reindenting a lot of code, but I made
> that a separate patch 0003 for ease of review.  (0003 would
> be a candidate for adding to .git-blame-ignore-revs, perhaps.)
> 0004 is the old 0002 (still unmodified) and then 0005 cleans
> up one remaining leakage observed by Valgrind.
>
> regards, tom lane
>

The v4-0001-Fix-memory-leakage-in-postgres_fdw-s-DirectModify.patch
looks good to me.

Just some thoughts on v4-0005-Avoid-leak-when-dblink_connstr_check-fails.patch:

I think that we can delay the allocation a bit more. The
dblink_security_check just use the rconn to pfree in case of a failure,
so I think that we can remove this parameter and move the rconn
allocation to the next if (connname) block. See attached as an example.

-- 
Matheus Alcantara


delay-rconn-allocation.diff
Description: Binary data


Re: Fix slot synchronization with two_phase decoding enabled

2025-05-27 Thread Masahiko Sawada
On Sun, May 25, 2025 at 11:34 PM Nisha Moond  wrote:
>
>  to
> On Fri, May 23, 2025 at 10:06 PM Masahiko Sawada  
> wrote:
> >
> >
> > Here are review comments for v14 patch:
> >
>
> Thank you for the review.
>
> > I think we need to include a basic test case where we simply create a
> > subscription with two_phase=true and then enable the failover via
> > ALTER SUBSCRIPTION after making sure that slot's restart_lsn passed
> > two_phase_at. Probably if we use this case in 003_logical_slots.pl, we
> > can avoid creating regress_sub2?
> >
>
> A test has been added in 040_standby_failover_slots_sync.pl to enable
> failover via ALTER SUBSCRIPTION.

Yes but the slot is originally created via SQL API, which seems
uncommon usage in practice. I thought it would be good to have the
basic steps in the tests to enable both fields.

> Regarding regress_sub2 in 003_logical_slots.pl :
> If we use ALTER SUBSCRIPTION to set failover=true on regress_sub, it
> leads to pg_upgrade failure, as it attempts to create a slot on the
> new_node(upgraded version) with both two_phase and failover enabled,
> which is an unsupported combination.

I think that the pg_upgrade test should cover the case where it
restores logical slots with both fields enabled in the new cluster.
When I actually tried this case, I realized that pg_upgrade doesn't
handle this case; it tried to create the logical slot via SQL API but
it failed as we don't allow it to create a logical slot with enabling
both two_phase and failover. How can we handle this case?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Clarification on warning when connecting to 'pgbouncer' database via Pgbouncer

2025-05-27 Thread Julien Rouhaud
On Wed, 28 May 2025, 07:21 Tom Lane,  wrote:

> Shaik Mohammad Mujeeb  writes:
> > So, I feel that if we can reliably detect when the backend is a
> non-PostgreSQL server, it might be better to adjust the warning
> accordingly, rather than relying on a client-server version comparison in
> such cases.
>
> The entire point of having a wire protocol is that you can't/don't
> need to detect that.  The short answer here is that this is
> pgbouncer's bug and you should be nagging them to change, not us.


maybe I'm missing something but it seems like it's behaving as expected.
pgbouncer advertises a version that both seems to be understood by the
protocol and is incompatible with any real postgres version.  since such
connection only allows plain SHOW statement, the warning about some psql
features seems perfectly accurate, if not wildly an understatement as
literally no psql feature will work.

>
>


Re: Clarification on warning when connecting to 'pgbouncer' database via Pgbouncer

2025-05-27 Thread David G. Johnston
On Tue, May 27, 2025 at 3:08 PM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > psql has zero awareness of pgbouncer or any other non-PostgreSQL server
> you
> > might be able to point it to and establish a successful connection.  The
> > lack of a warning previously is probably more incorrect than its presence
> > now.
>
> Yeah.  I'd say the fundamental problem is that pgbouncer is abusing
> the server_version parameter, which will break plenty of things
> besides psql --- pg_dump for instance


They proxy over the real server information during actual bouncer work -
it's just their local administrative database connection - one that isn't
proxied - that is at issue here.


> Anyway, I'm minus quite a lot on silencing this warning, because
> it is telling you about real problems you are likely to hit.
>
>
How about the false-negative that was happening for years when this warning
didn't appear even though you were connected to pgbouncer and not
PostgreSQL?

David J.


Re: Clarification on warning when connecting to 'pgbouncer' database via Pgbouncer

2025-05-27 Thread Shaik Mohammad Mujeeb
> Anyway, I'm minus quite a lot on silencing this warning, because 

> it is telling you about real problems you are likely to hit.
+1

I just wanted to clarify that the concern here isn’t about the warning itself 
being shown, but rather about the reason it conveys, which can be a bit 
misleading.

The current warning suggests that some psql features might not work due to the 
server version being 1.20. This can lead to confusion - as it did in my case - 
making it seem like there's an issue with PgBouncer v1.20.1. However, this same 
warning appears even when we use the latest version of PgBouncer, so the 
message doesn't accurately reflect the root cause.

Instead, a clearer message like:


"WARNING: non-PostgreSQL server. Some psql features might not work."







would be more appropriate. It immediately communicates that the server is not 
PostgreSQL (as in the case of PgBouncer), which would naturally explain why 
some psql features (like \d commands, as Euler mentioned) may not function as 
expected. This avoids unnecessary suspicion about the PgBouncer version being 
used.

The main goal here is not to suppress the warning altogether, but to make the 
message more meaningful and reflective of the actual scenario.

If we can reliably detect that it’s a non-PostgreSQL server, we could adjust 
the warning accordingly and avoid doing a (client-server)version comparison 
that doesn’t apply in this context.



Thanks & Regards,
Shaik Mohammad Mujeeb
Member Technical Staff
Zoho Corp










 On Wed, 28 May 2025 03:38:12 +0530 Tom Lane  wrote ---



"David G. Johnston" < mailto:david.g.johns...@gmail.com > writes: 
> psql has zero awareness of pgbouncer or any other non-PostgreSQL server you 
> might be able to point it to and establish a successful connection.  The 
> lack of a warning previously is probably more incorrect than its presence 
> now. 
 
Yeah.  I'd say the fundamental problem is that pgbouncer is abusing 
the server_version parameter, which will break plenty of things 
besides psql --- pg_dump for instance, and just about any other 
application that needs to adjust its queries to suit different 
Postgres server versions.  If they feel a need to advertise 
pgbouncer's version there, it wouldn't be that hard to make 
server_version read out as something like "17.5 (pgbouncer XX.YY)", 
which would not confuse libpq/psql (cf. parsing code in 
pqSaveParameterStatus).  But apparently they didn't do that. 
 
Anyway, I'm minus quite a lot on silencing this warning, because 
it is telling you about real problems you are likely to hit. 
 
regards, tom lane

Re: Clarification on warning when connecting to 'pgbouncer' database via Pgbouncer

2025-05-27 Thread Tom Lane
"David G. Johnston"  writes:
> On Tue, May 27, 2025 at 3:08 PM Tom Lane  wrote:
>> Yeah.  I'd say the fundamental problem is that pgbouncer is abusing
>> the server_version parameter, which will break plenty of things
>> besides psql --- pg_dump for instance

> They proxy over the real server information during actual bouncer work -
> it's just their local administrative database connection - one that isn't
> proxied - that is at issue here.

Okay, but they're still abusing the wire protocol: server_version is
supposed to be a Postgres version, not some random number.  My advice
to them would be to make up their mind which PG version they intend
their "local administrative database" to act like, and put that in
server_version.

regards, tom lane




Re: Clarification on warning when connecting to 'pgbouncer' database via Pgbouncer

2025-05-27 Thread Shaik Mohammad Mujeeb
Also, let’s say hypothetically that PgBouncer had a version like 17.2 - 
matching the psql (client) major version. In that case, the warning wouldn’t be 
shown at all, which might not be accurate either.



So, I feel that if we can reliably detect when the backend is a non-PostgreSQL 
server, it might be better to adjust the warning accordingly, rather than 
relying on a client-server version comparison in such cases.




Thanks & Regards,

Shaik Mohammad Mujeeb

Member Technical Staff

Zoho Corp












 On Wed, 28 May 2025 04:07:40 +0530 Shaik Mohammad Mujeeb 
 wrote ---



> Anyway, I'm minus quite a lot on silencing this warning, because 

> it is telling you about real problems you are likely to hit.
+1

I just wanted to clarify that the concern here isn’t about the warning itself 
being shown, but rather about the reason it conveys, which can be a bit 
misleading.

The current warning suggests that some psql features might not work due to the 
server version being 1.20. This can lead to confusion - as it did in my case - 
making it seem like there's an issue with PgBouncer v1.20.1. However, this same 
warning appears even when we use the latest version of PgBouncer, so the 
message doesn't accurately reflect the root cause.

Instead, a clearer message like:


"WARNING: non-PostgreSQL server. Some psql features might not work."







would be more appropriate. It immediately communicates that the server is not 
PostgreSQL (as in the case of PgBouncer), which would naturally explain why 
some psql features (like \d commands, as Euler mentioned) may not function as 
expected. This avoids unnecessary suspicion about the PgBouncer version being 
used.

The main goal here is not to suppress the warning altogether, but to make the 
message more meaningful and reflective of the actual scenario.

If we can reliably detect that it’s a non-PostgreSQL server, we could adjust 
the warning accordingly and avoid doing a (client-server)version comparison 
that doesn’t apply in this context.



Thanks & Regards,
Shaik Mohammad Mujeeb
Member Technical Staff
Zoho Corp










 On Wed, 28 May 2025 03:38:12 +0530 Tom Lane < mailto:t...@sss.pgh.pa.us > 
wrote ---











"David G. Johnston" < mailto:david.g.johns...@gmail.com > writes: 
> psql has zero awareness of pgbouncer or any other non-PostgreSQL server you 
> might be able to point it to and establish a successful connection.  The 
> lack of a warning previously is probably more incorrect than its presence 
> now. 
 
Yeah.  I'd say the fundamental problem is that pgbouncer is abusing 
the server_version parameter, which will break plenty of things 
besides psql --- pg_dump for instance, and just about any other 
application that needs to adjust its queries to suit different 
Postgres server versions.  If they feel a need to advertise 
pgbouncer's version there, it wouldn't be that hard to make 
server_version read out as something like "17.5 (pgbouncer XX.YY)", 
which would not confuse libpq/psql (cf. parsing code in 
pqSaveParameterStatus).  But apparently they didn't do that. 
 
Anyway, I'm minus quite a lot on silencing this warning, because 
it is telling you about real problems you are likely to hit. 
 
regards, tom lane

Re: Speed up JSON escape processing with SIMD plus other optimisations

2025-05-27 Thread John Naylor
On Thu, May 23, 2024 at 8:24 AM David Rowley  wrote:
> Other things I considered were if doing 16 bytes at a time is too much
> as it puts quite a bit of work into byte-at-a-time processing if just
> 1 special char exists in a 16-byte chunk. I considered doing SWAR [1]
> processing to do the job of vector8_has_le() and vector8_has() byte
> maybe with just uint32s.  It might be worth doing that. However, I've
> not done it yet as it raises the bar for this patch quite a bit.  SWAR
> vector processing is pretty much write-only code. Imagine trying to
> write comments for the code in [2] so that the average person could
> understand what's going on!?

Sorry to resurrect this thread, but I recently saw something that made
me think of this commit (as well as the similar one 0a8de93a48c):

https://lemire.me/blog/2025/04/13/detect-control-characters-quotes-and-backslashes-efficiently-using-swar/

I don't find this use of SWAR that bad for readability, and there's
only one obtuse clever part that merits a comment. Plus, it seems json
escapes are pretty much set in stone? I gave this a spin with

https://www.postgresql.org/message-id/attachment/163406/json_bench.sh.txt

master:

Test 1
tps = 321.522667 (without initial connection time)
tps = 315.070985 (without initial connection time)
tps = 331.070054 (without initial connection time)
Test 2
tps = 35.107257 (without initial connection time)
tps = 34.977670 (without initial connection time)
tps = 35.898471 (without initial connection time)
Test 3
tps = 33.575570 (without initial connection time)
tps = 32.383352 (without initial connection time)
tps = 31.876192 (without initial connection time)
Test 4
tps = 810.676116 (without initial connection time)
tps = 745.948518 (without initial connection time)
tps = 747.651923 (without initial connection time)

swar patch:

Test 1
tps = 291.919004 (without initial connection time)
tps = 294.446640 (without initial connection time)
tps = 307.670464 (without initial connection time)
Test 2
tps = 30.984440 (without initial connection time)
tps = 31.660630 (without initial connection time)
tps = 32.538174 (without initial connection time)
Test 3
tps = 29.828546 (without initial connection time)
tps = 30.332913 (without initial connection time)
tps = 28.873059 (without initial connection time)
Test 4
tps = 748.676688 (without initial connection time)
tps = 768.798734 (without initial connection time)
tps = 766.924632 (without initial connection time)

While noisy, this test seems a bit faster with SWAR, and it's more
portable to boot. I'm not sure where I'd put the new function so both
call sites can see it, but that's a small detail...


--
John Naylor
Amazon Web Services
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 51452755f58..8f832dacd40 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -19,7 +19,6 @@
 #include "funcapi.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
-#include "port/simd.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/date.h"
@@ -1621,6 +1620,22 @@ escape_json(StringInfo buf, const char *str)
  */
 #define ESCAPE_JSON_FLUSH_AFTER 512
 
+static inline bool
+has_json_escapable_byte(const char *str)
+{
+	uint64		x;
+
+	memcpy(&x, str, sizeof(uint64));
+
+	uint64		is_ascii = 0x8080808080808080ULL & ~x;
+	uint64		xor2 = x ^ 0x0202020202020202ULL;
+	uint64		lt32_or_eq34 = xor2 - 0x2121212121212121ULL;
+	uint64		sub92 = x ^ 0x5C5C5C5C5C5C5C5CULL;
+	uint64		eq92 = (sub92 - 0x0101010101010101ULL);
+
+	return ((lt32_or_eq34 | eq92) & is_ascii) != 0;
+}
+
 /*
  * escape_json_with_len
  *		Produce a JSON string literal, properly escaping the possibly not
@@ -1645,7 +1660,7 @@ escape_json_with_len(StringInfo buf, const char *str, int len)
 	 * Figure out how many bytes to process using SIMD.  Round 'len' down to
 	 * the previous multiple of sizeof(Vector8), assuming that's a power-of-2.
 	 */
-	vlen = len & (int) (~(sizeof(Vector8) - 1));
+	vlen = len & (int) (~(sizeof(uint64) - 1));
 
 	appendStringInfoCharMacro(buf, '"');
 
@@ -1661,19 +1676,13 @@ escape_json_with_len(StringInfo buf, const char *str, int len)
 		 * string byte-by-byte.  This optimization assumes that most chunks of
 		 * sizeof(Vector8) bytes won't contain any special characters.
 		 */
-		for (; i < vlen; i += sizeof(Vector8))
+		for (; i < vlen; i += sizeof(uint64))
 		{
-			Vector8		chunk;
-
-			vector8_load(&chunk, (const uint8 *) &str[i]);
-
 			/*
 			 * Break on anything less than ' ' or if we find a '"' or '\\'.
 			 * Those need special handling.  That's done in the per-byte loop.
 			 */
-			if (vector8_has_le(chunk, (unsigned char) 0x1F) ||
-vector8_has(chunk, (unsigned char) '"') ||
-vector8_has(chunk, (unsigned char) '\\'))
+			if (has_json_escapable_byte(&str[i]))
 break;
 
 #ifdef ESCAPE_JSON_FLUSH_AFTER
@@ -1706,7 +1715,7 @@ escape_json_with_len(StringInfo buf, const char *str, int len)
 		 * Per-byte loop for Vector8s containi

Re: PG 18 release notes draft committed

2025-05-27 Thread Bruce Momjian
On Mon, May 26, 2025 at 10:20:08AM -0400, Joe Conway wrote:
> On 5/23/25 09:47, Bruce Momjian wrote:
> > On Fri, May 23, 2025 at 09:54:54AM +0200, Álvaro Herrera wrote:
> > > I also think that showing an XML-ish format of a commit message is
> > > unhelpful, not to mention hard to read.  Showing a few actual examples
> > > would be better.
> > 
> > Agreed, but the XML came from Joe Conway so I am hesitant to remove it
> > myself without feedback from him.
> 
> 
> I am not married to it, but I would say that I find pure examples
> confusing/ambiguous. To me at least the XML-ish specification is easier to
> understand.
> 
> Perhaps both the specification as-is and one or two examples added?

Yeah, I think some people like syntax, others like examples.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.




Re: Non-reproducible AIO failure

2025-05-27 Thread Tom Lane
Andres Freund  writes:
> I'll see if being graphically logged in somehow indeed increased the repro
> rate, and if so I'll expand the debugging somewhat, or if this was just an
> absurd coincidence.

Hmm.  Now that you mention it, the one repro on the M1 came just as
I was about to give up and manually cancel the script that was running
the 027 test in a loop.  I wondered for a moment if I'd somehow
affected the state of the machine, but since I was logged in over ssh
I didn't quite see how that would be possible.  But your tale makes me
suspect that outside activity helps --- which would square with
Alexander's results that suggest concurrent runs help.

regards, tom lane




Re: PG 18 release notes draft committed

2025-05-27 Thread Bruce Momjian
On Tue, May 27, 2025 at 09:26:41AM -0500, Nathan Bossart wrote:
> For the "Deprecate MD5 password authentication" item, IMHO we should
> emphasize that support for MD5 passwords will be removed in a future major
> release, as we did for the 18beta1 announcement.  Here's an attempt:
> 
>   Deprecate MD5 password authentication (Nathan Bossart)
> 
>   Support for MD5 passwords will be removed in a future major version
>   release.  CREATE ROLE and ALTER ROLE now emit deprecation warnings when
>   setting MD5 passwords.  These warnings can be disabled by setting the
>   md5_password_warnings parameter to "off".

Agree, I have replaced the item text with your text.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.




Re: Clarification on warning when connecting to 'pgbouncer' database via Pgbouncer

2025-05-27 Thread Euler Taveira
On Tue, May 27, 2025, at 3:41 PM, Shaik Mohammad Mujeeb wrote:
> In my case, *pset.sversion* ends up being *12001* (due to PgBouncer v1.20.1), 
> and since that’s less than *90200*, the warning gets triggered, which feels 
> misleading. But I was wondering - does it really make sense to compare 
> PgBouncer’s version in this context using the same logic as PostgreSQL server 
> versions?

Yes. Because that comparison is for any server that psql connects to.

> Is this an expected behaviour, or would it make sense to handle Pgbouncer 
> differently in this check?

See similar issue [1]. The error is not totally misleading. As I stated in [1],
pgbouncer is a special database. IMO it makes sense to report that some
features don't work because that's true; backslash commands don't work.
Regarding the server version, PgBouncer is a server for the client. Nothing
wrong to report it too. If you don't want to see this message use --quiet
option. In my experience, this special database is frequently used by scripts
to obtain PgBouncer statistics. These scripts generally use --quiet option
anyway. I don't think we need an extra option just to suppress this warning.


[1] https://github.com/pgbouncer/pgbouncer/issues/1143


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: PG 18 release notes draft committed

2025-05-27 Thread Bruce Momjian
On Tue, May 27, 2025 at 08:13:24AM +1000, Peter Smith wrote:
> Hi,
> 
> There seems to be some unexpected ">" here:
> 
> "E.1.3.7.3. Logical Replication Applications>"

Yes, a mistake, fixed.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.




Re: Fixing memory leaks in postgres_fdw

2025-05-27 Thread Tom Lane
Matheus Alcantara  writes:
> I think that we can delay the allocation a bit more. The
> dblink_security_check just use the rconn to pfree in case of a failure,
> so I think that we can remove this parameter and move the rconn
> allocation to the next if (connname) block. See attached as an example.

Yeah, I thought of that too.  But I think the point of the current
setup is to ensure we have the rconn block before we create the PGconn
object, because if we were to hit OOM after creating the connection,
we'd leak the PGconn, which would be quite bad.

Having said that, the idea that this sequence is OOM-safe is pretty
silly anyway, considering that createNewConnection does a pstrdup,
and creates a new hashtable entry which might require enlarging the
hashtable, and for that matter might even create the hashtable.
So maybe rather than continuing to adhere to a half-baked coding
rule, we need to think of some other way to do that.  Maybe it'd be
reasonable to create a hashtable entry with NULL rconn, and then
open the connection?  This'd require rejiggering the lookup
code to treat a hashtable entry with NULL rconn as not really
being there.  But there's not too many routines touching that
hashtable, so maybe it's not hard.

regards, tom lane




Re: queryId constant squashing does not support prepared statements

2025-05-27 Thread Sami Imseih
> > therefore, a user supplied query like this:
> > ```
> > select where $5 in ($1, $2, $3) and $6 = $4 and 1 = 2
> > ```
> >
> > will be normalized to:
> > ```
> > select where $1 in ($2 /*...*/) and $3 = $4 and $5 = $6
> > ```
>
> Hmm, interesting.
>
> I think this renumbering should not be a problem in practice; users with
> unordered parameters have little room to complain if the param numbers
> change on query normalization.  At least that's how it seems to me.
>
> If renumbering everything in physical order makes the code simpler, then
> I don't disagree.
>

It does make it simpler, otherwise we have to introduce O(n) behavior
to find eligible parameter numbers.

I've spent a bit of time looking at this, and I want to
propose the following patchset.

* 0001:

This is a normalization issue discovered when adding new
tests for squashing. This is also an issue that exists in
v17 and likely earlier versions and should probably be
backpatched.

The crux of the problem is if a constant location is
recorded multiple times, the values for $n don't take
into account the duplicate constant locations and end up
incorrectly incrementing the next value fro $n.

So, a query like

SELECT WHERE '1' IN ('2'::int, '3'::int::text)

ends up normalizing to

SELECT WHERE $1 IN ($3::int, $4::int::text)

I also added a few test cases as part of
this patch.

This does also feel like it should be backpatched.

* 0002:

Added some more tests to the ones initially proposed
by Dmitri in v3-0001 [0] including the "edge cases" which
led to the findings for 0001.


* 0003:

This fixes the normalization anomalies introduced by
62d712ec ( squashing feature ) mentioned here [1]

This patch therefore implements the fixes to track
the boundaries of an IN-list, Array expression.


* 0004: implements external parameter squashing.

While I think we should get all patches in for v18, I definitely
think we need to get the first 3 because they fix existing
bugs.

What do you think?


[0] 
https://www.postgresql.org/message-id/i635eozw2yjpzqxi5vgm4ceccqq3gv7ul4xj2xni2v6pfgtqlr%40vc5otquxmgjg
[1] 
https://www.postgresql.org/message-id/CAA5RZ0ts6zb-efiJ%2BK31Z_YDU%3DM7tHE43vv6ZBCqQxiABr3Yaw%40mail.gmail.com

--
Sami


v5-0002-Enhanced-query-jumbling-squashing-tests.patch
Description: Binary data


v5-0003-Fix-Normalization-for-squashed-query-texts.patch
Description: Binary data


v5-0001-Fix-off-by-one-error-in-query-normalization.patch
Description: Binary data


v5-0004-Support-Squashing-of-External-Parameters.patch
Description: Binary data


Re: Clarification on warning when connecting to 'pgbouncer' database via Pgbouncer

2025-05-27 Thread Tom Lane
"David G. Johnston"  writes:
> psql has zero awareness of pgbouncer or any other non-PostgreSQL server you
> might be able to point it to and establish a successful connection.  The
> lack of a warning previously is probably more incorrect than its presence
> now.

Yeah.  I'd say the fundamental problem is that pgbouncer is abusing
the server_version parameter, which will break plenty of things
besides psql --- pg_dump for instance, and just about any other
application that needs to adjust its queries to suit different
Postgres server versions.  If they feel a need to advertise
pgbouncer's version there, it wouldn't be that hard to make
server_version read out as something like "17.5 (pgbouncer XX.YY)",
which would not confuse libpq/psql (cf. parsing code in
pqSaveParameterStatus).  But apparently they didn't do that.

Anyway, I'm minus quite a lot on silencing this warning, because
it is telling you about real problems you are likely to hit.

regards, tom lane




Re: Review/Pull Request: Adding new CRC32C implementation for IBM S390X

2025-05-27 Thread John Naylor
On Tue, May 27, 2025 at 3:24 AM Eduard Stefes  wrote:
> So I worked on the algorithm to also work on buffers between 16-64
> bytes. Then I ran the performance measurement on two
> dataset[^raw_data_1] [^raw_data_2]. And created two diagrams
> [^attachment].
>
> my findings so far:
>
> - the optimized crc32cvx is faster
> - the sb8 performance is heavily depending on alignment (see the
> ripples every 8 bytes)

To be precise, these all seem 8-byte aligned at a glance, and the
ripple is due to input length.

> - the 8 byte ripple is also visible in the vx implementation. As it can
> only perform on 16 or 64 byte chunks, it will still use sb8 for the
> remaining bytes.
> - there is no obvious speed regression in the vx algorithm. Except
> raw_data_2-28 which I assume is a fluke. I am sharing the system with a
> bunch of other devs.
>
>
> I hope this this is acceptable as performance measurement. However we
> will setup a dedicated performance test and try to get precise numbers
> without side-effects. But it may take some time until we get to that.

This already looks like a solid improvement at 32 bytes and above -- I
don't think we need less noisy numbers. Also for future reference,
please reply in-line. Thanks!

--
John Naylor
Amazon Web Services




Re: Clarification on warning when connecting to 'pgbouncer' database via Pgbouncer

2025-05-27 Thread Tom Lane
Shaik Mohammad Mujeeb  writes:
> So, I feel that if we can reliably detect when the backend is a 
> non-PostgreSQL server, it might be better to adjust the warning accordingly, 
> rather than relying on a client-server version comparison in such cases.

The entire point of having a wire protocol is that you can't/don't
need to detect that.  The short answer here is that this is
pgbouncer's bug and you should be nagging them to change, not us.

regards, tom lane




Re: queryId constant squashing does not support prepared statements

2025-05-27 Thread Sami Imseih
> I've spent a bit of time looking at this, and I want to
> propose the following patchset.

Sorry about this, but I missed to add a comment in one of the
test cases for 0004 that describes the behavior of parameters
and constants that live outside of the squashed list.

The following 2 cases will result in different queryId's because
the 4th constant/parameter will be jumbled either as a type Const
or type Param.

select from tab where a in (1, 2, 3) and b = 4

select from tab where a in ($1, $2, $3) and b = $4


--
Sami


v6-0004-Support-Squashing-of-External-Parameters.patch
Description: Binary data


v6-0002-Enhanced-query-jumbling-squashing-tests.patch
Description: Binary data


v6-0003-Fix-Normalization-for-squashed-query-texts.patch
Description: Binary data


v6-0001-Fix-broken-normalization-due-to-duplicate-constan.patch
Description: Binary data


Re: Non-reproducible AIO failure

2025-05-27 Thread Alexander Lakhin

Hello hackers,

27.05.2025 16:35, Andres Freund пишет:

On 2025-05-25 20:05:49 -0400, Tom Lane wrote:

Thomas Munro writes:

Could you guys please share your exact repro steps?

I've just been running 027_stream_regress.pl over and over.
It's not a recommendable answer though because the failure
probability is tiny, under 1%.  It sounded like Alexander
had a better way.

Just FYI, I've been trying to reproduce this as well, without a single failure
so far. Despite running all tests for a few hundred times (~2 days) and
027_stream_regress.pl many hundreds of times (~1 day).

This is on a m4 mac mini.  I'm wondering if there's some hardware specific
memory ordering issue or disk speed based timing issue that I'm just not
hitting.


I'm sorry, but I need several days more to present a working reproducer.

I was lucky enough to catch the assert on my first attempt, without much
effort, but then something changed on that MacBook (it's not mine, I
connect to it remotely when it's available) and I can not reproduce it
anymore.
Just today, I discovered that 027_stream_regress is running very slow
there just because of shared_preload_libraries:
# after the 027_stream_regress test run
echo "shared_preload_libraries = 'pg_stat_statements'" >/tmp/extra.config
TEMP_CONFIG=/tmp/extra.config NO_TEMP_INSTALL=1 /usr/bin/time make -s check
 1061,29 real    56,09 user    27,69 sys
vs
NO_TEMP_INSTALL=1 /usr/bin/time make -s check
   36,42 real    27,11 user    13,98 sys

Probably it's an effect of antivirus (I see wdavdaemon_unprivileged eating
CPU time), and I uninstalled it before, but now it's installed again
(maybe by some policy). So I definitely need more time to figure out the
exact recipe for triggering the assert.

As to the configure options, when I tried to reproduce the issue on other
(non-macOS) machines, I used options from sifaka:
-DWRITE_READ_PARSE_PLAN_TREES -DSTRESS_SORT_INT_MIN 
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS,
but then I added -DREAD_STREAM_DISABLE_FAST_PATH to stress read_stream,
and then I just copied that command and ran it on MacBook...
So I think the complete compilation command was (and I'm seeing it in
the history):
CFLAGS="-DREAD_STREAM_DISABLE_FAST_PATH -DWRITE_READ_PARSE_PLAN_TREES -DSTRESS_SORT_INT_MIN 
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS" ./configure --enable-injection-points --enable-cassert --enable-debug 
--enable-tap-tests --prefix=/tmp/pg -q && make -s -j8 && make -s install && make -s check

... then running 5 027_stream_regress tests in parallel ...
I had also applied a patch to repeat "test: brin" line, but I'm not sure
it does matter.

Sorry for the lack of useful information again.

Best regards,
Alexander Lakhin
Neon (https://neon.tech)

Re: Make wal_receiver_timeout configurable per subscription

2025-05-27 Thread Fujii Masao



On 2025/05/22 21:21, Amit Kapila wrote:

On Wed, May 21, 2025 at 6:04 PM Fujii Masao  wrote:


On 2025/05/20 18:13, vignesh C wrote:

If we set the wal_receiver_timeout configuration using ALTER ROLE for
the subscription owner's role, the apply worker will start with that
value. However, any changes made via ALTER ROLE ... SET
wal_receiver_timeout will not take effect for an already running apply
worker unless the subscription is disabled and re-enabled. In
contrast, this is handled automatically during CREATE SUBSCRIPTION,
where parameter changes are detected.


Yes, this is one of the limitations of the user-settable wal_receiver_timeout
approach. If we want to change the timeout used by the apply worker without
restarting it, storing the value in pg_subscription (similar to how
synchronous_commit is handled) would be a better solution.

In that case, for example, we could set the default value of
pg_subscription.wal_receiver_timeout to -1, meaning the apply worker should
use the global wal_receiver_timeout from postgresql.conf. If the value is 0
or higher, the apply worker would use the value stored in pg_subscription.



Yeah, I had a similar idea in my mind.


OK, I've implemented two patches:

  - 0001 makes the wal_receiver_timeout GUC user-settable.
  - 0002 adds support for setting wal_receiver_timeout per subscription.
It depends on the changes in 0001.

With both patches applied, wal_receiver_timeout can be set per role or
per database using ALTER ROLE or ALTER DATABASE (from 0001), and also
per subscription using CREATE SUBSCRIPTION or ALTER SUBSCRIPTION (from 0002).
The per-subscription value is stored in pg_subscription.subwalrcvtimeout,
and it overrides the global setting of wal_receiver_timeout for that
subscription's apply worker. The default is -1, meaning the global setting
(from server config, command line, role, or database) is used.


I'll add this to the next CommitFest.



On further thought, another downside of the user-settable approach is that
it doesn't work for parameters like wal_retrieve_retry_interval, which is
used by the logical replication launcher not the apply worker. So if we
want to support per-subscription control for non-apply workers, storing
the settings in pg_subscription might be more appropriate.



Yeah, that could be an option, but one might not want to keep such
variables different for each subscription. Do you think one would like
to prefer specifying variables that only apply to the subscriber-node
in a way other than GUC? I always have this question whenever I see
GUCs like max_sync_workers_per_subscription, which are specific to
only subscriber nodes.


I like still using the xxx_per_subscription GUC as the default,
and also allowing it to be overridden per subscription. This seems
intuitive for some users and adds useful flexibility.

Regards,

--
Fujii Masao
NTT DATA Japan Corporation
From 2802e79a06d87ca59be5a440b266738c91442129 Mon Sep 17 00:00:00 2001
From: Fujii Masao 
Date: Mon, 26 May 2025 20:00:40 +0900
Subject: [PATCH v1 1/2] Make GUC wal_receiver_timeout user-settable.

When multiple subscribers connect to different publisher servers,
it can be useful to set different wal_receiver_timeout values for
each connection to better detect failures. However, previously
this wasn't possible, which limited flexibility in managing subscriptions.

This commit changes wal_receiver_timeout to be user-settable,
allowing different values to be assigned using ALTER ROLE SET for
each subscription owner. This effectively enables per-subscription
configuration.
---
 doc/src/sgml/config.sgml| 3 ---
 src/backend/utils/misc/guc_tables.c | 2 +-
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ca2a567b2b1..4f37fbfe114 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5147,9 +5147,6 @@ ANY num_sync 
( From 69ef2751f3e13e30a28f73c9defdbaaf2f62aa5e Mon Sep 17 00:00:00 2001
From: Fujii Masao 
Date: Mon, 26 May 2025 20:02:51 +0900
Subject: [PATCH v1 2/2] Add per-subscription wal_receiver_timeout setting.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit allows setting wal_receiver_timeout per subscription
using the CREATE SUBSCRIPTION and ALTER SUBSCRIPTION commands.
The value is stored in the subwalrcvtimeout column of the pg_subscription
catalog.

When set, this value overrides the global wal_receiver_timeout for
the subscription’s apply worker. The default is -1, which means the
global setting (from the server configuration, command line, role,
or database) remains in effect.

This feature is useful for configuring different timeout values for
each subscription, especially when connecting to multiple publisher
servers, to improve failure detection.

Bump catalog version.
---
 doc/src/sgml/catalogs.sgml |  10 ++
 doc/src/sgml/ref/alter_subscription.sgml   |   5 +-
 doc/src/sgml/ref/create_su

Doc: section "8.21. Pseudo-Types" needs a bit of clarification?

2025-05-27 Thread Aleksander Alekseev
Hi,

While reading our documentation about pseudo-types [1] I noticed that it says:

"""
anyelement - Indicates that a function accepts any data type
anyarray - Indicates that a function accepts any array data type
"""

This may give an impression that anyelement and anyarray can be used
as an argument ("accepted") but not as a return value, while in fact
they can. In particular we have [2]:

"""
array_fill(anyelement, integer[] [, integer[] ] ) -> anyarray
unnest(anyarray) -> setof anyelement
"""

What makes section 8.21 even more confusing is the fact that a little
below we say:

"""
cstring - Indicates that a function accepts or *returns* a
null-terminated C string
"""

Should we be slightly more precise here?

[1]: https://www.postgresql.org/docs/current/datatype-pseudo.html
[2]: https://www.postgresql.org/docs/current/functions-array.html

-- 
Best regards,
Aleksander Alekseev




Re: Non-reproducible AIO failure

2025-05-27 Thread Andres Freund
Hi,

On 2025-05-27 10:12:28 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > This is on a m4 mac mini.  I'm wondering if there's some hardware specific
> > memory ordering issue or disk speed based timing issue that I'm just not
> > hitting.
> 
> I dunno, I've seen it on three different physical machines now
> (one M1, two M4 Pros).  But it is darn hard to repro, for sure.

I just meant that it seems that I can't reproduce it for some as of yet
unknown reason. I've now been through 3k+ runs of 027_stream_regress, without
a single failure, so there has to be *something* different about my
environment than yours.

Darwin m4-dev 24.1.0 Darwin Kernel Version 24.1.0: Thu Oct 10 21:06:23 PDT 
2024; root:xnu-11215.41.3~3/RELEASE_ARM64_T8132 arm64

cc -v
Apple clang version 16.0.0 (clang-1600.0.26.4)
Target: arm64-apple-darwin24.1.0
Thread model: posix

I guess I'll try to update to a later version and see if it repros there?

Greetings,

Andres Freund




Re: Doc: section "8.21. Pseudo-Types" needs a bit of clarification?

2025-05-27 Thread David G. Johnston
On Tue, May 27, 2025 at 11:15 AM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> While reading our documentation about pseudo-types [1] I noticed that it
> says:
>
> """
> anyelement - Indicates that a function accepts any data type
> anyarray - Indicates that a function accepts any array data type
> """
>
> This may give an impression that anyelement and anyarray can be used
> as an argument ("accepted") but not as a return value, while in fact
> they can. In particular we have [2]:
>
> Should we be slightly more precise here?
>

I'm fine with the status quo for these.  The table entry points out the
pertinent part - that they must be used on at least one input argument.
The various texts both in 8.21 and 36.2 describing usage and overall
behavior make it clear they can appear also on the return type clause so
long as they also appear in the input arguments.  The functions themselves
morph strictly upon the types used in the input arguments.

David J.


Tightening DecodeNumberField's parsing rules

2025-05-27 Thread Tom Lane
Evgeniy Gorbanev reported to the security list that he'd found
a case where timestamp_in triggered an undefined-behavior
sanitizer warning, due to trying to store a float value larger
than INT_MAX into an integer variable.  We concluded that there's
no real security issue there, it's just that the C standard
doesn't say what the result should be.  So I'm moving this
discussion to the public list, because it's also not entirely
clear what the fix should be.

Evgeniy's example case was

SELECT 'j 05 t a6.424e5-'::timestamp;

which of course is garbage and draws an "invalid input syntax"
error as-expected, though not till after DecodeNumberField
has tried to stuff ".424e5 * 100" into an integer.  However,
I found that there are closely related cases that don't draw
any syntax error, such as

regression=# SELECT timestamp with time zone 'J2452271 T X03456-08';
  timestamptz   

 2001-12-27 00:34:56-08
(1 row)

regression=# SELECT timestamp with time zone 'J2452271 T X03456.001e6-08';
  timestamptz   

 2001-12-27 00:51:36-08
(1 row)

The fundamental problem here, IMO, is that DecodeNumberField assumes
without checking that its input contains only digits and perhaps a
decimal point.  In this example though, it's given whatever remains
after stripping the run-on timezone spec, that is "X03456.001e6".
That triggers both the overflow problem with the bogus ".001e6"
fraction, and a totally inappropriate reading of "X0" as "hour zero".

What I think we ought to do about this is reject strings containing
non-digits, as in the attached patch.  However, there's no doubt that
that'd make timestamp_in reject some inputs it used to accept, and
we've generally tried to avoid that --- commit f0d0394e8 for instance
is a recent effort in the direction of not breaking backward
compatibility, and it adopted a position that we should not break
cases that have a sensible interpretation, even if they're surprising.
The difference here is that I don't think there's a sensible
interpretation; or if there is, it's certainly not what the code
does today.

So what I propose we do about this is to apply the attached to HEAD
and leave the back branches alone.  Right now, inputs like this are
"garbage in, garbage out" but don't cause any real problem (upsetting
a sanitizer check isn't of concern for production use).  So I'm not
seeing that there's a good argument for back-patching.

We could alternatively apply some more-limited fix; Evgeniy's
original recommendation was just to reject if the result of
strtod() was outside the expected 0..1 range.  But I'm finding
it hard to believe that that's a great idea.

Comments?

regards, tom lane

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 793d8a9adcc..ef841a1a454 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -702,9 +702,18 @@ ParseFraction(char *cp, double *frac)
 	}
 	else
 	{
+		/*
+		 * On the other hand, let's reject anything that's not digits after
+		 * the ".".  strtod is happy with input like ".123e9", but that'd
+		 * break callers' expectation that the result is in 0..1.  (It's quite
+		 * difficult to get here with such input, but not impossible.)
+		 */
+		if (strspn(cp + 1, "0123456789") != strlen(cp + 1))
+			return DTERR_BAD_FORMAT;
+
 		errno = 0;
 		*frac = strtod(cp, &cp);
-		/* check for parse failure */
+		/* check for parse failure (probably redundant given prior check) */
 		if (*cp != '\0' || errno != 0)
 			return DTERR_BAD_FORMAT;
 	}
@@ -2964,31 +2973,23 @@ DecodeNumberField(int len, char *str, int fmask,
 	 */
 	if ((cp = strchr(str, '.')) != NULL)
 	{
-		/*
-		 * Can we use ParseFractionalSecond here?  Not clear whether trailing
-		 * junk should be rejected ...
-		 */
-		if (cp[1] == '\0')
-		{
-			/* avoid assuming that strtod will accept "." */
-			*fsec = 0;
-		}
-		else
-		{
-			double		frac;
+		int			dterr;
 
-			errno = 0;
-			frac = strtod(cp, NULL);
-			if (errno != 0)
-return DTERR_BAD_FORMAT;
-			*fsec = rint(frac * 100);
-		}
+		/* Convert the fraction into *fsec */
+		dterr = ParseFractionalSecond(cp, fsec);
+		if (dterr)
+			return dterr;
 		/* Now truncate off the fraction for further processing */
 		*cp = '\0';
 		len = strlen(str);
 	}
+
+	/* What's left had better be all digits */
+	if (strspn(str, "0123456789") != len)
+		return DTERR_BAD_FORMAT;
+
 	/* No decimal point and no complete date yet? */
-	else if ((fmask & DTK_DATE_M) != DTK_DATE_M)
+	if (cp == NULL && (fmask & DTK_DATE_M) != DTK_DATE_M)
 	{
 		if (len >= 6)
 		{
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index b90bfcd794f..5ae93d8e8a5 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -467,6 +467,15 @@ SELECT timestamp with time zone 'Y2001M12D27H04MM05S06.789-08';
 ERROR:  invalid input syntax for type tim

Clarification on warning when connecting to 'pgbouncer' database via Pgbouncer

2025-05-27 Thread Shaik Mohammad Mujeeb
Hi Hackers,



I was hoping to get some clarification regarding a behaviour I observed while 
connecting to the special 'pgbouncer' database used for administering or 
monitoring Pgbouncer.



After the commit cf0cab868a, introduced in PG15, I noticed that when connecting 
to the 'pgbouncer' database via Pgbouncer, the following warning is shown:

psql (17.2, server 1.20.1/bouncer)

WARNING: psql major version 17, server major version 1.20.

 Some psql features might not work.








>From what I understand, this seems to be due to the lower version check in the 
>connection_warnings() function, where we check if pset.sversion < 90200.

if (pset.sversion / 100 > client_ver / 100 ||

pset.sversion < 90200)

printf(_("WARNING: %s major version %s, server major version %s.\n"

" Some psql features might not work.\n"),

   pset.progname,

   formatPGVersionNumber(client_ver, false,

cverbuf, sizeof(cverbuf)),

   formatPGVersionNumber(pset.sversion, false,

sverbuf, sizeof(sverbuf)));







In my case, pset.sversion ends up being 12001 (due to PgBouncer v1.20.1), and 
since that’s less than 90200, the warning gets triggered, which feels 
misleading. But I was wondering - does it really make sense to compare 
PgBouncer’s version in this context using the same logic as PostgreSQL server 
versions?



Is this an expected behaviour, or would it make sense to handle Pgbouncer 
differently in this check?



Appreciate any insights!



Thanks & Regards,
Shaik Mohammad Mujeeb
Member Technical Staff
Zoho Corp

Re: Non-reproducible AIO failure

2025-05-27 Thread Tom Lane
Andres Freund  writes:
> I just meant that it seems that I can't reproduce it for some as of yet
> unknown reason. I've now been through 3k+ runs of 027_stream_regress, without
> a single failure, so there has to be *something* different about my
> environment than yours.

> Darwin m4-dev 24.1.0 Darwin Kernel Version 24.1.0: Thu Oct 10 21:06:23 PDT 
> 2024; root:xnu-11215.41.3~3/RELEASE_ARM64_T8132 arm64

> cc -v
> Apple clang version 16.0.0 (clang-1600.0.26.4)
> Target: arm64-apple-darwin24.1.0
> Thread model: posix

> I guess I'll try to update to a later version and see if it repros there?

Maybe.  All the machines I've seen it on are current-software:

$ uname -a
Darwin minim4.sss.pgh.pa.us 24.5.0 Darwin Kernel Version 24.5.0: Tue Apr 22 
19:53:27 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T6041 arm64
$ cc -v
Apple clang version 17.0.0 (clang-1700.0.13.3)
Target: arm64-apple-darwin24.5.0
Thread model: posix
InstalledDir: 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

If it's OS-version-specific, that raises the odds IMO that this
is Apple's fault more than ours.

regards, tom lane




Re: Assert("vacrel->eager_scan_remaining_successes > 0")

2025-05-27 Thread Masahiko Sawada
On Fri, May 23, 2025 at 1:53 PM Melanie Plageman
 wrote:
>
> On Fri, May 23, 2025 at 12:41 PM Masahiko Sawada  
> wrote:
> >
> > I'll remove that part and push early next week, barring any objections.
>
> Great, thanks so much!

Pushed the fix and closed the open item. Thank you for reviewing the patch!

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Clarification on warning when connecting to 'pgbouncer' database via Pgbouncer

2025-05-27 Thread David G. Johnston
On Tue, May 27, 2025 at 11:41 AM Shaik Mohammad Mujeeb <
mujeeb...@zohocorp.com> wrote:

> After the commit cf0cab868a, introduced in PG15, I noticed that when
> connecting to the 'pgbouncer' database via Pgbouncer, the following warning
> is shown:
>
> psql (17.2, server 1.20.1/bouncer)
> WARNING: psql major version 17, server major version 1.20.
>  Some psql features might not work.
>
>
> Is this an expected behaviour, or would it make sense to handle Pgbouncer
> differently in this check?
>

psql has zero awareness of pgbouncer or any other non-PostgreSQL server you
might be able to point it to and establish a successful connection.  The
lack of a warning previously is probably more incorrect than its presence
now.

That said, I'd probably try and teach psql to check for whether its remote
is a PostgreSQL server and, if not, warn that basically everything but
direct query sending is unlikely to work.  Skipping the version processing
logic altogether since you are correct it makes no sense to compare the
client version to a server version of a non-PostgreSQL server.

David J.


Re: queryId constant squashing does not support prepared statements

2025-05-27 Thread Álvaro Herrera
On 2025-May-24, Sami Imseih wrote:

> therefore, a user supplied query like this:
> ```
> select where $5 in ($1, $2, $3) and $6 = $4 and 1 = 2
> ```
> 
> will be normalized to:
> ```
> select where $1 in ($2 /*...*/) and $3 = $4 and $5 = $6
> ```

Hmm, interesting.

I think this renumbering should not be a problem in practice; users with
unordered parameters have little room to complain if the param numbers
change on query normalization.  At least that's how it seems to me.

If renumbering everything in physical order makes the code simpler, then
I don't disagree.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"




Re: Cygwin support

2025-05-27 Thread Ken Marshall
On Tue, May 27, 2025 at 10:53:55AM -0400, Andrew Dunstan wrote:
> 
> On 2025-04-28 Mo 4:53 PM, Mark Woodward wrote:
> > What are the economics of this? I used PostgreSQL and Cygwin 25 years
> > ago and am amazed it is still a thing.
> > How much effort is it to support PostgreSQL on Cygwin?
> > How many actual users are using PostgreSQL on cygwin in production? (I
> > should hope none!)
> > 
> > I would say it is something that should be announced as "deprecated" and
> > see how many people complain, my bet no one will really care. Cygwin was
> > a phenomenal hack in its day but I believe that those days have passed.
> 
> 
> 
> Please don't top-post on PostgreSQL lists.
> 
> I don't see it as our role to pass judgements like this on what people use.
> While Cygwin exists it's not our province to deprecate its use. If the
> maintenance effort were onerous I might be willing to relook at our support
> for it, but the simple answer to your first question is that the maintenance
> effort is close to zero. As I pointed out elsewhere in this thread, even if
> the server has limited use, the Cygwin psql client is nicer to use on
> Windows than the native build, reason enough to keep it going, at least
> until we improve the native build.
> 
> 
> cheers
> andrew
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com

+1

I also have known of environments where Cygwin was an allowed
application and native Windows applications were severely restricted. Go
figure.

Regards,
Ken




Re: Replication slot is not able to sync up

2025-05-27 Thread Masahiko Sawada
On Fri, May 23, 2025 at 10:07 PM Amit Kapila  wrote:
>
> In the case presented here, the logical slot is expected to keep
> forwarding, and in the consecutive sync cycle, the sync should be
> successful. Users using logical decoding APIs should also be aware
> that if due for some reason, the logical slot is not moving forward,
> the master/publisher node will start accumulating dead rows and WAL,
> which can create bigger problems.

I've tried this case and am concerned that the slot synchronization
using pg_sync_replication_slots() would never succeed while the
primary keeps getting write transactions. Even if the user manually
consumes changes on the primary, the primary server keeps advancing
its XID in the meanwhile. On the standby, we ensure that the
TransamVariables->nextXid is beyond the XID of WAL record that it's
going to apply so the xmin horizon calculated by
GetOldestSafeDecodingTransactionId() ends up always being higher than
the slot's catalog_xmin on the primary. We get the log message "could
not synchronize replication slot "s" because remote slot precedes
local slot" and cleanup the slot on the standby at the end of
pg_sync_replication_slots().

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Pathify RHS unique-ification for semijoin planning

2025-05-27 Thread Richard Guo
On Thu, May 22, 2025 at 4:05 PM Richard Guo  wrote:
> Therefore, I'm thinking that maybe we could create a new RelOptInfo
> for the RHS rel to represent its unique-ified version, and then
> generate all worthwhile paths for it, similar to how it's done in
> create_distinct_paths().  Since this is likely to be called repeatedly
> on the same rel, we can cache the new RelOptInfo in the rel struct,
> just like how we cache cheapest_unique_path currently.
>
> To be concrete, I'm envisioning something like the following:
>
> if (bms_equal(sjinfo->syn_righthand, rel2->relids) &&
> -   create_unique_path(root, rel2, rel2->cheapest_total_path,
> -  sjinfo) != NULL)
> +   (rel2_unique = create_unique_rel(root, rel2, sjinfo)) != NULL)
>
> ...
>
> -   add_paths_to_joinrel(root, joinrel, rel1, rel2,
> -JOIN_UNIQUE_INNER, sjinfo,
> +   add_paths_to_joinrel(root, joinrel, rel1, rel2_unique,
> +JOIN_INNER, sjinfo,
>  restrictlist);
> -   add_paths_to_joinrel(root, joinrel, rel2, rel1,
> -JOIN_UNIQUE_OUTER, sjinfo,
> +   add_paths_to_joinrel(root, joinrel, rel2_unique, rel1,
> +JOIN_INNER, sjinfo,
>  restrictlist);

Here is a WIP draft patch based on this idea.  It retains
JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER to help determine whether the
inner relation is provably unique, but otherwise removes most of the
code related to these two join types.

Additionally, the T_Unique path now has the same meaning for both
semijoins and DISTINCT clauses: it represents adjacent-duplicate
removal on presorted input.  This patch unifies their handling by
sharing the same data structures and functions.

There are a few plan diffs in the regression tests.  As far as I can
tell, the changes are improvements.  One of them is caused by the fact
that we now consider parameterized paths in unique-ified cases.  The
rest are mostly a result of now preserving pathkeys for unique paths.

This patch is still a work in progress.  Before investing too much
time into it, I'd like to get some feedback on whether it's heading in
the right direction.

Thanks
Richard


v1-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patch
Description: Binary data


Re: Standardize the definition of the subtype field of AlterDomainStmt

2025-05-27 Thread Quan Zongliang



On 2025/5/27 11:54, Michael Paquier wrote:

On Tue, May 27, 2025 at 11:06:46AM +0800, Quan Zongliang wrote:

I noticed that the subtype of AlterDomainStmt is directly using constants in
the code. It is not conducive to the maintenance and reading of the code.
Based on the definition of AlterTableType, use "AD_" as the prefix. Define
several macros to replace the original characters.
The subtype of AlterTableCmd is defined using an enumeration. The subtypes
of AlterDomainStmt are relatively few in number, and the original definition
uses characters. These definitions still use characters and maintain the
values unchanged. If some plugins or tools are also processing
AlterDomainStmt, there will be no errors.


Sounds like a good idea.  As far as I can see after a closer lookup at
the tree, you have updated all the code paths that matter for this
change, and you have added a CF entry:
https://commitfest.postgresql.org/patch/5780/

+#define AD_VaidateConstraint   'V' /* VALIDATE CONSTRAINT */


Updated
Thank you.


s/Vaidate/Validate
--
Michael
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 54ad38247aa..3f09f85a480 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -15696,7 +15696,7 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid 
refRelId, char *cmd,
{
AlterDomainStmt *stmt = (AlterDomainStmt *) stm;
 
-   if (stmt->subtype == 'C')   /* ADD CONSTRAINT */
+   if (stmt->subtype == AD_AddConstraint)
{
Constraint *con = castNode(Constraint, 
stmt->def);
AlterTableCmd *cmd = makeNode(AlterTableCmd);
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0b5652071d1..103021c0947 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -11629,7 +11629,7 @@ AlterDomainStmt:
{
AlterDomainStmt *n = 
makeNode(AlterDomainStmt);
 
-   n->subtype = 'T';
+   n->subtype = AD_AlterDefault;
n->typeName = $3;
n->def = $4;
$$ = (Node *) n;
@@ -11639,7 +11639,7 @@ AlterDomainStmt:
{
AlterDomainStmt *n = 
makeNode(AlterDomainStmt);
 
-   n->subtype = 'N';
+   n->subtype = AD_DropNotNull;
n->typeName = $3;
$$ = (Node *) n;
}
@@ -11648,7 +11648,7 @@ AlterDomainStmt:
{
AlterDomainStmt *n = 
makeNode(AlterDomainStmt);
 
-   n->subtype = 'O';
+   n->subtype = AD_SetNotNull;
n->typeName = $3;
$$ = (Node *) n;
}
@@ -11657,7 +11657,7 @@ AlterDomainStmt:
{
AlterDomainStmt *n = 
makeNode(AlterDomainStmt);
 
-   n->subtype = 'C';
+   n->subtype = AD_AddConstraint;
n->typeName = $3;
n->def = $5;
$$ = (Node *) n;
@@ -11667,7 +11667,7 @@ AlterDomainStmt:
{
AlterDomainStmt *n = 
makeNode(AlterDomainStmt);
 
-   n->subtype = 'X';
+   n->subtype = AD_DropConstraint;
n->typeName = $3;
n->name = $6;
n->behavior = $7;
@@ -11679,7 +11679,7 @@ AlterDomainStmt:
{
AlterDomainStmt *n = 
makeNode(AlterDomainStmt);
 
-   n->subtype = 'X';
+   n->subtype = AD_DropConstraint;
n->typeName = $3;
n->name = $8;
n->behavior = $9;
@@ -11691,7 +11691,7 @@ AlterDomainStmt:
{
AlterDomainStmt *n = 
makeNode(AlterDomainStmt);
 
-   n->subtype = 'V';
+   n->subtype = AD_ValidateConstraint;

Re: Standardize the definition of the subtype field of AlterDomainStmt

2025-05-27 Thread wenhui qiu
HI
Thank you for your update ,I marked the path as "Ready for Committer"



Thank

On Wed, May 28, 2025 at 10:27 AM Quan Zongliang 
wrote:

>
>
> On 2025/5/27 11:54, Michael Paquier wrote:
> > On Tue, May 27, 2025 at 11:06:46AM +0800, Quan Zongliang wrote:
> >> I noticed that the subtype of AlterDomainStmt is directly using
> constants in
> >> the code. It is not conducive to the maintenance and reading of the
> code.
> >> Based on the definition of AlterTableType, use "AD_" as the prefix.
> Define
> >> several macros to replace the original characters.
> >> The subtype of AlterTableCmd is defined using an enumeration. The
> subtypes
> >> of AlterDomainStmt are relatively few in number, and the original
> definition
> >> uses characters. These definitions still use characters and maintain the
> >> values unchanged. If some plugins or tools are also processing
> >> AlterDomainStmt, there will be no errors.
> >
> > Sounds like a good idea.  As far as I can see after a closer lookup at
> > the tree, you have updated all the code paths that matter for this
> > change, and you have added a CF entry:
> > https://commitfest.postgresql.org/patch/5780/
> >
> > +#define AD_VaidateConstraint   'V' /* VALIDATE CONSTRAINT */
> >
> Updated
> Thank you.
>
> > s/Vaidate/Validate
> > --
> > Michael
>


Re: Assertion failure in smgr.c when using pg_prewarm with partitioned tables

2025-05-27 Thread Fujii Masao




On 2025/05/26 16:55, ikedamsh wrote:

2025/05/21 12:54 Fujii Masao :

On Mon, May 19, 2025 at 5:18 PM Masahiro Ikeda  
wrote:
 >
 > Thanks for your work and feedback!
 >
 > I've updated the patches and added regular regression tests for
 > both pg_prewarm and amcheck.

Thanks for updating the patches!

Regarding the 0001 patch:

+CREATE TABLE test_part1 PARTITION OF test FOR VALUES FROM (1) TO (1000);
+INSERT INTO test SELECT generate_series(1, 100);

These lines don't seem necessary for the test. How about removing them?
It would simplify the test and slightly reduce the execution time - though
the time savings are very minimal.

+-- To specify the relation which does not have storage should fail.

This comment could be clearer as:
"pg_prewarm() should fail if the target relation has no storage."

+ /* Check that the storage exists. */

Maybe rephrase to:
"Check that the relation has storage." ?


Thanks! I will fix them.


Thanks!



Regarding the 0002 patch:

- errdetail("Relation \"%s\" is a %s index.",
-    RelationGetRelationName(rel), NameStr(((Form_pg_am)
GETSTRUCT(amtuprel))->amname;
+ errdetail("Relation \"%s\" is a %s %sindex.",
+    RelationGetRelationName(rel), NameStr(((Form_pg_am)
GETSTRUCT(amtuprel))->amname),
+    (rel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) ?
"partitioned " : "")));

Instead of manually building the message, how about using
errdetail_relkind_not_supported()?
It would be more consistent with similar error reporting elsewhere.

I was thinking of using errdetail_relkind_not_supported(),
but I’m reconsidering building the message manually
since the AM name seems to be important for the error.
What do you think?


Understood.
I was trying to better understand the error message, as I found
the following is still a bit confusing for users. However, I don't
have a better suggestion at the moment, so I'm okay with
the proposed change.

ERROR:  expected "btree" index as targets for verification
DETAIL:  Relation "pgbench_accounts_pkey" is a btree partitioned


This is not directly relation to your proposal, but while reading
the index_checkable() function, I noticed that ReleaseSysCache()
is not called after SearchSysCache1(). Shouldn't we call
ReleaseSysCache() here? Alternatively, we could use get_am_name()
instead of SearchSysCache1(), which might be simpler.

I also observed that the error code ERRCODE_FEATURE_NOT_SUPPORTED
is used when the relation is not the expected type in index_checkable().
However, based on similar cases (e.g., pgstattuple), it seems that
ERRCODE_WRONG_OBJECT_TYPE might be more appropriate in this situation.
Thought?

Regards,

--
Fujii Masao
NTT DATA Japan Corporation





Re: Large expressions in indexes can't be stored (non-TOASTable)

2025-05-27 Thread Michael Paquier
On Tue, May 27, 2025 at 03:02:52PM -0500, Nathan Bossart wrote:
> Here is what I have staged for commit for the others.  I'm hoping to push
> these in the next couple of days.

Thanks for the refreshed versions.  Looks sensible to me overall.

+static inline void
+AssertHasSnapshotForToast(Relation rel)
[...]
+/*
+ * Commit 16bf24e fixed accesses to pg_replication_origin without a
+ * an active snapshot by removing its TOAST table.  On older branches,
+ * these bugs are left in place.  Its only varlena column is roname (the
+ * replication origin name), so this is only a problem if the name
+ * requires out-of-line storage, which seems unlikely.  In any case,
+ * fixing it doesn't seem worth extra code churn on the back-branches.
+ */
+if (RelationGetRelid(rel) == ReplicationOriginRelationId)
+return;

As of the back-branches but not HEAD, this shortcut makes sense.
--
Michael


signature.asc
Description: PGP signature


Re: Speed up JSON escape processing with SIMD plus other optimisations

2025-05-27 Thread David Rowley
On Wed, 28 May 2025 at 11:24, John Naylor  wrote:
> https://lemire.me/blog/2025/04/13/detect-control-characters-quotes-and-backslashes-efficiently-using-swar/
>
> I don't find this use of SWAR that bad for readability, and there's
> only one obtuse clever part that merits a comment. Plus, it seems json
> escapes are pretty much set in stone?

I think we'll end up needing some SWAR code. There are plenty of
places where 16 bytes is too much to do at once. e.g. looking for the
delimiter character in a COPY FROM, 16 is likely too many when you're
important a bunch of smallish ints. A 4 or 8 byte SWAR search is
likely better for that. With 16 you're probably going to find a
delimiter every time you look and do byte-at-a-time processing to find
that delimiter.

> I gave this a spin with
> https://www.postgresql.org/message-id/attachment/163406/json_bench.sh.txt
>
> master:
>
> Test 1
> tps = 321.522667 (without initial connection time)
> tps = 315.070985 (without initial connection time)
> tps = 331.070054 (without initial connection time)
> Test 2
> tps = 35.107257 (without initial connection time)
> tps = 34.977670 (without initial connection time)
> tps = 35.898471 (without initial connection time)
> Test 3
> tps = 33.575570 (without initial connection time)
> tps = 32.383352 (without initial connection time)
> tps = 31.876192 (without initial connection time)
> Test 4
> tps = 810.676116 (without initial connection time)
> tps = 745.948518 (without initial connection time)
> tps = 747.651923 (without initial connection time)
>
> swar patch:
>
> Test 1
> tps = 291.919004 (without initial connection time)
> tps = 294.446640 (without initial connection time)
> tps = 307.670464 (without initial connection time)
> Test 2
> tps = 30.984440 (without initial connection time)
> tps = 31.660630 (without initial connection time)
> tps = 32.538174 (without initial connection time)
> Test 3
> tps = 29.828546 (without initial connection time)
> tps = 30.332913 (without initial connection time)
> tps = 28.873059 (without initial connection time)
> Test 4
> tps = 748.676688 (without initial connection time)
> tps = 768.798734 (without initial connection time)
> tps = 766.924632 (without initial connection time)
>
> While noisy, this test seems a bit faster with SWAR, and it's more
> portable to boot. I'm not sure where I'd put the new function so both
> call sites can see it, but that's a small detail...

Isn't that mostly a performance regression? How does it do with ANSI
chars where the high bit is set?

I had in mind we'd have a swar.h header and have a bunch of inline
functions for this in there. I've not yet studied how well compilers
would inline multiple such SWAR functions to de-duplicate the common
parts.

David




Re: queryId constant squashing does not support prepared statements

2025-05-27 Thread Michael Paquier
On Tue, May 27, 2025 at 05:05:39PM -0500, Sami Imseih wrote:
> * 0001:
> 
> This is a normalization issue discovered when adding new
> tests for squashing. This is also an issue that exists in
> v17 and likely earlier versions and should probably be
> backpatched.
> 
> The crux of the problem is if a constant location is
> recorded multiple times, the values for $n don't take
> into account the duplicate constant locations and end up
> incorrectly incrementing the next value from $n.
> 
> This does also feel like it should be backpatched.

Yes, this needs to be backpatched and it is actually a safe backpatch
because only the text representation is changed when adding a new
entry in the dshash; it only improves the reports without touching the
existing data.  I'm OK to take care of this one by myself, even in the
context of this thread.  It is an issue independent of what we're
discussing here for the list squashing.  As there is only one sprintf() in
generate_normalized_query() in ~17, the fix of the back-branches is
slightly simpler. 

You have mentioned the addition of tests, but v6-0001 includes nothing
of the kind.  Am I missing something?  How much coverage did you
intend to add here?  These seem to be included in squashing.sql in
patch v6-0002, but IMO this should be moved somewhere else to work
with the back-branches and make the whole backpatch story more
consistent.

> * 0002:
> 
> Added some more tests to the ones initially proposed
> by Dmitri in v3-0001 [0] including the "edge cases" which
> led to the findings for 0001.

Tests for CoerceViaIO with jsonb have been moved around.  Not a big
deal, but that makes the diffs of the patch confusing to read.

+-- if there is only one level of relabeltype, the list will be squashable

RelabelType perhaps?

A lot of the tests introduced in v6-0002 are copy-pastes of the
previous ones for IN clauses introduced for the ARRAY cases, with
comments explaining the reasons why lists are squashed or not also
copy-pasted.  Perhaps it would make sense to group the ARRAY and IN
clause cases together.  For example, group each of the two CoerceViaIO
cases together in a single query on pg_stat_statements, with a single
pg_stat_statements_reset().  That would make more difficult to miss
the fact that we need to care about IN clauses *and* arrays when
adding more test patterns, if we add some of course.

The cases where IN clauses are rewritten as ArrayExpr are OK kept at
the end.

> * 0003:
> 
> This fixes the normalization anomalies introduced by
> 62d712ec ( squashing feature ) mentioned here [1]
> 
> This patch therefore implements the fixes to track
> the boundaries of an IN-list, Array expression.

Nice simplifications in the PGSS part in terms of 

+   ListWithBoundary *n = $4;

I'd suggest to not use "n" for this one, but a different variable
name, leaving the internals for the SubLink cases minimally touched.

+typedef struct ListWithBoundary
+{
+   Node   *expr;
+   ParseLocstart;
+   ParseLocend;
+} ListWithBoundary;

Implementation-wise, I would choose a location with a query length
rather than start and end locations.  That's what we do for the nested
queries in the DMLs, so on consistency grounds..

> * 0004: implements external parameter squashing.

+static void
+_jumbleParam(JumbleState *jstate, Node *node)
+{
[...]
+   if (expr->paramkind == PARAM_EXTERN)
+   {
+   RecordExpressionLocation(jstate, expr->location, -1, true);
+
+   if (expr->paramid > jstate->highest_extern_param_id)
+   jstate->highest_extern_param_id = expr->paramid;
+   }

Using a custom implementation for Param nodes means that we are going
to apply a location record for all external parameters, not only the
ones in the lists..  Not sure if this is a good idea.  Something
smells a bit wrong with this approach.  Sorry, I cannot push my finger
on what exactly when typing this paragraph.

> While I think we should get all patches in for v18, I definitely
> think we need to get the first 3 because they fix existing
> bugs.
> 
> What do you think?

Patches 0002 and 0003 fix bugs in the squashing logic present only on
HEAD, nothing that impacts older branches already released, right?
--
Michael


signature.asc
Description: PGP signature


Re: Speed up JSON escape processing with SIMD plus other optimisations

2025-05-27 Thread John Naylor
On Wed, May 28, 2025 at 10:18 AM David Rowley  wrote:

> I think we'll end up needing some SWAR code. There are plenty of
> places where 16 bytes is too much to do at once. e.g. looking for the
> delimiter character in a COPY FROM, 16 is likely too many when you're
> important a bunch of smallish ints. A 4 or 8 byte SWAR search is
> likely better for that. With 16 you're probably going to find a
> delimiter every time you look and do byte-at-a-time processing to find
> that delimiter.

I believe I saw an implementation or paper where the delimiter
locations for an input segment are turned into a bitmap, and in that
case the stride wouldn't matter so much. I've seen a use of SWAR in a
hash table that intentionally allowed bytes to overflow into the next
higher byte, which doesn't happen in SIMD lanes, so it can be useful
on all platforms.

> Isn't that mostly a performance regression?

D'oh! My brain was imagining time, not TPS.

> How does it do with ANSI chars where the high bit is set?

It would always fail in that case.

> I had in mind we'd have a swar.h header and have a bunch of inline
> functions for this in there. I've not yet studied how well compilers
> would inline multiple such SWAR functions to de-duplicate the common
> parts.

That would be a good step when we have a use case, and with that we
might also be able to clean up some odd-looking code in simd.h.

-- 
John Naylor
Amazon Web Services