date:20250425

Bertrand Drouvot  writes:
> While working on wait events I faced some compilation issues due to circular
> header file dependency (introduced in fa88928470b5) between wait_event.h and
> wait_event_types.h.

Ugh.  I still carry the scars of cleaning up after a previous
circular-inclusion mess (cf 1609797c2), so I'm always worried about
introducing new cases.  I don't have an opinion about whether this
specific refactoring is the best way to deal with this case, but
I definitely feel that we mustn't allow the situation to persist.

> Out of curiosity, I ran clang-tidy with misc-header-include-cycle ([1]) and it
> also reports:
> ../src/pl/plpython/plpy_util.h:9:10: warning: circular header file dependency 
> detected while including 'plpython.h'
> This one worries me less because plpy_util.h only contains simple external 
> function declarations.

Whatever it contains, we need to kill it with fire before the problem
metastasizes like it did the last time.  (yeah, yeah, badly mixed
metaphors)  I can take a look at this one over the weekend if nobody
beats me to it.

I am very glad to hear that there's a readily available tool to
catch such cases.  We ought to run it every so often.

regards, tom lane

Re: Sanding down some edge cases for PL/pgSQL reserved words

"Joel Jacobson"  writes:
> For years, I've felt we could benefit from introducing convenience syntax to
> explicitly require that exactly one row is affected by a query, something 
> which
> currently requires using a somewhat cumbersome workaround:

> - Using `... INTO STRICT ...` for `SELECT`,
> - Using `RETURNING ... INTO STRICT ...` for `DELETE/UPDATE/INSERT`, or
> - Checking `ROW_COUNT` via `GET DIAGNOSTICS` and raising an error if not 
> exactly one row.

> I think it would be more convenient and intuitive if we could simply write:

> ```
> STRICT [SELECT | UPDATE | INSERT | DELETE] ...;
> ```

Meh.  I don't really have an opinion on whether this is worth bespoke
syntax, but if it is:

(1) I don't see why we'd restrict it to plpgsql as opposed to
implementing it in core SQL.

(2) Putting the keyword at the front seems fairly un-SQL-like.
For SELECT, "SELECT STRICT ..." would seem more natural, as it calls
back to SELECT DISTINCT; or you could imagine integrating it into the
LIMIT clause.  Not as sure what to do for the DML commands, but
somewhere near where we put RETURNING seems saner.

Also, even if we did do it in plpgsql exactly as you suggest, making
it unreserved doesn't move the needle on whether that's possible.
Most of plpgsql's statement-starting keywords are unreserved.

But please, don't hijack this thread for that discussion ...

regards, tom lane

Avoid circular header file dependency

2025-04-25 Thread Bertrand Drouvot

Hi hackers,

While working on wait events I faced some compilation issues due to circular
header file dependency (introduced in fa88928470b5) between wait_event.h and
wait_event_types.h. Those files have include guards but could still lead to
compilation issues in some cases due to the circular dependency. 

Currently, on HEAD, this doesn't cause any issues but I think it's better to
avoid circular header file dependencies (harder to maintain and understand).

Please find attached a patch to $SUBJECT between those 2 header files: it
extracts (in a new header file) from wait_event.h what is strictly needed in
wait_event_types.h.

Out of curiosity, I ran clang-tidy with misc-header-include-cycle ([1]) and it
also reports:

../src/pl/plpython/plpy_util.h:9:10: warning: circular header file dependency 
detected while including 'plpython.h'

This one worries me less because plpy_util.h only contains simple external 
function declarations.

I was surprised that clang-tidy does report only the plpy_util.h one (in 
addition
to the wait_event_types.h one) so I wondered if misc-header-include-cycle was
failing to discover circular dependencies in nested cases.

I did a quick test with: a.h → c.h → b.h → d.h → a.h and it found it:

"
../d.h:6:10: warning: circular header file dependency detected while including 
'a.h'
6 | #include "a.h"
  |  ^
../b.h:6:10: note: 'd.h' included from here
6 | #include "d.h"
  |  ^
../c.h:6:10: note: 'b.h' included from here
6 | #include "b.h"
  |  ^
../a.h:6:10: note: 'c.h' included from here
6 | #include "c.h"
  |  ^
../main.c:2:10: note: 'a.h' included from here
2 | #include "a.h"
"

So it looks like that our code base only contains those 2: plpy_util.h and
wait_event_types.h cases.

Thoughts?

[1]: 
https://clang.llvm.org/extra/clang-tidy/checks/misc/header-include-cycle.html

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From e001863ab8645b5b6c2021c857ac541e230398dd Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Thu, 24 Apr 2025 17:11:03 +
Subject: [PATCH v1] Avoid including wait_event.h in wait_event_types.h

Adding wait_class_constants.h to avoid including wait_event.h in
wait_event_types.h (that produced a circular include).
---
 .../activity/generate-wait_event_types.pl |  2 +-
 src/include/utils/wait_class_constants.h  | 29 +++
 src/include/utils/wait_event.h| 17 ++-
 3 files changed, 32 insertions(+), 16 deletions(-)
 create mode 100644 src/include/utils/wait_class_constants.h

diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl
index 171bf2ae632..52b6b7b2ae6 100644
--- a/src/backend/utils/activity/generate-wait_event_types.pl
+++ b/src/backend/utils/activity/generate-wait_event_types.pl
@@ -168,7 +168,7 @@ if ($gen_code)
 	printf $h $header_comment, 'wait_event_types.h';
 	printf $h "#ifndef WAIT_EVENT_TYPES_H\n";
 	printf $h "#define WAIT_EVENT_TYPES_H\n\n";
-	printf $h "#include \"utils/wait_event.h\"\n\n";
+	printf $h "#include \"utils/wait_class_constants.h\"\n\n";
 
 	printf $c $header_comment, 'pgstat_wait_event.c';
 
diff --git a/src/include/utils/wait_class_constants.h b/src/include/utils/wait_class_constants.h
new file mode 100644
index 000..286f9c2daec
--- /dev/null
+++ b/src/include/utils/wait_class_constants.h
@@ -0,0 +1,29 @@
+/*-
+ * wait_class_constants.h
+ *	  Constants related to wait classes
+ *
+ * Copyright (c) 2001-2025, PostgreSQL Global Development Group
+ *
+ * src/include/utils/wait_class_constants.h
+ * --
+ */
+#ifndef WAIT_CLASS_CONSTANTS_H
+#define WAIT_CLASS_CONSTANTS_H
+
+
+/* --
+ * Wait Classes
+ * --
+ */
+#define PG_WAIT_LWLOCK0x0100U
+#define PG_WAIT_LOCK0x0300U
+#define PG_WAIT_BUFFERPIN			0x0400U
+#define PG_WAIT_ACTIVITY			0x0500U
+#define PG_WAIT_CLIENT0x0600U
+#define PG_WAIT_EXTENSION			0x0700U
+#define PG_WAIT_IPC	0x0800U
+#define PG_WAIT_TIMEOUT0x0900U
+#define PG_WAIT_IO	0x0A00U
+#define PG_WAIT_INJECTIONPOINT		0x0B00U
+
+#endif			/* WAIT_CLASS_CONSTANTS_H */
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index b8cb3e5a430..c3c1ce6ef2e 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -10,21 +10,8 @@
 #ifndef WAIT_EVENT_H
 #define WAIT_EVENT_H
 
-
-/* --
- * Wait Classes
- * --
- */
-#define PG_WAIT_LWLOCK0x0100U
-#define PG_WAIT_LOCK0x0300U
-#define PG_WAIT_BUFFERPIN			0x0400U
-#define PG_WAIT_ACTIVITY			0x0500U
-#define PG_WAIT_CLIENT0x0600U
-#define PG_WAIT_EXTENSION			0x0700U
-#define PG_WAIT_IPC	0x0800U
-#define PG_WAIT_TIMEOUT0x0900U
-#define P

Re: Adding pg_dump flag for parallel export to pipes

2025-04-25 Thread Thomas Munro

On Tue, Apr 8, 2025 at 7:48 AM Hannu Krosing  wrote:
> Just to bring this out separately : Does anybody have any idea why pipe 
> commands close inside tests ?
>
> Re: 003-pg_dump_basic_tests has a few basic validation tests for
> correctmflag combinations. We need to write more automated tests in
> 002_pg_dump.pl but have been running into some issues with environment
> setup due to which certain pipe commands result in the shell process
> becoming defunct. These same commands are working fine in manual
> testing. We are still looking into this.

No comment on the wider project except that it looks generally useful,
and I can see that it's not possible to use the conventional POSIX
filename "-" to represent stdout, because you need to write to
multiple files so you need to come up with *something* along the lines
you're proposing here.  But I was interested in seeing if I could help
with that technical problem you mentioned above, and I don't see that
happening with the current patches.  Do I understand correctly that
the problem you encountered is in some other tests that you haven't
attached yet?  Could you post what you have so that others can see the
problem and perhaps have a chance of helping?  I also recommend using
git format-patch when you post patches so that you have a place to
write a commit message including a note about which bits are WIP and
known not to work correctly yet.

Re: Sanding down some edge cases for PL/pgSQL reserved words

2025-04-25 Thread Joel Jacobson

On Sat, Apr 26, 2025, at 06:44, Tom Lane wrote:
> This is a rather delayed response to the discussion of bug
> #18693 [1], in which I wrote:
...
> which is pretty bogus because the record *does* have a field
> named "strict".  The actual problem is that STRICT is a fully
> reserved PL/pgSQL keyword, which means you need to double-quote
> it if you want to use it this way.

I'd like to briefly raise an old nostalgic PL/pgSQL dream of mine that might be
affected by this change.

For years, I've felt we could benefit from introducing convenience syntax to
explicitly require that exactly one row is affected by a query, something which
currently requires using a somewhat cumbersome workaround:

- Using `... INTO STRICT ...` for `SELECT`,
- Using `RETURNING ... INTO STRICT ...` for `DELETE/UPDATE/INSERT`, or
- Checking `ROW_COUNT` via `GET DIAGNOSTICS` and raising an error if not 
exactly one row.

I think it would be more convenient and intuitive if we could simply write:

```
STRICT [SELECT | UPDATE | INSERT | DELETE] ...;
```

That is, allowing `STRICT` followed directly by any regular `SELECT`, `UPDATE`,
`INSERT`, or `DELETE` command, explicitly enforcing exactly one affected row.

Changing `STRICT` to become an unreserved keyword in PL/pgSQL would effectively
close the window of opportunity for this syntax, as it would introduce ambiguity
in command parsing.

I was actually not aware of STRICT already being a reserved PL/pgSQL keyword.
Had I known that, I would have proposed this convenience syntax already since
a long time ago.

I wonder how often developers truly need to use "strict" as a field name versus
the potential usage of a clean and explicit syntax for enforcing single-row
results without additional verbosity.

/Joel

Re: Making sslrootcert=system work on Windows psql

> On 25 Apr 2025, at 13:53, Daniel Gustafsson  wrote:
>> 
>>> (2) sslrootcert=system on Windows doesn’t do a thing that would be 
>>> extremely useful in some common situations. Namely: connecting securely to 
>>> servers that present a certificate signed by a public CA.
>> 
>> Just to be clear, does (2) happens when the OpenSSL installation has a bogus
>> OPENSSLDIR value, or does it happen regardless?
> 
> I would still like to get clarity on this, do you have any insights here?

I can tell you what happens on my Windows 11 system with Postgres 17 via the 
EDB installer, which has a non-bogus OPENSSLDIR.

OpenSSL appears to have been built with OPENSSLDIR="C:\Program Files\Common 
Files\SSL".

This is a valid path, the directory exists, and it contains a few *.cnf files. 
I’m pretty sure the EDB installer created and populated this directory.

However, the directory contains no certificates, and its location (or 
existence) are not advertised anywhere. You would have to know it must exist 
somewhere, and then hunt about for it.

(1) In this original state, attempting a connection produces this error:

c:\Program Files\PostgreSQL\17>.\bin\psql.exe 
"postgresql://.../neondb?sslrootcert=system"
psql: error: connection to server at "…" (...), port 5432 failed: SSL error: 
unregistered scheme

(2) If I create either an empty folder "C:\Program Files\Common 
Files\SSL\certs" or an empty file "C:\Program Files\Common Files\SSL\cert.pem", 
the error changes:

c:\Program Files\PostgreSQL\17>.\bin\psql.exe 
"postgresql://.../neondb?sslrootcert=system"
psql: error: connection to server at "…" (...), port 5432 failed: SSL error: 
certificate verify failed

(3) Or if I download the curl/Mozilla certificates list to "C:\Program 
Files\Common Files\SSL\cert.pem", the connection then works as expected:

c:\Program Files\PostgreSQL\17>.\bin\psql.exe 
"postgresql://.../neondb?sslrootcert=system"
psql (17.3, server 16.8)
WARNING: Console code page (850) differs from Windows code page (1252)
 8-bit characters might not work correctly. See psql reference
 page "Notes for Windows users" for details.
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: 
off, ALPN: postgresql)
Type "help" for help.

neondb=>

(But sslrootcert=system is obviously not very useful in this case: I might as 
well just point sslrootcert directly at that cert.pem file in whatever location 
I like).

Re: Allow io_combine_limit up to 1MB

Andres Freund  writes:
> void
> assign_io_max_combine_limit(int newval, void *extra)
> {
>   io_max_combine_limit = newval;
>   io_combine_limit = Min(io_max_combine_limit, io_combine_limit_guc);
> }
> void
> assign_io_combine_limit(int newval, void *extra)
> {
>   io_combine_limit_guc = newval;
>   io_combine_limit = Min(io_max_combine_limit, io_combine_limit_guc);
> }

> So we end up with a larger io_combine_limit than
> io_max_combine_limit. Hilarity ensues.

There's another thing that's rather tin-eared about these assign
hooks: the hook is not supposed to be setting the GUC variable by
itself.  guc.c will do that.  It's relatively harmless for these
int-valued GUCs to be assigned twice, but I think it might be
problematic if this coding pattern were followed for a string GUC.
I suggest instead

void
assign_io_max_combine_limit(int newval, void *extra)
{
io_combine_limit = Min(newval, io_combine_limit_guc);
}

void
assign_io_combine_limit(int newval, void *extra)
{
io_combine_limit = Min(io_max_combine_limit, newval);
}

> Besides the obvious fix, I think we should also add
>   Assert(len <= io_max_combine_limit);

+1

regards, tom lane

Re: MergeAppend could consider sorting cheapest child path

2025-04-25 Thread Andrei Lepikhov

On 4/25/25 11:16, Alexander Pyhalov wrote:

Andrei Lepikhov писал(а) 2025-04-24 16:01:

On 3/28/25 09:19, Alexander Pyhalov wrote:
In the attachment, see the patch written according to the idea. There
are I introduced two new routines:

get_cheapest_path_for_pathkeys_ext
get_cheapest_fractional_path_for_pathkeys_ext

Hi. I'm a bit confused that

Thanks for the participation!

get_cheapest_fractional_path_for_pathkeys_ext() looks only on sorting
cheapest fractional path, and get_cheapest_path_for_pathkeys_ext() in
STARTUP_COST case looks only on sorting cheapest_startup_path.
At first, at the moment, I don't understand why we calculate the
cheapest_startup path at all. In my opinion, after commit 6b94e7a [1,
2], the min-fractional path totally covers the case. I began this
discussion in [3] - maybe we need to scrutinise that issue beforehand.

Looking into the min-fractional-path + Sort, we propose a path for the
case when extracting minor part of tuples with sorting later is cheaper
than doing a massive job of non-selective index scan. You also may
imagine the case of a JOIN as a subpath: non-sorted, highly selective
parameterised NestLoop may be way more optimal than MergeJoin, which
fits the pathkeys.

Usually, sorted cheapest_total_path will be cheaper than sorted
fractional/startup path at least by startup cost (as after sorting it
includes total_cost of input path). But we ignore this case when
selecting cheapest_startup and cheapest_fractional subpaths. As result
selected cheapest_startup and cheapest_fractional can be not cheapest
for startup or selecting a fraction of rows.
I don't know what you mean by that. The cheapest_total_path is
considered when we chose optimal cheapest_total path. The same works for
the fractional path - get_cheapest_fractional_path gives us the most
optimal fractional path and probes cheapest_total_path too.
As above, not sure about min-startup case for now. I can imagine
MergeAppend over sophisticated subquery: non-sorted includes highly
parameterised JOINs and the alternative (with pathkeys) includes
HashJoin, drastically increasing startup cost. It is only a theory, of
course. So, lets discover how min-startup works.

At the end, when the sorted path already estimated, we each time compare
it with previously selected pathkeys-cheapest. So, if the sorted path is
worse, we end up with the original path and don't lose anything.

[1]
https://www.postgresql.org/message-id/e8f9ec90-546d-e948-acce-0525f3e92773%40enterprisedb.com
[2]
https://www.postgresql.org/message-id/1581042da8044e71ada2d6e3a51bf7bb%40index.de
[3]
https://www.postgresql.org/message-id/f0206ef2-6b5a-4d07-8770-cfa7cd30f...@gmail.com

--
regards, Andrei Lepikhov

Re: ZStandard (with dictionaries) compression support for TOAST compression

2025-04-25 Thread Nikhil Kumar Veldanda

Hi Michael,

Thanks for the suggestions. I agree that we should first solve the
“last–free-bit” problem in varattrib_4b compression bits before
layering on any features. Below is the approach I’ve prototyped to
keep the header compact yet fully extensible, followed by a sketch of
the plain-ZSTD(no dict) patch that sits cleanly on top of it.

1. Minimal but extensible header

/* varatt_cmp_extended follows va_tcinfo when the upper two bits of
 * va_tcinfo are 11.  Compressed data starts immediately after
 * ext_data.  ext_hdr encodes both the compression algorithm and the
 * byte-length of the algorithm-specific metadata.
 */
typedef struct varatt_cmp_extended
{
uint32 ext_hdr; /* [ meta_size:24 | cmpr_id:8 ] */
char   ext_data[FLEXIBLE_ARRAY_MEMBER];  /* optional metadata */
} varatt_cmp_extended;

a. 24 bits for length → per-datum compression algorithm metadata is
capped at 16 MB, which is far more than any realistic compression
header.
b. 8 bits for algorithm id → up to 256 algorithms.
c. Zero-overhead when unused if an algorithm needs no per-datum
metadata (e.g., ZSTD-nodict),

2. Algorithm registry
/*
 * TOAST compression methods enumeration.
 *
 * Each entry defines:
 *   - NAME : identifier for the compression algorithm
 *   - VALUE: numeric enum value
 *   - METADATA type: struct type holding extra info (void when none)
 *
 * The INVALID entry is a sentinel and must remain last.
 */
#define TOAST_COMPRESSION_LIST  \
X(PGLZ, 0, void) /* existing */ \
X(LZ4,  1, void) /* existing */ \
X(ZSTD_NODICT,  2, void) /* new, no metadata */ \
X(ZSTD_DICT,3, zstd_dict_meta)   /* new, needs dict_id */   \
X(INVALID,  4, void) /* sentinel */

typedef enum ToastCompressionId
{
#define X(name,val,meta) TOAST_##name##_COMPRESSION_ID = val,
TOAST_COMPRESSION_LIST
#undef X
} ToastCompressionId;

/* Example of an algorithm-specific metadata block */
typedef struct
{
uint32 dict_id; /* dictionary Oid */
} zstd_dict_meta;

3. Resulting on-disk layouts for zstd

ZSTD no dict: datum ondisk layout:
+--+
| va_header (uint32)   |
+--+
| va_tcinfo (uint32)  |  (11 in top two bits specify extended)
+--+
| ext_hdr (uint32)|  <-- [ meta size:24 bits |
compression id:8 bits ]
+--+
| Compressed bytes …   |  <-- zstd   (no dictionary)
+--+

ZSTD dict: datum ondisk layout
+--+
| va_header (uint32)   |
+--+
| va_tcinfo (uint32)  |
+--+
| ext_hdr (uint32)|  <-- [ meta size:24 bits |
compression id:8 bits ]
+--+
| dict_id (uint32)  |  <-- zstd_dict_meta
+--+
| Compressed bytes …   |  <-- zstd   (dictionary)
+--+

4. How does this fit?

Flexibility: Each new algorithm that needs extra metadata simply
defines its own struct and allocates varatt_cmp_extended in
setup_compression_info.
Storage: Everything in varatt_cmp_extended is copied to the datum,
immediately followed by the compressed payload.
Optional, pay-as-you-go metadata – only algorithms that need it pay for it.
Future-proof – new compression algorithms, requires any kind of
metadata like dictid or any other slot into the same ext_data
mechanism.

I’ve split the work into two patches for review:
v19-0001-varattrib_4b-design-proposal-to-make-it-extended.patch:
varattrib_4b extensibility – adds varatt_cmp_extended, enum plumbing,
and macros; behaviour unchanged.
v19-0002-zstd-nodict-support.patch: Plain ZSTD (non dict) support.

Please share your thoughts—and I’d love to hear feedback on the design. Thanks!

On Mon, Apr 21, 2025 at 12:02 AM Michael Paquier  wrote:
>
> On Fri, Apr 18, 2025 at 12:22:18PM -0400, Robert Haas wrote:
> > I think we could add plain-old zstd compression without really
> > tackling this issue, but if we are going to add dictionaries then I
> > think we might need to revisit the idea of preventing things from
> > leaking out of tables. What I can't quite remember at the moment is
> > how much of the problem was that it was going to be slow to force the
> > recompression, and how much of it was that we weren't sure we could
> > even find all the places in the code that might need such handling.
>
> FWIW, this point resonates here.  There is one thing that we have to
> do anyway: we just have one bit left in the varlena headers as lz4 is
> using the one before last.  So we have to make it extensible, even if
> it means that any compression method other than LZ4 and pglz would
> consume one mor

Re: Introduce some randomness to autovacuum

2025-04-25 Thread Nikita Malakhov

Hi!

I agree it is a good idea to shift the table list. Although vacuuming
larger tables first
is a questionable approach because smaller ones could wait a long time to
be vacuumed.
It looks like the most obvious and simple way is that the first table to be
vacuumed
should not be the first one from the previous iteration.

On Fri, Apr 25, 2025 at 6:04 PM wenhui qiu  wrote:

> Hi，I like your idea，It would be even better if the weights could be taken
> according to the larger tables
>
>
-- 
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Re: Making sslrootcert=system work on Windows psql

> On Thu, 24 Apr 2025 at 23:52, Jelte Fennema-Nio  wrote:
> 
>> How about we add a *compile time*
>> option that allows the person that compiles libpq to choose which cert
>> store it should use if sslrootcert=system is provided. Something like
>> --system-cert-store=openssl and --system-cert-store=winstore flags for
>> ./configure.
> 
> @George So basically my suggestion is to make the behaviour that your
> patch introduces configurable at compile time. FWIW my vote would
> probably be to default to --system-cert-store=winstore if it's
> available. And then --system-cert-store=openssl would be a way out for
> people that took the effort to configure openssl correctly on Windows.

👍 I think that’s a pretty nice idea.

On the other hand, what are the specific objections to doing it dynamically, 
the way my patch does? I think that has backwards-compatibility quite well 
covered.

Is the main concern that users may be surprised that the behaviour of psql 
changes if they later set one of the OpenSSL environment variables or put cert 
files in OPENSSLDIR? I feel like that would be quite rare and also a pretty 
safe failure mode.

doc: Some copy-editing around prefix operators

2025-04-25 Thread Christoph Berg

While reading the CREATE OPERATOR docs around prefix operators, I noticed
a few places that could be improved.

Christoph
>From 478aeea4fe44c9c3462aaf18dbf04a6ed6f74ebc Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Fri, 25 Apr 2025 12:46:36 +0200
Subject: [PATCH v1] doc: Some copy-editing around prefix operators

When postfix operators where dropped in 1ed6b8956, the CREATE OPERATOR
docs were not updated to make the RIGHTARG argument mandatory in the
grammar.

While at it, make the RIGHTARG docs more concise. Also, the operator
docs were mentioning "infix" in the introduction, while using "binary"
everywhere else.
---
 doc/src/sgml/ref/create_operator.sgml | 6 +++---
 doc/src/sgml/xoper.sgml   | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/create_operator.sgml b/doc/src/sgml/ref/create_operator.sgml
index 3553d364541..d2ffb1b2a50 100644
--- a/doc/src/sgml/ref/create_operator.sgml
+++ b/doc/src/sgml/ref/create_operator.sgml
@@ -23,7 +23,7 @@ PostgreSQL documentation
 
 CREATE OPERATOR name (
 {FUNCTION|PROCEDURE} = function_name
-[, LEFTARG = left_type ] [, RIGHTARG = right_type ]
+[, LEFTARG = left_type ] , RIGHTARG = right_type
 [, COMMUTATOR = com_op ] [, NEGATOR = neg_op ]
 [, RESTRICT = res_proc ] [, JOIN = join_proc ]
 [, HASHES ] [, MERGES ]
@@ -88,8 +88,8 @@ CREATE OPERATOR name (
 
   
For binary operators, both LEFTARG and
-   RIGHTARG must be defined.  For prefix operators only
-   RIGHTARG should be defined.
+   RIGHTARG must be defined.  For prefix operators, only
+   RIGHTARG must be defined.
The function_name
function must have been previously defined using CREATE
FUNCTION and must be defined to accept the correct number
diff --git a/doc/src/sgml/xoper.sgml b/doc/src/sgml/xoper.sgml
index 954a90d77d0..853b07a9f14 100644
--- a/doc/src/sgml/xoper.sgml
+++ b/doc/src/sgml/xoper.sgml
@@ -21,7 +21,7 @@
 
   
PostgreSQL supports prefix
-   and infix operators.  Operators can be
+   and binary (or infix) operators.  Operators can be
overloaded;overloadingoperators
that is, the same operator name can be used for different operators
that have different numbers and types of operands.  When a query is
-- 
2.47.2

Re: RFC: Additional Directory for Extensions

2025-04-25 Thread Christoph Berg

Re: David E. Wheeler
> +
> +make install prefix=/etc/postgresql

I'd use /usr/local/postgresql there. "/etc" is just wrong.

> +
> +This will install the control SQL files into
> +/etc/postgresql/share and shared modules into
> +/etc/postgresql/lib. If the prefix does not
> +include the strings postgresql or

Just "postgres", see src/Makefile.global.in:86.

> +pgsql, such as:
> +
> +make install prefix=/etc/extras

/usr/local/extras

> +
> +Then the postgresql directory will be appended io the
> +prefix, installing the control SQL files into

"the extension control and SQL files"

> +/etc/extras/postgresql/share and shared modules into

.../postgresql/share/extension because ...

> +/etc/extras/postgresql/lib. Either way, you'll need to
> +set  and  +linkend="guc-dynamic-library-path"/> to allow
> +PostgreSQL to find the files:
> +
> +extension_control_path = '/etc/extras/postgresql/share/extension:$system'

... it's used here.

Christoph

Re: Fix slot synchronization with two_phase decoding enabled

2025-04-25 Thread Nisha Moond

On Thu, Apr 24, 2025 at 3:57 PM shveta malik  wrote:
>
> On Thu, Apr 24, 2025 at 2:54 PM Nisha Moond  wrote:
> >
> > Please find the updated patch for Approach 3, which implements the
> > idea suggested by Swada-san above.
> >
>
> Thank You for the patch.
>
> 1)
>
> CreateDecodingContext:
>
>   if (ctx->twophase && !slot->data.two_phase)
>   {
> + /*
> + * Do not allow two-phase decoding for failover enabled slots.
> + *
> + * See comments atop the similar check in ReplicationSlotCreate() for
> + * a detailed reason.
> + */
> + if (slot->data.failover)
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("cannot enable two-phase decoding for failover enabled slot \"%s\"",
> + NameStr(slot->data.name;
> +
>
> Can you please add tetscase to cover this scenario?
>

Added a test for the above scenario.

> 2)
> We shall update create-sub documents as well for these mutually
> exclusive options. Review other pages (alter-sub, create-slot) as well
> for any required change.
>

Updated docs where I felt this new change should be mentioned. Please
let me know if I missed any place.

> 3)
> +##
> +# Test that the failover option can be enabled for a two_phase enabled
> +# subscription.
> +##
>
> Suggestion: 'Test that the failover option can be enabled for a two_phase
> enabled subscription only through Alter Subscription (failover=true)'
>
>

Done.
~~~

Please find the attached v8 patch with above comments addressed.
This version includes the documentation updates suggested by
Sawada-san at [1], and incorporates his feedback from [2] as well.

[1] 
https://www.postgresql.org/message-id/CAD21AoD08Nb588fAJ%2BJd5xRxtAT8yWnOfZ3zq-K5sto9b3ntsA%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAD21AoD08Nb588fAJ%2BJd5xRxtAT8yWnOfZ3zq-K5sto9b3ntsA%40mail.gmail.com

--
Thanks,
Nisha


v8-0001-PG17-Approach-3-Fix-slot-synchronization-for-two_.patch
Description: Binary data

Re: Making sslrootcert=system work on Windows psql

> On Fri, 25 Apr 2025 at 12:22, George MacKerron  wrote:
>> I know the documentation has now been changed to reflect that ‘system’ 
>> actually means OpenSSL.
> 
> I didn't realize that. I'm definitely not in favor of that doc change.
> It's describing behaviour that I believe is incorrect, as if it's
> actually intended.

The change was described in Daniel’s message on 3 April. It’s actually a bit 
subtler than I suggested. The diff is:

 The special value system may be specified instead, 
in
-which case the system's trusted CA roots will be loaded.
+which case the trusted CA roots from the SSL implementation will be 
loaded.

I agree with you here: the change makes the docs more correct, but the 
correctly-documented behaviour itself still seems incorrect to me.

I think a clue is that the word ‘system’ no longer appears in the updated 
version of text explaining what sslrootcert=system does!

Re: Make COPY format extendable: Extract COPY TO format implementations

2025-04-25 Thread Sutou Kouhei

Hi,

I've updated the patch set. See the attached v40 patch set.

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 23 Apr 2025 23:44:55 -0700,
  Masahiko Sawada  wrote:

>> Are the followings correct?
>>
>> 1. Move invalid input patterns in
>>src/test/modules/test_copy_format/sql/invalid.sql to
>>src/test/regress/sql/copy.sql as much as possible.
>> 2. Create
>>src/test/modules/test_copy_format/sql/test_copy_format.sql
>>and move all contents in existing *.sql to the file.
>> 3. Add comments what the tests expect to
>>src/test/modules/test_copy_format/sql/test_copy_format.sql.
>> 4. Remove CopyFormatOptions::{binary,csv_mode}.
> 
> Agreed with the above items.

Done except 1. because 1. is removed by 3. in the following
list:


>> There are 3 unconfirmed suggested changes for tests in:
>> https://www.postgresql.org/message-id/20250330.113126.433742864258096312.kou%40clear-code.com
>>
>> Here are my opinions for them:
>>
>> > 1.: There is no difference between single-quoting and
>> > double-quoting here. Because the information what quote
>> > was used for the given FORMAT value isn't remained
>> > here. Should we update gram.y?
>> >
>> > 2.: I don't have a strong opinion for it. If nobody objects
>> > it, I'll remove them.
>> >
>> > 3.: I don't have a strong opinion for it. If nobody objects
>> > it, I'll remove them.


0005 is added for 4. Could you squash 0004 ("Use copy
handler for bult-in formats") and 0005 ("Remove
CopyFormatOptions::{binary,csv_mode}") if needed when you
push?

>> 6. Use handler OID for detecting the default built-in format
>>instead of comparing the given format as string.

Done.

>> 7. Update documentation.

Could someone help this? 0007 is the draft commit for this.

>> There are 3 unconfirmed suggested changes for tests in:
>> https://www.postgresql.org/message-id/20250330.113126.433742864258096312.kou%40clear-code.com
>>
>> Here are my opinions for them:
>>
>> > 1.: There is no difference between single-quoting and
>> > double-quoting here. Because the information what quote
>> > was used for the given FORMAT value isn't remained
>> > here. Should we update gram.y?
>> >
>> > 2.: I don't have a strong opinion for it. If nobody objects
>> > it, I'll remove them.
>> >
>> > 3.: I don't have a strong opinion for it. If nobody objects
>> > it, I'll remove them.
>>
>> Is the 1. required for "ready for merge"? If so, is there
>> any suggestion? I don't have a strong opinion for it.
>>
>> If there are no more opinions for 2. and 3., I'll remove
>> them.
> 
> Agreed.

1.: I didn't do anything. Because there is no suggestion.

2., 3.: Done.

> I think we would still need some rounds of reviews but the patch is
> getting in good shape.

I hope that this is completed in this year...


Thanks,
-- 
kou
>From a81eb07a4c92b8b34ed6fbe6610c54bb9b3bb2e4 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Mon, 25 Nov 2024 13:58:33 +0900
Subject: [PATCH v40 1/6] Export CopyDest as private data

This is a preparation to export CopyToStateData as private data.

CopyToStateData depends on CopyDest. So we need to export CopyDest
too.

But CopyDest and CopySource has the same names. So we can't export
CopyDest as-is.

This uses the COPY_DEST_ prefix for CopyDest enum values. CopySource
uses the COPY_FROM_ prefix for consistency.
---
 src/backend/commands/copyfrom.c  |  4 ++--
 src/backend/commands/copyfromparse.c | 10 
 src/backend/commands/copyto.c| 30 
 src/include/commands/copyfrom_internal.h |  8 +++
 src/include/commands/copyto_internal.h   | 28 ++
 5 files changed, 49 insertions(+), 31 deletions(-)
 create mode 100644 src/include/commands/copyto_internal.h

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..b4dad744547 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1709,7 +1709,7 @@ BeginCopyFrom(ParseState *pstate,
 			pg_encoding_to_char(GetDatabaseEncoding();
 	}
 
-	cstate->copy_src = COPY_FILE;	/* default */
+	cstate->copy_src = COPY_SOURCE_FILE;	/* default */
 
 	cstate->whereClause = whereClause;
 
@@ -1837,7 +1837,7 @@ BeginCopyFrom(ParseState *pstate,
 	if (data_source_cb)
 	{
 		progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-		cstate->copy_src = COPY_CALLBACK;
+		cstate->copy_src = COPY_SOURCE_CALLBACK;
 		cstate->data_source_cb = data_source_cb;
 	}
 	else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5fc346e201..9f7171d1478 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -180,7 +180,7 @@ ReceiveCopyBegin(CopyFromState cstate)
 	for (i = 0; i < natts; i++)
 		pq_sendint16(&buf, format); /* per-column formats */
 	pq_endmessage(&buf);
-	cstate->copy_src = COPY_FRONTEND;
+	cstate->copy_src = COPY_SOURC

Re: why in appendShellStringNoError() loop still continues even after it found CR or LF?

2025-04-25 Thread Robert Haas

On Sat, Mar 29, 2025 at 8:37 AM Srinath Reddy  wrote:
> i have naive doubt $SUBJECT,AFAIK The reasoning would be this function is 
> designed to silently filter out \n and \r while still producing a usable 
> shell-safe argument. It informs the caller of the issue (false return value) 
> but does not abruptly stop execution,then the caller will decide what to do 
> but every place this function is called they are just throwing the error.

That is true, but one of the two call sites exits the program, and the
other does not, so the cases are different.

> if theres no proper reason to loop and continue we can use strpbrk and return 
> false when we found \n or \r.

I don't know what you mean by this. There are comments within
appendShellStringNoError and atop appendShellString which explain the
intended behavior, and the function seems to correctly implement that
behavior in a pretty straightforward way. While it's probably true
that someone could have chosen to use strpbrk, I don't see any real
advantage to that over the way it was actually done.

Returning as soon as an \r or \n is found would change the documented
behavior, so one couldn't make that change without carefully
considering the implications for the callers.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Making sslrootcert=system work on Windows psql

2025-04-25 Thread Daniel Gustafsson

> On 3 Apr 2025, at 16:26, Daniel Gustafsson  wrote:
> 
>> On 3 Apr 2025, at 14:41, George MacKerron  wrote:
> 
>> (2) sslrootcert=system on Windows doesn’t do a thing that would be extremely 
>> useful in some common situations. Namely: connecting securely to servers 
>> that present a certificate signed by a public CA.
> 
> Just to be clear, does (2) happens when the OpenSSL installation has a bogus
> OPENSSLDIR value, or does it happen regardless?

I would still like to get clarity on this, do you have any insights here?

--
Daniel Gustafsson

Re: extension_control_path and "directory"

2025-04-25 Thread Matheus Alcantara

On Thu, Apr 24, 2025 at 7:27 PM David E. Wheeler  wrote:
>
> On Apr 24, 2025, at 11:18, Matheus Alcantara  wrote:
>
> > In v2 I've moved the logic to remove the /extension to
> > parse_extension_control_file(), do you think that this Assert on this
> > function would still be wrong? IIUC we should always have /extension at
> > the end of "control_dir" at this place, because the
> > extension_control_path GUC will omit the /extension at the end and we
> > will force it to have the suffix on the path at
> > find_extension_control_filename() and
> > get_extension_control_directories() functions. I'm missing something
> > here?
>
> I took this patch for a spin and managed to make it core dump. How? Well I 
> installed semver with this command:
>
> ```sh
> make prefix=/Users/david/Downloads install
> ```
>
> Then set the search paths and restarted:
>
> ```ini
> extension_control_path = '/Users/david/Downloads/share/extension:$system'
> dynamic_library_path = '/Users/david/Downloads/lib:$libdir'
> ```
>
> Then I connected and ran `CREATE EXTENSION semver` and it segfaulted. I poked 
> around for a few minutes and realized that my prefix is not what I expected. 
> Because it doesn’t contain the string “postgres”, PGXS helpfully adds it. The 
> actual paths are:
>
> ```ini
> extension_control_path = 
> '/Users/david/Downloads/share/postgresql/extension:$system'
> dynamic_library_path = '/Users/david/Downloads/lib/postgresql:$libdir'
> ```
>
> With that fix it no longer segafulted.
>
> So I presume something crashes when a directory or file doesn’t exist.
>

Yes, you are right. The problem was where I was asserting
control->control_dir != NULL. I've moved the assert after the "if
(!filename)" check that returns an error if the extension was not found.

Attached v3 with this fix and also a TAP test for this scenario.

I'm just a bit confused how you get it working using /extension at the
end of extension_control_path since with this patch this suffix is not
necessary and since we hard coded append this it should return an error
when trying to search on something like
/Users/david/Downloads/share/postgresql/extension/extension

-- 
Matheus Alcantara


v3-0001-Make-directory-work-with-extension-control-path.patch
Description: Binary data

Re: on_error table, saving error info to a table

2025-04-25 Thread jian he

On Mon, Dec 16, 2024 at 7:50 PM Nishant Sharma
 wrote:
>
>
> 1) The new switch logic does not look correct to me. It will pass for
> a failing scenario. I think you can use v3's logic instead with below
> changes:-
>
> a)
> while (HeapTupleIsValid(atup = systable_getnext(ascan))) -->
> while (HeapTupleIsValid(atup = systable_getnext(ascan)) && on_error_tbl_ok)
>
> b)
> attcnt++; --> just before the "switch (attForm->attnum)".
>
> Thats it.
>
You are right about this.

On Tue, Dec 17, 2024 at 12:31 PM Kirill Reshke  wrote:
>
> On Mon, 16 Dec 2024 at 16:50, Nishant Sharma
>  wrote:
> > Also, I think Andrew's suggestion can resolve the concern me and Krill
> > had on forcing users to create tables with correct column names and
> > numbers. Also, will make error table checking simpler. No need for the
> > above kind of checks.
>
> +1 on that.
>

Syntax: COPY (on_error table, table error_saving_tbl);
seems not ideal.

but auto-create on_error table if this table does not exist, seems way
more harder.

Since we can not use SPI interface here, maybe we can use DefineRelation
also, to auto-create a table, what if table already exists, then
our operation would be stuck for not COPY related reason.
also auto-create means we need to come up with a magic table name for
all COPY (on_error table)
operations, which seems not ideal IMO.

i realized we should error out case like:
COPY err_tbl FROM STDIN WITH (DELIMITER ',', on_error table, table err_tbl);

also by changing copy_generic_opt_arg, now we can
COPY err_tbl FROM STDIN WITH (DELIMITER ',', on_error table, table  x);
previously, we can only do
COPY err_tbl FROM STDIN WITH (DELIMITER ',', on_error 'table', table  x);
From 4d5458c1c5de85b9f03c22727b92aec45f0cd73d Mon Sep 17 00:00:00 2001
From: jian he 
Date: Fri, 25 Apr 2025 21:43:39 +0800
Subject: [PATCH v5 1/1] COPY FROM (on_error table)

the syntax is {on_error table, table error_saving_tbl}.  we first check table
error_saving_tbl's existence and data definition.  if it does not meet our
criteria, then we quickly error out.

we also did preliminary check the lock of error saving table so the insert to
error saving table won't stuck.

once there is a error happened, we save the error metedata and insert it to the
error_saving_table. and continue to the next row.  That means for one row, we
can only catch the first field that have errors.

discussion: https://postgr.es/m/CACJufxH_OJpVra%3D0c4ow8fbxHj7heMcVaTNEPa5vAurSeNA-6Q%40mail.gmail.com
context:https://www.postgresql.org/message-id/752672.1699474336%40sss.pgh.pa.us
commitfest: https://commitfest.postgresql.org/51/4817/
---
 doc/src/sgml/ref/copy.sgml   | 119 -
 src/backend/commands/copy.c  |  29 
 src/backend/commands/copyfrom.c  | 202 ++-
 src/backend/commands/copyfromparse.c |  50 +-
 src/backend/parser/gram.y|   1 +
 src/include/commands/copy.h  |   8 +
 src/include/commands/copyfrom_internal.h |   1 +
 src/test/regress/expected/copy2.out  | 101 
 src/test/regress/sql/copy2.sql   |  84 ++
 9 files changed, 582 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8433344e5b6..b627d422b38 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -44,6 +44,7 @@ COPY { table_name [ ( column_name [, ...] ) | * }
 FORCE_NULL { ( column_name [, ...] ) | * }
 ON_ERROR error_action
+TABLE error_saving_table
 REJECT_LIMIT maxerror
 ENCODING 'encoding_name'
 LOG_VERBOSITY verbosity
@@ -395,15 +396,25 @@ COPY { table_name [ ( error_action value of
   stop means fail the command, while
-  ignore means discard the input row and continue with the next one.
+  ignore means discard the input row and continue with the next one,
+  table means save error details to error_saving_table
+  and continue with the next one.
   The default is stop.
  
  
-  The ignore option is applicable only for COPY FROM
+  The ignore and table option are applicable only for COPY FROM
   when the FORMAT is text or csv.
  
  
-  A NOTICE message containing the ignored row count is
+  If ON_ERROR=table,
+  a NOTICE message containing the row count that is saved to
+  error_saving_table is
+  emitted at the end of the COPY FROM if at least one
+  row was saved.
+ 
+
+ 
+  If ON_ERROR=ignore, a NOTICE message containing the ignored row count is
   emitted at the end of the COPY FROM if at least one
   row was discarded. When LOG_VERBOSITY option is set to
   verbose, a NOTICE message
@@ -463,6 +474,108 @@ COPY { table_name [ ( 

 
+   
+TABLE
+
+  
+Save error context details to the table error_saving_table.
+This option is allowed only in COPY FROM and
+ON_ERROR is specified with TABLE.
+It also require user have INSERT pri

Re: Improve verification of recovery_target_timeline GUC.

2025-04-25 Thread David Steele


On 4/24/25 20:12, Michael Paquier wrote:

On Thu, Apr 24, 2025 at 03:34:29PM +, David Steele wrote:


Having said that, I think these tests are awfully expensive for a single
GUC. Unit tests would definitely be preferable but that's not an option for
GUCs, so far as I know.


On my laptop, 003_recovery_targets.pl increases from 8.2s to 8.7s,
which seems OK here for the coverage we have.


Multiply half a second by similar tests on all the GUCs and it would add 
up to a lot. But no argument here on keeping the tests.



"invalid recovery startup fails" is used three times repeatedly for
the tests with bogus and out-of-bound values.  I'd suggest to make
these more verbose, with three extras "bogus value", "lower bound
check" and "upper bound check" added.


I have updated the labels to be more descriptive.


Side thing noted while reading the area: check_recovery_target_xid()
does not have any GUC_check_errdetail() giving details for the EINVAL
and ERANGE cases.  Your addition for recovery_target_timeline is a
good idea, so perhaps we could do the same there?  No need to do that,
that's just something I've noticed in passing..


I don't want to add that to this commit but I have added a note to my 
list of PG improvements to make when I have time.



And you are following the fat-comma convention for the command lines,
thanks for doing that.  


Just trying to follow the convention from existing tests, but you are 
welcome!



Note that I'm not planning to do anything here
for v18 now that we are in feature freeze, these would be for v19 once
the branch opens.


That was my expectation. I just had some time to get this patch updated 
so took the opportunity.


Regards,
-DavidFrom 5e8fa391e7d2171475dc8c86ef29fe3c9305d2e0 Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Fri, 25 Apr 2025 13:37:41 +
Subject: Improve verification of recovery_target_timeline GUC.

Currently check_recovery_target_timeline() converts any value that is not
current, latest, or a valid integer to 0. So for example:

recovery_target_timeline = 'currrent'

results in the following error:

FATAL:  22023: recovery target timeline 0 does not exist

Since there is no range checking for uint32 (but there is a cast from unsigned 
long) this setting:

recovery_target_timeline = '99'

results in the following error:

FATAL:  22023: recovery target timeline 1410065407 does not exist

Improve by adding endptr checking to catch conversion errors and add range 
checking to
exclude values < 2 and greater than UINT_MAX.
---
 src/backend/access/transam/xlogrecovery.c   | 17 +--
 src/test/recovery/t/003_recovery_targets.pl | 50 +
 2 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/xlogrecovery.c 
b/src/backend/access/transam/xlogrecovery.c
index 6ce979f2d8b..e82aa739d79 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -4994,13 +4994,24 @@ check_recovery_target_timeline(char **newval, void 
**extra, GucSource source)
rttg = RECOVERY_TARGET_TIMELINE_LATEST;
else
{
+   char *endp;
+   uint64  timeline;
rttg = RECOVERY_TARGET_TIMELINE_NUMERIC;
 
errno = 0;
-   strtoul(*newval, NULL, 0);
-   if (errno == EINVAL || errno == ERANGE)
+   timeline = strtou64(*newval, &endp, 0);
+
+   if (*endp != '\0' || errno == EINVAL || errno == ERANGE)
+   {
+   GUC_check_errdetail("\"%s\" is not a valid number.",
+   
"recovery_target_timeline");
+   return false;
+   }
+
+   if (timeline < 1 || timeline > UINT_MAX)
{
-   GUC_check_errdetail("\"recovery_target_timeline\" is 
not a valid number.");
+   GUC_check_errdetail("\"%s\" must be between %u and %u.",
+   
"recovery_target_timeline", 1, UINT_MAX);
return false;
}
}
diff --git a/src/test/recovery/t/003_recovery_targets.pl 
b/src/test/recovery/t/003_recovery_targets.pl
index 0ae2e982727..a963f46b7bb 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -187,4 +187,54 @@ ok( $logfile =~
  qr/FATAL: .* recovery ended before configured recovery target was 
reached/,
'recovery end before target reached is a fatal error');
 
+# Invalid timeline target
+$node_standby = PostgreSQL::Test::Cluster->new('standby_9');
+$node_standby->init_from_backup($node_primary, 'my_backup',
+   has_restoring => 1);
+$node_standby->append_conf(
+   'postgresql.conf', "recovery_target_timeline = 'bogus'");
+
+$res = run_log(
+   [
+   'pg_ctl',
+   '--pgd

Re: Allow io_combine_limit up to 1MB

Hi,

On 2025-03-18 16:18:17 +1300, Thomas Munro wrote:
> Here's a new version that also adjusts the code that just landed in
> da722699:

Something isn't quite right with this code.  If I just add -c
io_combine_limit=32 to the options and do a seqscan, I get odd
failures. Mostly assertion failures about buffers not being valid in
CheckReadBuffersOperation().

Sure glad I added CheckReadBuffersOperation(), that'd have been much nastier
to figure out without those assertion failures.

A bit of debugging later: Ie figured out that this is because io_combine_limit
is bigger than io_max_combine_limit, so the iovecs of one IO overlap with
another. With predictably hilarious results.

I think it might be a small thing:
Since our GUC system doesn't support dependencies or cross-checks
between GUCs, the user-settable one now assigns a "raw" value to
io_combine_limit_guc, and the lower of io_combine_limit_guc and
io_max_combine_limit is maintained in io_combine_limit.

However, the IO combine limit GUC still references io_combine_limit:

{
{"io_combine_limit",
PGC_USERSET,
RESOURCES_IO,
gettext_noop("Limit on the size of data reads and 
writes."),
NULL,
GUC_UNIT_BLOCKS
},
&io_combine_limit,
DEFAULT_IO_COMBINE_LIMIT,
1, MAX_IO_COMBINE_LIMIT,
NULL, assign_io_combine_limit, NULL
},

Therefore the GUC machinery undoes the work of io_combine_limit done in
assign_io_combine_limit:

/*
 * GUC assign hooks that recompute io_combine_limit whenever
 * io_combine_limit_guc and io_max_combine_limit are changed.  These are needed
 * because the GUC subsystem doesn't support dependencies between GUCs, and
 * they may be assigned in either order.
 */
void
assign_io_max_combine_limit(int newval, void *extra)
{
io_max_combine_limit = newval;
io_combine_limit = Min(io_max_combine_limit, io_combine_limit_guc);
}
void
assign_io_combine_limit(int newval, void *extra)
{
io_combine_limit_guc = newval;
io_combine_limit = Min(io_max_combine_limit, io_combine_limit_guc);
}

So we end up with a larger io_combine_limit than
io_max_combine_limit. Hilarity ensues.

Besides the obvious fix, I think we should also add
  Assert(len <= io_max_combine_limit);

to pgaio_io_set_handle_data_32/64, that'd have made the bug much more obvious.

Greetings,

Andres Freund

Introduce some randomness to autovacuum

2025-04-25 Thread Junwang Zhao

Hi hackers,

After watching Robert's talk[1] on autovacuum and participating in the related
workshop yesterday, it appears that people are inclined to use prioritization
to address the issues highlighted in Robert's presentation. Here I list two
of the failure modes that were discussed.

- Spinning. Running repeatedly on the same table but not accomplishing
anything useful.
- Starvation. autovacuum can't vacuum everything that needs vacuuming.
- ...

The prioritization way needs some basic stuff that postgres doesn't have now.

I had a random thought that introducing some randomness might help
mitigate some of the issues mentioned above. Before performing vacuum
on the collected tables, we could rotate the table_oids list by a random
number within the range [0, list_length(table_oids)]. This way, every table
would have an equal chance of being vacuumed first, thus no spinning and
starvation.

Even if there is a broken table that repeatedly gets stuck, this random
approach would still provide opportunities for other tables to be vacuumed.
Eventually, the system would converge.

The change is something like the following, I haven't tested the code,
just posted it here for discussion, let me know your thoughts.

diff --git a/src/backend/postmaster/autovacuum.c
b/src/backend/postmaster/autovacuum.c
index 16756152b71..6273d22 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -79,6 +79,7 @@
 #include "catalog/pg_namespace.h"
 #include "commands/dbcommands.h"
 #include "commands/vacuum.h"
+#include "common/pg_prng.h"
 #include "common/int.h"
 #include "lib/ilist.h"
 #include "libpq/pqsignal.h"
@@ -2267,6 +2268,25 @@ do_autovacuum(void)

   "Autovacuum Portal",

   ALLOCSET_DEFAULT_SIZES);

+   /*
+* Randomly rotate the list of tables to vacuum.  This is to avoid
+* always vacuuming the same table first, which could lead to spinning
+* on the same table or vacuuming starvation.
+*/
+   if (list_length(table_oids) > 2)
+   {
+   int rand = 0;
+   static pg_prng_state prng_state;
+   List   *tmp_oids = NIL;
+
+   pg_prng_seed(&prng_state, (uint64) (getpid() ^ time(NULL)));
+   rand = (int) pg_prng_uint64_range(&prng_state, 0,
list_length(table_oids) - 1);
+   if (rand != 0) {
+   tmp_oids = list_copy_tail(table_oids, rand);
+   table_oids = list_copy_head(table_oids,
list_length(table_oids) - rand);
+   table_oids = list_concat(table_oids, tmp_oids);
+   }
+   }
/*
 * Perform operations on collected tables.
 */


[1] How Autovacuum Goes Wrong: And Can We Please Make It Stop Doing
That? https://www.youtube.com/watch?v=RfTD-Twpvac


-- 
Regards
Junwang Zhao

Re: Making sslrootcert=system work on Windows psql

2025-04-25 Thread Daniel Gustafsson

> On 25 Apr 2025, at 15:40, George MacKerron  wrote:
> 
>> On 25 Apr 2025, at 13:53, Daniel Gustafsson  wrote:
>>> 
 (2) sslrootcert=system on Windows doesn’t do a thing that would be 
 extremely useful in some common situations. Namely: connecting securely to 
 servers that present a certificate signed by a public CA.
>>> 
>>> Just to be clear, does (2) happens when the OpenSSL installation has a bogus
>>> OPENSSLDIR value, or does it happen regardless?
>> 
>> I would still like to get clarity on this, do you have any insights here?
> 
> I can tell you what happens on my Windows 11 system with Postgres 17 via the 
> EDB installer, which has a non-bogus OPENSSLDIR.

Thanks for confirming.

> OpenSSL appears to have been built with OPENSSLDIR="C:\Program Files\Common 
> Files\SSL".
> 
> This is a valid path, the directory exists, and it contains a few *.cnf 
> files. I’m pretty sure the EDB installer created..

It did, CVE-2019-10211 has more details.

> ..and populated this directory.

The contents most likely come from building OpenSSL, by the sounds of it that's
the stock OPENSSLDIR setup.

--
Daniel Gustafsson

XACT_EVENT_COMMIT placement

2025-04-25 Thread Yurii Rashkovskii

Hi there,

There is an issue with the existing placement of `XACT_EVENT_COMMIT` in
`CommitTransaction()`. The place where it is called now (and has been
called for many versions before) is a bit too early – it occurs at a time
when other backends might not be able to see what this backend has
committed yet.

I discovered this when I used the `XACT_EVENT_COMMIT` to start a background
worker upon creation of an extension (as part of the installation script).
The symptom was that sometimes the background worker would see the [DDL]
proceeds of the transaction, and sometimes it would not. The case extends
beyond extensions, of course – any such notification to other backends or
external systems using those backends is subject to it.

If you look where CallXactCallbacks is called [1], you can scroll down and
see the following comment: "Make catalog changes visible to all backends."
[2] which was my clue as to what was happening.

I've solved this with a hack. I've observed that `TopTransactionContext` is
being reset even further down the road [3], so I used a memory context
callback to slingshot [4] my actual callback (in the original case, the one
starting a background worker) there. It has worked remarkably well for
quite some time. But it's a hack nevertheless.

Bottom line, I'd like to discuss further course of action and options:

(a) Do we move CallXactCallbacks [1] after the invalidation [2]? Could this
negatively impact existing implementations?
(b) Do we add a new event `XACT_EVENT_POST_COMMIT` and call it after [2] or
even after memory context reset [3]? Are there many implementations of this
callback that don't switch over the event type correctly, and would suffer
from the introduction of the new event?

Personally, out of caution, I currently favor option (b), but I'd greatly
appreciate your thoughts and insights.

[1]
https://github.com/postgres/postgres/blob/0787646e1dce966395f211fb9475dcab32daae70/src/backend/access/transam/xact.c#L2407-L2408
[2]
https://github.com/postgres/postgres/blob/0787646e1dce966395f211fb9475dcab32daae70/src/backend/access/transam/xact.c#L2426-L2433
[3]
https://github.com/postgres/postgres/blob/0787646e1dce966395f211fb9475dcab32daae70/src/backend/access/transam/xact.c#L2486
[4] https://github.com/pgedc/book/blob/master/memcxt_slingshot.md

-- 
Founder at Omnigres
https://omnigres.com

Re: Introduce some randomness to autovacuum

2025-04-25 Thread wenhui qiu

Hi，I like your idea，It would be even better if the weights could be taken
according to the larger tables

On Fri, 25 Apr 2025 at 22:03, Junwang Zhao  wrote:

> Hi hackers,
>
> After watching Robert's talk[1] on autovacuum and participating in the
> related
> workshop yesterday, it appears that people are inclined to use
> prioritization
> to address the issues highlighted in Robert's presentation. Here I list two
> of the failure modes that were discussed.
>
> - Spinning. Running repeatedly on the same table but not accomplishing
> anything useful.
> - Starvation. autovacuum can't vacuum everything that needs vacuuming.
> - ...
>
> The prioritization way needs some basic stuff that postgres doesn't have
> now.
>
> I had a random thought that introducing some randomness might help
> mitigate some of the issues mentioned above. Before performing vacuum
> on the collected tables, we could rotate the table_oids list by a random
> number within the range [0, list_length(table_oids)]. This way, every table
> would have an equal chance of being vacuumed first, thus no spinning and
> starvation.
>
> Even if there is a broken table that repeatedly gets stuck, this random
> approach would still provide opportunities for other tables to be vacuumed.
> Eventually, the system would converge.
>
> The change is something like the following, I haven't tested the code,
> just posted it here for discussion, let me know your thoughts.
>
> diff --git a/src/backend/postmaster/autovacuum.c
> b/src/backend/postmaster/autovacuum.c
> index 16756152b71..6273d22 100644
> --- a/src/backend/postmaster/autovacuum.c
> +++ b/src/backend/postmaster/autovacuum.c
> @@ -79,6 +79,7 @@
>  #include "catalog/pg_namespace.h"
>  #include "commands/dbcommands.h"
>  #include "commands/vacuum.h"
> +#include "common/pg_prng.h"
>  #include "common/int.h"
>  #include "lib/ilist.h"
>  #include "libpq/pqsignal.h"
> @@ -2267,6 +2268,25 @@ do_autovacuum(void)
>
>"Autovacuum Portal",
>
>ALLOCSET_DEFAULT_SIZES);
>
> +   /*
> +* Randomly rotate the list of tables to vacuum.  This is to avoid
> +* always vacuuming the same table first, which could lead to
> spinning
> +* on the same table or vacuuming starvation.
> +*/
> +   if (list_length(table_oids) > 2)
> +   {
> +   int rand = 0;
> +   static pg_prng_state prng_state;
> +   List   *tmp_oids = NIL;
> +
> +   pg_prng_seed(&prng_state, (uint64) (getpid() ^
> time(NULL)));
> +   rand = (int) pg_prng_uint64_range(&prng_state, 0,
> list_length(table_oids) - 1);
> +   if (rand != 0) {
> +   tmp_oids = list_copy_tail(table_oids, rand);
> +   table_oids = list_copy_head(table_oids,
> list_length(table_oids) - rand);
> +   table_oids = list_concat(table_oids, tmp_oids);
> +   }
> +   }
> /*
>  * Perform operations on collected tables.
>  */
>
>
> [1] How Autovacuum Goes Wrong: And Can We Please Make It Stop Doing
> That? https://www.youtube.com/watch?v=RfTD-Twpvac
>
>
> --
> Regards
> Junwang Zhao
>
>
>

gcc 15.1 warnings - jsonb_util.c

2025-04-25 Thread Erik Rijkers


Hi,


Compiling gcc 15.1 utters these protests (at least, with 
--enable-cassert) that I don't think I saw with gcc 14:


jsonb_util.c: In function ‘compareJsonbContainers’:
jsonb_util.c:301:34: warning: ‘va.type’ may be used uninitialized 
[-Wmaybe-uninitialized]

  301 | res = (va.type > vb.type) ? 1 : -1;
  |~~^
jsonb_util.c:202:33: note: ‘va’ declared here
  202 | JsonbValue  va,
  | ^~
jsonb_util.c:301:44: warning: ‘vb.type’ may be used uninitialized 
[-Wmaybe-uninitialized]

  301 | res = (va.type > vb.type) ? 1 : -1;
  |  ~~^
jsonb_util.c:203:41: note: ‘vb’ declared here
  203 | vb;
  | ^~


Thanks,

Erik

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-25 Thread Christoph Berg

Re: Jacob Champion
> I think the system is overconstrained at that point. If you want to
> support clients that delay-load the ABI they're compiled against,
> _and_ have them continue to work seamlessly after the system has
> upgraded the ABI underneath them, without restarting the client... is
> there any option other than side-by-side installation?

My point is that we should be trying to change the ABI-as-coded-in-the-
filename as rarely as possible. Then side-by-side should not be required.

Christoph

Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-04-25 Thread Alvaro Herrera

On 2025-Apr-23, Nathan Bossart wrote:

> This one was briefly discussed in an RMT meeting.
> 
> On Wed, Apr 09, 2025 at 01:16:20PM +0800, jian he wrote:
> > attached patch is for address pg_dump inconsistency
> > when parent is "not null not valid" while child is "not null".
> 
> I see an open item [0] that points to this patch, but there's no owner
> listed, and there doesn't appear to have been any follow-up discussion.
> Álvaro, should I list you as the owner, and if so, are you planning to look
> at it soon?

Hello, yes, please list me as owner, and now at least there's a patch I
feel more comfortable with :-)  I'll leave it there for a couple of days
for Jian and Rushabh to comment.

Thanks,

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: long-standing data loss bug in initial sync of logical replication

2025-04-25 Thread Amit Kapila

On Fri, Apr 25, 2025 at 10:45 AM Shlok Kyal  wrote:
>
> In the commits, I saw that the filenames are misspelled for files
> invalidation_distrubution.out and invalidation_distrubution.spec.
> This is present in branches from REL_13 to HEAD. I have attached
> patches to fix the same.
>

Thanks for spotting the problem. I've pushed your patch.

-- 
With Regards,
Amit Kapila.

Re: MergeAppend could consider sorting cheapest child path

2025-04-25 Thread Alexander Pyhalov


Andrei Lepikhov писал(а) 2025-04-24 16:01:

On 3/28/25 09:19, Alexander Pyhalov wrote:

Andy Fan писал(а) 2024-10-17 03:33:
I've updated patch. One more interesting case which we found - when 
fractional path is selected, it still can be more expensive than 
sorted cheapest total path (as we look only on indexes whith necessary 
pathkeys, not on indexes which allow efficiently fetch data).
So far couldn't find artificial example, but we've seen inadequate 
index selection due to this issue - instead of using index suited for 
filters in where, index, suitable for sorting was selected as one 
having the cheapest fractional cost.

I think it is necessary to generalise the approach a little.

Each MergeAppend subpath candidate that fits pathkeys should be 
compared to the overall-optimal path + explicit Sort node. Let's label 
this two-node composition as base_path. There are three cases exist: 
startup-optimal, total-optimal and fractional-optimal.
In the startup case, base_path should use cheapest_startup_path, the 
total-optimal case - cheapest_total_path and for a fractional case, we 
may employ the get_cheapest_fractional_path routine to detect proper 
base_path.


It may provide a more effective plan either in full, fractional and 
partial scan cases:
1. The Sort node may be pushed down to subpaths under a parallel or 
async Append.
2. When a minor set of subpaths doesn't have a proper index, and it is 
profitable to sort them instead of switching to plain Append.


In general, analysing the regression tests changed by this code, I see 
that the cost model prefers a series of small sortings instead of a 
single massive one. May be it will show some profit for execution time.


I am not afraid of any palpable planning overhead here because we just 
do cheap cost estimation and comparison operations that don't need 
additional memory allocations. The caller is responsible for building a 
proper Sort node if this method is chosen as optimal.


In the attachment, see the patch written according to the idea. There 
are I introduced two new routines:

get_cheapest_path_for_pathkeys_ext
get_cheapest_fractional_path_for_pathkeys_ext


Hi. I'm a bit confused that 
get_cheapest_fractional_path_for_pathkeys_ext() looks only on sorting 
cheapest fractional path, and get_cheapest_path_for_pathkeys_ext() in 
STARTUP_COST case looks only on sorting cheapest_startup_path.
Usually, sorted cheapest_total_path will be cheaper than sorted 
fractional/startup path at least by startup cost (as after sorting it 
includes total_cost of input path). But we ignore this case when 
selecting cheapest_startup and cheapest_fractional subpaths. As result 
selected cheapest_startup and cheapest_fractional can be not cheapest 
for startup or selecting a fraction of rows.


Consider the partition with the following access paths:

1) cheapest_startup without required pathkeys:
  startup_cost: 0.42
  total_cost: 4004

2) some index_path  with required pathkeys:
  startup_cost: 6.6
  total_cost: 2000

3) cheapest_total_path:
  startup_cost: 0.42
  total_cost: 3.48

Here, when selecting cheapest startup subpath we'll compare costs of 
index path (2) and sorted cheapest_startup (1), but will ignore sorted 
cheapest_total_path (3), despite the fact that it really can be the 
cheapest startup path, providing required sorting order.


--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: AIO v2.5

Hi,

On 2025-04-15 21:00:00 +0300, Alexander Lakhin wrote:
> Please take a look also at the simple reproducer for the crash inside
> pg_get_aios() I mentioned upthread:
> for i in {1..100}; do
>   numjobs=12
>   echo "iteration $i"
>   date
>   for ((j=1;j<=numjobs;j++)); do
>     ( createdb db$j; for k in {1..300}; do
>     echo "CREATE TABLE t (a INT); CREATE INDEX ON t (a); VACUUM t;
>   SELECT COUNT(*) >= 0 AS ok FROM pg_aios; " \
>     | psql -d db$j >/dev/null 2>&1;
>   done; dropdb db$j; ) &
>   done
>   wait
>   psql -c 'SELECT 1' || break;
> done
> 
> it fails for me as follows:
> iteration 20
> Tue Apr 15 07:21:29 PM EEST 2025
> dropdb: error: connection to server on socket "/tmp/.s.PGSQL.55432" failed: 
> No such file or directory
>    Is the server running locally and accepting connections on that socket?
> ...
> 2025-04-15 19:21:30.675 EEST [3111699] LOG:  client backend (PID 3320979) was 
> terminated by signal 11: Segmentation fault
> 2025-04-15 19:21:30.675 EEST [3111699] DETAIL:  Failed process was running: 
> SELECT COUNT(*) >= 0 AS ok FROM pg_aios;
> 2025-04-15 19:21:30.675 EEST [3111699] LOG:  terminating any other active 
> server processes

Thanks for that.  The bug turns out to be pretty stupid - pgaio_io_reclaim()
resets the fields in PgAioHandle *before* updating the generation/state. That
opens up a window in which pg_get_aios() thinks the copied PgAioHandle is
valid, even though it was taken while the fields were being reset.

Once I had figured that out, it was easy to make it more reproducible - put a
pg_usleep() between the fields being reset in pgaio_io_reclaim() and the
generation increase / state update.

The fix is simple, increment generation and state before resetting fields.

Will push the fix for that soon.

Greetings,

Andres Freund

Re: Summarizing indexes allowing single-phase VACUUM?

On Wed, Apr 9, 2025 at 2:47 PM Matthias van de Meent
 wrote:
>
> Hi,
>
> With PG16, we got Index AM -level indications for summarizing indexes,
> improving HOT applicability.
>
> Given that these indexes explicitly never store TIDs and thus don't
> really need the cleanup part of vacuum's index cleanup, we could well
> decide to not count those indexes to the indexes we need to clean up
> in the index scan phase of vacuum, thus allowing tables with only
> summarizing indexes to use the single-scan option of vacuum.
>
> As a summarizing index may still want to do some housekeeping during
> vacuum, it'd still need to get its cleanup functions called, but it'd
> at least save the io and WAL of the cleanup scan.
>
> Any comments/suggestions? POC patch attached.

I like this idea.

Here are some comments on the PoC patch:

-   if (vacrel->nindexes == 0
+   if (vacrel->indallsummarizing

I'm not a fan of this change as it isn't obvious from the name
indallsummarizing that the table doesn't have any indexes. I think we
can keep nindexes==0 or rename indallsummarizing to something like
use_onepass_strategy if we want to use merge them.

---
@@ -2536,8 +2544,12 @@ lazy_vacuum(LVRelState *vacrel)
/*
 * We successfully completed a round of index vacuuming.  Do related
 * heap vacuuming now.
+*
+* If all valid indexes are summarizing, then the TIDs have already
+* been reclaimed, requiring us to skip that last phase.
 */
-   lazy_vacuum_heap_rel(vacrel);
+   if (!vacrel->indallsummarizing)
+   lazy_vacuum_heap_rel(vacrel);

IIUC if indallsummarizing is true we should never reach lazy_vacuum().

---
@@ -241,8 +242,9 @@ static void parallel_vacuum_error_callback(void *arg);
  */
 ParallelVacuumState *
 parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
-int nrequested_workers, int vac_work_mem,
-int elevel, BufferAccessStrategy bstrategy)
+bool indallsummarizing, int nrequested_workers,
+int vac_work_mem, int elevel,
+BufferAccessStrategy bstrategy)
 {
ParallelVacuumState *pvs;
ParallelContext *pcxt;
@@ -282,6 +284,7 @@ parallel_vacuum_init(Relation rel, Relation
*indrels, int nindexes,
pvs = (ParallelVacuumState *) palloc0(sizeof(ParallelVacuumState));
pvs->indrels = indrels;
pvs->nindexes = nindexes;
+   pvs->indallsummarizing = nindexes;
pvs->will_parallel_vacuum = will_parallel_vacuum;
pvs->bstrategy = bstrategy;
pvs->heaprel = rel;

We should set pvs->indallsummarizing to indallsummarizing, not
nindexes, but ISTM we don't use pvs->indallsummazing anywhere in
vacuumparallel.c. Do we need to pass indallsummarizing to
parallel_vacuum_init() in the first place?

Regards,


--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Fix slot synchronization with two_phase decoding enabled

On Fri, Apr 25, 2025 at 3:43 AM Amit Kapila  wrote:
>
> On Fri, Apr 25, 2025 at 6:02 AM Masahiko Sawada  wrote:
> >
> > I realized that users who create a logical slot using
> > pg_create_logical_replication_slot() would not be able to enable both
> > options at slot creation, and there is no easy way to enable the
> > failover after two_phase-enabled-slot creation. Users would need to
> > use ALTER_REPLICATION_SLOT replication command, which seems
> > unrealistics for users to use. On the other hand, if we allow creating
> > a logical slot with enabling failover and two_phase using SQL API,
> > there is still a chance for this bug to occur. Would it be worth
> > considering that if a logical slot is created with enabling failover
> > and two_phase using SQL API, we create the slot with only
> > two_phase=true, then advance the slot until the slot satisfies
> > restart_lsn >= two_phase_at, and then enable the failover?
> >
>
> This means we either need to maintain somewhere that user has provided
> failover flag till restart_lsn >= two_phase_at or and then set
> failover flag in the slot

I was thinking of this idea.

> or initially mark it but enable the
> functionality of failover when we reach the condition restart_lsn >=
> two_phase_at.

IIUC the slot could be synchronized to the standby as soon as we
complete DecodingContextFindStartpoint() for a failover-enabled slot.
So we would need some mechanisms to make sure that the slot is not
synchronized while we're waiting to reach the condition restart_lsn >=
two_phase_at even if the failover is enabled.

> Both seem to have different kinds of problems. The first
> idea seems to have an issue with persistence, which means we can lose
> track of the flag after the restart.

I think we can do this series of operations while the slot is not
persistent, that is the slot is still RS_EPHEMERAL.

> The second can mislead the user
> for a long period in cases where prepare and commit have a large time
> gap. I feel this will introduce complexity either in the form of code
> or in giving the information to the user.

Agreed. Both ways introduce complexity so we need to consider the
user-unfriendliness (by not having a proper way to enable failover for
the two_phase-enabled-slot using SQL API) vs. risk (of introducing
complexity).

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Questions about logicalrep_worker_launch()

Hi,

On Fri, Apr 25, 2025 at 9:10 AM Fujii Masao  wrote:
>
> Hi,
>
> While reading the code of logicalrep_worker_launch(), I had two questions:
>
> (1)
> When the sync worker limit per subscription is reached, 
> logicalrep_worker_launch()
> runs garbage collection to try to free up slots before checking the limit 
> again.
> That makes sense.
>
> But should we do the same when the parallel apply worker limit is reached?
> Currently, if we've hit the parallel apply worker limit but not the sync 
> worker limit
> and we find an unused worker slot, garbage collection doesn't run. Would it
> make sense to also run garbage collection in that case?

Good point. In that case, we would end up not being able to launch
parallel apply workers for the subscription until either we used up
all worker slots or reached the sync worker limit, which is bad.

>
> (2)
> If garbage collection removes at least one worker, logicalrep_worker_launch()
> scans all worker slots again to look for a free one. But since we know at 
> least one
> slot was freed, this retry might be unnecessary. We could just reuse the freed
> slot directly. Is that correct?

Agreed. Since these codes are protected by LogicalRepWorkerLock in an
exclusive mode, we should be able to use the worker slot that we just
cleaned up.

> The attached patch addresses both points. Since logicalrep_worker_launch()
> isn't a performance-critical path, this might not be a high-priority change.
> But if my understanding is correct, I'm a bit tempted to apply it as a 
> refactoring.

I agree with these changes.

I think that while the changes for (2) should be for v19, the changes
for (1) might be treated as a bug fix?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: gcc 15 "array subscript 0" warning at level -O3

Hi,

On 2025-04-25 13:37:15 -0400, Tom Lane wrote:
> Whilst poking at Erik Rijkers' nearby report, I found that
> Fedora 42's gcc 15.0.1 will produce this complaint if you
> select optimization level -O3:
> 
> In file included from ../../../../src/include/access/htup_details.h:22,
>  from pl_exec.c:21:
> In function 'assign_simple_var',
> inlined from 'exec_set_found' at pl_exec.c:8609:2:
> ../../../../src/include/varatt.h:230:36: warning: array subscript 0 is 
> outside array bounds of 'char[0]' [-Warray-bounds=]
>   230 | (((varattrib_1b_e *) (PTR))->va_tag)
>   |^
> ../../../../src/include/varatt.h:94:12: note: in definition of macro 
> 'VARTAG_IS_EXPANDED'
>94 | (((tag) & ~1) == VARTAG_EXPANDED_RO)
>   |^~~
> ../../../../src/include/varatt.h:284:57: note: in expansion of macro 
> 'VARTAG_1B_E'
>   284 | #define VARTAG_EXTERNAL(PTR)
> VARTAG_1B_E(PTR)
>   | ^~~
> ../../../../src/include/varatt.h:301:57: note: in expansion of macro 
> 'VARTAG_EXTERNAL'
>   301 | (VARATT_IS_EXTERNAL(PTR) && 
> !VARTAG_IS_EXPANDED(VARTAG_EXTERNAL(PTR)))
>   | 
> ^~~
> pl_exec.c:8797:17: note: in expansion of macro 
> 'VARATT_IS_EXTERNAL_NON_EXPANDED'
>  8797 | 
> VARATT_IS_EXTERNAL_NON_EXPANDED(DatumGetPointer(newvalue)))
>   | ^~~
> In function 'exec_set_found':
> cc1: note: source object is likely at address zero

FWIW, I've seen this even before GCC 15.


> Buildfarm member serinus has been producing the identical warning for
> some time.  I'd been ignoring that because it runs "experimental gcc",
> but I guess the experiment has leaked out to production distros.
> 
> What seems to be happening here is that after inlining
> assign_simple_var into exec_set_found, the compiler decides that
> "newvalue" might be zero (since it's a BoolGetDatum result),
> and then it warns -- in a rather strange way -- about the
> potential null dereference.

I don't think it actually is complaining about a null dereference - it thinks
we're interpreting a boolean as a pointer (for which it obviously is not wide
enough)


> The dereference is not reachable
> because of the preceding "var->datatype->typlen == -1" check,
> but that's not stopping the optimizer from bitching.

> I experimented with modifying exec_set_found thus:
> 
>   var = (PLpgSQL_var *) (estate->datums[estate->found_varno]);
> + Assert(var->datatype->typlen == 1);
>   assign_simple_var(estate, var, BoolGetDatum(state), false, false);
> 
> which should be OK since we're expecting the "found" variable to
> be boolean.  That does silence the warning, but of course only
> in --enable-cassert builds.

One way to address this is outlined here:

https://postgr.es/m/20230316172818.x6375uvheom3ibt2%40awork3.anarazel.de
https://postgr.es/m/20240207203138.sknifhlppdtgtxnk%40awork3.anarazel.de

I've been wondering about adding wrapping something like that in a
pg_assume(expr) or such.

Greetings,

Andres Freund

Questions about logicalrep_worker_launch()

2025-04-25 Thread Fujii Masao


Hi,

While reading the code of logicalrep_worker_launch(), I had two questions:

(1)
When the sync worker limit per subscription is reached, 
logicalrep_worker_launch()
runs garbage collection to try to free up slots before checking the limit again.
That makes sense.

But should we do the same when the parallel apply worker limit is reached?
Currently, if we've hit the parallel apply worker limit but not the sync worker 
limit
and we find an unused worker slot, garbage collection doesn't run. Would it
make sense to also run garbage collection in that case?

(2)
If garbage collection removes at least one worker, logicalrep_worker_launch()
scans all worker slots again to look for a free one. But since we know at least 
one
slot was freed, this retry might be unnecessary. We could just reuse the freed
slot directly. Is that correct?


The attached patch addresses both points. Since logicalrep_worker_launch()
isn't a performance-critical path, this might not be a high-priority change.
But if my understanding is correct, I'm a bit tempted to apply it as a 
refactoring.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From 3e541cbaacfde2c212dd95c7d53329399511a9ec Mon Sep 17 00:00:00 2001
From: Fujii Masao 
Date: Sat, 26 Apr 2025 00:06:40 +0900
Subject: [PATCH v1] Refactor logicalrep_worker_launch().

---
 src/backend/replication/logical/launcher.c | 64 +++---
 1 file changed, 31 insertions(+), 33 deletions(-)

diff --git a/src/backend/replication/logical/launcher.c 
b/src/backend/replication/logical/launcher.c
index 10677da56b2..e2af6ccbdeb 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -96,7 +96,7 @@ static void logicalrep_launcher_onexit(int code, Datum arg);
 static void logicalrep_worker_onexit(int code, Datum arg);
 static void logicalrep_worker_detach(void);
 static void logicalrep_worker_cleanup(LogicalRepWorker *worker);
-static int logicalrep_pa_worker_count(Oid subid);
+static void logicalrep_worker_count(Oid subid, int *nsync, int *npa);
 static void logicalrep_launcher_attach_dshmem(void);
 static void ApplyLauncherSetWorkerStartTime(Oid subid, TimestampTz start_time);
 static TimestampTz ApplyLauncherGetWorkerStartTime(Oid subid);
@@ -336,7 +336,6 @@ logicalrep_worker_launch(LogicalRepWorkerType wtype,
 */
LWLockAcquire(LogicalRepWorkerLock, LW_EXCLUSIVE);
 
-retry:
/* Find unused worker slot. */
for (i = 0; i < max_logical_replication_workers; i++)
{
@@ -350,7 +349,7 @@ retry:
}
}
 
-   nsyncworkers = logicalrep_sync_worker_count(subid);
+   logicalrep_worker_count(subid, &nsyncworkers, &nparallelapplyworkers);
 
now = GetCurrentTimestamp();
 
@@ -359,7 +358,8 @@ retry:
 * reason we do this is because if some worker failed to start up and 
its
 * parent has crashed while waiting, the in_use state was never cleared.
 */
-   if (worker == NULL || nsyncworkers >= max_sync_workers_per_subscription)
+   if (worker == NULL || nsyncworkers >= max_sync_workers_per_subscription 
||
+   nparallelapplyworkers >= 
max_parallel_apply_workers_per_subscription)
{
booldid_cleanup = false;
 
@@ -381,11 +381,17 @@ retry:
 
logicalrep_worker_cleanup(w);
did_cleanup = true;
+
+   if (worker == NULL)
+   {
+   worker = w;
+   slot = i;
+   }
}
}
 
if (did_cleanup)
-   goto retry;
+   logicalrep_worker_count(subid, &nsyncworkers, 
&nparallelapplyworkers);
}
 
/*
@@ -399,8 +405,6 @@ retry:
return false;
}
 
-   nparallelapplyworkers = logicalrep_pa_worker_count(subid);
-
/*
 * Return false if the number of parallel apply workers reached the 
limit
 * per subscription.
@@ -844,48 +848,42 @@ logicalrep_worker_onexit(int code, Datum arg)
 int
 logicalrep_sync_worker_count(Oid subid)
 {
-   int i;
int res = 0;
 
-   Assert(LWLockHeldByMe(LogicalRepWorkerLock));
-
-   /* Search for attached worker for a given subscription id. */
-   for (i = 0; i < max_logical_replication_workers; i++)
-   {
-   LogicalRepWorker *w = &LogicalRepCtx->workers[i];
-
-   if (isTablesyncWorker(w) && w->subid == subid)
-   res++;
-   }
-
+   logicalrep_worker_count(subid, &res, NULL);
return res;
 }
 
 /*
- * Count the number of registered (but not necessarily running) parallel apply
- * workers for a subscription.
+ * Count the number o

Re: Conflicting updates of command progress

2025-04-25 Thread Antonin Houska

Sami Imseih  wrote:

> > pgstat_progress_start_command() is called twice: First with
> > cmdtype=PROGRESS_COMMAND_CLUSTER, second with
> > PROGRESS_COMMAND_CREATE_INDEX. The first happens in cluster_rel() the second
> > in cluster_rel() -> rebuild_relation() -> finish_heap_swap() ->
> > reindex_relation() -> reindex_index().
> >
> > It does not matter though, the current design only expects one command.
> 
> When I set a breakpoint on pgstat_progress_start_command and ran a
> CLUSTER on a table with a few indexes, I only saw start_command being
> called once for PROGRESS_COMMAND_CLUSTER. Then I went back in the
> code and realized that reindex_index when called via the CLUSTER command
> has progress set to false. So as it stands now, only PROGRESS_COMMAND_CLUSTER
> is effectively started when CLUSTER is called.
> 
> ```
> if (progress)
> {
> const int progress_cols[] = {
> PROGRESS_CREATEIDX_COMMAND,
> PROGRESS_CREATEIDX_INDEX_OID
> };
> const int64 progress_vals[] = {
> PROGRESS_CREATEIDX_COMMAND_REINDEX,
> indexId
> };
> 
> pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX,
> heapId);
> pgstat_progress_update_multi_param(2, progress_cols, progress_vals);
> }

Ah, got it. So for now the REPACK CONCURRENTLY patch only needs to turn off
the progress reporting for PROGRESS_COMMAND_CREATE_INDEX properly. Thanks for
checking!

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Allow io_combine_limit up to 1MB

Hi,

On 2025-04-25 10:26:09 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > void
> > assign_io_max_combine_limit(int newval, void *extra)
> > {
> > io_max_combine_limit = newval;
> > io_combine_limit = Min(io_max_combine_limit, io_combine_limit_guc);
> > }
> > void
> > assign_io_combine_limit(int newval, void *extra)
> > {
> > io_combine_limit_guc = newval;
> > io_combine_limit = Min(io_max_combine_limit, io_combine_limit_guc);
> > }
> 
> > So we end up with a larger io_combine_limit than
> > io_max_combine_limit. Hilarity ensues.
> 
> There's another thing that's rather tin-eared about these assign
> hooks: the hook is not supposed to be setting the GUC variable by
> itself.

Agreed. Without that the bug would have been more apparent... Pushed the fix,
alongside the change you outlined.

It's kinda sad to not have any test that tests a larger
io_combine_limit/io_max_combine_limit - as evidenced by this bug that'd be
good. However, not all platforms have PG_IOV_MAX > 16, so it seems like it'd
be somewhat painful to test?

Greetings,

Andres Freund

gcc 15 "array subscript 0" warning at level -O3

Whilst poking at Erik Rijkers' nearby report, I found that
Fedora 42's gcc 15.0.1 will produce this complaint if you
select optimization level -O3:


In file included from ../../../../src/include/access/htup_details.h:22,
 from pl_exec.c:21:
In function 'assign_simple_var',
inlined from 'exec_set_found' at pl_exec.c:8609:2:
../../../../src/include/varatt.h:230:36: warning: array subscript 0 is outside 
array bounds of 'char[0]' [-Warray-bounds=]
  230 | (((varattrib_1b_e *) (PTR))->va_tag)
  |^
../../../../src/include/varatt.h:94:12: note: in definition of macro 
'VARTAG_IS_EXPANDED'
   94 | (((tag) & ~1) == VARTAG_EXPANDED_RO)
  |^~~
../../../../src/include/varatt.h:284:57: note: in expansion of macro 
'VARTAG_1B_E'
  284 | #define VARTAG_EXTERNAL(PTR)VARTAG_1B_E(PTR)
  | ^~~
../../../../src/include/varatt.h:301:57: note: in expansion of macro 
'VARTAG_EXTERNAL'
  301 | (VARATT_IS_EXTERNAL(PTR) && 
!VARTAG_IS_EXPANDED(VARTAG_EXTERNAL(PTR)))
  | ^~~
pl_exec.c:8797:17: note: in expansion of macro 'VARATT_IS_EXTERNAL_NON_EXPANDED'
 8797 | 
VARATT_IS_EXTERNAL_NON_EXPANDED(DatumGetPointer(newvalue)))
  | ^~~
In function 'exec_set_found':
cc1: note: source object is likely at address zero


Buildfarm member serinus has been producing the identical warning for
some time.  I'd been ignoring that because it runs "experimental gcc",
but I guess the experiment has leaked out to production distros.

What seems to be happening here is that after inlining
assign_simple_var into exec_set_found, the compiler decides that
"newvalue" might be zero (since it's a BoolGetDatum result),
and then it warns -- in a rather strange way -- about the
potential null dereference.  The dereference is not reachable
because of the preceding "var->datatype->typlen == -1" check,
but that's not stopping the optimizer from bitching.

I experimented with modifying exec_set_found thus:

var = (PLpgSQL_var *) (estate->datums[estate->found_varno]);
+   Assert(var->datatype->typlen == 1);
assign_simple_var(estate, var, BoolGetDatum(state), false, false);

which should be OK since we're expecting the "found" variable to
be boolean.  That does silence the warning, but of course only
in --enable-cassert builds.

Anybody have an idea about how to silence it more effectively?
There are going to be more people seeing this as gcc 15 propagates.

regards, tom lane

Re: Allow io_combine_limit up to 1MB

Andres Freund  writes:
> It's kinda sad to not have any test that tests a larger
> io_combine_limit/io_max_combine_limit - as evidenced by this bug that'd be
> good. However, not all platforms have PG_IOV_MAX > 16, so it seems like it'd
> be somewhat painful to test?

Maybe just skip the test if the maximum value of the GUC isn't
high enough?

regards, tom lane

Re: RFC: Additional Directory for Extensions

2025-04-25 Thread David E. Wheeler

On Apr 25, 2025, at 07:33, Christoph Berg  wrote:
> 
> Re: David E. Wheeler
>> +
>> +make install prefix=/etc/postgresql
> 
> I'd use /usr/local/postgresql there. "/etc" is just wrong.

Thank you for the review. Here’s v3*.

Best,

David

* Also reviewable as a GitHub PR[1].

[1]: https://github.com/theory/postgres/pull/10




v3-0001-Flesh-out-docs-for-the-prefix-make-variable.patch
Description: Binary data




signature.asc
Description: Message signed with OpenPGP

Re: sslmode=secure by default (Re: Making sslrootcert=system work on Windows psql)

2025-04-25 Thread Jacob Champion

On Thu, Apr 24, 2025 at 3:16 PM Jelte Fennema-Nio  wrote:
> Why is this dangerous? As long as we'd validate that the provided cert
> by the server is for example.com

I can't help but read this as "as long as everyone mitigates the
danger, what's the danger?" We won't be the only implementers of any
URL schemes we introduce.

> I don't see any security problem in
> having DNS resolution happen for evil.com, nor in having the IP
> addresses hardcoded using hostaddr.

I think if we introduce a new scheme with the idea that it's "HTTPS
mode", it needs to behave very similarly to HTTPS, so people reason
about it correctly in worst-case corner cases.

To attack an https:// connection, you need to both steal the server
key _and_ get the client to talk to you instead of the real server.
And for HTTPS, that second part generally requires hijacking DNS or
mounting a successful MITM, not modifying the query.

The idea of a query string overriding the //authority is... weird. It
breaks the conventions of generic parsers (and I will include "humans"
in that group). We're "allowed" to do it, I guess -- it's our scheme,
we do it with our existing schemes today, and the IETF isn't going to
send spec police to our doors -- but I don't think we should.

Thanks,
--Jacob

Re: Back-patch of: avoid multiple hard links to same WAL file after a crash

Hi,

On 2025-04-20 14:53:39 -0700, Noah Misch wrote:
> On Mon, Apr 14, 2025 at 09:19:35AM +0900, Michael Paquier wrote:
> > On Sun, Apr 13, 2025 at 11:51:57AM -0400, Tom Lane wrote:
> > > Noah Misch  writes:
> > > > Tom and Michael, do you still object to the test addition, or not?  If 
> > > > there
> > > > are no new or renewed objections by 2025-04-20, I'll proceed to add the 
> > > > test.
> 
> Pushed as commit 714bd9e.  The failure so far is
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-04-20%2015%3A36%3A35
> with these highlights:
> 
> pg_ctl: server does not shut down
> 
> 2025-04-20 17:27:35.735 UTC [1576688][postmaster][:0] LOG:  received 
> immediate shutdown request
> 2025-04-20 17:27:35.969 UTC [1577386][archiver][:0] FATAL:  archive command 
> was terminated by signal 3: Quit
> 2025-04-20 17:27:35.969 UTC [1577386][archiver][:0] DETAIL:  The failed 
> archive command was: cp "pg_wal/0001006D" 
> "/home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/recovery/045_archive_restartpoint/data/t_045_archive_restartpoint_primary_data/archives/0001006D"
> 
> The checkpoints and WAL creation took 30s, but archiving was only 20% done
> (based on file name 0001006D) at the 360s PGCTLTIMEOUT.

Huh.  That seems surprisingly slow, even for valgrind.  I guess it's one more
example for why the single-threaded archiving approach sucks so badly :)


> I can reproduce this if I test with valgrind --trace-children=yes.  With my
> normal valgrind settings, the whole test file takes only 18s.  I recommend
> one of these changes to skink:
> 
> - Add --trace-children-skip='/bin/*,/usr/bin/*' so valgrind doesn't instrument
>   "sh" and "cp" commands.
> - Remove --trace-children=yes

Hm. I think I used --trace-children=yes because I was thinking it was required
to track forks. But a newer version of valgrind's man page has an important
clarification:
   --trace-children= [default: no]
   When enabled, Valgrind will trace into sub-processes initiated via 
the exec system call. This is necessary for multi-process programs.
 
   Note that Valgrind does trace into the child of a fork (it would be 
difficult not to, since fork makes an identical copy of a process), so this
   option is arguably badly named. However, most children of fork calls 
immediately call exec anyway.

So there doesn't seem to be much point in using --trace-children=yes.


> Andres, what do you think about making one of those skink configuration
> changes?  Alternatively, I could make the test poll until archiving catches
> up.  However, that would take skink about 30min, and I expect little value
> from 30min of valgrind instrumenting the "cp" command.

I just changed the config to --trace-children=no. There already is a valgrind
run in progress, so it won't be in effect for the next run.

Greetings,

Andres Freund

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-25 Thread Jacob Champion

On Fri, Apr 25, 2025 at 2:03 AM Christoph Berg  wrote:
> My point is that we should be trying to change the ABI-as-coded-in-the-
> filename as rarely as possible.

I agree, but I'm also trying to say I can't unilaterally declare
pieces of our internal structs to be covered by an ABI guarantee.
Maybe the rest of the ABI will never change because it'll be perfect,
but I point to the immediately preceding thread as evidence against
the likelihood of perfection on the first try. I'm trying to build in
air bags so we don't have to regret a mistake.

> Then side-by-side should not be required.

It's still required _during_ an ABI bump, though, if you don't want
things to break. Right?

--Jacob

Re: extension_control_path and "directory"

2025-04-25 Thread Matheus Alcantara

On Fri, Apr 25, 2025 at 4:13 PM David E. Wheeler  wrote:
>
> On Apr 25, 2025, at 09:25, Matheus Alcantara  wrote:
>
> > Yes, you are right. The problem was where I was asserting
> > control->control_dir != NULL. I've moved the assert after the "if
> > (!filename)" check that returns an error if the extension was not found.
> >
> > Attached v3 with this fix and also a TAP test for this scenario.
>
> That fixes the segfault, thank you.
>
Great, thanks for testing!

> > I'm just a bit confused how you get it working using /extension at the
> > end of extension_control_path since with this patch this suffix is not
> > necessary and since we hard coded append this it should return an error
> > when trying to search on something like
>
> It worked with
>
> extension_control_path = 
> '/Users/david/Downloads/share/postgresql/extension:$system’
>
> But not with
>
> extension_control_path = '/Users/david/Downloads/share/postgresql:$system’
>
> And here is where the control file actually is:
>
> ❯ ll ~/Downloads/share/postgresql/extension  total 8
> -rw-r--r--  1 david  staff   161B Apr 24 18:07 semver.control
>
> So I don’t know the answer to your question, but it’d be handy to have 
> functions that return a list of resolved paths from extension_control_path 
> and dynamic_library_path, since they get mangled.
>
Ok, I was testing using extension_control_path = '$system:/my/custom/path'
(starting with the macro) and it was working as expected, testing with
the macro at the end does not work.

The problem was on find_extension_control_filename() that was appending
the /extension at the end of the entire extension_control_path GUC value
instead of just the custom paths.

To append the /extension at each path on extension_control_path would
require some changes on find_in_path() that
find_extension_control_filename() calls, which I think that it would
make the function more complicated. I've them created a similar
find_in_paths() function that works in the same way but it receives a
List of paths instead of the string of paths separated by ":". We can
get this List of paths using get_extension_control_directories() that
also handle the macro substitution like find_in_path().

Attached v4 with these fixes. I hope that now you should be able to omit
the /extension from the GUC value.

-- 
Matheus Alcantara

v4-0001-Make-directory-work-with-extension-control-path.patch
Description: Binary data

[SP-]GiST IOS visibility bug (was: Why doens't GiST require super-exclusive lock)

2025-04-25 Thread Matthias van de Meent

Hi,

In the original thread [0] we established that GIST and SP-GiST
support index-only scans, but don't have sufficient interlocking to
prevent dead tuples from being revived for their scan by vacuum. As
that thread already contains a lot of detail, I won't go into the
specifics here, but the gist (heh) of it is as follows:

(SP-)GiST index scans don't interlock with their index vacuum in a way
that guarantees that their index-only scans only return TIDs that are
known to be valid TIDs when they're returned by amgettuple, thus
allowing index vacuum to clean TIDs up and let VACUUM mark their pages
ALL_VISIBLE before the tids were returned by the index-only scan.

The keen-eyed reader may have noticed that this issue is very similar
to the issue fixed with 459e7bf8, and that's correct. Both are cases
where the visibility map was accessed to determine the visibility of a
TID of which we didn't know for sure it was still a valid TID on that
heap page.


While the thread in [0] contains a very thorough fix, and (IMO) some
good additions to how index-only scans work with table AMs, the patch
currently proposed there changes things in non-backpatchable ways. So,
attached is a set of 4 patches designed for backpatching a fix.

The patches work similar to [0]'s, but with less invasive code changes
and no meaningful ABI breaks:
index vacuum's lock requirements are increased to super-exclusive
(cleanup) locks. Additionally, index-only scans will now do visibility
checks of each tuple they want to return, before the pin on the index
page that contains those tuples is released.
The visibility check is done on a tuple-by-tuple basis, through a new
function table_index_vischeck_tuple (so as to not totally break the
semantic barrier between indexes and tables regarding visibility
checks). In the patchset of [0] I added a table AM routine to allow
tableAMs to extend this, but that's not backportable, hence this
separate thread for discussing a backpatchable bugfix.
The one exposed ABI change is done by adding a single-byte
"visrecheck" field in an alignment area in IndexScanDescData - which
historically has been the place to put new fields required for
backported bugfixes. The size of the struct doesn't change, and there
should be no backward or forward compatibility issues for extensions
that might make use of this new field as long as they don't assume the
value of the field when they haven't set it themselves.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0] 
https://postgr.es/m/flat/CAH2-Wz=PqOziyRSrnN5jAtfXWXY7-BJcHz9S355LH8Dt=5q...@mail.gmail.com


v01-0002-GIST-Fix-visibility-issues-in-IOS.patch
Description: Binary data


v01-0001-Expose-visibility-checking-shim-for-index-usage.patch
Description: Binary data


v01-0003-SP-GIST-Fix-visibility-issues-in-IOS.patch
Description: Binary data


v01-0004-Test-for-IOS-Vacuum-race-conditions-in-index-AMs.patch
Description: Binary data

Sanding down some edge cases for PL/pgSQL reserved words

This is a rather delayed response to the discussion of bug
#18693 [1], in which I wrote:

> (It's kind of annoying that "strict" has to be double-quoted
> in the RAISE NOTICE, especially since you get a rather misleading
> error if it isn't.  But that seems like a different discussion.)

As an example of that, if you don't double-quote "strict"
in this usage you get

regression=# do $$ declare r record; begin
SELECT a, b AS STRICT INTO r FROM (SELECT 'A' AS a, 'B' AS b) AS q;
RAISE NOTICE 'STRICT r.strict = %', r.strict;
end $$;
ERROR:  record "r" has no field "strict"
LINE 1: r.strict
^
QUERY:  r.strict
CONTEXT:  PL/pgSQL function inline_code_block line 3 at RAISE

which is pretty bogus because the record *does* have a field
named "strict".  The actual problem is that STRICT is a fully
reserved PL/pgSQL keyword, which means you need to double-quote
it if you want to use it this way.

The attached patches provide two independent responses to that:

1. AFAICS, there is no real reason for STRICT to be a reserved
rather than unreserved PL/pgSQL keyword, and for that matter not
EXECUTE either.  Making them unreserved does allow some ambiguity,
but I don't think there's any surprises in how that ambiguity
would be resolved; and certainly we've preferred ambiguity over
introducing new reserved keywords in PL/pgSQL before.  I think
these two just escaped that treatment by dint of being ancient.

2. That "has no field" error message is flat-out wrong.  The now-known
way to trigger it has a different cause, and what's more, we simply do
not know at this point whether the malleable record type has such a
field.  So in 0002 below I just changed it to assume that the problem
is a reserved field name.  We might find another way to reach that
failure in future, but I doubt that "has no field" would be the right
thing to say in any case.

This is v19 material at this point, so I'll stick it on the CF queue.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/18693-65968418890877b4%40postgresql.org

From 2d19cbfbbd9aa4166cbeb46195d7a7b04e8fda52 Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Fri, 25 Apr 2025 16:27:13 -0400
Subject: [PATCH v1 1/2] De-reserve keywords EXECUTE and STRICT in PL/pgSQL.

On close inspection, there does not seem to be a strong reason
why these should be fully-reserved keywords.  I guess they just
escaped consideration in previous attempts to minimize PL/pgSQL's
list of reserved words.
---
 src/pl/plpgsql/src/pl_gram.y  | 13 +
 src/pl/plpgsql/src/pl_reserved_kwlist.h   |  2 --
 src/pl/plpgsql/src/pl_scanner.c   |  2 +-
 src/pl/plpgsql/src/pl_unreserved_kwlist.h |  2 ++
 4 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 5612e66d023..7b672ea5179 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -1368,7 +1368,8 @@ for_control		: for_variable K_IN
 		int			tok = yylex(&yylval, &yylloc, yyscanner);
 		int			tokloc = yylloc;
 
-		if (tok == K_EXECUTE)
+		if (tok_is_keyword(tok, &yylval,
+		   K_EXECUTE, "execute"))
 		{
 			/* EXECUTE means it's a dynamic FOR loop */
 			PLpgSQL_stmt_dynfors *new;
@@ -2135,7 +2136,8 @@ stmt_open		: K_OPEN cursor_variable
 yyerror(&yylloc, NULL, yyscanner, "syntax error, expected \"FOR\"");
 
 			tok = yylex(&yylval, &yylloc, yyscanner);
-			if (tok == K_EXECUTE)
+			if (tok_is_keyword(tok, &yylval,
+			   K_EXECUTE, "execute"))
 			{
 int			endtoken;
 
@@ -2536,6 +2538,7 @@ unreserved_keyword	:
 | K_ERRCODE
 | K_ERROR
 | K_EXCEPTION
+| K_EXECUTE
 | K_EXIT
 | K_FETCH
 | K_FIRST
@@ -2581,6 +2584,7 @@ unreserved_keyword	:
 | K_SLICE
 | K_SQLSTATE
 | K_STACKED
+| K_STRICT
 | K_TABLE
 | K_TABLE_NAME
 | K_TYPE
@@ -3514,7 +3518,8 @@ make_return_query_stmt(int location, YYSTYPE *yylvalp, YYLTYPE *yyllocp, yyscan_
 	new->stmtid = ++plpgsql_curr_compile->nstatements;
 
 	/* check for RETURN QUERY EXECUTE */
-	if ((tok = yylex(yylvalp, yyllocp, yyscanner)) != K_EXECUTE)
+	tok = yylex(yylvalp, yyllocp, yyscanner);
+	if (!tok_is_keyword(tok, yylvalp, K_EXECUTE, "execute"))
 	{
 		/* ordinary static query */
 		plpgsql_push_back_token(tok, yylvalp, yyllocp, yyscanner);
@@ -3597,7 +3602,7 @@ read_into_target(PLpgSQL_variable **target, bool *strict, YYSTYPE *yylvalp, YYLT
 		*strict = false;
 
 	tok = yylex(yylvalp, yyllocp, yyscanner);
-	if (strict && tok == K_STRICT)
+	if (strict && tok_is_keyword(tok, yylvalp, K_STRICT, "strict"))
 	{
 		*strict = true;
 		tok = yylex(yylvalp, yyllocp, yyscanner);
diff --git a/src/pl/plpgsql/src/pl_reserved_kwlist.h b/src/pl/plpgsql/src/pl_reserved_kwlist.h
index ce7b0c9d331..f3ef2cbd8d7 100644
--- a/src/pl/plpgsql/src/pl_reserved_kwlist.h
+++ b/src/pl/plpgsql/src/pl_reserved_kwlist.h
@@ -33,7 +33,6 @@ PG_KEYWORD("case", K_CASE)
 PG_

Re: Fix premature xmin advancement during fast forward decoding

2025-04-25 Thread Amit Kapila

On Fri, Apr 25, 2025 at 10:46 AM Masahiko Sawada  wrote:
>
> What I'm concerned about is the back branches. With this approach all
> back branches will have such degradations and it doesn't make sense to
> me to optimize SnapBuildCommitTxn() codes in back branches.
>

One possibility could be that instead of maintaining an entire
snapshot in fast_forward mode, we can maintain snapshot's xmin in each
ReorderBufferTXN. But then also, how would we get the minimum
txns_by_base_snapshot_lsn as we are getting now in
ReorderBufferGetOldestXmin? I think we need to traverse the entire
list of txns to get it in fast_forward mode but that may not show up
because it will not be done for each transaction. We can try such a
thing, but it won't be clean to have fast_forward specific code and
also it would be better to add such things only for HEAD. Can you
think of any better ideas?

-- 
With Regards,
Amit Kapila.

Re: Does RENAME TABLE rename associated identity sequence?

2025-04-25 Thread Ashutosh Bapat

On Thu, Apr 24, 2025 at 11:10 PM Tom Lane  wrote:

>
> I think it's up to the user to rename subsidiary objects if
> they wish to do so.
>

I absolutely agree.

-- 
Best Wishes,
Ashutosh Bapat

Re: Making sslrootcert=system work on Windows psql

> On 24 Apr 2025, at 18:45, Jacob Champion  
> wrote:
> 
> On Wed, Apr 23, 2025 at 8:47 AM George MacKerron  
> wrote:
>> I’d suggest two new special sslrootcert values:
>> 
>> (1) sslrootcert=openssl
>> 
>> This does exactly what sslrootcert=system does now, but is less confusingly 
>> named for Windows users. sslrootcert=system becomes a deprecated synonym for 
>> this option.
> 
> Stealing the word "system" from the existing sslrootcert domain had at
> least two hazards: a) existing users might have a file named "system"
> that would now be ignored, and b) users might accidentally use
> sslrootcert=system on older versions of libpq, picking up an
> unexpected file named "system" and doing the Wrong Thing. Problem (a)
> can be worked around by saying "./system" instead, so honestly I
> wasn't too concerned about that, and I considered (b) to be more of a
> theoretical problem that was outweighed by the benefit of getting
> OpenSSL to just Do The Thing people wanted it to do.
> 
> A couple years on, I think (b) is less theoretical than I had
> originally hoped. As evidence I point to Stack Overflow questions like
> [1], where both the asker and the answerer are a bit confused about
> how connection string versioning works. If we steal more words, I
> think that problem is going to get worse. So I'm leaning toward's
> Daniel's earlier position that sslrootcert has kind of run its course,
> and if you want to select OpenSSL stores, we need a more fully
> featured syntax and probably a completely new option to be able to
> pass that through safely.

If we stick to ‘system’ as the only special value, then (b) gets more 
theoretical with every passing day, as more people upgrade their Postgres 
installs.

But it’s true that adding a new special value makes it day 0 again. So I guess 
I’m persuaded that adding new special values is probably not a great idea. That 
makes me all the keener to get sslrootcert=system working for average Windows 
users!

> You should ideally tell us what you want, and either get it or fail.

The key thing I want (I am a stuck record on this point!) is a reliably 
cross-platform way to use the operating system’s trust store when evaluating 
the credentials of the Postgres server I’m connecting to.

This is what sslrootcert=system promised to be, and sounded like it would be, 
but turned out not to be on Windows, because for ordinary Windows users (i.e. 
those who don’t maintain an OpenSSL cert store on their machines) it always 
fails.

I know the documentation has now been changed to reflect that ‘system’ actually 
means OpenSSL. But I still think it would be better for it to really mean the 
operating system. On Windows, that’s the winstore. 

Which is why I still think my patch (or perhaps Jelte’s suggestion of a 
compile-time option, as an alternative) is an improvement on the status quo … ?

Re: Conflict detection for update_deleted in logical replication

2025-04-25 Thread shveta malik

>
> Here is V30 patch set includes the following changes:
>

Thank You for the patch, please find few concerns:

1)
By looking at code of ApplyLauncherMain(), it appears that we stopped
advancing shared-slot's xmin if any of the subscriptions with
retain_conflict_info is disabled. If a subscription is not being used
and is disabled forever (or for quite long), that means xmin will
never be advanced and we will keep accumulating dead-rows even if
other subscribers with retain_conflict_info are actively setting their
oldest_xmin. This could be problematic. Here too, there should be some
way to set stop-conflict-rettention for such a subscription like we do
for 'subscriber not able to catch-up case'.
But I understand it can be complex to implement as we do not know for
how long a subscription is disabled. If we do not find a simpler way
to implement it, then at least we can document  such cases and the
risks associated with disabled subscription which has
'retain_conflict_info' enabled. Thoughts?

2)
in wait_for_local_flush(), we have
should_stop_conflict_info_retention() before 'AllTablesyncsReady'
check. Should we give a discount for table-sync time and avoid doing
stop-conflict-retention when table-sync is going on? This is because
table-sync is one time operation (or done only on
subscription-refresh), so we shall not count time spent in table-sync
for 'max_conflict_retention_duration'. We can reset our timer if
table-sync is observed to be going on. Thoughts?

thanks
Shveta

Re: Making sslrootcert=system work on Windows psql

2025-04-25 Thread Jelte Fennema-Nio

On Fri, 25 Apr 2025 at 12:22, George MacKerron  wrote:
> I know the documentation has now been changed to reflect that ‘system’ 
> actually means OpenSSL.

I didn't realize that. I'm definitely not in favor of that doc change.
It's describing behaviour that I believe is incorrect, as if it's
actually intended.

Re: Fix slot synchronization with two_phase decoding enabled

2025-04-25 Thread Amit Kapila

On Fri, Apr 25, 2025 at 6:02 AM Masahiko Sawada  wrote:
>
> I realized that users who create a logical slot using
> pg_create_logical_replication_slot() would not be able to enable both
> options at slot creation, and there is no easy way to enable the
> failover after two_phase-enabled-slot creation. Users would need to
> use ALTER_REPLICATION_SLOT replication command, which seems
> unrealistics for users to use. On the other hand, if we allow creating
> a logical slot with enabling failover and two_phase using SQL API,
> there is still a chance for this bug to occur. Would it be worth
> considering that if a logical slot is created with enabling failover
> and two_phase using SQL API, we create the slot with only
> two_phase=true, then advance the slot until the slot satisfies
> restart_lsn >= two_phase_at, and then enable the failover?
>

This means we either need to maintain somewhere that user has provided
failover flag till restart_lsn >= two_phase_at or and then set
failover flag in the slot or initially mark it but enable the
functionality of failover when we reach the condition restart_lsn >=
two_phase_at. Both seem to have different kinds of problems. The first
idea seems to have an issue with persistence, which means we can lose
track of the flag after the restart. The second can mislead the user
for a long period in cases where prepare and commit have a large time
gap. I feel this will introduce complexity either in the form of code
or in giving the information to the user.

-- 
With Regards,
Amit Kapila.

Re: sslmode=secure by default (Re: Making sslrootcert=system work on Windows psql)

2025-04-25 Thread Daniel Gustafsson

> On 25 Apr 2025, at 00:16, Jelte Fennema-Nio  wrote:

> Let me derail some more, while we're at it I think it would be good to
> add tls-prefixed aliases for all our ssl options. Like tlscert/tlskey.
> Since such a new postgress:// scheme would be totally new, maybe we
> can even disallow the ssl prefixed ones there.

I think that would be a mistake, 'SSL' has long lost its original meaning and
is now interpreted to be an umbrella term for "secure connections with
certificates and things".  Sticking to ssl_* will most likely be the least
confusing for our users.

--
Daniel Gustafsson

Re: Add pg_buffercache_mark_dirty[_all] functions to the pg_buffercache

2025-04-25 Thread Xuneng Zhou

Hi,
I’ve been trying to review this patch, and it struck me that we’re
currently grabbing the content lock exclusively just to flip a header bit:

if (!(buf_state & BM_DIRTY))
{
LWLockAcquire(BufferDescriptorGetContentLock(desc), LW_EXCLUSIVE);
MarkBufferDirty(buf);
LWLockRelease(BufferDescriptorGetContentLock(desc));
}

Since our sole goal here is to simulate a buffer’s dirty state for
testing/debugging, I wonder—could we instead:

1. Acquire the shared content lock (which already blocks eviction),
2. Call MarkBufferDirtyHint() to flip the BM_DIRTY bit under the header
spinlock, and
3. Release the shared lock?

This seems to satisfy Assert(LWLockHeldByMe(...)) inside the hint routine
and would:

- Prevent exclusive‐lock contention when sweeping many buffers,
- Avoid full‐page WAL writes (we only need a hint record, if any)


Would love to hear if this makes sense or or am I overlooking something
here. Thanks for any feedback!

Cheers,
Xuneng

Nazir Bilal Yavuz  于2025年4月11日周五 17:14写道：

> Hi,
>
> There is another thread [1] to add both pg_buffercache_evict_[relation
> | all] and pg_buffercache_mark_dirty[_all] functions to the
> pg_buffercache. I decided to create another thread as
> pg_buffercache_evict_[relation | all] functions are committed but
> pg_buffercache_mark_dirty[_all] functions still need review.
>
> pg_buffercache_mark_dirty(): This function takes a buffer id as an
> argument and tries to mark this buffer as dirty. Returns true on
> success.
> pg_buffercache_mark_dirty_all(): This is very similar to the
> pg_buffercache_mark_dirty() function. The difference is
> pg_buffercache_mark_dirty_all() does not take an argument. Instead it
> just loops over the shared buffers and tries to mark all of them as
> dirty. It returns the number of buffers marked as dirty.
>
> Since that patch is targeted for the PG 19, pg_buffercache is bumped to
> v1.7.
>
> Latest version is attached and people who already reviewed the patches are
> CCed.
>
> [1]
> postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw%40mail.gmail.com
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>

Re: gcc 15.1 warnings - jsonb_util.c

Erik Rijkers  writes:
> Compiling gcc 15.1 utters these protests (at least, with 
> --enable-cassert) that I don't think I saw with gcc 14:

Hmm, odd.  I tried with gcc 15.0.1 from a fresh Fedora 42 install,
and I can reproduce your result:

jsonb_util.c: In function 'compareJsonbContainers':
jsonb_util.c:301:34: warning: 'va.type' may be used uninitialized 
[-Wmaybe-uninitialized]
  301 | res = (va.type > vb.type) ? 1 : -1;
  |~~^
jsonb_util.c:202:33: note: 'va' declared here
  202 | JsonbValue  va,
  | ^~
jsonb_util.c:301:44: warning: 'vb.type' may be used uninitialized 
[-Wmaybe-uninitialized]
  301 | res = (va.type > vb.type) ? 1 : -1;
  |  ~~^
jsonb_util.c:203:41: note: 'vb' declared here
  203 | vb;
  | ^~

but for me, it only happens with asserts *disabled*.
With --enable-cassert, I think that the preceding bit
about 

Assert(ra != WJB_END_ARRAY && ra != WJB_END_OBJECT);
Assert(rb != WJB_END_ARRAY && rb != WJB_END_OBJECT);

is sufficient to persuade the compiler that va.type and vb.type
must have been set by JsonbIteratorNext?  Although AFAICS it
should not have been, in view of the first "return" in
JsonbIteratorNext:

JsonbIteratorNext(JsonbIterator **it, JsonbValue *val, bool skipNested)
{
if (*it == NULL)
return WJB_DONE;

TBH I think compareJsonbContainers is too cute for its own good.
The comment at line 280ff explains why it's safe, but no compiler or
static analyzer in the world is going to duplicate that reasoning,
especially in a non-assert build.  And it just seems fragile.
(I notice we've had to fix up this logic once already.)

I'm inclined to add manual initializations of va.type and vb.type
just to silence such warnings.

regards, tom lane

Re: Fix premature xmin advancement during fast forward decoding

On Fri, Apr 25, 2025 at 4:42 AM Amit Kapila  wrote:
>
> On Fri, Apr 25, 2025 at 10:46 AM Masahiko Sawada  
> wrote:
> >
> > What I'm concerned about is the back branches. With this approach all
> > back branches will have such degradations and it doesn't make sense to
> > me to optimize SnapBuildCommitTxn() codes in back branches.
> >
>
> One possibility could be that instead of maintaining an entire
> snapshot in fast_forward mode, we can maintain snapshot's xmin in each
> ReorderBufferTXN. But then also, how would we get the minimum
> txns_by_base_snapshot_lsn as we are getting now in
> ReorderBufferGetOldestXmin? I think we need to traverse the entire
> list of txns to get it in fast_forward mode but that may not show up
> because it will not be done for each transaction. We can try such a
> thing, but it won't be clean to have fast_forward specific code and
> also it would be better to add such things only for HEAD.

Agreed.

> Can you think of any better ideas?

No idea. Hmm, there seems no reasonable way to fix this issue for back
branches. I consented to the view that these costs were something that
we should have paid from the beginning.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: gcc 15 "array subscript 0" warning at level -O3