date:20250305

On Wed, 2025-03-05 at 15:22 +0530, Ashutosh Bapat wrote:
> Hmm. Updating the statistics without consuming more CPU is more
> valuable when autovacuum is off it improves query plans with no extra
> efforts. But if adding a new mode is some significant work, riding it
> on top of autovacuum=off might ok. It's not documented either way, so
> we could change that behaviour later if we find it troublesome.

Sounds good. I will commit something like the v2 patch then soon, and
if we need a different condition we can change it later.

Regards,
Jeff Davis

Re: Add contrib/pg_logicalsnapinspect

2025-03-05 Thread Masahiko Sawada

On Wed, Mar 5, 2025 at 3:10 PM Tom Lane  wrote:
>
> Bertrand Drouvot  writes:
> > yeah makes sense. Done in the attached, and bonus point I realized that the
> > test could be simplified (so, removing useless steps in passing).
>
> Just a side note: tayra showed two instances of this failure today
> [1][2].  That's not using valgrind.  I wonder if we changed something
> else recently that would make this more probable?
>

I've observed the third failure. I read through recent commits but
have no idea what commit made this more probable. Comparing other
tests on the success case[1] and failure case[2], it seems that tayra
were slow overall. For instance, the 'build' and 'check' were 00:00:19
vs. 00:02:42 and 00:02:42 vs. 00:19:17, respectively. I'm not sure
what caused tayra to be slower overall recently.

Regards,

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tayra&dt=2025-03-05%2001%3A22%3A07
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tayra&dt=2025-03-05%2013%3A42%3A17

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: track generic and custom plans in pg_stat_statements

2025-03-05 Thread Ilia Evdokimov


Hi,

Thank you for your patch. It is really useful for tracking the history 
of generic and custom plan usage.


At first glance, I have the following suggestions for improvement:

1. Is there any reason for the double check of cplan != NULL? It seems 
unnecessary, and we could simplify it to:


-if (cplan && cplan->status == PLAN_CACHE_STATUS_CUSTOM_PLAN)
+if (cplan->status == PLAN_CACHE_STATUS_CUSTOM_PLAN)

2. Should we add Assert(kind == PGSS_EXEC) at this place  to ensure that 
generic_plan_calls and custom_plan_calls are only incremented when 
appropriate?


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Statistics Import and Export

On Wed, Mar 05, 2025 at 08:17:53PM -0500, Andres Freund wrote:
> Right now --statistics more than doubles the number of queries that pg_dump
> issues. That's oviously noticeable locally, but it's going to be really
> noticeable when dumping across the network.
> 
> I think we need to do more to lessen the impact. Even leaving regression test
> performance aside, the time increase for the default pg_dump invocation will
> be painful for folks, particularly due to this being enabled by default.
> 
> One fairly easy win would be to stop issuing getAttributeStats() for
> non-expression indexes. In most cases that'll already drastically cut down on
> the extra queries.

Apologies if this has already been considered upthread, but would it be
possible to use one query to gather all the required information into a
sorted table?  At a glance, it looks to me like it might be feasible.  I
had a lot of luck with reducing the number per-object queries with that
approach recently (e.g., commit 2329cad).

-- 
nathan

Re: optimize file transfer in pg_upgrade

On Wed, Mar 05, 2025 at 03:40:52PM -0500, Greg Sabino Mullane wrote:
> I've seen various failures, but they always get caught quite early.
> Certainly early enough to easily abort, fix perms/mounts/etc., then retry.
> I think your instinct is correct that this reversion is more trouble than
> its worth. I don't think the pg_upgrade docs mention taking a backup, but
> that's always step 0 in my playbook, and that's the rollback plan in the
> unlikely event of failure.

Thank you, Greg and Robert, for sharing your thoughts.  With that, here's
what I'm considering to be a reasonably complete patch set for this
feature.  This leaves about a month for rigorous testing and editing, so
I'm hopeful it'll be ready v18.

-- 
nathan
>From 6716e7b16a795911f55432dfd6d3c246aa8fd9fe Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 19 Feb 2025 09:14:51 -0600
Subject: [PATCH v4 1/3] initdb: Add --no-sync-data-files.

This new option instructs initdb to skip synchronizing any files
in database directories and the database directories themselves,
i.e., everything in the base/ subdirectory and any other
tablespace directories.  Other files, such as those in pg_wal/ and
pg_xact/, will still be synchronized unless --no-sync is also
specified.  --no-sync-data-files is primarily intended for internal
use by tools that separately ensure the skipped files are
synchronized to disk.  A follow-up commit will use this to help
optimize pg_upgrade's file transfer step.

Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan
---
 doc/src/sgml/ref/initdb.sgml| 20 +
 src/bin/initdb/initdb.c | 10 ++-
 src/bin/initdb/t/001_initdb.pl  |  1 +
 src/bin/pg_basebackup/pg_basebackup.c   |  2 +-
 src/bin/pg_checksums/pg_checksums.c |  2 +-
 src/bin/pg_combinebackup/pg_combinebackup.c |  2 +-
 src/bin/pg_rewind/file_ops.c|  2 +-
 src/common/file_utils.c | 85 +
 src/include/common/file_utils.h |  2 +-
 9 files changed, 89 insertions(+), 37 deletions(-)

diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 0026318485a..14c401b9a99 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -527,6 +527,26 @@ PostgreSQL documentation
   
  
 
+ 
+  --no-sync-data-files
+  
+   
+By default, initdb safely writes all database files
+to disk.  This option instructs initdb to skip
+synchronizing all files in the individual database directories and the
+database directories themselves, i.e., everything in the
+base subdirectory and any other tablespace
+directories.  Other files, such as those in pg_wal
+and pg_xact, will still be synchronized unless the
+--no-sync option is also specified.
+   
+   
+This option is primarily intended for internal use by tools that
+separately ensure the skipped files are synchronized to disk.
+   
+  
+ 
+
  
   --no-instructions
   
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 21a0fe3ecd9..22b7d31b165 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -168,6 +168,7 @@ static bool data_checksums = true;
 static char *xlog_dir = NULL;
 static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
 static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
+static bool sync_data_files = true;
 
 
 /* internal vars */
@@ -2566,6 +2567,7 @@ usage(const char *progname)
printf(_("  -L DIRECTORY  where to find the input 
files\n"));
printf(_("  -n, --no-cleando not clean up after errors\n"));
printf(_("  -N, --no-sync do not wait for changes to be 
written safely to disk\n"));
+   printf(_("  --no-sync-data-files  do not sync files within database 
directories\n"));
printf(_("  --no-instructions do not print instructions for 
next steps\n"));
printf(_("  -s, --showshow internal settings, then 
exit\n"));
printf(_("  --sync-method=METHOD  set method for syncing files to 
disk\n"));
@@ -3208,6 +3210,7 @@ main(int argc, char *argv[])
{"icu-rules", required_argument, NULL, 18},
{"sync-method", required_argument, NULL, 19},
{"no-data-checksums", no_argument, NULL, 20},
+   {"no-sync-data-files", no_argument, NULL, 21},
{NULL, 0, NULL, 0}
};
 
@@ -3402,6 +3405,9 @@ main(int argc, char *argv[])
case 20:
data_checksums = false;
break;
+   case 21:
+   sync_data_files = false;
+   break;
default:
/* getopt_long already emitted a complaint

Re: Update Unicode data to Unicode 16.0.0

On Wed, Mar 05, 2025 at 03:34:06PM -0800, Jeff Davis wrote:
> On Wed, 2025-03-05 at 14:33 -0600, Nathan Bossart wrote:
>> +   report_status(PG_WARNING, "warning");
>> +   pg_log(PG_WARNING, "Your installation contains
>> relations that may be affected by a new version of Unicode.\n"
>> +  "A list of potentially-affected relations
>> is in the file:\n"
>> +  "    %s", report.path);
>> 
>> This may have been discussed upthread, but is a warning enough?  That
>> seems
>> like something that could very easily be missed.
> 
> There can be false positives, because even if such an expression index
> exists, it's often not an actual problem. Do we want to stop an upgrade
> from happening in that case? I doubt it, but if so, we'd need some kind
> of option to bypass it.

I see.  Do we provide any suggested next steps for users to assess the
potentially-affected relations?

-- 
nathan

Re: log_min_messages per backend type

2025-03-05 Thread Fujii Masao

On 2025/03/05 9:33, Euler Taveira wrote:

> +    Valid BACKENDTYPE values are 
ARCHIVER,
> +    AUTOVACUUM, BACKEND,
> +    BGWORKER, BGWRITER,
> +    CHECKPOINTER, LOGGER,
> +    SLOTSYNCWORKER, WALRECEIVER,
> +    WALSENDER, WALSUMMARIZER, and
> +    WALWRITER.

What about postmaster?

For parallel workers launched for parallel queries, should they follow
the backend's log level or the background worker's? Since they operate
as part of a parallel query executed by a backend, it seems more logical
for them to follow the backend's setting.

+   [B_CHECKPOINTER] = "checkpointer",
+   [B_STARTUP] = "backend",  /* XXX same as backend? */

I like the idea of allowing log levels to be set per process.
There were times I wanted to use debug5 specifically for
the startup process when troubleshooting WAL replay. It would be
helpful to distinguish the startup process from a regular backend,
so we can set its log level independently.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Statistics Import and Export

On Wed, Mar 05, 2025 at 08:54:35PM -0500, Corey Huinker wrote:
> * The stats data is kinda heavy (most common value lists, most common
> elements lists, esp for high stattargets), which would be a considerable
> memory impact and some of those stats might not even be needed (example,
> index stats for a table that is filtered out)

Understood.  Looking closer, I can see why that's a concern in this case.
You'd need 128 bytes just for the schema and table name.

-- 
nathan

Re: jsonb_strip_nulls with arrays?

2025-03-05 Thread Florents Tselai




> On 6 Mar 2025, at 2:10 AM, Ian Lawrence Barwick  wrote:
> 
> Hi
> 
> 2025年3月1日(土) 2:58 Florents Tselai :
>> Please add this to the next Commitfest at 
>> https://commitfest.postgresql.org/52/
>> 
>> 
>> Added ; thanks
>> https://commitfest.postgresql.org/patch/5260/
> 
> I see this was committed, but there's a small formatting error in the docs
> (extra comma in the parameter list); patch attached.
> 
> Regards
> 
> Ian Barwick
> 

You’re corrrect.

Re: Monitoring gaps in XLogWalRcvWrite() for the WAL receiver

On Wed, Mar 05, 2025 at 08:04:44AM +, Bertrand Drouvot wrote:
> On Wed, Mar 05, 2025 at 12:35:26PM +0900, Michael Paquier wrote:
>> Perhaps there's a point in backpatching a portion of what's in the
>> attached patch (the wait event?), but I am not planning to bother much
>> with the stable branches based on the lack of complaints.
> 
> We're not emitting some statistics, so I think that it's hard for users to
> complain about something they don't/can't see.

Hmm, not exactly actually.  I've missed that ff99918c625a mentions
that WAL receiver was discarded on purpose, but this was still when
pgstats was not in shared memory and still file-based.  We scale much
better now.  It is not that difficult to make XLogWrite() hot enough
that it would trigger a lot of calls of pgstat_count_io_op_time() per
ms, either, like the WAL receiver, so as long as the timings are
behind track_wal_io_timing we're fine.

Let's do this at the end, without a backpatch.  At least we'll be anle
to get better IO metrics for replication setups.
--
Michael

signature.asc
Description: PGP signature

Re: track generic and custom plans in pg_stat_statements

>
> Please see v2
>

oops, had to fix a typo in meson.build. Please see v3.


--
Sami


v3-0001-add-plan_cache-counters-to-pg_stat_statements.patch
Description: Binary data

Re: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

2025-03-05 Thread David G. Johnston

On Wednesday, March 5, 2025, Shay Rojansky  wrote:
>
> SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea); -- Expected
> 0x010203, got AQID
>

I get \x41514944 which is precisely what I would expect since it what this
query results in as well:

select 'AQID'::bytea;

David J.

Re: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

2025-03-05 Thread Shay Rojansky

>
>
>> SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea); -- Expected
>> 0x010203, got AQID
>>
>
> I get \x41514944 which is precisely what I would expect since it what this
> query results in as well:
>
> select 'AQID'::bytea;
>

If the behavior of RETURNING is meant to be identical to that of simply
applying a cast, is there any actual advantage in using JSON_VALUE with
RETURNING? In other words, why not just do JSON_VALUE(json '"AQID"',
'$')::bytea instead of using RETURNING? I thought the point was precisely
for RETURNING to be able to perform JSON-specific conversions (e.g. take
into account that the base64 is being converted from a *JSON* string, and
therefore apply base64 decoding to it).

Re: Orphaned users in PG16 and above can only be managed by Superusers

2025-03-05 Thread Ashutosh Sharma

Thanks, Nathan, for reviewing the patch. Below are my comments inline:

On Thu, Mar 6, 2025 at 1:43 AM Nathan Bossart  wrote:
>
>
> * The patch alleges to only block DROP ROLE commands when there exists
>   _both_ admins of the target role and roles for which the target role is
>   an admin.  However, it's not clear to me why both need to be true.  I
>   might be able to glean the reason if I read this thread carefully or
>   spend more time thinking about it, but IMHO that patch itself should make
>   it obvious.  I'd suggest expanding the comment atop
>   check_drop_role_dependency().
>

I'll update the comments above the check_drop_role_dependency()
function to clarify things.

> * Does this introduce any race conditions?  For example, is it possible for
>   the new check to pass and then for a dependency to be added before the
>   drop completes?
>

I believe it is; I may need to adjust the location from where I'm
calling check_drop_role_dependency() to take care of this. I'll
address this in the next patch version. Thanks for bringing up this
concern.

--
With Regards,
Ashutosh Sharma.

Re: Orphaned users in PG16 and above can only be managed by Superusers

2025-03-05 Thread Ashutosh Sharma

Hi Robert,

Thanks for the review comments.

On Thu, Mar 6, 2025 at 2:10 AM Robert Haas  wrote:
>
> On Wed, Mar 5, 2025 at 3:13 PM Nathan Bossart  
> wrote:
> > * The patch alleges to only block DROP ROLE commands when there exists
> >   _both_ admins of the target role and roles for which the target role is
> >   an admin.  However, it's not clear to me why both need to be true.  I
> >   might be able to glean the reason if I read this thread carefully or
> >   spend more time thinking about it, but IMHO that patch itself should make
> >   it obvious.  I'd suggest expanding the comment atop
> >   check_drop_role_dependency().
>
> The error message needs work, too. Nobody is ever going to guess what
> the rule is from that error message.
>

I'll handle this in the next patch version.

> > * Does this introduce any race conditions?  For example, is it possible for
> >   the new check to pass and then for a dependency to be added before the
> >   drop completes?
>
> This is a serious concern for me as well.
>

This too will be taken care of in the next patch.

--
With Regards,
Ashutosh Sharma.

Re: Hook for Selectivity Estimation in Query Planning

2025-03-05 Thread Andrei Lepikhov


On 5/3/2025 19:50, Aleksander Alekseev wrote:

Andrei, Matthias,


Could you explain why you think the Pluggable TOASTer proposal was similar?
[...]


I merely pointed out that adding hooks without any particular value
for the Postgres users was criticized before, see for instance:
Thank you for your feedback. Rational criticism is always welcome. Let’s 
aim to clarify the actual objectives:


https://www.postgresql.org/message-id/20230206104917.sipa7nzue5lw2e6z%40alvherre.pgsql

One could argue - but wait, isn't TAM for instance just a bunch of
hooks in a nutshell? How do we distinguish a well-documented and more
or less stable API for the extension authors from a random hook put in
a convenient place? That's a good question. I don't have an answer to
it. This being said, the proposed patch doesn't strike me as a good or
documented API, or the one that is going to be stable in the long run.
1. **Documentation** - Agreed. I think it's feasible to create 
documentation based on the examples. However, we should first decide on 
the main subject, don't you think?


2. **'Good API'** - I wouldn't say that makes sense. Could you clarify 
what you mean by "good API"? What qualifies as a good API, why do you 
feel that the current changes are bad, and how can we improve it?


3. **'Stable'** - Why do you believe it is unstable? As I mentioned, 
this is the first hook that allows us to influence the optimiser's 
behaviour. Current path hooks only allow us to provide the planner with 
alternative decisions and force us to think it knows better how to 
proceed. I suggest we enable developers to enhance prediction quality 
without having to create a new planner. The rationale behind this is 
quite clear — specific workloads may require more sophisticated 
estimation algorithms, which would be excessive for a general-purpose 
planner.


As you can imagine, I would like to hook into cardinality predictions or 
tweak cost functions (see Apache Calcite), but that approach is invasive 
and unstable since each node, whether existing or newly introduced, 
would require such a call. In contrast, the selectivity estimation 
function serves as a central point for estimations, necessitating only 
one call. I believe we could consider adding a reference to `RelOptInfo` 
in the future, as has been briefly mentioned in discussions among 
developers before. For now, though, this seems sufficient for the 
purpose of database statistics.



[...]

Overall, I see that new hooks allow new [sometimes] open-source projects
and startups to emerge - not sure about enterprises' benefits.
Therefore, I'm not convinced by your current justification. Are there
any technical objections?


There is no point in debating about good and evil or right and wrong.
The only important question is whether there will be a committer
willing to accept the proposed change considering its controversy.
It would be interesting to see what type of controversy you see here. I 
think it will be clearer after you answer the previous questions.


--
regards, Andrei Lepikhov

Re: Statistics Import and Export

2025-03-05 Thread Corey Huinker

>
> One fairly easy win would be to stop issuing getAttributeStats() for
> non-expression indexes. In most cases that'll already drastically cut down
> on
> the extra queries.


That does seem like an easy win, especially since we're already using
indexprs for some filters. I am concerned that, down the road, if we ever
start storing non-expression stats for, say, partial indexes, we would
overlook that a corresponding change needed to happen in pg_dump. If you
can think of any safeguards we can create for that, I'm listening.

Re: Statistics Import and Export

Hi,

On 2025-03-05 22:00:42 -0500, Corey Huinker wrote:
> On Wed, Mar 5, 2025 at 9:18 PM Andres Freund  wrote:
> > On 2025-03-05 20:54:35 -0500, Corey Huinker wrote:
> > > It's been considered and not ruled out, with a "let's see how the simple
> > > thing works, first" approach. Considerations are:
> > >
> > > * pg_stats is keyed on schemaname + tablename (which can also be indexes)
> > > and we need to use that because of the security barrier
> >
> > I don't think that has to be a big issue, you can just make the the query
> > query multiple tables at once using an = ANY(ARRAY[]) expression or such.
> >
>
> I'm uncertain how we'd do that with (schemaname,tablename) pairs. Are you
> suggesting we back the joins from pg_stats to pg_namespace and pg_class and
> then filter by oids?

I was thinking of one query per schema or something like that. But yea, a
query to pg_namespace and pg_class wouldn't be a problem if we did it far
fewer times than before.   Or you could put the list of catalogs / tables to
be queried into an unnest() with two arrays or such.

Not sure how good the query plan for that would be, but it may be worth
looking at.

> > > * The stats data is kinda heavy (most common value lists, most common
> > > elements lists, esp for high stattargets), which would be a considerable
> > > memory impact and some of those stats might not even be needed (example,
> > > index stats for a table that is filtered out)
> >
> > Doesn't the code currently have this problem already? Afaict the stats are
> > currently all stored in memory inside pg_dump.
> >
>
> Each call to getAttributeStats() fetches the pg_stats for one and only one
> relation and then writes the SQL call to fout, then discards the result set
> once all the attributes of the relation are done.

I don't think that's true. For one my example demonstrated that it increases
the peak memory usage substantially. That'd not be the case if the data was
just written out to stdout or such.

Looking at the code confirms that. The ArchiveEntry() in dumpRelationStats()
is never freed, afaict. And ArchiveEntry() strdups ->createStmt, which
contains the "SELECT pg_restore_attribute_stats(...)".

> I don't think the query itself would be a problem, a query querying all the
> > required stats should probably use PQsetSingleRowMode() or
> > PQsetChunkedRowsMode().
>
>
> That makes sense if we get the attribute stats from the result set in the
> order that we need them, and I don't know how we could possibly do that.
> We'd still need a table to bsearch() and that would be huge.

I'm not following - what would be the problem with a bsearch()? Compared to
the stats data an array to map from oid to an index in an array of stats data
data would be very small.

But with the unnest() idea from above it wouldn't even be needed, you could
use

SELECT ...
FROM unnest(schema_array, table_array) WITH ORDINALITY AS src(schemaname, 
tablename)
...
ORDER BY ordinality

or something along those lines.

Greetings,

Andres Freund

Re: Statistics Import and Export

Hi,

On 2025-03-05 20:54:35 -0500, Corey Huinker wrote:
> It's been considered and not ruled out, with a "let's see how the simple
> thing works, first" approach. Considerations are:
>
> * pg_stats is keyed on schemaname + tablename (which can also be indexes)
> and we need to use that because of the security barrier

I don't think that has to be a big issue, you can just make the the query
query multiple tables at once using an = ANY(ARRAY[]) expression or such.


> * The stats data is kinda heavy (most common value lists, most common
> elements lists, esp for high stattargets), which would be a considerable
> memory impact and some of those stats might not even be needed (example,
> index stats for a table that is filtered out)

Doesn't the code currently have this problem already? Afaict the stats are
currently all stored in memory inside pg_dump.

$ for opt in '' --no-statistics; do echo "using option $opt"; for dbname in 
pgbench_part_100 pgbench_part_1000 pgbench_part_1; do echo $dbname; 
/usr/bin/time -f 'Max RSS kB: %M' ./src/bin/pg_dump/pg_dump --no-data 
--quote-all-identifiers --no-sync --no-data $opt $dbname -Fp > 
/dev/null;done;done

using option
pgbench_part_100
Max RSS kB: 12780
pgbench_part_1000
Max RSS kB: 22700
pgbench_part_1
Max RSS kB: 124224
using option --no-statistics
pgbench_part_100
Max RSS kB: 12648
pgbench_part_1000
Max RSS kB: 19124
pgbench_part_1
Max RSS kB: 85068


I don't think the query itself would be a problem, a query querying all the
required stats should probably use PQsetSingleRowMode() or
PQsetChunkedRowsMode().

Greetings,

Andres Freund

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-03-05 Thread Ajin Cherian

On Fri, Feb 28, 2025 at 11:34 PM Shubham Khanna
 wrote:
>
>
> The attached Patch contains the suggested changes.
>
> Thanks and regards,
> Shubham Khanna.

Some comments:
1.
+
+ -a
+ --all
+ 
+  
+   For all source server non-template databases create subscriptions for
+   create subscriptions for databases with the same names on the
+   target server.
+   Subscription names, publication names, and replication slot names are
+   automatically generated. Cannot be used together with
+   --database, --publication,
+   --replication-slot or --subscription.

Don't start the sentence with "Cannot". Change the sentence to "This
option cannot be used together with ..."
similar sentences used in 3 other places below this as well. Change all of them.

2.
+# Verify that only user databases got subscriptions (not template databases)

change to "Verify that only user databases have subscriptions"

regards,
Ajin Cherian
Fujitsu Australia

Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-03-05 Thread Peter Smith

Hi Shubham.

Some review comments for patch v13-0001.

==
GENERAL

1.
--cleanup-existing-publications

I've never liked this proposed switch name much.

e.g. why say "cleanup" instead of "drop"? What is the difference?
Saying drop is very explicit about what the switch will do.
e.g. why say "existing"? It seems redundant; you can't cleanup/drop
something that does not exist.

My preference is one of:
--drop-publications
--drop-all-publications

either of which seem nicely aligned with the descriptions in the usage and docs.

Yeah, I know I have raised this same point before, but AFAICT the
reply was like "will revise it in the next patch version", but that
was many versions ago. I think it is important to settle the switch
name earlier than later because there will be many tentacles into the
code (vars, params, fields, comments) and docs if anything changes --
so it is not a decision you want to leave till the end because it
could destabilise everything at the last minute.

==
Commit message

2.
By default, publications are preserved to avoid unintended data loss.

~

Was there supposed to be a blank line before this text, or should this
text be wrapped into the preceding paragraph?

==
src/bin/pg_basebackup/pg_createsubscriber.c

setup_subscriber:

3.
 /*
  * Create the subscriptions, adjust the initial location for logical
  * replication and enable the subscriptions. That's the last step for logical
- * replication setup.
+ * replication setup. If 'drop_publications' options is true, existing
+ * publications on the subscriber will be dropped before creating new
+ * subscriptions.
  */

There are multiple things amiss with this comment.
- 'drop_publications' is not the parameter name
- 'drop_publications' options [sic plural??]. It is not an option
here; it is a parameter

~~~

check_and_drop_existing_publications:

4.
 /*
- * Remove publication if it couldn't finish all steps.
+ * Check and drop existing publications on the subscriber if requested.
  */

There is no need to say "if requested.". It is akin to saying this
function does XXX if this function is called.

~~~

drop_publication_by_name:

5.
+/* Drop the specified publication of the given database. */
+static void
+drop_publication_by_name(PGconn *conn, const char *pubname, const char *dbname)

5a.
I think it is better to define this function before
check_and_drop_existing_publications. YMMV.

~

5b.
IMO the parameters should be reordered (PGconn *conn, const char
*dbname, const char *pubname). It seems more natural and would be
consistent with check_and_drop_existing_publications. YMMV.

~~~

6.
- dbinfo->pubname, dbinfo->dbname, PQresultErrorMessage(res));
- dbinfo->made_publication = false; /* don't try again. */
+ pubname, dbname, PQresultErrorMessage(res));
+ dbinfos.dbinfo->made_publication = false; /* don't try again. */

Something about this flag assignment seems odd to me. IIUC
'made_publications' is for removing the cleanup_objects_atexit to be
able to remove the special publication implicitly made by the
pg_createsubscriber. But, AFAIK we can also get to this code via the
--cleanup-existing-publication switch, so actually it might be one of
the "existing" publication DROPS that has failed. If so, then the
"don't try again comment" is misleading because it may not be that
same publication "again" at all.

==
.../t/040_pg_createsubscriber.pl

GENERAL.

7.
Most of the changes to this test code are just changing node name S ->
S1. It's not much to do with the patch other than tidying it in
preparation for a new node S2.  OTOH it makes the review far harder
because there are so many changes. IMO all this S->S1 stuff should be
done as a separate pre-requisite patch and then it will be far easier
to see what changes are added for the --clean-existing-publications
testing.

~~~

8.
 # Set up node S as standby linking to node P
 $node_p->backup('backup_1');
-my $node_s = PostgreSQL::Test::Cluster->new('node_s');
-$node_s->init_from_backup($node_p, 'backup_1', has_streaming => 1);
-$node_s->append_conf(
+my $node_s1 = PostgreSQL::Test::Cluster->new('node_s1');
+$node_s1->init_from_backup($node_p, 'backup_1', has_streaming => 1);
+$node_s1->append_conf(

The comment should refer to S1, not S.

~~~

9.
 # Set up node C as standby linking to node S
-$node_s->backup('backup_2');
+$node_s1->backup('backup_2');
 my $node_c = PostgreSQL::Test::Cluster->new('node_c');
-$node_c->init_from_backup($node_s, 'backup_2', has_streaming => 1);
+$node_c->init_from_backup($node_s1, 'backup_2', has_streaming => 1);

The comment should refer to S1, not S.

~~~

10.
 # Check some unmet conditions on node S
-$node_s->append_conf(
+$node_s1->append_conf(

The comment should refer to S1, not S.

(note... there are lots of these. You should search/fix them all;
these review comments might miss some)

~~~

11.
+ '--socketdir' => $node_s1->host,
+ '--subscriber-port' => $node_s1->port,
  '--database' => $db1,
  '--database' => $db2,
  ],
  'standby cont

Add arbitrary xid and mxid to pg_resetwal

2025-03-05 Thread Daniil Davydov

Hi,
I prepared a patch that will allow us to set arbitrary values in -m and -x
options for pg_resetwal. For now, it is not possible to specify a value
that does not fit into existing SLRU segments, and
main idea of this patch (for REL_17_STABLE) is to create such segments
inside pg_resetwal's main() function.
In my opinion, this will be useful primarily to simplify testing, since at
the moment you have to create segments manually (as in this article

).

Patch also contains additional tests for pg_resetwal (regression is called
to make sure that all postgres components are working correctly, but maybe
it can be replaced with something more "compact").

What do you think about the idea of adding such functionality?

--
Best regards,
Daniil Davydov
From 2993b376674cc0b565abd42a7d85eae8c8856428 Mon Sep 17 00:00:00 2001
From: Daniil Davidov 
Date: Wed, 5 Mar 2025 14:07:06 +0700
Subject: [PATCH] Allow to set any value for -m and -x options

---
 src/bin/pg_resetwal/Makefile |   4 +
 src/bin/pg_resetwal/pg_resetwal.c| 145 +++
 src/bin/pg_resetwal/t/003_advance.pl | 135 +
 3 files changed, 284 insertions(+)
 create mode 100644 src/bin/pg_resetwal/t/003_advance.pl

diff --git a/src/bin/pg_resetwal/Makefile b/src/bin/pg_resetwal/Makefile
index 4228a5a772..c890b1c5c6 100644
--- a/src/bin/pg_resetwal/Makefile
+++ b/src/bin/pg_resetwal/Makefile
@@ -17,6 +17,10 @@ include $(top_builddir)/src/Makefile.global
 
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils
 
+# required for 03_advance.pl
+REGRESS_SHLIB=$(top_builddir)/src/test/regress/regress$(DLSUFFIX)
+export REGRESS_SHLIB
+
 OBJS = \
 	$(WIN32RES) \
 	pg_resetwal.o
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index e9dcb5a6d8..50f6b9ca2f 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -76,6 +76,9 @@ static XLogSegNo minXlogSegNo = 0;
 static int	WalSegSz;
 static int	set_wal_segsize;
 
+static void AdvanceNextXid(FullTransactionId oldval, FullTransactionId newval);
+static void AdvanceNextMultiXid(MultiXactId oldval, MultiXactId newval);
+
 static void CheckDataVersion(void);
 static bool read_controlfile(void);
 static void GuessControlValues(void);
@@ -90,6 +93,29 @@ static void WriteEmptyXLOG(void);
 static void usage(void);
 
 
+typedef struct CommitTimestampEntry
+{
+	TimestampTz time;
+	RepOriginId nodeid;
+} CommitTimestampEntry;
+
+// typedef uint64 MultiXactOffset;
+
+/*
+ * Note: these macros are copied from clog.c, commit_ts.c and subtrans.c and
+ * should be kept in sync.
+ */
+#define CLOG_XACTS_PER_BYTE 4
+#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
+#define SUBTRANS_XACTS_PER_PAGE (BLCKSZ / sizeof(TransactionId))
+#define SizeOfCommitTimestampEntry (offsetof(CommitTimestampEntry, nodeid) + \
+	sizeof(RepOriginId))
+#define COMMIT_TS_XACTS_PER_PAGE \
+	(BLCKSZ / SizeOfCommitTimestampEntry)
+#define MULTIXACT_OFFSETS_PER_PAGE (BLCKSZ / sizeof(MultiXactOffset))
+
+#define SLRU_PAGES_PER_SEGMENT	32
+
 int
 main(int argc, char *argv[])
 {
@@ -113,6 +139,7 @@ main(int argc, char *argv[])
 	bool		force = false;
 	bool		noupdate = false;
 	MultiXactId set_oldestmxid = 0;
+	FullTransactionId current_fxid = {0};
 	char	   *endptr;
 	char	   *endptr2;
 	char	   *DataDir = NULL;
@@ -406,6 +433,11 @@ main(int argc, char *argv[])
 	if ((guessed && !force) || noupdate)
 		PrintControlValues(guessed);
 
+	/*
+	 * Remember full id of next free transaction
+	 */
+	current_fxid = ControlFile.checkPointCopy.nextXid;
+
 	/*
 	 * Adjust fields if required by switches.  (Do this now so that printout,
 	 * if any, includes these values.)
@@ -422,10 +454,18 @@ main(int argc, char *argv[])
 	}
 
 	if (set_xid != 0)
+	{
 		ControlFile.checkPointCopy.nextXid =
 			FullTransactionIdFromEpochAndXid(EpochFromFullTransactionId(ControlFile.checkPointCopy.nextXid),
 			 set_xid);
 
+		if (FullTransactionIdPrecedes(current_fxid, ControlFile.checkPointCopy.nextXid) &&
+			!noupdate)
+		{
+			AdvanceNextXid(current_fxid, ControlFile.checkPointCopy.nextXid);
+		}
+	}
+
 	if (set_oldest_commit_ts_xid != 0)
 		ControlFile.checkPointCopy.oldestCommitTsXid = set_oldest_commit_ts_xid;
 	if (set_newest_commit_ts_xid != 0)
@@ -436,12 +476,19 @@ main(int argc, char *argv[])
 
 	if (set_mxid != 0)
 	{
+		MultiXactId current_mxid = ControlFile.checkPointCopy.nextMulti;
 		ControlFile.checkPointCopy.nextMulti = set_mxid;
 
 		ControlFile.checkPointCopy.oldestMulti = set_oldestmxid;
 		if (ControlFile.checkPointCopy.oldestMulti < FirstMultiXactId)
 			ControlFile.checkPointCopy.oldestMulti += FirstMultiXactId;
 		ControlFile.checkPointCopy.oldestMultiDB = InvalidOid;
+
+		/*
+		 * If current_mxid precedes set_mxid
+		 */
+		if (((int32) (current_mxid - set_mxid) < 0) && !noupdate)
+			AdvanceNextMultiXid(current_mxid, set_mxid)

Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

On Wed, Mar 05, 2025 at 08:55:55PM -0500, Andres Freund wrote:
> Once we've slept for 10+ seconds without reaching the target, sleeping for
> 100us is way too short a sleep and just wastes CPU cycles. A decent portion
> of the CPU time when running under valgrind is wasted just due to way too
> tight retry loops.
> 
> That's harder to do if we have many places polling.
> 
> But anyway, I digress, that's really not related to your change.

Please let me agree with your previous argument, then.  While looking
at the test when reviewing the patch a couple of days ago, I was also
wondering why we could not have a poll_query_until() in BackgroundPsql
and gave up on the idea.

Honestly, I don't see a reason not to introduce that, like in the
attached.  BackgroundPsql->query() does all the job already, and it is
possible to rely on $PostgreSQL::Test::Utils::timeout_default in the
loops, so that's simple, and it makes the test a bit easier to parse.
--
Michael
From 8caea1d434aa1cbd6f1da777b89dd4895a88b44f Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 6 Mar 2025 13:06:05 +0900
Subject: [PATCH v2] Fix race condition in pre-auth test

---
 src/test/authentication/t/007_pre_auth.pl | 36 ++-
 .../perl/PostgreSQL/Test/BackgroundPsql.pm| 44 +++
 2 files changed, 57 insertions(+), 23 deletions(-)

diff --git a/src/test/authentication/t/007_pre_auth.pl b/src/test/authentication/t/007_pre_auth.pl
index a638226dbaf1..fb4241cac4ab 100644
--- a/src/test/authentication/t/007_pre_auth.pl
+++ b/src/test/authentication/t/007_pre_auth.pl
@@ -43,36 +43,26 @@ $psql->query_safe("SELECT injection_points_attach('init-pre-auth', 'wait')");
 # authentication. Use the $psql connection handle for server interaction.
 my $conn = $node->background_psql('postgres', wait => 0);
 
-# Wait for the connection to show up.
-my $pid;
-while (1)
-{
-	$pid = $psql->query(
-		"SELECT pid FROM pg_stat_activity WHERE state = 'starting';");
-	last if $pid ne "";
+# Wait for the connection to show up in pg_stat_activity, with the wait_event
+# of the injection point.
+my $res = $psql->poll_query_until(
+	qq{SELECT count(pid) > 0 FROM pg_stat_activity
+  WHERE state = 'starting' and wait_event = 'init-pre-auth';});
+ok($res, 'authenticating connections are recorded in pg_stat_activity');
 
-	usleep(100_000);
-}
-
-note "backend $pid is authenticating";
-ok(1, 'authenticating connections are recorded in pg_stat_activity');
+# Get the PID of the backend waiting, for the next checks.
+my $pid = $psql->query(
+	qq{SELECT pid FROM pg_stat_activity
+  WHERE state = 'starting' and wait_event = 'init-pre-auth';});
 
 # Detach the waitpoint and wait for the connection to complete.
 $psql->query_safe("SELECT injection_points_wakeup('init-pre-auth');");
 $conn->wait_connect();
 
 # Make sure the pgstat entry is updated eventually.
-while (1)
-{
-	my $state =
-	  $psql->query("SELECT state FROM pg_stat_activity WHERE pid = $pid;");
-	last if $state eq "idle";
-
-	note "state for backend $pid is '$state'; waiting for 'idle'...";
-	usleep(100_000);
-}
-
-ok(1, 'authenticated connections reach idle state in pg_stat_activity');
+$res = $psql->poll_query_until(
+	qq{SELECT state FROM pg_stat_activity WHERE pid = $pid;}, 'idle');
+ok($res, 'authenticated connections reach idle state in pg_stat_activity');
 
 $psql->query_safe("SELECT injection_points_detach('init-pre-auth');");
 $psql->quit();
diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
index c611a61cf4e6..83ecf8b3e720 100644
--- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
+++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
@@ -61,6 +61,7 @@ use Config;
 use IPC::Run;
 use PostgreSQL::Test::Utils qw(pump_until);
 use Test::More;
+use Time::HiRes qw(usleep);
 
 =pod
 
@@ -172,6 +173,49 @@ sub wait_connect
 	die "psql startup timed out" if $self->{timeout}->is_expired;
 }
 
+
+=pod
+
+=item $session->poll_query_until($query [, $expected ])
+
+Run B<$query> repeatedly, until it returns the B<$expected> result
+('t', or SQL boolean true, by default).
+Continues polling if B returns an error result.
+
+Times out after $PostgreSQL::Test::Utils::timeout_default seconds.
+
+Returns 1 if successful, 0 if timed out.
+
+=cut
+
+sub poll_query_until
+{
+	my ($self, $query, $expected) = @_;
+
+	$expected = 't' unless defined($expected);# default value
+
+	my $max_attempts = 10 * $PostgreSQL::Test::Utils::timeout_default;
+	my $attempts = 0;
+
+	while ($attempts < $max_attempts)
+	{
+		my $ret = $self->query($query);
+
+		if ($ret eq $expected)
+		{
+			return 1;
+		}
+
+		# Wait 0.1 second before retrying.
+		usleep(100_000);
+		$attempts++;
+	}
+
+	# Give up.  The output of the last attempt is logged by query(),
+	# so no need to do anything here.
+	return 0;
+}
+
 =pod
 
 =item $session->quit
-- 
2.47.2



signature.asc
Description: PGP signature

Re: track generic and custom plans in pg_stat_statements

Thanks for the review!

> I could see EXPLAIN being somewhat useful (especially for non-interactive 
> things like auto_explain), so a weak +1 on that.

I'll make this a follow-up to this patch.

> Definitely not useful for pg_stat_database IMHO.

agree as well. I did not have strong feelings about this.

> FILE: contrib/pg_stat_statements/expected/plan_cache.out
>
> These tests seem very verbose. Why not just prepare a simple query:
>
> prepare foo as select $1 > 0;
> execute foo(1);
> ...then tweak things via plan_cache_mode to ensure we test the right things?

Yeah, I went overboard to see what others think.
I toned it down drastically for v2 following your advice.


> oldextversions should still have a trailing comma

done


> FILE: contrib/pg_stat_statements/pg_stat_statements.c
>
> + if (cplan && cplan->status > PLAN_CACHE_STATUS_CUSTOM_PLAN)
>
> Not really comfortable with using the enum like this. Better would be 
> explicitly listing the two states that lead to the increment. Not as good but 
> still better:
>   cplan->status >= PLAN_CACHE_STATUS_GENERIC_PLAN_BUILD

fair, I use explicit values for each plan type.

> + PlanCacheStatus status; /* is it a reused generic plan? */
>
> The comment should be updated

done

>
> FILE: contrib/pg_stat_statements/sql/plan_cache.sql
>
> Missing a newline at the end

done

> FILE: doc/src/sgml/pgstatstatements.sgml
>
> + Total number of non-utility statements executed using a generic plan
>
> I'm not sure we need to specify non-utility here.
>

fair, I did not have strong feeling about this either.

Please see v2

Thanks!

--
Sami Imseih
Amazon Web Services (AWS)


v2-0001-add-plan_cache-counters-to-pg_stat_statements.patch
Description: Binary data

Re: Statistics Import and Export

2025-03-05 Thread Corey Huinker

>
> Apologies if this has already been considered upthread, but would it be
> possible to use one query to gather all the required information into a
> sorted table?  At a glance, it looks to me like it might be feasible.  I
> had a lot of luck with reducing the number per-object queries with that
> approach recently (e.g., commit 2329cad).


It's been considered and not ruled out, with a "let's see how the simple
thing works, first" approach. Considerations are:

* pg_stats is keyed on schemaname + tablename (which can also be indexes)
and we need to use that because of the security barrier
* Joining pg_class and pg_namespace to pg_stats was specifically singled
out as a thing to remove.
* The stats data is kinda heavy (most common value lists, most common
elements lists, esp for high stattargets), which would be a considerable
memory impact and some of those stats might not even be needed (example,
index stats for a table that is filtered out)

So it's not impossible, but it's trickier than just, say, tables or indexes.

Re: Add Pipelining support in psql

On Wed, Mar 05, 2025 at 03:25:12PM +0100, Daniel Verite wrote:
>   Anthonin Bonnefoy wrote:
>> I do see the idea to make it easier to convert existing scripts into
>> using pipelining. The main focus of the initial implementation was
>> more on protocol regression tests with psql, so that's not necessarily
>> something I had in mind.
> 
> Understood. Yet pipelining can accelerate considerably certain scripts
> when client-server latency is an issue. We should expect end users to
> benefit from it too.

That was not a test case we had in mind originally here, but if it is
possible to keep the implementation simple while supporting your
demand, well, let's do it.  If it's not that straight-forward, let's
use the new meta-command, forbidding \g and \gx based on your
arguments from upthread.
--
Michael


signature.asc
Description: PGP signature

Re: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

2025-03-05 Thread Greg Sabino Mullane

It looks like your bytea_output is set to 'escape', which would explain
what you are seeing. Try adding this in first:

SET bytea_output = hex;
SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea);

That (hex) is the default value, so you must be setting it to escape
somewhere. You can see where by running:

select * from pg_settings where name = 'bytea_output';

Examine the source, sourcefile, and sourceline columns

I personally prefer hex, but perhaps it's set to octet in your system for a
reason. If not, maybe change it globally?

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

Re: Statistics Import and Export

Hi,

On 2025-02-25 21:29:56 -0500, Corey Huinker wrote:
> On Tue, Feb 25, 2025 at 9:00 PM Jeff Davis  wrote:
>
> > On Mon, 2025-02-24 at 09:54 -0500, Andres Freund wrote:
> > > Have you compared performance of with/without stats after these
> > > optimizations?
> >
> > On unoptimized build with asserts enabled, dumping the regression
> > database:
> >
> >   --no-statistics: 1.0s
> >   master:  3.6s
> >   v3j-0001:3.0s
> >   v3j-0002:1.7s
> >
> > I plan to commit the patches soon.

I think these have all been committed, but I still see a larger performance
difference than what you observed. I just checked because I was noticing that
the tests are still considerably slower than they used to be.


An optimized pg_dump against an unoptimized assert-enabled server:

time ./src/bin/pg_dump/pg_dump --no-data --quote-all-identifiers 
--binary-upgrade --format=custom --no-sync regression > /dev/null
real0m2.778s
user0m0.167s
sys 0m0.057s

$ time ./src/bin/pg_dump/pg_dump --no-data --quote-all-identifiers 
--binary-upgrade --format=custom --no-sync --no-statistics regression > 
/dev/null

real0m1.290s
user0m0.097s
sys 0m0.026s


I thought it might be interesting to look at the set of queries arriving on
the server side, so I enabled pg-stat_statements and ran a dump:

regression[4041753][1]=# SELECT total_exec_time, total_plan_time, calls, plans, 
substring(query, 1, 30) FROM pg_stat_statements ORDER BY calls DESC LIMIT 15;
┌─┬─┬───┬───┬┐
│   total_exec_time   │   total_plan_time   │ calls │ plans │   
substring│
├─┼─┼───┼───┼┤
│   239.967218998 │ 12.5725 │   981 │ 6 │ PREPARE 
getAttributeStats(pg_c │
│  15.3304054 │1.836712 │   282 │ 6 │ PREPARE 
dumpFunc(pg_catalog.oi │
│  10.1291143 │ 0.398348004 │   199 │ 6 │ PREPARE 
dumpTableAttach(pg_cat │
│   9.8874892 │  0.93326201 │84 │84 │ SELECT 
pg_get_partkeydef($1)   │
│  14.3507256 │0.691071 │60 │60 │ SELECT 
pg_catalog.pg_get_viewd │
│  5.11742195 │  1.46042199 │47 │ 6 │ PREPARE 
dumpAgg(pg_catalog.oid │
│ 0.240361996 │0.545125 │41 │41 │ SELECT 
pg_catalog.format_type( │
│   7.0996352 │ 0.470318007 │39 │39 │ SELECT 
pg_catalog.pg_get_ruled │
│0.672752 │  1.90363202 │21 │ 6 │ PREPARE 
dumpDomain(pg_catalog. │
│  1.65192997 │  3.14803806 │21 │22 │ PREPARE 
getDomainConstraints(p │
│1.085548 │  3.96476305 │16 │ 6 │ PREPARE 
dumpCompositeType(pg_c │
│0.196259 │0.602291 │11 │ 6 │ PREPARE 
dumpOpr(pg_catalog.oid │
│0.265461 │4.428914 │10 │10 │ SELECT amprocnum, 
amproc::pg_c │
│ 0.395913993 │9.345471 │10 │10 │ SELECT 
amopstrategy, amopopr:: │
│ 0.357521003 │2.128437 │ 9 │ 9 │ SELECT nspname, 
tmplname FROM  │
└─┴─┴───┴───┴┘


It looks a lot less bad with an optimized build:
regression[4042057][1]=# SELECT total_exec_time, total_plan_time, calls, plans, 
substring(query, 1, 30) FROM pg_stat_statements ORDER BY calls DESC LIMIT 15;
┌─┬─┬───┬───┬┐
│   total_exec_time   │   total_plan_time   │ calls │ plans │   
substring│
├─┼─┼───┼───┼┤
│   50.637642 │2.503585 │   981 │ 6 │ PREPARE 
getAttributeStats(pg_c │
│  3.52419907 │0.478541 │   282 │ 6 │ PREPARE 
dumpFunc(pg_catalog.oi │
│  2.31703585 │0.126379 │   199 │ 6 │ PREPARE 
dumpTableAttach(pg_cat │
│2.291331 │ 0.253604005 │84 │84 │ SELECT 
pg_get_partkeydef($1)   │
│   4.6784333 │0.202578 │60 │60 │ SELECT 
pg_catalog.pg_get_viewd │
│  1.12884404 │ 0.309762004 │47 │ 6 │ PREPARE 
dumpAgg(pg_catalog.oid │
│ 0.06619 │ 0.168136004 │41 │41 │ SELECT 
pg_catalog.format_type( │
│2.102865 │0.115169 │39 │39 │ SELECT 
pg_catalog.pg_get_ruled │
│ 0.16163 │0.439991 │21 │ 6 │ PREPARE 
dumpDomain(pg_catalog. │
│  0.53351201 │0.727573 │21 │22 │ PREPARE 
getDomainConstraints(p │
│ 0.28177 │0.894156 │16 │ 6 │ PREPARE 
dumpCompositeType(pg_c │
│0.038558 │0.140807 │11 │ 6 │ PREPARE 
dumpOpr(pg_catalog.oid │
│0.082078 │  0.96542801

Re: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

2025-03-05 Thread David G. Johnston

On Wednesday, March 5, 2025, Shay Rojansky  wrote:

>
>>> SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea); -- Expected
>>> 0x010203, got AQID
>>>
>>
>> I get \x41514944 which is precisely what I would expect since it what
>> this query results in as well:
>>
>> select 'AQID'::bytea;
>>
>
> If the behavior of RETURNING is meant to be identical to that of simply
> applying a cast, is there any actual advantage in using JSON_VALUE with
> RETURNING? In other words, why not just do JSON_VALUE(json '"AQID"',
> '$')::bytea instead of using RETURNING? I thought the point was precisely
> for RETURNING to be able to perform JSON-specific conversions (e.g. take
> into account that the base64 is being converted from a *JSON* string, and
> therefore apply base64 decoding to it).
>

Not really…it does seem to just be syntactic sugar.  Not that we’d be
likely to assume the contents of a JSON string are a base64 encoding as it
is just, as you claim, a de-facto standard.  Unless we have some standard
(namely the one defining json_value) telling us that the contents are
indeed always base64 encoded data we’ll just assume it’s plain text and act
accordingly - in this case passing it into bytea’s input function.

David J.

Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

Hi,

On 2025-03-05 16:19:04 -0800, Jacob Champion wrote:
> On Wed, Mar 5, 2025 at 9:28 AM Andres Freund  wrote:
> > Unrelated to the change in this patch, but tests really shouldn't use 
> > while(1)
> > loops without a termination condition. If something is wrong, the test will
> > hang indefinitely, instead of timing out.  On the buildfarm that can take 
> > out
> > an animal if it hasn't configured a timeout (with autoconf at least, meson
> > terminates tests after a timeout).
>
> With the current patchset, if I pull the PG_TEST_TIMEOUT_DEFAULT down
> low, and modify the backend so that either one of the two conditions
> never completes, the tests still stop due to BackgroundPsql's session
> timeout. This is true for Meson and Autoconf. Is there a different
> situation where I can't rely on that?

Oh, I see. I missed that it's relying on the timeout and that the timeout
isn't reset in the loop.  Sorry for the noise.

FWIW, I still don't like open-coded poll loops, as I'd really like our tests
to use some smarter retry/backoff logic than a single hardcoded
usleep(100_000).

The first few iterations that's too long, commonly the state isn't reached
in the first iteration, but will be within a millisecond or two. Waiting
100ms is obviously way too long.

Once we've slept for 10+ seconds without reaching the target, sleeping for
100us is way too short a sleep and just wastes CPU cycles. A decent portion
of the CPU time when running under valgrind is wasted just due to way too
tight retry loops.

That's harder to do if we have many places polling.

But anyway, I digress, that's really not related to your change.

Greetings,

Andres Freund

Re: support fast default for domain with constraints

2025-03-05 Thread jian he

hi.

rearrange the patch.
v3-0001 and v3-0002 is preparare patches.
v3-0001 add function: ExecPrepareExprSafe and ExecInitExprSafe.
v3-0002 add function: DomainHaveVolatileConstraints


v3-0003 tests and apply fast default for domain with constraints.
v3-0003 table with empty rows aligned with master behavior.
also no table rewrite if the domain has volatile check constraints,
so less surprising behavior.
From b2b42a27bcad670e3f5843b9db4ee369e3f30d9c Mon Sep 17 00:00:00 2001
From: jian he 
Date: Wed, 5 Mar 2025 20:37:31 +0800
Subject: [PATCH v3 1/3] soft error variant of ExecPrepareExpr, ExecInitExpr

ExecPrepareExprSafe and ExecInitExprSafe.
ExecPrepareExprSafe initialize for expression execution with soft error support.
not all expression node support it. some like CoerceToDomain do support it.

XXX more comments.

discussion: https://postgr.es/m/cacjufxe_+izbr1i49k_ahigpppwltji6km8nosc7fwvkdem...@mail.gmail.com
---
 src/backend/executor/execExpr.c | 63 +
 src/include/executor/executor.h |  2 ++
 2 files changed, 65 insertions(+)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 03566c4d181..04f8b839d30 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -170,6 +170,46 @@ ExecInitExpr(Expr *node, PlanState *parent)
 	return state;
 }
 
+/*
+ * ExecInitExpr: soft error variant of ExecInitExpr.
+ * use it only for expression nodes support soft errors, not all expression
+ * nodes support it.
+*/
+ExprState *
+ExecInitExprSafe(Expr *node, PlanState *parent)
+{
+	ExprState  *state;
+	ExprEvalStep scratch = {0};
+
+	/* Special case: NULL expression produces a NULL ExprState pointer */
+	if (node == NULL)
+		return NULL;
+
+	/* Initialize ExprState with empty step list */
+	state = makeNode(ExprState);
+	state->expr = node;
+	state->parent = parent;
+	state->ext_params = NULL;
+	state->escontext = makeNode(ErrorSaveContext);
+	state->escontext->type = T_ErrorSaveContext;
+	state->escontext->error_occurred = false;
+	state->escontext->details_wanted = true;
+
+	/* Insert setup steps as needed */
+	ExecCreateExprSetupSteps(state, (Node *) node);
+
+	/* Compile the expression proper */
+	ExecInitExprRec(node, state, &state->resvalue, &state->resnull);
+
+	/* Finally, append a DONE step */
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return state;
+}
+
 /*
  * ExecInitExprWithParams: prepare a standalone expression tree for execution
  *
@@ -778,6 +818,29 @@ ExecPrepareExpr(Expr *node, EState *estate)
 	return result;
 }
 
+/*
+ * ExecPrepareExprSafe: soft error variant of ExecPrepareExpr.
+ *
+ * use it when expression node *support* soft error expression execution.
+ * ExecPrepareExpr comments apply to here too.
+ */
+ExprState *
+ExecPrepareExprSafe(Expr *node, EState *estate)
+{
+	ExprState  *result;
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
+
+	node = expression_planner(node);
+
+	result = ExecInitExprSafe(node, NULL);
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return result;
+}
+
 /*
  * ExecPrepareQual --- initialize for qual execution outside a normal
  * Plan tree context.
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d12e3f451d2..b7ab95437fe 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -305,6 +305,7 @@ ExecProcNode(PlanState *node)
  * prototypes from functions in execExpr.c
  */
 extern ExprState *ExecInitExpr(Expr *node, PlanState *parent);
+extern ExprState *ExecInitExprSafe(Expr *node, PlanState *parent);
 extern ExprState *ExecInitExprWithParams(Expr *node, ParamListInfo ext_params);
 extern ExprState *ExecInitQual(List *qual, PlanState *parent);
 extern ExprState *ExecInitCheck(List *qual, PlanState *parent);
@@ -353,6 +354,7 @@ extern ProjectionInfo *ExecBuildUpdateProjection(List *targetList,
  TupleTableSlot *slot,
  PlanState *parent);
 extern ExprState *ExecPrepareExpr(Expr *node, EState *estate);
+extern ExprState *ExecPrepareExprSafe(Expr *node, EState *estate);
 extern ExprState *ExecPrepareQual(List *qual, EState *estate);
 extern ExprState *ExecPrepareCheck(List *qual, EState *estate);
 extern List *ExecPrepareExprList(List *nodes, EState *estate);
-- 
2.34.1

From 9885cda706513ed7d8e85597440609c411678dcc Mon Sep 17 00:00:00 2001
From: jian he 
Date: Thu, 6 Mar 2025 09:58:56 +0800
Subject: [PATCH v3 2/3] add function DomainHaveVolatileConstraints

bool DomainHaveVolatileConstraints(Oid type_id, bool *have_volatile);

Returns true if the Domain has any constraints.
If you want check this domain have any volatile check constraints,
make sure have_volatile is not NULL.

discussion: https://postgr.es/m/cacjufxe_+izbr1i49k_ahigpppwltji6km8nosc7fwvkdem...@mail.gmail.com
---
 src/backend/utils/cache/typcache.c | 37 ++
 src/include/utils/typcache.h   |  1 +
 2 file

Re: what's going on with lapwing?

2025-03-05 Thread Julien Rouhaud

On Tue, Mar 04, 2025 at 10:18:49AM -0500, Tom Lane wrote:
>
> But yeah, I thought we were overdue for a buildfarm release.
> I'm pleased to see that Andrew just pushed one.

FWIW I installed the client version 19.1 this morning and forced a run on HEAD
and lapwing is back to green.

Re: Statistics Import and Export

2025-03-05 Thread Corey Huinker

On Wed, Mar 5, 2025 at 9:18 PM Andres Freund  wrote:

> Hi,
>
> On 2025-03-05 20:54:35 -0500, Corey Huinker wrote:
> > It's been considered and not ruled out, with a "let's see how the simple
> > thing works, first" approach. Considerations are:
> >
> > * pg_stats is keyed on schemaname + tablename (which can also be indexes)
> > and we need to use that because of the security barrier
>
> I don't think that has to be a big issue, you can just make the the query
> query multiple tables at once using an = ANY(ARRAY[]) expression or such.
>

I'm uncertain how we'd do that with (schemaname,tablename) pairs. Are you
suggesting we back the joins from pg_stats to pg_namespace and pg_class and
then filter by oids?

> > * The stats data is kinda heavy (most common value lists, most common
> > elements lists, esp for high stattargets), which would be a considerable
> > memory impact and some of those stats might not even be needed (example,
> > index stats for a table that is filtered out)
>
> Doesn't the code currently have this problem already? Afaict the stats are
> currently all stored in memory inside pg_dump.
>

Each call to getAttributeStats() fetches the pg_stats for one and only one
relation and then writes the SQL call to fout, then discards the result set
once all the attributes of the relation are done.

I don't think the query itself would be a problem, a query querying all the
> required stats should probably use PQsetSingleRowMode() or
> PQsetChunkedRowsMode().

That makes sense if we get the attribute stats from the result set in the
order that we need them, and I don't know how we could possibly do that.
We'd still need a table to bsearch() and that would be huge.

Re: Refactoring postmaster's code to cleanup after child exit

2025-03-05 Thread Noah Misch

On Tue, Mar 04, 2025 at 05:50:34PM -0500, Andres Freund wrote:
> On 2024-12-09 00:12:32 +0100, Tomas Vondra wrote:
> > [23:48:44.444](1.129s) ok 3 - reserved_connections limit
> > [23:48:44.445](0.001s) ok 4 - reserved_connections limit: matches
> > process ended prematurely at
> > /home/user/work/postgres/src/test/postmaster/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
> > line 154.
> > # Postmaster PID for node "primary" is 198592
> 
> 
> I just saw this failure on skink in the BF:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-04%2015%3A43%3A23
> 
> [17:05:56.438](0.247s) ok 3 - reserved_connections limit
> [17:05:56.438](0.000s) ok 4 - reserved_connections limit: matches
> process ended prematurely at 
> /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
>  line 160.
> 
> 
> > That BackgroundPsql.pm line is this in wait_connect()
> > 
> >   $self->{run}->pump()
> > until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired;
> 
> A big part of the problem here imo is the exception behaviour that
> IPC::Run::pump() has:
> 
>   If pump() is called after all harnessed activities have completed, a 
> "process
>   ended prematurely" exception to be thrown.  This allows for simple scripting
>   of external applications without having to add lots of error handling code 
> at
>   each step of the script:
> 
> Which is, uh, not very compatible with how we use IPC::Run (here and
> elsewhere).  Just ending the test because a connection failed is pretty awful.

Historically, I think we've avoided this sort of trouble by doing pipe I/O
only on processes where we feel able to predict when the process will exit.
Commit f44b9b6 is one example (simpler case, not involving pump()).  It would
be a nice improvement to do better, since there's always some risk of
unexpected exit.

> This behaviour makes it really hard to debug problems. It'd have been a lot
> easier to understand the problem if we'd seen psql's stderr before the test
> died.
> 
> I guess that mean at the very least we'd need to put an eval {} around the
> ->pump() call., print $self->{stdout}, ->{stderr} and reraise an error?

That sounds right.

Officially, you could call ->pumpable() before ->pump().  It's defined as
'Returns TRUE if calling pump() won't throw an immediate "process ended
prematurely" exception.'  I lack high confidence that it avoids the exception,
because the pump() still calls pumpable()->reap_nb()->waitpid(WNOHANG) and may
decide "process ended prematurely" based on the new finding.  In other words,
I bet there would be a TOCTOU defect in "$h->pump if $h->pumpable".

> Presumably not just in in wait_connect(), but also at least in pump_until()?

If the goal is to have it capture maximum data from processes that exit when
we don't expect it (seems good to me), yes.

Hook for Selectivity Estimation in Query Planning

2025-03-05 Thread Andrei Lepikhov


Hi,

I would like to discuss the introduction of a hook for evaluating the 
selectivity of an expression when searching for an optimal query plan. 
This topic has been brought up in various discussions, for example, in [1].


Currently, extensions that interact with the optimiser can only add 
their paths without the ability to influence the optimiser's decisions. 
As a result, when developing an extension that implements a new type of 
statistics (such as a histogram for composite types), utilises knowledge 
from previously executed queries, or implements some system of 
selectivity hints, we find ourselves writing a considerable amount of 
code. To ensure the reliable operation of the extension, this may end up 
in developing a separate optimiser or, at the very least, creating a 
custom join search (refer to core.c in the pg_hint_plan extension for an 
estimation of the amount of code required).


A hook for evaluating selectivity could streamline the development of 
methods to improve selectivity evaluation, making it easier to create 
new types of statistics and estimation methods (I would like to deal 
with join clauses estimation). Considering the limited amount of code 
involved and the upcoming code freeze, I propose adding such a hook to 
PostgreSQL 18 to assess how it simplifies extension development.


This proposed hook would complement the existing path hooks without 
overlapping in functionality. In my experience with implementing 
adaptive features in enterprise solutions, I believe that additional 
hooks could also be beneficial for estimating the number of groups and 
the amount of memory allocated, which is currently based solely on 
work_mem. However, these suggestions do not interfere with the current 
proposal and could be considered later.


Critique:
In general, a hook for evaluating the number of rows appears to be a 
more promising approach. It would allow the extension to access specific 
RelOptInfo data, thus providing insights into where the evaluation takes 
place within the plan. Consequently, this would enable a deeper 
influence on the query plan choice. However, implementing such a hook 
might be more invasive, requiring modifications to each cost function. 
Additionally, it addresses a slightly different issue and can be 
considered separately.


Attached is a patch containing the proposed hook code.

--
regards, Andrei Lepikhov
From 7113a4d0b45a4cf00b78076d55570ade60ff9841 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" 
Date: Wed, 5 Mar 2025 09:38:30 +0100
Subject: [PATCH] Introduce selectivity hook.

---
 src/backend/optimizer/path/clausesel.c |  5 +
 src/backend/utils/adt/selfuncs.c   |  1 +
 src/include/utils/selfuncs.h   | 11 +++
 3 files changed, 17 insertions(+)

diff --git a/src/backend/optimizer/path/clausesel.c 
b/src/backend/optimizer/path/clausesel.c
index 5d51f97f219..6b5d49d0786 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -128,6 +128,11 @@ clauselist_selectivity_ext(PlannerInfo *root,
ListCell   *l;
int listidx;
 
+   if (clauselist_selectivity_hook)
+   return clauselist_selectivity_hook(root, clauses, varRelid, 
jointype,
+   
   sjinfo, &estimatedclauses,
+   
   use_extended_stats);
+
/*
 * If there's exactly one clause, just go directly to
 * clause_selectivity_ext(). None of what we might do below is relevant.
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c2918c9c831..e66346ef6b4 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -147,6 +147,7 @@
 /* Hooks for plugins to get control when we ask for stats */
 get_relation_stats_hook_type get_relation_stats_hook = NULL;
 get_index_stats_hook_type get_index_stats_hook = NULL;
+clauselist_selectivity_hook_type clauselist_selectivity_hook = NULL;
 
 static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
 static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index d35939651f3..ce4ba27d183 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -149,6 +149,17 @@ typedef bool (*get_index_stats_hook_type) (PlannerInfo 
*root,

   VariableStatData *vardata);
 extern PGDLLIMPORT get_index_stats_hook_type get_index_stats_hook;
 
+/* Hooks for plugins to get control when we ask for selectivity estimation */
+typedef Selectivity (*clauselist_selectivity_hook_type) (
+   
PlannerInfo *root,
+

RE: Selectively invalidate caches in pgoutput module

2025-03-05 Thread Hayato Kuroda (Fujitsu)

Dear Hou,

Thanks for reading the thread!

> > Attached patchset implemented with the approach.
> > 0001 contains only part to avoid whole cache invalidation, for ADD/SET/DROP
> > command.
> 
> I think the changes proposed in 0001 are reasonable. Basically, any
> modifications to pg_publication_rel or pg_publication_namespace should trigger
> relcache invalidation messages. This is necessary for rebuilding the cache
> stored in RelationData::rd_pubdesc, given that these changes could impact
> whether a table is published. Following this logic, it should be safe and
> appropriate to depend on individual relcache invalidation messages to
> invalidate the relsynccache in pgoutput, rather than invalidating the entire
> relsynccache for a single catalog change.

Right. I should have clarified these points on my previous post.

> One nitpick for 0001: the comments atop of the rel_sync_cache_publication_cb()
> need updating, as they still reference pg_publication_rel and
> pg_publication_namespace. Aside from this, 0001 looks good to me.

Thanks for pointing out, fixed. PSA new version.

Additionally, I found a feasibility that discarding whole-cache can be avoided
more. Currently, all caches are unconditionally removed when pg_namesace is 
updated.
Invalidation is needed to update the qualified name of tables in the replication
messages after the schema renames, but IIUC it is OK to do for relations that
belong to the renamed schema.

IIUC there are two approaches to implement it.
The first idea is to introduce a new type of invalidation message. This is 
similar
to what 0002 does. The invalidation message can contain relids that belong to
the renaming schema, and pgoutput can register callback for it.
The second idea is to use syscache callback. IIUC, the callback is passed the
hash of oid. Thus, RelSyncCache can contain hash values of its schema, and they
can be compared with a callback function.

I noted the idea on the code as XXX code comment in 0001.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



v3-0001-Avoid-Invalidating-all-entries-when-ALTER-PUBLICA.patch
Description:  v3-0001-Avoid-Invalidating-all-entries-when-ALTER-PUBLICA.patch


v3-0002-Introduce-a-new-invalidation-message-to-invalidat.patch
Description:  v3-0002-Introduce-a-new-invalidation-message-to-invalidat.patch


v3-0003-Invalidate-Relcaches-while-ALTER-PUBLICATION-OWNE.patch
Description:  v3-0003-Invalidate-Relcaches-while-ALTER-PUBLICATION-OWNE.patch

Re: Draft for basic NUMA observability

2025-03-05 Thread Jakub Wartak

On Tue, Mar 4, 2025 at 5:02 PM Bertrand Drouvot
 wrote:

> > Cool! Attached is v7
> Thanks for the new version!

... and another one: 7b ;)

> > > === 2
[..]
> > Well, I've made query_numa a parameter there simply to avoid that code
> > duplication in the first place, look at those TupleDescInitEntry()...
>
> Yeah, that's why I was mentioning to use a "shared" 
> populate_buffercache_entry()
> or such function: to put the "duplicated" code in it and then use this
> shared function in pg_buffercache_pages() and in the new numa related one.

OK, so hastily attempted that in 7b , I had to do a larger refactor
there to avoid code duplication between those two. I don't know which
attempt is better though (7 vs 7b)..

> > IMHO rarely anybody uses pg_buffercache, but we could add unlikely()
>
> I think unlikely() should be used for optimization based on code path 
> likelihood,
> not based on how often users might use a feature.

In 7b I've removed the unlikely() - For a moment I was thinking that
you are concerned about this loop for NBuffers to be as much optimized
as it can and that's the reason for splitting the routines.

> > > === 5
[..]
> I meant to say avoid code duplication between pg_get_shmem_allocations() and
> pg_get_shmem_numa_allocations(). It might be possible to create a shared
> function for them too. That said, it looks like that the savings (if any), 
> would
> not be that much, so maybe just forget about it.

 Yeah, OK, so let's leave it at that.

-J.


v7b-0003-Add-pg_shmem_numa_allocations-to-show-NUMA-zones.patch
Description: Binary data


v7b-0002-Extend-pg_buffercache-with-new-view-pg_buffercac.patch
Description: Binary data


v7b-0001-Add-optional-dependency-to-libnuma-Linux-only-fo.patch
Description: Binary data

Re: doc: expand note about pg_upgrade's --jobs option

2025-03-05 Thread Daniel Gustafsson

> On 4 Mar 2025, at 19:08, Nathan Bossart  wrote:

> The attached patch replaces the "et cetera" with those two general categories.

LGTM.

--
Daniel Gustafsson

Re: Add contrib/pg_logicalsnapinspect

Hi,

On Wed, Mar 05, 2025 at 02:42:15PM +0530, Amit Kapila wrote:
> On Wed, Mar 5, 2025 at 12:47 PM Bertrand Drouvot
>  wrote:
> >
> > Agree, PFA a patch doing so.
> >
> 
> It would be better if you could add a few comments atop the
> permutation line to explain the working of the test.

yeah makes sense. Done in the attached, and bonus point I realized that the
test could be simplified (so, removing useless steps in passing).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 6154e847e4e5c2c7eb816d4302a4704d5a690954 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Wed, 5 Mar 2025 07:06:58 +
Subject: [PATCH v2] Modify pg_logicalinspect isolation test

The previous version was relying on the fact that the test produces exactly
2 snapshots on disk, while in fact it can produce more. Changing the test knowing
that at least 2 snapshots are generated.

In passing, removing useless steps and adding some comments.

Per buildfarm member skink.
---
 .../expected/logical_inspect.out  | 36 ---
 .../specs/logical_inspect.spec| 11 +++---
 2 files changed, 22 insertions(+), 25 deletions(-)
  58.4% contrib/pg_logicalinspect/expected/
  41.5% contrib/pg_logicalinspect/specs/

diff --git a/contrib/pg_logicalinspect/expected/logical_inspect.out b/contrib/pg_logicalinspect/expected/logical_inspect.out
index d95efa4d1e5..b7d01dbb68b 100644
--- a/contrib/pg_logicalinspect/expected/logical_inspect.out
+++ b/contrib/pg_logicalinspect/expected/logical_inspect.out
@@ -1,6 +1,6 @@
 Parsed test spec with 2 sessions
 
-starting permutation: s0_init s0_begin s0_savepoint s0_truncate s1_checkpoint s1_get_changes s0_commit s0_begin s0_insert s1_checkpoint s1_get_changes s0_commit s1_get_changes s1_get_logical_snapshot_info s1_get_logical_snapshot_meta
+starting permutation: s0_init s0_begin s0_savepoint s0_truncate s1_checkpoint s1_get_changes s0_commit s1_checkpoint s1_get_changes s1_get_logical_snapshot_info_catchange s1_get_logical_snapshot_info_committed s1_get_logical_snapshot_meta
 step s0_init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
 ?column?
 
@@ -17,8 +17,6 @@ data
 (0 rows)
 
 step s0_commit: COMMIT;
-step s0_begin: BEGIN;
-step s0_insert: INSERT INTO tbl1 VALUES (1);
 step s1_checkpoint: CHECKPOINT;
 step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
 data   
@@ -28,25 +26,21 @@ table public.tbl1: TRUNCATE: (no-flags)
 COMMIT 
 (3 rows)
 
-step s0_commit: COMMIT;
-step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
-data 
--
-BEGIN
-table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
-COMMIT   
-(3 rows)
+step s1_get_logical_snapshot_info_catchange: SELECT count(*) > 0 as has_catchange FROM pg_ls_logicalsnapdir(), pg_get_logical_snapshot_info(name) AS info where info.catchange_count = 2 and array_length(info.catchange_xip,1) = 2 and info.committed_count = 0;
+has_catchange
+-
+t
+(1 row)
 
-step s1_get_logical_snapshot_info: SELECT info.state, info.catchange_count, array_length(info.catchange_xip,1) AS catchange_array_length, info.committed_count, array_length(info.committed_xip,1) AS committed_array_length FROM pg_ls_logicalsnapdir(), pg_get_logical_snapshot_info(name) AS info ORDER BY 2;
-state |catchange_count|catchange_array_length|committed_count|committed_array_length
---+---+--+---+--
-consistent|  0|  |  2| 2
-consistent|  2| 2|  0|  
-(2 rows)
+step s1_get_logical_snapshot_info_committed: SELECT count(*) > 0 as has_committed FROM pg_ls_logicalsnapdir(), pg_get_logical_snapshot_info(name) AS info where info.committed_count = 2 and array_length(info.committed_xip,1) = 2 and info.catchange_count = 0;
+has_committed
+-
+t
+(1 row)
 
-step s1_get_logical_snapshot_meta: SELECT COUNT(meta.*) from pg_ls_logicalsnapdir(), pg_get_logical_snapshot_meta(name) as meta;
-count
--
-2
+step s1_get_logical_snapshot_meta: SELECT COUNT(meta.*) > 1 AS has_meta from pg_ls_logicalsnapdir(), pg_get_logical_snapshot_meta(name) as meta;
+has_meta
+
+t   
 (1 row)
 
diff --git a/contrib/pg_logicalinspect/specs/logical_inspect.spec b/contrib/pg_logicalinspect/specs/logical_inspect.spec
index 9851a6c18e4..631da

Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

2025-03-05 Thread Heikki Linnakangas

On 28/02/2025 03:53, Peter Geoghegan wrote:

On Sat, Feb 8, 2025 at 8:47 AM Michail Nikolaev
wrote:

Just some commit messages + few cleanups.

I'm worried about this:

+These longer pin lifetimes can cause buffer exhaustion with messages like "no
+unpinned buffers available" when the index has many pages that have similar
+ordering; but future work can figure out how to best work that out.

I think that we should have some kind of upper bound on the number of
pins that can be acquired at any one time, in order to completely
avoid these problems. Solving that problem will probably require GiST
expertise that I don't have right now.

+1. With no limit, it seems pretty easy to hold thousands of buffer pins
with this.

The index can set IndexScanDesc->xs_recheck to indicate that the quals
must be rechecked. Perhaps we should have a similar flag to indicate
that the visibility must be rechecked.

Matthias's earlier patch
(https://www.postgresql.org/message-id/CAEze2Wg1kbpo_Q1%3D9X68JRsgfkyPCk4T0QN%2BqKz10%2BFVzCAoGA%40mail.gmail.com)
had a more complicated mechanism to track the pinned buffers. Later
patch got rid of that, which simplified things a lot. I wonder if we
need something like that, after all.

Here's a completely different line of attack: Instead of holding buffer
pins for longer, what if we checked the visibility map earlier? We could
check the visibility map already when we construct the
GISTSearchHeapItem, and set a flag in IndexScanDesc to tell
IndexOnlyNext() that we have already done that. IndexOnlyNext() would
have three cases:

1. The index AM has not checked the visibility map. Check it in
IndexOnlyNext(), and fetch the tuple if it's not set. This is what it
always does today.
2. The index AM has checked the visibility map, and the VM bit was set.
IndexOnlyNext() can skip the VM check and use the tuple directly.
3. The index AM has checked the visibility map, and the VM bit was not
set. IndexOnlyNext() will fetch the tuple to check its visibility.

--
Heikki Linnakangas
Neon (https://neon.tech)

RE: Selectively invalidate caches in pgoutput module

2025-03-05 Thread Hayato Kuroda (Fujitsu)

Dear hackers,

I did a performance testing with HEAD and v2-0001, and confirmed that it could
improve performance around 20% in the typical case.

Workload
==
I emulated a scenario that there are many tables to be published and only one
table is stop and resume publishing. In HEAD, ALTER PUBLICATION DROP/ADD command
invalidates all entries in the RelSyncCache, but v2 invalidates only a needed
one - this means the decoding time would be reduced after patching.

1. Initialized an instance
2. Created a root table and 5000 leaf tables.
3. Created another table which did not have parent-child relationship.
4. Created a publication which included a root table and another table
5. Created a replication slot with pgoutput plugin.
6. Executed a transaction which would be decoded. In the transaction:
a. Inserted tuples to all the leaf tables
b. Altered publication to drop another table
c. Altered publication again to add the dropped one
d. Inserted tuples to all the leaf tables again.
7. measured decoding time via SQL interfaces, five times.

Attached script automated above.

Results
=
Each cell is a median value of 5 runs. Compared with HEAD, V2-0001 can reduce
the decoding time, relatively 20%.

head [ms]   v2-0001 [ms]
252 198

I'm planning to do further tests to prove the benefit for 0002, 0003 patches.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



test_p.sh
Description: test_p.sh

Allow database owners to CREATE EVENT TRIGGER

2025-03-05 Thread Steve Chavez

Hello hackers,

Currently PostgreSQL only allows creating event triggers for superusers,
this prevents usage on PostgreSQL service providers, which do not grant
superuser access.

This patch allows database owners to create event triggers, while
preventing privilege escalation.

Unlike superuser event triggers, which execute functions for every role,
database owner event triggers are only executed for non-superusers.
This is necessary to prevent privesc. i.e. a superuser tripping on an event
trigger containing an `ALTER ROLE dbowner SUPERUSER`.

For skipping dbowner event triggers for superusers:

- A restriction is added for superuser event triggers, the event trigger
function must be owned by a superuser.
  + While this is a breaking change, I think it's minor as the usual flow
is to "login as superuser" -> "create an evtrig function" -> "create the
evtrig". This is also proved by the existing tests, which barely change.
- A restriction is added for dbowner event triggers, the event trigger
function must not be owned by a superuser.

This way we can filter dbowner event trigger functions inside
`EventTriggerInvoke`.

Tests are included in the patch, I've added a dedicated regression file for
easier review. Only a couple of error messages of the existing event
trigger regression tests are changed.

Any feedback is welcomed. I haven't added docs yet but I'll gladly add them
if the community thinks this patch makes sense.

(Previous thread that also discussed allowing event triggers for
non-superusers:
https://www.postgresql.org/message-id/flat/81C10FFB-5ADC-4956-9337-FA248A4CC20D%40enterprisedb.com#77738d12b82c9a403ea2c56ed09949a3
)

Best regards,
Steve Chavez
From 84965d3ad70c4794f93473b939f5437c0d154826 Mon Sep 17 00:00:00 2001
From: steve-chavez 
Date: Wed, 26 Feb 2025 20:17:36 -0500
Subject: [PATCH] Allow database owners to CREATE EVENT TRIGGER

---
 src/backend/commands/dbcommands.c |  22 
 src/backend/commands/event_trigger.c  |  38 --
 src/backend/utils/cache/lsyscache.c   |  23 
 src/include/commands/dbcommands.h |   1 +
 src/include/utils/lsyscache.h |   1 +
 src/test/regress/expected/event_trigger.out   |   2 +-
 src/test/regress/parallel_schedule|   2 +-
 .../regress/sql/event_trigger_dbowner.sql | 117 ++
 8 files changed, 197 insertions(+), 9 deletions(-)
 create mode 100644 src/test/regress/sql/event_trigger_dbowner.sql

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 1753b289074..4cd31dc1d90 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -3201,6 +3201,28 @@ get_database_name(Oid dbid)
 	return result;
 }
 
+/*
+ * get_database_owner - given a database OID, look up the owner OID
+ *
+ * If the owner is not found, it will return InvalidOid
+ */
+Oid
+get_database_owner(Oid dbid)
+{
+	HeapTupledbtuple;
+	Oid  dbowner;
+
+	dbtuple = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(dbid));
+	if (HeapTupleIsValid(dbtuple))
+	{
+		dbowner = ((Form_pg_database) GETSTRUCT(dbtuple))->datdba;
+		ReleaseSysCache(dbtuple);
+	}
+	else
+		dbowner = InvalidOid;
+
+	return dbowner;
+}
 
 /*
  * While dropping a database the pg_database row is marked invalid, but the
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index edc2c988e29..62e7c7326f4 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -34,6 +34,7 @@
 #include "catalog/pg_trigger.h"
 #include "catalog/pg_ts_config.h"
 #include "catalog/pg_type.h"
+#include "commands/dbcommands.h"
 #include "commands/event_trigger.h"
 #include "commands/extension.h"
 #include "commands/trigger.h"
@@ -125,18 +126,17 @@ CreateEventTrigger(CreateEventTrigStmt *stmt)
 	Oid			evtowner = GetUserId();
 	ListCell   *lc;
 	List	   *tags = NULL;
+	bool 		is_database_owner = (evtowner == get_database_owner(MyDatabaseId));
+	bool 		owner_is_super = superuser_arg(evtowner);
+	bool 		function_is_owned_by_super;
 
-	/*
-	 * It would be nice to allow database owners or even regular users to do
-	 * this, but there are obvious privilege escalation risks which would have
-	 * to somehow be plugged first.
-	 */
-	if (!superuser())
+	if (!owner_is_super && !is_database_owner){
 		ereport(ERROR,
 (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  errmsg("permission denied to create event trigger \"%s\"",
 		stmt->trigname),
- errhint("Must be superuser to create an event trigger.")));
+ errhint("Must be the database owner or superuser to create an event trigger.")));
+	}
 
 	/* Validate event name. */
 	if (strcmp(stmt->eventname, "ddl_command_start") != 0 &&
@@ -200,6 +200,24 @@ CreateEventTrigger(CreateEventTrigStmt *stmt)
  errmsg("function %s must return type %s",
 		NameListToString(stmt->funcname), "event_trigger")));
 
+	function_is_owned_by_super  = superuser_arg(get_func_owner(funcoid))

Re: making EXPLAIN extensible

2025-03-05 Thread Andrei Lepikhov


On 4/3/2025 22:23, Robert Haas wrote:

On Tue, Mar 4, 2025 at 1:53 PM Jeff Davis  wrote:

I don't expect there to be zillions of extensions that only add new and
exciting explain options. Instead, it seems more likely that all
TableAM and CustomScan extensions will have 1-3 new explain options,
and that some of those might collide. For example, several may have a
EXPLAIN(PUSHDOWN) option that explains what work is being pushed down
into the TableAM/CustomScan.

In that case it's not even clear to me that a collision is a problem.
Would you really only want pushdown information from extension A, and
be upset that it also emits pushdown information from extension B?
Maybe we should just allow multiple extensions to use the same option
name?

...
Actually, I don't think custom scans or table AMs are the design
center for this feature. Keep in mind that, for a custom scan, we
already have ExplainCustomScan as part of CustomScanState->methods. We
don't currently have anything similar for table AMs, and you could
work around that with these hooks, by checking every node to see
whether it's a scan node and if so whether it scans a relation that
matches your table AM, but it would probably be better to have a
method for it, similar to what we have for CustomScan, if this is
something people want to do. It would be more efficient and require
less code.
+1. In my experience, ExplainCustomScan has always been enough for the 
CustomScan node.
As for extensions collision - for now, it would be better to treat 
extensions as independent actors, suggesting that the developer, 
designing a software solution based on an extensions' constellation, 
will arrange their behaviour during development.
For instance,  if your library exports a function or variable, adding a 
prefix is essential to prevent overlapping functions when another 
library is loaded.
I recall that Yurii Rashkovskii is a proponent of using multiple 
extensions within a single installation. Perhaps he has insights on this 
topic?


--
regards, Andrei Lepikhov

Re: Add regression test checking combinations of (object,backend_type,context) in pg_stat_io

On Wed, Mar 05, 2025 at 07:34:16AM +, Bertrand Drouvot wrote:
> That would mean changing the test each time pgstat_tracks_io_object() is
> modified in such a way that the output is changed. That's a good thing as
> the writer will need to double check if the new output makes sense according
> to his changes. So I don't see any reason not to add this test.

Thanks for the review.

> What about adding some extra paranoia like?
> 
> SELECT backend_type, object, context FROM pg_stat_io ORDER BY
> backend_type, object, context COLLATE "C";

Why not, to force the ordering.
--
Michael


signature.asc
Description: PGP signature

Re: Monitoring gaps in XLogWalRcvWrite() for the WAL receiver

On Wed, Mar 05, 2025 at 08:04:44AM +, Bertrand Drouvot wrote:
> On Wed, Mar 05, 2025 at 12:35:26PM +0900, Michael Paquier wrote:
> Also for sync? sync looks fine as issue_xlog_fsync() is being called in 
> XLogWalRcvFlush(), no?

Yes, we're OK for the sync data aggregated in the WAL receiver on
HEAD, as mentioned below, not in the back-branches.

> We're not emitting some statistics, so I think that it's hard for users to
> complain about something they don't/can't see.

One would see idle data in pg_stat_wal on a standby, so the lack of
data could be annoying, but I'm perhaps the only one who noticed
that..

> Same logic as in XLogWrite() and I don't think there is a need for a 
> dedicated wait event, so LGTM.

Thanks.
--
Michael

signature.asc
Description: PGP signature

Re: per backend WAL statistics

2025-03-05 Thread Xuneng Zhou

Subject: Clarification Needed on WAL Pending Checks in Patchset

Hi,

Thank you for the patchset. I’ve spent some time learning and reviewing it
and have 2 comments.  I'm new to PostgreSQL hacking, so please bear with me
if I make mistakes or say something that seems trivial.

I noticed that in patches prior to patch 6, the function
pgstat_backend_wal_have_pending() was implemented as follows:

/*
 * To determine whether any WAL activity has occurred since last time, not
 * only the number of generated WAL records but also the numbers of WAL
 * writes and syncs need to be checked. Because even transactions that
 * generate no WAL records can write or sync WAL data when flushing the
 * data pages.
 */
static bool
pgstat_backend_wal_have_pending(void)
{
PgStat_PendingWalStats pending_wal;

pending_wal = PendingBackendStats.pending_wal;

return pgWalUsage.wal_records != prevBackendWalUsage.wal_records ||
   pending_wal.wal_write != 0 || pending_wal.wal_sync != 0;
}

Starting with patch 7, it seems the implementation was simplified to:

/*
 * To determine whether WAL usage happened.
 */
static bool
pgstat_backend_wal_have_pending(void)
{
return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
}

Meanwhile, the cluster-level check in the function
pgstat_wal_have_pending_cb() still performs the additional checks:

bool
pgstat_wal_have_pending_cb(void)
{
return pgWalUsage.wal_records != prevWalUsage.wal_records ||
   PendingWalStats.wal_write != 0 ||
   PendingWalStats.wal_sync != 0;
}

The difference lies in the removal of the checks for wal_write and wal_sync
from the per-backend function. I assume that this may be due to factoring
out these counters—perhaps because they can now be extracted from
pg_stat_get_backend_io(). However, I’m not entirely sure I grasp the full
rationale behind this change.

Another one is on:
Bertrand Drouvot  于2025年3月3日周一 18:47写道：

> Hi,
>
> On Mon, Mar 03, 2025 at 09:17:30AM +, Bertrand Drouvot wrote:
> > hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents
> things
> > that need to be called from pgstat_report_wal(). But I think that's open
> > door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX is
> not
> > related to pgstat_report_wal() at all. So, I'm tempted to keep it as it
> is.
>
> I just realized that pgstat_flush_backend() is not correct in 0001. Indeed
> we check:
>
> "
> if (pg_memory_is_all_zeros(&PendingBackendStats,
>sizeof(struct PgStat_BackendPending)))
> return false;
> "
>
> but the WAL stats are not part of PgStat_BackendPending... So we only check
> for IO pending stats here. I'm not sure WAL stats could be non empty if IO
> stats are but the attached now also takes care of pending WAL stats here.
>
> Regards,
>
> --
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com

I've noticed that here's a function for checking if there are any backend
stats waiting for flush:
/*
 * Check if there are any backend stats waiting for flush.
 */
bool
pgstat_backend_have_pending_cb(void)
{
return (!pg_memory_is_all_zeros(&PendingBackendStats,
sizeof(struct PgStat_BackendPending)));
}

[PGSTAT_KIND_BACKEND] = {

.have_static_pending_cb = pgstat_backend_have_pending_cb,
.flush_static_cb = pgstat_backend_flush_cb,
.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
},

Should the following modification be applied to the above function as well.
Or should we modify the comment 'any backend stat' if we choose to check
i/o only.
@@ -199,7 +258,8 @@ pgstat_flush_backend(bool nowait, bits32 flags)
  return false;

  if (pg_memory_is_all_zeros(&PendingBackendStats,
-   sizeof(struct PgStat_BackendPending)))
+   sizeof(struct PgStat_BackendPending))
+ && !pgstat_backend_wal_have_pending())
  return false;

Best regards,

[Xuneng]

Re: Wrong results with subquery pullup and grouping sets

2025-03-05 Thread Dean Rasheed

On Wed, 5 Mar 2025 at 09:09, Richard Guo  wrote:
>
> I noticed the adjacent code that sets wrap_non_vars to true if we
> are considering an appendrel child subquery, and I doubt this is
> necessary.
>
>  /*
>   * If we are dealing with an appendrel member then anything that's not a
>   * simple Var has to be turned into a PlaceHolderVar.  We force this to
>   * ensure that what we pull up doesn't get merged into a surrounding
>   * expression during later processing and then fail to match the
>   * expression actually available from the appendrel.
>   */
>  if (containing_appendrel != NULL)
>  rvcontext.wrap_non_vars = true;
>
> As explained in e42e31243, the only part of the upper query that could
> reference the appendrel child yet is the translated_vars list of the
> associated AppendRelInfo that we just made for this child, and we do
> not want to force use of PHVs in the AppendRelInfo (see the comment in
> perform_pullup_replace_vars).  In fact, perform_pullup_replace_vars
> sets wrap_non_vars to false before performing pullup_replace_vars on
> the AppendRelInfo.  It seems to me that this makes the code shown
> above unnecessary, and we can simply remove it.
>

Yes, that makes sense, and it seems like a fairly straightforward
simplification, given that perform_pullup_replace_vars() sets it back
to false shortly afterwards.

The same thing happens in pull_up_constant_function().

Regards,
Dean

Re: Incorrect result of bitmap heap scan.

2025-03-05 Thread Matthias van de Meent

On Sun, 2 Mar 2025 at 01:35, Tom Lane  wrote:
>
> Peter Geoghegan  writes:
> > Is everybody in agreement about committing and back patching this fix,
> > which simply disables the optimization altogether?
> > I myself don't see a better way, but thought I'd ask before proceeding
> > with review and commit.
>
> If you don't see a clear path forward, then "disable" is the only
> reasonable choice for the back branches.  Maybe we'll find a fix
> in future, but it seems unlikely that it'd be back-patchable.

Agreed.

Here's patch v5 for the master branch (now up to f4694e0f), with no
interesting changes other than fixing apply conflicts caused by
bfe56cdf.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)


v5-0002-isolationtester-showing-broken-index-only-bitmap-.patch
Description: Binary data


v5-0001-Remove-HeapBitmapScan-s-skip_fetch-optimization.patch
Description: Binary data

Re: Considering fractional paths in Append node

2025-03-05 Thread Alexander Korotkov

On Wed, Mar 5, 2025 at 8:32 AM Andrei Lepikhov  wrote:
> On 5/3/2025 03:27, Alexander Korotkov wrote:
> > On Mon, Mar 3, 2025 at 1:04 PM Andrei Lepikhov  wrote:
> >>> 2. As usage of root->tuple_fraction RelOptInfo it has been criticized,
> >>> do you think we could limit this to some simple cases?  For instance,
> >>> check that RelOptInfo is the final result relation for given root.
> >> I believe that using tuple_fraction is not an issue. Instead, it serves
> >> as a flag that allows the upper-level optimisation to consider
> >> additional options. The upper-level optimiser has more variants to
> >> examine and will select the optimal path based on the knowledge
> >> available at that level. Therefore, we're not introducing a mistake
> >> here; we're simply adding extra work in the narrow case. However, having
> >> only the bottom-up planning process, I don't see how we could avoid this
> >> additional workload.
> >
> > Yes, but if we can assume root->tuple_fraction applies to result of
> > Append, it's strange we apply the same tuple fraction to all the child
> > rels.  Latter rels should less likely be used at all and perhaps
> > should have less tuple_fraction.
> Of course, it may happen. But I'm not sure it is a common rule.
> Using LIMIT, we usually select data according to specific clauses.
> Imagine, we need TOP-100 ranked goods. Appending partitions of goods, we
> will use the index on the 'rating' column. But who knows how top-rated
> goods are spread across partitions? Maybe a single partition contains
> all of them? So, we need to select 100 top-rated goods from each partition.
> Hence, applying the same limit to each partition seems reasonable, right?

Ok, I didn't notice add_paths_to_append_rel() is used for MergeAppend
as well.  I thought again about regular Append.  If can have required
number of rows from the first few children relations, the error of
tuple fraction shouldn't influence plans much, and other children
relations wouldn't be used at all.  But if we don't, we unlikely get
prediction of selectivity accurate enough to predict which exact
children relations are going to be used.  So, usage root tuple
fraction for every child relation would be safe.  So, this approach
makes sense to me.

--
Regards,
Alexander Korotkov
Supabase

Re: Separate GUC for replication origins

2025-03-05 Thread Amit Kapila

On Wed, Mar 5, 2025 at 12:42 PM Masahiko Sawada  wrote:
>
> On Tue, Mar 4, 2025 at 10:42 PM Amit Kapila  wrote:
> >
> > On Wed, Mar 5, 2025 at 6:24 AM Euler Taveira  wrote:
> > >
> > > On Sat, Mar 1, 2025, at 10:08 AM, Amit Kapila wrote:
> > >
> > > On Thu, Feb 13, 2025 at 6:48 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > Thank you for updating the patch! The patch looks mostly good to me.
> > > >
> > >
> > > + /*
> > > + * Prior to PostgreSQL 18, max_replication_slots was used to set the
> > > + * number of replication origins. For backward compatibility, -1 
> > > indicates
> > > + * to use the fallback value (max_replication_slots).
> > > + */
> > > + if (max_replication_origin_sessions == -1)
> > >
> > > Shouldn't we let users use max_replication_origin_sessions for
> > > subscribers? Maintaining this mapping for a long time can create
> > > confusion such that some users will keep using max_replication_slots
> > > for origins, and others will start using
> > > max_replication_origin_sessions. Such a mapping would be useful if we
> > > were doing something like this in back-branches, but doing it in a
> > > major version appears to be more of a maintenance burden.
> > >
> > >
> > > It was my initial proposal. Of course, it breaks compatibility but it
> > > forces the users to adopt this new GUC. It will be a smooth adoption
> > > because if we use the same value as max_replication_slots it means
> > > most of the scenarios will be covered (10 is a good start point for the
> > > majority of replication scenarios in my experience).
> > >
> > > However, Peter E suggested that it would be a good idea to have a
> > > backward compatibility so I added it.
> > >
> > > We need to decide which option we want:
> > >
> > > 1. no backward compatibility and max_replication_origin_sessions = 10 as
> > >   default value. It might break scenarios that have the number of
> > >   subscriptions greater than the default value.
> > >
> >
> > To avoid breakage, can we add a check during the upgrade to ensure
> > that users have set max_replication_origin_sessions appropriately? See
> > the current check in function
> > check_new_cluster_subscription_configuration(). It uses
> > max_replication_slots, we can change it to use
> > max_replication_origin_sessions for versions greater than or equal to
> > 18. Will that address this concern?
>
> I think that the patch already replaced max_replication_slots with
> max_replication_origin_sessions in that function.
>

Then, that should be sufficient to avoid upgrade related risks.

-- 
With Regards,
Amit Kapila.

Re: Wrong results with subquery pullup and grouping sets

2025-03-05 Thread Richard Guo

On Wed, Mar 5, 2025 at 11:02 AM Richard Guo  wrote:
> It seems to me that simple Var expressions in a subquery's target list
> also need to retain their separate identity in order to match grouping
> set columns after subquery pullup, not just non-var expressions.

I noticed the adjacent code that sets wrap_non_vars to true if we
are considering an appendrel child subquery, and I doubt this is
necessary.

 /*
  * If we are dealing with an appendrel member then anything that's not a
  * simple Var has to be turned into a PlaceHolderVar.  We force this to
  * ensure that what we pull up doesn't get merged into a surrounding
  * expression during later processing and then fail to match the
  * expression actually available from the appendrel.
  */
 if (containing_appendrel != NULL)
 rvcontext.wrap_non_vars = true;

As explained in e42e31243, the only part of the upper query that could
reference the appendrel child yet is the translated_vars list of the
associated AppendRelInfo that we just made for this child, and we do
not want to force use of PHVs in the AppendRelInfo (see the comment in
perform_pullup_replace_vars).  In fact, perform_pullup_replace_vars
sets wrap_non_vars to false before performing pullup_replace_vars on
the AppendRelInfo.  It seems to me that this makes the code shown
above unnecessary, and we can simply remove it.

Any thoughts?

Thanks
Richard

Re: [RFC] Lock-free XLog Reservation from WAL

2025-03-05 Thread Yura Sokolov

05.03.2025 08:39, Zhou, Zhiguo пишет:
> 
> 
> On 2/23/2025 8:03 PM, Yura Sokolov wrote:
>> 14.02.2025 11:41, Zhou, Zhiguo пишет:
>>>
>>>
>>> On 2/11/2025 9:25 AM, Japin Li wrote:
 On Mon, 10 Feb 2025 at 22:12, "Zhou, Zhiguo"  wrote:
> On 2/5/2025 4:32 PM, Japin Li wrote:
>> On Mon, 27 Jan 2025 at 17:30, "Zhou, Zhiguo"  
>> wrote:
>>> On 1/26/2025 10:59 PM, Yura Sokolov wrote:
 24.01.2025 12:07, Japin Li пишет:
> On Thu, 23 Jan 2025 at 21:44, Japin Li  wrote:
>> On Thu, 23 Jan 2025 at 15:03, Yura Sokolov
>>  wrote:
>>> 23.01.2025 11:46, Japin Li пишет:
 On Wed, 22 Jan 2025 at 22:44, Japin Li  wrote:
> On Wed, 22 Jan 2025 at 17:02, Yura Sokolov
>  wrote:
>> I believe, I know why it happens: I was in hurry making v2 by
>> cherry-picking internal version. I reverted some changes in
>> CalcCuckooPositions manually and forgot to add modulo
>> PREV_LINKS_HASH_CAPA.
>>
>> Here's the fix:
>>
>>  pos->pos[0] = hash % PREV_LINKS_HASH_CAPA;
>> -   pos->pos[1] = pos->pos[0] + 1;
>> +   pos->pos[1] = (pos->pos[0] + 1) % PREV_LINKS_HASH_CAPA;
>>  pos->pos[2] = (hash >> 16) % PREV_LINKS_HASH_CAPA;
>> -   pos->pos[3] = pos->pos[2] + 2;
>> +   pos->pos[3] = (pos->pos[2] + 2) % PREV_LINKS_HASH_CAPA;
>>
>> Any way, here's v3:
>> - excess file "v0-0001-Increase..." removed. I believe it was 
>> source
>>    of white-space apply warnings.
>> - this mistake fixed
>> - more clear slots strategies + "8 positions in two
>> cache-lines" strategy.
>>
>> You may play with switching PREV_LINKS_HASH_STRATEGY to 2 or 3
>> and see
>> if it affects measurably.
>
> Thanks for your quick fixing.  I will retest it tomorrow.
 Hi, Yura Sokolov
 Here is my test result of the v3 patch:
 | case  | min    | avg    | max
 |
 |---++
 +|
 | master (44b61efb79)   | 865,743.55 | 871,237.40 |
 874,492.59 |
 | v3    | 857,020.58 | 860,180.11 |
 864,355.00 |
 | v3 PREV_LINKS_HASH_STRATEGY=2 | 853,187.41 | 855,796.36 |
 858,436.44 |
 | v3 PREV_LINKS_HASH_STRATEGY=3 | 863,131.97 | 864,272.91 |
 865,396.42 |
 It seems there are some performance decreases :( or something I
 missed?

>>>
>>> Hi, Japin.
>>> (Excuse me for duplicating message, I found I sent it only to you
>>> first time).
>>>
>> Never mind!
>>
>>> v3 (as well as v2) doesn't increase NUM_XLOGINSERT_LOCKS itself.
>>> With only 8 in-progress inserters spin-lock is certainly better
>>> than any
>>> more complex solution.
>>>
>>> You need to compare "master" vs "master + NUM_XLOGINSERT_LOCKS=64" 
>>> vs
>>> "master + NUM_XLOGINSERT_LOCKS=64 + v3".
>>>
>>> And even this way I don't claim "Lock-free reservation" gives any
>>> profit.
>>>
>>> That is why your benchmarking is very valuable! It could answer, 
>>> does
>>> we need such not-small patch, or there is no real problem at all?
>>>
>
> Hi, Yura Sokolov
>
> Here is the test result compared with NUM_XLOGINSERT_LOCKS and the
> v3 patch.
>
> | case  | min  | avg  |
> max  | rate% |
> |---+--+--+--
> +---|
> | master (4108440)  | 891,225.77   | 904,868.75   |
> 913,708.17   |    |
> | lock 64   | 1,007,716.95 | 1,012,013.22 |
> 1,018,674.00 | 11.84 |
> | lock 64 attempt 1 | 1,016,716.07 | 1,017,735.55 |
> 1,019,328.36 | 12.47 |
> | lock 64 attempt 2 | 1,015,328.31 | 1,018,147.74 |
> 1,021,513.14 | 12.52 |
> | lock 128  | 1,010,147.38 | 1,014,128.11 |
> 1,018,672.01 | 12.07 |
> | lock 128 attempt 1    | 1,018,154.79 | 1,023,348.35 |
> 1,031,365.42 | 13.09 |
> | lock 128 attempt 2    | 1,013,245.56 | 1,018,984.78 |
> 1,023,696.00 | 12.61 |
> | lock 64 v3    | 1,010,893.30 | 1,022,787.25 |
> 1,029,200.26 | 13.03 |
> | lock 64 attempt 1 v3  | 1,014,961.21 | 1,019,745.09 |
> 1,025,511.62 | 12.70 |
> | lock 64 attempt 2 v3

Re: Statistics Import and Export: difference in statistics dumped

2025-03-05 Thread Ashutosh Bapat

On Tue, Mar 4, 2025 at 11:45 PM Jeff Davis  wrote:
>
> On Tue, 2025-03-04 at 10:28 +0530, Ashutosh Bapat wrote:
>
> > >
> > > What solution are you suggesting? The only one that comes to mind
> > > is
> > > moving everything to SECTION_POST_DATA, which is possible, but it
> > > seems
> > > like a big design change to satisfy a small detail.
> >
> > We don't have to do that. We can manage it by making statistics of
> > index dependent upon the indexes on the table.
>
> The index relstats are already dependent on the index definition. If
> you have a simple database like:
>
>CREATE TABLE t(i INT);
>INSERT INTO t SELECT generate_series(1,10);
>CREATE INDEX t_idx ON t (i);
>ANALYZE;
>
> and then you dump it, you get:
>
>
>--- SECTION_PRE_DATA ---
>
>CREATE TABLE public.t ...
>
>--- SECTION_DATA ---
>
>COPY public.t (i) FROM stdin;
>...
>SELECT * FROM pg_catalog.pg_restore_relation_stats(
> 'version', '18'::integer,
> 'relation', 'public.t'::regclass,
> 'relpages', '1'::integer,
> 'reltuples', '10'::real,
> 'relallvisible', '0'::integer
>);
>...
>
>--- SECTION_POST_DATA --
>
>CREATE INDEX t_idx ON public.t USING btree (i);
>SELECT * FROM pg_catalog.pg_restore_relation_stats(
> 'version', '18'::integer,
> 'relation', 'public.t_idx'::regclass,
> 'relpages', '2'::integer,
> 'reltuples', '10'::real,
> 'relallvisible', '0'::integer
>);
>
> (section annotations added for clarity)
>
> There is no problem with the index relstats, because they are already
> dependent on the index definition, and will be restored after the
> CREATE INDEX.
>
> The issue is when the table's restored relstats are different from what
> CREATE INDEX calculates, and then the CREATE INDEX overwrites the
> table's just-restored relation stats. The easiest way to see this is
> when restoring with --no-data, because CREATE INDEX will see an empty
> table and overwrite the table's restored relstats with zeros.
>
> If we view this issue as a dependency problem, then we'd have to make
> the *table relstats* depend on the *index definition*. If a table has
> any indexes, the relstats would need to go after the last index
> definition, effectively moving most relstats to SECTION_POST_DATA. The
> table's attribute stats would not be dependent on the index definition
> (because CREATE INDEX doesn't touch those), so they could stay in
> SECTION_DATA. And if the table doesn't have any indexes, then its
> relstats could also stay in SECTION_DATA. But then we have a mess, so
> we might as well just put all stats in SECTION_POST_DATA.
>
> But I don't see it as a dependency problem. When I look at the above
> SQL, it reads nicely to me and there's no obvious problem with it.

Thanks for explaining it. I

>
> If we want stats to be stable, we need some kind of mode to tell the
> server not to apply these kind of helpful optimizations, otherwise the
> issue will resurface in some form no matter what we do with pg_dump. We
> could invent a new mode, but autovacuum=off seems close enough to me.

Hmm. Updating the statistics without consuming more CPU is more
valuable when autovacuum is off it improves query plans with no extra
efforts. But if adding a new mode is some significant work, riding it
on top of autovacuum=off might ok. It's not documented either way, so
we could change that behaviour later if we find it troublesome.

-- 
Best Wishes,
Ashutosh Bapat

Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-03-05 Thread Nisha Moond

On Tue, Mar 4, 2025 at 8:05 PM Shubham Khanna
 wrote:
>
> The attached Patch contains the suggested changes.
>

Hi Shubham,

Here are few comments for 040_pg_createsubscriber.pl

1)
+# Run pg_createsubscriber on node S using '--cleanup-existing-publications'.
+# --verbose is used twice to show more information.

  1a) /node S/ node S1

  1b) Also, can we keep the comment in the same pattern as it was earlier -
# Run pg_createsubscriber on node S1. --verbose is used twice
# to show more information.
# In passing, also test the --enable-two-phase and
--cleanup-existing-publications
# options.

2)
+# Reuse P as primary
+# Set up node S2 as standby linking to node P
+$node_p->backup('backup_3');

/Reuse P as/ Reuse node P as/

3)
+$node_s2->append_conf(
+ 'postgresql.conf', qq[
+   primary_conninfo = '$pconnstr'
+   hot_standby_feedback = on
+   max_logical_replication_workers = 5
+   ]);

Do we need "hot_standby_feedback = on" on node_s2? I think we can remove it.

4)
+# Create user-defined publications
+$node_p->safe_psql($db3, "CREATE PUBLICATION test_pub3 FOR ALL TABLES;");
+$node_p->safe_psql($db3, "CREATE PUBLICATION test_pub4 FOR ALL TABLES;");

Can create both publications under one safe_psql as -

$node_p->safe_psql($db3, qq[CREATE PUBLICATION test_pub3 FOR ALL TABLES;
CREATE PUBLICATION test_pub4 FOR ALL TABLES;
]);

5)
+# Run pg_createsubscriber on node A without using
+# '--cleanup-existing-publications'.
+# --verbose is used twice to show more information.

  5a)  /node A/node S2/
  5b)  /without using '--cleanup-existing-publications' / without
'--cleanup-existing-publications' option

6)
+ ],
+ 'run pg_createsubscriber without --cleanup-existing-publications on node A'
+);

/node A/node S2/

7)
+# Drop the newly created publications
+$node_p->safe_psql($db3, "DROP PUBLICATION IF EXISTS test_pub3;");
+$node_p->safe_psql($db3, "DROP PUBLICATION IF EXISTS test_pub4;");

Similar to #4, use single safe_psql to drop both the publications.
OTOH, do we really need to drop the publications here? I think we can
remove this part since we're performing cleanup right after.


--
Thanks,
Nisha

Re: Log connection establishment timings

Hi,

On Tue, Mar 04, 2025 at 06:25:42PM -0500, Melanie Plageman wrote:
> Attached v9 implements log_connections as an enum with a new third
> value "timings".
> 
> On Mon, Mar 3, 2025 at 11:14 AM Bertrand Drouvot
>  wrote:
> >
> >
> > On Fri, Feb 28, 2025 at 05:52:35PM -0500, Melanie Plageman wrote:
> >
> > > We have to be really careful about when log_connections is actually
> > > set before we use it -- I fixed some issues with that.
> >
> > Good catch! Yeah for the EXEC_BACKEND case we need to wait that 
> > read_nondefault_variables()
> > in SubPostmasterMain() is executed.
> 
> Due to this and the bug Fujii-san reported with passing options to
> psql (process_startup_options() is called too late for us to use the
> value of log_connections as a gate for getting any of the timings),
> I've decided to get the time regardless of the setting of
> log_connections. This means we can be sure to have valid values at the
> end when the message needs to be emitted.

I think that makes sense given what has been said in [1].

> I was wondering if we should remove the backend type condition
> (B_WAL_SENDER/B_BACKEND) too. It is only guarding capturing the fork
> start and end time. That's just two measurements. Maybe we should just
> always get fork start and end time. Spending extra time in forking due
> to excess CPU activity (or some other reason) would be relevant for
> all backend types, not just wal sender and client backends. We do only
> want to log it in those cases, though...

Yeah, I'm on the fence for this one. OTOH that sounds strange to "collect"
data that won't be used for some backend types. So I'm tempted to vote to
keep it as it is.

> > +bool
> > +check_log_connections(char **newval, void **extra, GucSource source)
> > +{
> >
> > This function is pretty close to check_debug_io_direct() for example and its
> > overall content, memory management, looks good to me. I just have a few
> > following comments about it.
> 
> Yep, one more reason I think PGC_SET is a good idea. Code deduplication.

+1

> > +typedef enum ConnectionLogOption
> > +{
> > +   LOG_CONNECTION_RECEIVED = (1 << 0),
> > +   LOG_CONNECTION_AUTHENTICATED = (1 << 1),
> > +   LOG_CONNECTION_AUTHORIZED = (1 << 2),
> > +   LOG_CONNECTION_DISCONNECTED = (1 << 3),
> > +} ConnectionLogOption;
> >
> > I wonder if it would make sense to add LOG_CONNECTION_ALL here
> > (LOG_CONNECTION_RECEIVED | LOG_CONNECTION_AUTHENTICATED ..)
> >
> > That sounds a better place (than defining "all" in check_log_connections()) 
> > to
> > me. It's also how it is done in MonotonicFunction.
> 
> The attached patch

Thanks for the new patch!

> doesn't work exactly the same because it is a
> regular enum and not a string list (so no "all")

Yeah, does not make sense anymore.

>, but I do still have
> a LogConnectionOption. The user interface is that timings includes all
> the same messages as "on", however for the enum, I've defined it like
> this:
> 
> typedef enum LogConnectionOption
> {
> LOG_CONNECTION_BASIC = (1 << 0),
> LOG_CONNECTION_TIMINGS = (1 << 1),
> } LogConnectionOption;
> 
> and then in the enum array, I have this
> 
> static const struct config_enum_entry log_connections_options[] = {
> {"timings", LOG_CONNECTION_TIMINGS | LOG_CONNECTION_BASIC, false},

works for me.

Looking at the enum array:

"
static const struct config_enum_entry log_connections_options[] = {
{"off", 0, false},
{"on", LOG_CONNECTION_BASIC, false},
{"true", LOG_CONNECTION_BASIC, true},
{"false", 0, true},
{"yes", LOG_CONNECTION_BASIC, true},
{"no", 0, true},
{"timings", LOG_CONNECTION_TIMINGS | LOG_CONNECTION_BASIC, false},
{"1", LOG_CONNECTION_BASIC, true},
{"0", 0, true},
{NULL, 0, false}
};
"
and at parse_bool_with_len(), then it looks like it used to work with
log_connections set to things like "y, ye, yes, t, tr, tru, true, f, fa, fal,
fals, false, n and no" but now we want the full words.

I wonder if we should go so deep in the backward compatibility though. Maybe
that's worth to do if simple enough? (not sure how complicated it would be).

The case-insensitive is preserved, I mean "trUe" still works with the patch.

> I think the way I've done it makes sense because the "business logic"
> of how "timings" includes all the messages from "on" doesn't have to
> pollute the code and doesn't take away our ability to use the enum
> values independently.

Yeah, I like the way it's done.

> When I want to emit a regular logging message, I can just check
> if (log_connections & LOG_CONNECTION_BASIC)
> which will work for "timings" too, but without taking away the
> flexibility to use the enum values discretely.

yup.

> 
> > Also not sure if it's worth adding a "MAX" (like it's done for relopt_kind)
> > because we'd probably not go over it anyay.
> >
> > "
> > /* some compilers treat enums as signed ints, so we can't use 1 << 31 */
> > RELOPT_KIND_MAX = (1 << 30)
> 
> That actually seems kind of broken -- what

Re: Separate GUC for replication origins

2025-03-05 Thread Peter Eisentraut


On 11.02.25 21:25, Euler Taveira wrote:

Here is another patch that only changes the GUC name to
max_replication_origin_sessions.


I think the naming and description of this is still confusing.

What does this name mean?  There is (I think) no such thing as a 
"replication origin session".  So why would there be a maximum of those?


The description in the documentation, after the patch, says "Specifies 
how many replication origins (see Chapter 48) can be tracked 
simultaneously".  But Chapter 48 does not say anything about what it 
means for a replication slot to be tracked.  The only use of the word 
tracked in that chapter is to say that replication slots do the tracking 
of replication progress.


Both of these are confusing independently, but moreover we are now using 
two different words ("sessions" and "tracked") for apparently the same 
thing, but neither of which is adequately documented in those terms 
anywhere else.


I agree that the originally proposed name max_replication_origins is not 
good, because you can "create" (using pg_replication_origin_create()) 
more than the configured maximum.  What is the term for what the setting 
actually controls?  How many are "active"?  "In use"?  Per session?  etc.

Re: support fast default for domain with constraints

2025-03-05 Thread jian he

On Wed, Mar 5, 2025 at 11:13 AM Tom Lane  wrote:
>
> This patch appears to summarily throw away a couple of
> backwards-compatibility concerns that the previous round
> took care to preserve:
>
> * not throwing an error if the default would fail the domain
> constraints, but the table is empty so there is no need to
> instantiate the default.
>
hi. Thanks for pointing this out.
I noticed an empty table scarenio, but didn't check it thoroughly.
The attached patch preserves this backwards-compatibility.
now it's aligned with master behavior, i think.

main gotcha is:
ALTER TABLE ADD COLUMN...
If no explicitly DEFAULT, the defval either comes from pg_type.typdefaultbin,
or constructed via makeNullConst branch.
In that case, we need to use soft error evaluation, because we allow
these cases for an empty table;
In other cases, we can directly evaluate explicitly the DEFAULT clause.

> * not assuming that the domain constraints are immutable.
>
> Now it's fair to question how important the second point is
> considering that we mostly treat domain constraints as immutable
> elsewhere.  But I think the first point has actual practical uses
> --- for example, if you want to set things up so that inserts must
> specify that column explicitly.  So I don't think it's okay to
> discard that behavior.
>

in v2-0003. I created a new function:
bool DomainHaveVolatileConstraints(Oid type_id, bool *have_volatile)
within DomainHaveVolatileConstraints
i use contain_volatile_functions to test whether check_expr is volatile or not.
contain_volatile_functions won't be expensive, i think.

if true then have_volatile is set to true.
if have_volatile is true then we need table rewrite.
From 40364a9f3926a9ebc8cad4534ab2221e2a1d2574 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Wed, 5 Mar 2025 20:38:27 +0800
Subject: [PATCH v2 2/3] fast default for domain with constraints

This is primarily done by evaluating CoerceToDomain with soft error support.

If we evaluate CoerceToDomain to false, it means in ATExecAddColumn,
the defval node evaluation value cannot be cast to the domain.
However, we cannot fail at the Phase 2 stage in cases where the table is empty.

Therefore, if an error occurred while evaluation, do not raise the error,
we signal Phase 3 to do table rewrite, error will be raised on Phase 3.

Thanks to commit aaaf9449ec6be62cb0d30ed3588dc384f56274bf[1],
ExprState.escontext (ErrorSaveContext) was added, and ExecEvalConstraintNotNull,
ExecEvalConstraintCheck were changed to use errsave instead of hard error.
Now we can evaluate CoerceToDomain in a soft error way.

NOTE: this patch does not consider domain with volatile check constraints.

discussion: https://postgr.es/m/cacjufxe_+izbr1i49k_ahigpppwltji6km8nosc7fwvkdem...@mail.gmail.com

[1]: https://git.postgresql.org/cgit/postgresql.git/commit/?id=aaaf9449ec6be62cb0d30ed3588dc384f56274bf
---
 src/backend/commands/tablecmds.c   | 43 --
 src/test/regress/expected/fast_default.out | 91 ++
 src/test/regress/sql/fast_default.sql  | 57 ++
 3 files changed, 186 insertions(+), 5 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8826ca5c32c..f7c8348c7ff 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -7331,6 +7331,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
 	 * NULL if so, so without any modification of the tuple data we will get
 	 * the effect of NULL values in the new column.
 	 *
+	 *  this para should be removed?
 	 * An exception occurs when the new column is of a domain type: the domain
 	 * might have a not-null constraint, or a check constraint that indirectly
 	 * rejects nulls.  If there are any domain constraints then we construct
@@ -7358,6 +7359,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
 	{
 		bool		has_domain_constraints;
 		bool		has_missing = false;
+		bool		no_explicit_defval = false;

 		/*
 		 * For an identity column, we can't use build_column_default(),
@@ -7375,6 +7377,21 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		else
 			defval = (Expr *) build_column_default(rel, attribute->attnum);

+		/*
+		 * if defval is there, atthasdef is false, that means the defval comes
+		 * from domain default value, no explicit DEFAULT expression specified.
+		 * In that case we need evaluate defval error safe way, so
+		 * domain specification such as ``check(value > 10) default 8 ``
+		 * can be added to empty table.
+		*/
+		if (defval)
+		{
+			TupleDesc	rd_att = rel->rd_att;
+			Form_pg_attribute att_tup = TupleDescAttr(rd_att, attribute->attnum - 1);
+			if (!att_tup->atthasdef)
+no_explicit_defval = true;
+		}
+
 		/* Build CoerceToDomain(NULL) expression if needed */
 		has_domain_constraints = DomainHasConstraints(attribute->atttypid);
 		if (!defval && has_domain_constraints)
@@ -7397,6 +7414,12 @@ ATExecAddColumn(List **wqueue,

Re: Patch: Cover POSITION(bytea,bytea) with tests

2025-03-05 Thread Ilia Evdokimov




On 27.02.2025 17:40, Aleksander Alekseev wrote:

Hi,

This function is currently not covered by any tests. The proposed
patch fixes this.

--
Best regards,
Aleksander Alekseev



Hi,

+1. The 'POSISTION' tests include 'text', 'bits' types, but not 'bytea'.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: doc: expand note about pg_upgrade's --jobs option

2025-03-05 Thread Magnus Hagander

On Wed, Mar 5, 2025 at 11:00 AM Daniel Gustafsson  wrote:

> > On 4 Mar 2025, at 19:08, Nathan Bossart 
> wrote:
>
> > The attached patch replaces the "et cetera" with those two general
> categories.
>
> LGTM.
>

Another option that I think would also work is to just cut down the details
to just "The --jobs option allows multiple CPU cores to be
used".

I think this is also slightly confusing, but maybe that's a
non-native-english thing: "a good place to start is the maximum of the
number of  CPU cores and tablespaces.". Am I supposed to set it to
max(cpucores, ntablespaces) or to max(cpucores+ntablespaces)?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: per backend WAL statistics

Hi,

On Wed, Mar 05, 2025 at 05:45:57PM +0800, Xuneng Zhou wrote:
> Subject: Clarification Needed on WAL Pending Checks in Patchset
> 
> Hi,
> 
> Thank you for the patchset. I’ve spent some time learning and reviewing it
> and have 2 comments.

Thanks for looking at it!

> I noticed that in patches prior to patch 6, the function
> pgstat_backend_wal_have_pending() was implemented as follows:
> 
> /*
>  * To determine whether any WAL activity has occurred since last time, not
>  * only the number of generated WAL records but also the numbers of WAL
>  * writes and syncs need to be checked. Because even transactions that
>  * generate no WAL records can write or sync WAL data when flushing the
>  * data pages.
>  */
> static bool
> pgstat_backend_wal_have_pending(void)
> {
> PgStat_PendingWalStats pending_wal;
> 
> pending_wal = PendingBackendStats.pending_wal;
> 
> return pgWalUsage.wal_records != prevBackendWalUsage.wal_records ||
>pending_wal.wal_write != 0 || pending_wal.wal_sync != 0;
> }
> 
> Starting with patch 7, it seems the implementation was simplified to:
> 
> /*
>  * To determine whether WAL usage happened.
>  */
> static bool
> pgstat_backend_wal_have_pending(void)
> {
> return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
> }

That's right. This is due to 2421e9a51d2 that removed PgStat_PendingWalStats.

> Meanwhile, the cluster-level check in the function
> pgstat_wal_have_pending_cb() still performs the additional checks:
> 
> bool
> pgstat_wal_have_pending_cb(void)
> {
> return pgWalUsage.wal_records != prevWalUsage.wal_records ||
>PendingWalStats.wal_write != 0 ||
>PendingWalStats.wal_sync != 0;
> }

Not since 2421e9a51d2. It looks like that you are looking at code prior to
2421e9a51d2.

> Another one is on:
> Bertrand Drouvot  于2025年3月3日周一 18:47写道：
> 
> > Hi,
> >
> > On Mon, Mar 03, 2025 at 09:17:30AM +, Bertrand Drouvot wrote:
> > > hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents
> > things
> > > that need to be called from pgstat_report_wal(). But I think that's open
> > > door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX is
> > not
> > > related to pgstat_report_wal() at all. So, I'm tempted to keep it as it
> > is.
> >
> > I just realized that pgstat_flush_backend() is not correct in 0001. Indeed
> > we check:
> >
> > "
> > if (pg_memory_is_all_zeros(&PendingBackendStats,
> >sizeof(struct PgStat_BackendPending)))
> > return false;
> > "
> >
> > but the WAL stats are not part of PgStat_BackendPending... So we only check
> > for IO pending stats here. I'm not sure WAL stats could be non empty if IO
> > stats are but the attached now also takes care of pending WAL stats here.
> 
> I've noticed that here's a function for checking if there are any backend
> stats waiting for flush:
> /*
>  * Check if there are any backend stats waiting for flush.
>  */
> bool
> pgstat_backend_have_pending_cb(void)
> {
> return (!pg_memory_is_all_zeros(&PendingBackendStats,
> sizeof(struct PgStat_BackendPending)));
> }

That's right. 

The reason I did not add the extra check there is because I have in mind to
replace the pg_memory_is_all_zeros() checks by a new global variable and also 
replace
the checks in pgstat_flush_backend() by a call to 
pgstat_backend_have_pending_cb()
(see 0002 in [1]). It means that all of that would be perfectly clean if 
0002 in [1] goes in.

But yeah, if 0002 in [1] does not go in, then your concern is valid, so adding
the extra check in the attached.

Thanks for the review!

[1]: 
https://www.postgresql.org/message-id/Z8WYf1jyy4MwOveQ%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From c9500ba1ed93f97aee2f9f4a97bcbfaaff998cad Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 6 Jan 2025 10:00:00 +
Subject: [PATCH v14] per backend WAL statistics

Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.

This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity.  A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.

The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.

XXX: bump catalog version
XXX: bump of stats file format not required, as backend stats do not
persist on disk.
---
 doc/src/sgml/monitoring.sgml| 19 ++
 src/backend/utils/activity/pgstat_backend.c | 70 -
 src/backend/utils/activity/pgstat_wal.c |  1 +
 src/backend/utils/adt/pgstatfuncs.c | 26

Re: Should we add debug_parallel_query=regress to CI?

Andres Freund  writes:
> On 2025-03-05 11:19:46 -0500, Tom Lane wrote:
>> However, we seem to be moving towards a situation where each type of CI run
>> is a special snowflake that differs in multiple dimensions from other types.
>> That might make it difficult to figure out which dimension is responsible
>> for a particular failure.

> True, but I don't really see an alternative. Having dedicated tasks for
> testing just debug_parallel_query=regress (and about half a dozen other
> things) on their own seems like a *lot* of resource usage for the gain.

Agreed.

> I guess we could be add a "standardized" section at the top of each task
> describing their oddities? Not sure it's worth it.

I think this does need to be documented somewhere/somehow, just so
that people don't waste time focusing on "it's failing on FreeBSD"
when the actual cause is some other thing we happened to load
onto that task.

regards, tom lane

Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

Hi,

On 2025-03-05 08:16:45 -0800, Jacob Champion wrote:
> From efc9fc3b3993601e9611131f229fbcaf4daa46f1 Mon Sep 17 00:00:00 2001
> From: Michael Paquier 
> Date: Wed, 5 Mar 2025 13:30:43 +0900
> Subject: [PATCH 1/2] Fix race condition in pre-auth test
> 
> ---
>  src/test/authentication/t/007_pre_auth.pl | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/src/test/authentication/t/007_pre_auth.pl 
> b/src/test/authentication/t/007_pre_auth.pl
> index a638226dbaf..90aaea4b5a6 100644
> --- a/src/test/authentication/t/007_pre_auth.pl
> +++ b/src/test/authentication/t/007_pre_auth.pl
> @@ -43,12 +43,14 @@ $psql->query_safe("SELECT 
> injection_points_attach('init-pre-auth', 'wait')");
>  # authentication. Use the $psql connection handle for server interaction.
>  my $conn = $node->background_psql('postgres', wait => 0);
>  
> -# Wait for the connection to show up.
> +# Wait for the connection to show up in pg_stat_activity, with the wait_event
> +# of the injection point.
>  my $pid;
>  while (1)
>  {
>   $pid = $psql->query(
> - "SELECT pid FROM pg_stat_activity WHERE state = 'starting';");
> + qq{SELECT pid FROM pg_stat_activity
> +  WHERE state = 'starting' and wait_event = 'init-pre-auth';});
>   last if $pid ne "";

Unrelated to the change in this patch, but tests really shouldn't use while(1)
loops without a termination condition. If something is wrong, the test will
hang indefinitely, instead of timing out.  On the buildfarm that can take out
an animal if it hasn't configured a timeout (with autoconf at least, meson
terminates tests after a timeout).

I guess you can't use poll_query_until() here, but in that case you should
copy some of the timeout logic. Or, perhaps better, add a poll_query_until()
to BackgroundPsql.pm.

Greetings,

Andres Freund

Re: making EXPLAIN extensible

It would be good to clarify whether this is for (a) experimenting with
explain options that might be useful in core some day; or (b) special
developer-only options that would never be useful in core; or (c)
production-grade explain options specific to an extension.

On Tue, 2025-03-04 at 16:23 -0500, Robert Haas wrote:
> Where I see this being more useful is for people who want to display
> additional information for plan nodes that they did not implement.
> For
> example, my EXPLAIN (RANGE_TABLE) option dumps every
> range-table-related fact it can find in the Plan tree.

That sounds like (b) or maybe (a).

But the first motivation you listed when introducing the patch was: "It
wouldn't make sense for core to have an EXPLAIN option whose whole
purpose is to cater to the needs of some extension, so that made me
think of providing some extensibility infrastructure."

which sounds like (c).

And if (c) is part of the intended use, but not CustomScan or TableAM,
then what kind of extensions would need extension-specific explain
options?

I am not trying to push the patch in any particular direction. On the
contrary, I'd just like to know the scope of the feature so that I can
stop accidentally pushing it in some direction by asking questions
about out-of-scope use cases.

Regards,
Jeff Davis

Re: making EXPLAIN extensible

2025-03-05 Thread Robert Haas

On Wed, Mar 5, 2025 at 12:52 PM Jeff Davis  wrote:
> It would be good to clarify whether this is for (a) experimenting with
> explain options that might be useful in core some day; or (b) special
> developer-only options that would never be useful in core; or (c)
> production-grade explain options specific to an extension.

I think it could be any of these.

> And if (c) is part of the intended use, but not CustomScan or TableAM,
> then what kind of extensions would need extension-specific explain
> options?
>
> I am not trying to push the patch in any particular direction. On the
> contrary, I'd just like to know the scope of the feature so that I can
> stop accidentally pushing it in some direction by asking questions
> about out-of-scope use cases.

Heh, no problem. I see it probably being most useful for extensions
that touch the planner in some way that cuts across multiple node
types. We don't have a lot of those in contrib yet, but that's partly
because we don't have much infrastructure to support them.

Alena Rybakina gives an example upthread: "while writing the AQO
extension [0] we were just adding a hook (we called it
ExplainOnePlan_hook) similar to this one to add a description for the
node used in the plan, in particular its identifier and the
cardinality that is used during query planning." Andrei Lephikov's
comment upthread suggests similar things: "I write optimisation
helpers, and it is necessary to show which node was influenced and
how".

Another example - and the one that motivated this work - was my
proposal 
http://postgr.es/m/CA+TgmoZY+baV-T-5ifDn6P=L=av-vkvbrpmi0tqkceq-5fi...@mail.gmail.com
- this is an offshoot of that work, several steps removed.

I'm not sure if I'm being clear here, but the difference between these
examples and the ones you proposed is whether what you're adding to
the planner in your extension looks more like a new type of node or
more like a new overall feature. A Custom Scan node or a new table AM
is a new kind of thing that you can scan; those sorts of examples
probably want to solve their problems in some other way, though they
could use this infrastructure if they really wanted. On the other
hand, Alena's example of adaptive query optimization or my example of
letting extensions nudge the planner are a new type of optimizer
capability that happens to live in an extension. They don't show up in
a specific place in the plan; they affect things at some higher level.
That kind of thing is where I expect this infrastructure to be high
value.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

2025-03-05 Thread Matthias van de Meent

On Wed, 5 Mar 2025 at 10:04, Heikki Linnakangas  wrote:
>
> On 28/02/2025 03:53, Peter Geoghegan wrote:
> > On Sat, Feb 8, 2025 at 8:47 AM Michail Nikolaev
> >  wrote:
> >> Just some commit messages + few cleanups.
> >
> > I'm worried about this:
> >
> > +These longer pin lifetimes can cause buffer exhaustion with messages like 
> > "no
> > +unpinned buffers available" when the index has many pages that have similar
> > +ordering; but future work can figure out how to best work that out.
> >
> > I think that we should have some kind of upper bound on the number of
> > pins that can be acquired at any one time, in order to completely
> > avoid these problems. Solving that problem will probably require GiST
> > expertise that I don't have right now.
>
> +1. With no limit, it seems pretty easy to hold thousands of buffer pins
> with this.
>
> The index can set IndexScanDesc->xs_recheck to indicate that the quals
> must be rechecked. Perhaps we should have a similar flag to indicate
> that the visibility must be rechecked.
>
> Matthias's earlier patch
> (https://www.postgresql.org/message-id/CAEze2Wg1kbpo_Q1%3D9X68JRsgfkyPCk4T0QN%2BqKz10%2BFVzCAoGA%40mail.gmail.com)
> had a more complicated mechanism to track the pinned buffers. Later
> patch got rid of that, which simplified things a lot. I wonder if we
> need something like that, after all.

I dropped that because it effectively duplicates the current
per-backend pin tracking system. Adding it back in will probably
complicate matters by a lot again.

> Here's a completely different line of attack: Instead of holding buffer
> pins for longer, what if we checked the visibility map earlier? We could
> check the visibility map already when we construct the
> GISTSearchHeapItem, and set a flag in IndexScanDesc to tell
> IndexOnlyNext() that we have already done that. IndexOnlyNext() would
> have three cases:

I don't like integrating a heap-specific thing like VM_ALL_VISIBLE()
to indexes, but given that IOS code already uses that exact code my
dislike is not to the point of a -1. I'd like it better if we had a
TableAM API for higher-level visibility checks (e.g.
table_tids_could_be_invisible?()) which gives us those responses
instead; dropping the requirement to maintain VM in pg's preferred
format to support efficient IOS.

I am a bit worried about even more random IO happening before we've
returned even a single tuple, but that's probably much less of an
issue than "unlimited pins".

With VM-checking in the index, we would potentially have another
benefit: By checking all tids on the page at once, we can deduplicate
and reduce the VM lookups. The gains might not be all that impressive,
but could be significant in certain hot cases.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: Hook for Selectivity Estimation in Query Planning

2025-03-05 Thread Aleksander Alekseev

Andrei, Matthias,

> Could you explain why you think the Pluggable TOASTer proposal was similar?
> [...]

I merely pointed out that adding hooks without any particular value
for the Postgres users was criticized before, see for instance:

https://www.postgresql.org/message-id/20230206104917.sipa7nzue5lw2e6z%40alvherre.pgsql

One could argue - but wait, isn't TAM for instance just a bunch of
hooks in a nutshell? How do we distinguish a well-documented and more
or less stable API for the extension authors from a random hook put in
a convenient place? That's a good question. I don't have an answer to
it. This being said, the proposed patch doesn't strike me as a good or
documented API, or the one that is going to be stable in the long run.

> [...]
>
> Overall, I see that new hooks allow new [sometimes] open-source projects
> and startups to emerge - not sure about enterprises' benefits.
> Therefore, I'm not convinced by your current justification. Are there
> any technical objections?

There is no point in debating about good and evil or right and wrong.
The only important question is whether there will be a committer
willing to accept the proposed change considering its controversy.

-- 
Best regards,
Aleksander Alekseev

Re: Should work_mem be stable for a prepared statement?

2025-03-05 Thread James Hunter

On Fri, Feb 28, 2025 at 2:34 PM Tom Lane  wrote:
>
> Sami Imseih  writes:
> > However, I think any GUC that can influence the planner
> > should be considered for consistency in behavior.
> > It was mentioned above with the enable_* GUCs, but another
> > one I can think of is the max_parallel_workers_per_gather which
> > should then force a re-plan if changed. I have seen users need to turn
> > that off in a hurry when it impacts their oltp workload.
>
> The argument for treating work_mem specially is that it has effects at
> both plan time and run time, so that the planner's cost assumptions
> are invalidated if the executor uses a different value than the
> planner did.  I don't believe that argument applies to anything else;
> certainly it doesn't apply to the enable_* flags.

I take the view:
 * for the plan time effects, the point of a prepared statement is to
plan exactly once, at the time the statement is prepared. So I don't
think it makes sense to replan when a GUC changes. (So, no to option
(b).)
* for the run time effects, the current run time state of the world
should control. (So, no to option (a).)

But I would leave (a) open as an option for extensions.

>  I'm also not
> convinced that the argument requires us to solve the problem by
> re-planning.  It would work just as well to stamp each PlannedStmt
> with the value that the planner used and refer to that in the
> executor instead of looking directly at the GUC.

I propose something similar, in [1], except that instead of stamping
the PlannedStmt with a single (pair of) GUC value, I stamp it with a
separate value for every Plan or SubPlan that needs it. And then a
hook / extension can impose Jeff's option (a), which is what you
describe here.

> This is all kind of moot though, now that Jeff has revealed that
> what he's really interested in is some sort of refactoring.
> Maybe that refactoring is one that would conveniently apply to
> other GUCs, or maybe it isn't.  I'm content to await details
> before arguing about what we "should" do.

I think refactoring is good (as you'd expect, given my proposal!), but
-- expanding on what you wrote above -- I think it's also important to
consider "plan time effects" and "run time effects" separately. The
same GUC / field / whatever might sometimes be used at both plan and
run time, but conceptually these are different.

A plan-time setting is intended to bias the planner so that it chooses
certain classes of plans over others. A plan-time setting is "wrong"
only so far as it causes the planner to choose a sub-optimal plan.

A run-time setting reflects the current state of the PostgreSQL
instance, at the time the query is run.

PQ is a good example. At plan time, we choose PQ, and place a Gather
node on the Plan, based on an assumption
(max_parallel_workers_per_gather) about how much parallelism we'll
actually get at run-time. And then, at run-time, we assign actual
workers based on max_parallel_workers and other queries running at the
same time.

It may turn out that we can't actually get *any* actual workers, at
runtime; so then we have added a Gather node for nothing. This has a
non-zero cost. If we had known, at plan time, that we wouldn't get any
parallel workers, then we should have chosen a serial plan. But, so
what? Planning is always an estimate.

Yeah, "Gather" overhead is lower than the cost of choosing a
sub-optimal join method, but I think the same principle applies:
* at plan time, we make the best estimate and prediction we can, based
on info available at that time;
* at run time, we make the best decision we can, based on the actual
state of the database instance.

And in cases where we think, at runtime, that the planner's
assumptions were so wrong that they'll lead us to execute a
sub-optimal plan, then maybe we can re-plan. But I explicitly wouldn't
re-plan a prepared statement, since the customer's expectation is that
it has already been prepared.

James Hunter

[1] 
https://www.postgresql.org/message-id/flat/CAJVSvF5n3_uEGW5GZSRehDuTfz7XVDohbn7tVJ%2B2ZnweQEVFrQ%40mail.gmail.com#abc6e69a396bb9f6505bf33260670a1f

Re: making EXPLAIN extensible

Robert Haas  writes:
> OK. It sounds to me like there is a good amount of support for going
> forward with something like what I have, even though some people might
> also like other things. What I feel is currently lacking is some
> review of the actual code. Would anyone like to do that?

Here's some comments on 0001 and 0002; I didn't have time for
0003 today.  But in any case, I think you should move forward
with committing 0001/0002 soon so other people can play around
in this space.  0003 can be left for later.

0001:

GetExplainExtensionState and SetExplainExtensionState should probably
have guards against extension_id < 0, even if it's just an Assert.
Or make the id unsigned?

SetExplainExtensionState is repalloc'ing ExplainExtensionNameArray;
why?  Wouldn't that invalidate any other pointers to the array?
It disturbs me that you have ExplainState carrying what could
easily become a stale pointer to that array; I'd try to get rid
of that.

RegisterExtensionExplainOption has "char *option_name", but I think
that should be "const char *", and the function should have the same
disclaimer as GetExplainExtensionId about that string needing to be
constant-or-never-freed.

This bit in explain.h seems non-idiomatic:

+struct ExplainState;
+typedef struct ExplainState ExplainState;

AFAIK it's sufficient to write the typedef line.  Also I'd
strongly recommend a comment saying that the struct is
defined in explain_state.h.

explain_state.h has the same pattern:

+struct ExplainState;
+typedef struct ExplainState ExplainState;

This is a little more problematic, because I'm pretty sure some
compilers will kvetch when they see the duplicate typedefs.
You need to either (a) write the typedef in only one file,
and use "struct ExplainState" in the other file, or (b)
use some dance like

#ifndef EXPLAINSTATE_TYPEDEF_DEFINED
typedef struct ExplainState ExplainState;
#define EXPLAINSTATE_TYPEDEF_DEFINED 1
#endif

in both files.  If memory serves we do have some headers using
the (b) pattern, but I don't love it particularly.

0002:

I'm fine with the order of additions being determined by module
load order.  Users who are feeling picky about that can arrange
the module load order as they wish.  If we put in a priority
mechanism then the order would be determined by module authors,
who probably don't want the responsibility, and not by the users
whose results are actually affected.

I'm quite confused by the #include additions in auto_explain.c and
file_fdw.c, and I strongly object to the ones in explain_state.h.
Surely those are unnecessary?

Anyway, these are all very minor concerns; overall I think
it's going in the right direction.

regards, tom lane

Re: explain plans for foreign servers

On Wed, 2025-02-26 at 13:13 -0600, Sami Imseih wrote:
> 1/ The use of NOTICE to propagate the explain plan.
> I see the message content is checked, but this does not look robust
> and could lead to
> some strange results if another ExecutorRun hook emits a similar
> notice message.

Fundamentally, EXPLAIN ANALYZE needs to return two result sets for this
patch to work: the ordinary result, and the EXPLAIN ANALYZE result. The
current patch hacks around that by returning the ordinary result set
from the foreign server, and then returning the EXPLAIN ANALYZE result
as a NOTICE.

Ideally, we'd have EXPLAIN ANALYZE return two result sets, kind of like
how a query with a semicolon returns two result sets. That changes the
expected message flow for EXPLAIN ANALYZE, though, so we'd need a new
option so we are sure the client is expecting it (is this a sane
idea?). I wonder if Robert's extensible EXPLAIN work[1] could be useful
here? We'd also need a DestReceiver capable of returning two result
sets. These problems sound solvable, but would require some more
discussion.

> What if we do something like a new EXPLAIN option which returns all
> the rows
> back to the client, and then writes out the plan to some local
> memory.

That's another idea, but I am starting to think returning two result
sets from EXPLAIN ANALYZE would be generally useful. 

Regards,
Jeff Davis

[1]
https://www.postgresql.org/message-id/CA%2BTgmoYSzg58hPuBmei46o8D3SKX%2BSZoO4K_aGQGwiRzvRApLg%40mail.gmail.com

Re: Hook for Selectivity Estimation in Query Planning

2025-03-05 Thread Aleksander Alekseev

Hi,

> I would like to discuss the introduction of a hook for evaluating the
> selectivity of an expression when searching for an optimal query plan.
> This topic has been brought up in various discussions, for example, in [1].
>
> [...]

As I vaguely recall recent proposals like this ("Pluggable TOASTer" to
name one) this approach was criticised. Hooks per se don't add value
for the end user. They only put the burden of maintaining them on the
community while all the real features are implemented in proprietar
extensions. If you believe something is missing in Postgres,
contribute it to the upstream so that anyone will benefit from it.

Personally I agree with the sentiment, thus -1.

-- 
Best regards,
Aleksander Alekseev

Re: is git.postgresql.org working fine?

2025-03-05 Thread Aleksander Alekseev

Hi,

> > Is it only me?
>
> I've had trouble in the past with cloning on old/slow machines --- it
> seems to time out after awhile.  But AFAIR the symptom was an error
> message not "stalling", so maybe that's unrelated.
>
> Anyway, I tried it just now and didn't observe any problem:
>
> [...]

For me works fine too:

```
$ time git clone git://git.postgresql.org/git/postgresql.git
Cloning into 'postgresql'...
remote: Enumerating objects: 20275, done.
remote: Counting objects: 100% (20275/20275), done.
remote: Compressing objects: 100% (12231/12231), done.
remote: Total 1048219 (delta 14543), reused 10289 (delta 7971),
pack-reused 1027944
Receiving objects: 100% (1048219/1048219), 351.78 MiB | 1.30 MiB/s, done.
Resolving deltas: 100% (904606/904606), done.
Updating files: 100% (7096/7096), done.

real4m53.197s
user1m51.879s
sys0m26.096s

$ git --version
git version 2.48.1
```

-- 
Best regards,
Aleksander Alekseev

Re: Should we add debug_parallel_query=regress to CI?

2025-03-05 Thread Peter Geoghegan

On Wed, Mar 5, 2025 at 11:06 AM Andres Freund  wrote:
> Post-commit issues due to debug_parallel_query=regress seem rather common,
> surely not helped by CI/cfbot not flagging them. I wonder if we ought to make
> one of the CI tasks use debug_parallel_query=regress, to avoid that problem?

That would certainly be nice.

-- 
Peter Geoghegan

Re: Enhance file_fdw to report processed and skipped tuples in COPY progress

2025-03-05 Thread Fujii Masao





On 2024/11/30 15:23, Kirill Reshke wrote:

On Fri, 11 Oct 2024 at 06:53, Fujii Masao  wrote:

However, this issue already exists without the proposed patch.
Since file_fdw already reports progress partially, querying multiple
file_fdw tables can lead to inaccurate or confusing progress reports.
You can even observe this when analyzing a file_fdw table and also
when copying to the table with a trigger that executes progress-reporting
commands.

So, I don’t think this issue should block the proposed patch.
In fact, progress reporting is already flawed in these scenarios,
regardless of whether the patch is applied.


On second thought, supporting progress tracking for COPY used by file_fdw
could increase the chances of multiple commands being tracked simultaneously
by a single backend. This means the command progress view might show
incorrect results more often.

As I mentioned before, this issue already exists, but it currently
only happens in rare cases. I don’t think the fact that the issue
already exists is a good reason to introduce more, and likely more common,
scenarios where it could occur.

With that in mind, I'm thinking of withdrawing this patch for now.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

2025-03-05 Thread Jacob Champion

On Wed, Mar 5, 2025 at 5:47 AM Jacob Champion
 wrote:
>
> So while we're at it, should we add a
> `backend_type = 'client backend'` filter to stop that from flaking in
> the future? That would further align this query with the
> wait_for_event() implementation.

More concretely: here's a squashable patchset with that suggestion,
for the CI to chew on.

--Jacob


0001-Fix-race-condition-in-pre-auth-test.patch
Description: Binary data


0002-squash-Fix-race-condition-in-pre-auth-test.patch
Description: Binary data

Should we add debug_parallel_query=regress to CI?

Hi,

In 
https://postgr.es/m/CAH2-WzkYXSnM60ZNo-vQLxFoGzHLHFD0x%3DiPHF6VGxiZmWUuwQ%40mail.gmail.com
Peter wrote:

On 2025-03-05 09:37:05 -0500, Peter Geoghegan wrote:
> Committed just now. Thanks again.

But since had to revert, due to BF issues:

commit d00107cd63e780753aa25563fa37603369997d0c
Author: Peter Geoghegan 
Date:   2025-03-05 10:27:31 -0500

Revert "Show index search count in EXPLAIN ANALYZE."

This reverts commit 5ead85fbc81162ab1594f656b036a22e814f96b3.

This commit shows test failures with debug_parallel_query=regress.  The
underlying issue needs to be debugged, so revert for now.


Post-commit issues due to debug_parallel_query=regress seem rather common,
surely not helped by CI/cfbot not flagging them. I wonder if we ought to make
one of the CI tasks use debug_parallel_query=regress, to avoid that problem?

Greetings,

Andres

Re: Allow database owners to CREATE EVENT TRIGGER

2025-03-05 Thread Isaac Morland

On Wed, 5 Mar 2025 at 10:28, Tom Lane  wrote:

> I wrote:
> > Or in other words: not-superuser to superuser is far from the only
> > type of privilege escalation that we need to prevent.
>
> After reflecting on that for a moment: maybe say that an event trigger
> fires for queries that are run by a role that the trigger's owning
> role is a member of?  That changes nothing for superuser-owned
> triggers.
>

Can somebody remind me why triggers don't run as their owner in the first
place?

It would make triggers way more useful, and eliminate the whole issue of
trigger owners escalating to whomever tries to access the object on which
the trigger is defined.

Re: Should we add debug_parallel_query=regress to CI?

Andres Freund  writes:
> Post-commit issues due to debug_parallel_query=regress seem rather common,
> surely not helped by CI/cfbot not flagging them. I wonder if we ought to make
> one of the CI tasks use debug_parallel_query=regress, to avoid that problem?

Yeah, it certainly seems like a test coverage gap.  However, we seem to
be moving towards a situation where each type of CI run is a special
snowflake that differs in multiple dimensions from other types.
That might make it difficult to figure out which dimension is
responsible for a particular failure.

(OTOH, the same can be said of the buildfarm, and we've survived
regardless.  So maybe I'm worried over nothing.)

regards, tom lane

Re: doc: expand note about pg_upgrade's --jobs option

On Wed, Mar 05, 2025 at 09:35:27AM -0600, Nathan Bossart wrote:
> On Wed, Mar 05, 2025 at 01:52:40PM +0100, Magnus Hagander wrote:
>> Another option that I think would also work is to just cut down the details
>> to just "The --jobs option allows multiple CPU cores to be
>> used".
> 
> That's fine with me.  It's probably not particularly actionable
> information, anyway.  If anything, IMHO we should make it clear to users
> that the parallelization is per-database (except for file transfer, which
> is per-tablespace).  If you've just got one big database in the default
> tablespace, --jobs won't help.
> 
>> I think this is also slightly confusing, but maybe that's a
>> non-native-english thing: "a good place to start is the maximum of the
>> number of  CPU cores and tablespaces.". Am I supposed to set it to
>> max(cpucores, ntablespaces) or to max(cpucores+ntablespaces)?
> 
> I've always read it to mean the former.  But I'm not sure that's great
> advice.  If you have 8 cores and 100 tablespaces, does it make sense to use
> --jobs=100?  Ordinarily, I'd suggest the number of cores as the starting
> point.

Here's another attempt at the patch based on the latest discussion.

-- 
nathan
>From e7f8633672530fb06dac72271cda429ad873a640 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Mar 2025 10:19:27 -0600
Subject: [PATCH v2 1/1] doc: Adjust note about pg_upgrade's --jobs option.

Reported-by: Magnus Hagander 
Reviewed-by: ???
Discussion: https://postgr.es/m/Z8dBn_5iGLNuYiPo%40nathan
---
 doc/src/sgml/ref/pgupgrade.sgml | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 6ca20f19ec2..10911d81174 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -565,12 +565,11 @@ NET STOP postgresql-&majorversion;
 
 
 
- The --jobs option allows multiple CPU cores to be used
- for copying/linking of files, dumping and restoring database schemas
- in parallel, etc.;  a good place to start is the maximum of the number of
- CPU cores and tablespaces.  This option can dramatically reduce the
- time to upgrade a multi-database server running on a multiprocessor
- machine.
+ Setting --jobs to 2 or higher allows pg_upgrade to
+ process multiple databases and tablespaces in parallel.  A good starting
+ point is the number of CPU cores on the machine.  This option can
+ substantially reduce the upgrade time for multi-database and
+ multi-tablespace servers.
 
 
 
-- 
2.39.5 (Apple Git-154)

Re: making EXPLAIN extensible

2025-03-05 Thread Robert Haas

On Wed, Mar 5, 2025 at 12:20 PM Jeff Davis  wrote:
> I didn't look into the technical details to see what might be required
> to allow that kind of collaboration, and I am not suggesting you
> redesign the entire feature around that idea.

OK. It sounds to me like there is a good amount of support for going
forward with something like what I have, even though some people might
also like other things. What I feel is currently lacking is some
review of the actual code. Would anyone like to do that?

Here is a v2 with some documentation and regression tests for
pg_overexplain added.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v2-0002-Add-some-new-hooks-so-extensions-can-add-details-.patch
Description: Binary data

v2-0003-pg_overexplain-Additional-EXPLAIN-options-for-deb.patch
Description: Binary data

v2-0001-Make-it-possible-for-loadable-modules-to-add-EXPL.patch
Description: Binary data

Re: Should we add debug_parallel_query=regress to CI?

Hi,

On 2025-03-05 12:29:15 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > I guess we could be add a "standardized" section at the top of each task
> > describing their oddities? Not sure it's worth it.
> 
> I think this does need to be documented somewhere/somehow, just so
> that people don't waste time focusing on "it's failing on FreeBSD"
> when the actual cause is some other thing we happened to load
> onto that task.

0002 is a first draft for one way to do this.

Of course this still requires somebody analyzing a failure to look at
cirrus.tasks.yml, but I don't know if we can do a whole lot about that?

I wondered about making the SPECIAL thing an environment variable instead of a
comment, that way it'd probably be visible to cfbot. But I don't really see
what it could do with that information.

I guess we could make the SPECIAL: comments into echos in a
  special_script:

that way it would show up as an explicit "section" in the per-task CI
output. But I don't know how much that would help, the failures due to the
tasks specialness could be during build, testing etc, so the "special" section
won't necessarily be close to the failing step.

Greetings,

Andres Freund
>From 5e75ce6daeb293facbbcff0637d48b87ba40e592 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 5 Mar 2025 12:08:58 -0500
Subject: [PATCH v2 1/2] ci: freebsd: Specify debug_parallel_query=regress

A lot of buildfarm animals run with debug_parallel_query=regress, while CI
didn't test that. That lead to the annoying situation of only noticing related
test instabilities after merging changes upstream.

FreeBSD was chosen because it's a relatively fast task. It also tests
debug_write_read_parse_plan_trees etc, which probably is exercised a bit more
heavily with debug_parallel_query=regress.

ci-os-only: freebsd
Discussion: https://postgr.es/m/zbuk4mlov22yfoktf5ub3lwjw2b7ezwphwolbplthepda42int@h6wpvq7orc44
---
 .cirrus.tasks.yml | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index e7482da1fdd..a4cd0c76e80 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -139,7 +139,14 @@ task:
 CPPFLAGS: -DRELCACHE_FORCE_RELEASE -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
 CFLAGS: -Og -ggdb

-PG_TEST_INITDB_EXTRA_OPTS: -c debug_copy_parse_plan_trees=on -c debug_write_read_parse_plan_trees=on -c debug_raw_expression_coverage_test=on
+# Several buildfarm animals enable these options. Without testing them
+# during CI, it would be easy to cause breakage on the buildfarm with CI
+# passing.
+PG_TEST_INITDB_EXTRA_OPTS: >-
+  -c debug_copy_parse_plan_trees=on
+  -c debug_write_read_parse_plan_trees=on
+  -c debug_raw_expression_coverage_test=on
+  -c debug_parallel_query=regress
 PG_TEST_PG_UPGRADE_MODE: --link

   <<: *freebsd_task_template
-- 
2.48.1.76.g4e746b1a31.dirty

>From 923f7766bcc80f6bc094271b7073ec76a07e1e27 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 5 Mar 2025 12:50:05 -0500
Subject: [PATCH v2 2/2] ci: Document what makes certain tasks special

To increase coverage without drastically increasing CI resource usage, we have
different CI tasks test different things (e.g. the linux tasks use
sanitizers).  Unfortunately that can create confusing situations where CI
fails on some OS, but not others, without the problem appearing to be platform
dependent.

To, partially, address that, add a comment, prefixed with SPECIAL, to each
task that we use to test in some non-default way.

Discussion: https://postgr.es/m/321570.1741195...@sss.pgh.pa.us
---
 .cirrus.tasks.yml | 29 +
 1 file changed, 29 insertions(+)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index a4cd0c76e80..57469518d86 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -2,6 +2,12 @@
 #
 # For instructions on how to enable the CI integration in a repository and
 # further details, see src/tools/ci/README
+#
+#
+# NB: Different tasks intentionally test with different, non-default,
+# configurations, to increase the chance of catching problems. Each task with
+# non-obvious non-default documents their oddity at the top of the task,
+# prefixed by "SPECIAL:".

 env:
@@ -55,6 +61,10 @@ on_failure_meson: &on_failure_meson

 # To avoid unnecessarily spinning up a lot of VMs / containers for entirely
 # broken commits, have a minimal task that all others depend on.
+#
+# SPECIAL:
+# - Builds with --auto-features=disabled and thus almost no enabled
+#   dependencies.
 task:
   name: SanityCheck

@@ -125,6 +135,11 @@ task:
   src/tools/ci/cores_backtrace.sh linux /tmp/cores

+# SPECIAL:
+# - Uses postgres specific CPPFLAGS that increase test coverage
+# - Specifies configuration options that test reading/writing/copying of node trees
+# - Specifies debug_parallel_query=regress, to catch related issues during CI
+# - Also runs tests against a running postgres instance, see test_running_script
 task:
   name

Re: new commitfest transition guidance

2025-03-05 Thread Daniel Gustafsson

> On 4 Mar 2025, at 15:10, jian he  wrote:
> On Tue, Feb 4, 2025 at 2:39 PM Daniel Gustafsson  wrote:

>> They will be on the wiki shortly, I've taken notes of all discussions and am
>> busy tidying them up to be able to publish them once the participants have
>> proofread to ensure noone is misattributed.
> 
> hi.
> i can only found 2024 version.
> https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2024_Developer_Meeting#FOSDEM_2024_Developer_Meeting_schedule_by_Time_and_Room
> 
> google:
> site: https://wiki.postgresql.org/wiki  PGDay 2025 fosdem
> didn't have any interesting results.

I would avoid using Google for finding content on the wiki, the search function
on the wiki itself is generally more reliable.  Searching for FOSDEM 2025
returns the following as the top result:

https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2025_Developer_Meeting

--
Daniel Gustafsson

should num_custom_plans be reset after plan invalidation?

Hi,

While examining plan caches, I noticed that when a generic plan is invalidated,
the next execution of the prepared statement still results in a
generic plan. This
is of course with the default plan_cache_mode.

This behavior might go unnoticed since plan cache invalidations are
relatively uncommon,
but I’m unsure if this is the intended design. The existing decision
to switch to
a generic plan—based on the cost average of the first five custom
plans—may no longer
be optimal after a relation is modified (e.g., when a new index is added).
Given this, resetting num_custom_plans to 0 after a plan cache invalidation
might be a better approach.

I've attached an example for reference. The fix seems straightforward,
but since generic
plans may already not handle skewed data optimally, I want to see if others have
thoughts on this being something to fix.

--
Sami Imseih
Amazon Web Services (AWS)
DEALLOCATE p1;
DROP TABLE IF EXISTS plan_cache_tab;
CREATE TABLE plan_cache_tab (x int, y int);
INSERT INTO plan_cache_tab SELECT 1, 0 FROM generate_series(1, 5);
INSERT INTO plan_cache_tab SELECT 2, 0 FROM generate_series(1, 2);
PREPARE P1(int) AS select  from plan_cache_tab where x = $1;
EXPLAIN ANALYZE EXECUTE p1(1);
EXPLAIN ANALYZE EXECUTE p1(1);
EXPLAIN ANALYZE EXECUTE p1(1);
EXPLAIN ANALYZE EXECUTE p1(1);
EXPLAIN ANALYZE EXECUTE p1(1);
EXPLAIN ANALYZE EXECUTE p1(1);
EXPLAIN ANALYZE EXECUTE p1(1);
CREATE INDEX ON plan_cache_tab(x);
ANALYZE plan_cache_tab ;
EXPLAIN ANALYZE EXECUTE p1(1);
EXPLAIN ANALYZE EXECUTE p1(1);
DEALLOCATE p1;
PREPARE P1(int) AS select  from plan_cache_tab where x = $1;
EXPLAIN ANALYZE EXECUTE p1(1);
DEALLOCATE p1;

test=# DEALLOCATE p1;
DEALLOCATE
test=# 
test=# DEALLOCATE p1;
ERROR:  prepared statement "p1" does not exist
test=# DROP TABLE IF EXISTS plan_cache_tab;
DROP TABLE
test=# CREATE TABLE plan_cache_tab (x int, y int);
CREATE TABLE
test=# INSERT INTO plan_cache_tab SELECT 1, 0 FROM generate_series(1, 5);

INSERT 0 5
test=# INSERT INTO plan_cache_tab SELECT 2, 0 FROM generate_series(1, 2);
INSERT 0 2
test=# PREPARE P1(int) AS select  from plan_cache_tab where x = $1;
PREPARE
test=# EXPLAIN ANALYZE EXECUTE p1(1);
 QUERY PLAN 


 Seq Scan on plan_cache_tab  (cost=0.00..849.15 rows=251 width=0) (actual 
time=0.037..19.157 rows=5.00 loops=1)
   Filter: (x = 1)
   Rows Removed by Filter: 2
   Buffers: shared hit=222
 Planning:
   Buffers: shared hit=6
 Planning Time: 0.093 ms
 Execution Time: 21.764 ms
(8 rows)

test=# EXPLAIN ANALYZE EXECUTE p1(1);
 QUERY PLAN 


 Seq Scan on plan_cache_tab  (cost=0.00..849.15 rows=251 width=0) (actual 
time=0.014..14.656 rows=5.00 loops=1)
   Filter: (x = 1)
   Rows Removed by Filter: 2
   Buffers: shared hit=222
 Planning Time: 0.041 ms
 Execution Time: 17.215 ms
(6 rows)

test=# EXPLAIN ANALYZE EXECUTE p1(1);
 QUERY PLAN 


 Seq Scan on plan_cache_tab  (cost=0.00..849.15 rows=251 width=0) (actual 
time=0.013..14.668 rows=5.00 loops=1)
   Filter: (x = 1)
   Rows Removed by Filter: 2
   Buffers: shared hit=222
 Planning Time: 0.041 ms
 Execution Time: 17.241 ms
(6 rows)

test=# EXPLAIN ANALYZE EXECUTE p1(1);
 QUERY PLAN 


 Seq Scan on plan_cache_tab  (cost=0.00..849.15 rows=251 width=0) (actual 
time=0.014..14.655 rows=5.00 loops=1)
   Filter: (x = 1)
   Rows Removed by Filter: 2
   Buffers: shared hit=222
 Planning Time: 0.039 ms
 Execution Time: 17.211 ms
(6 rows)

test=# EXPLAIN ANALYZE EXECUTE p1(1);
 QUERY PLAN 


 Seq Scan on plan_cache_tab  (cost=0.00..849.15 rows=251 width=0) (actual 
time=0.014..14.645 rows=5.00 loops=1)
   Filter: (x = 1)
   Rows Removed by Filter: 2
   Buffers: shared hit=222
 Planning Time: 0.038 ms
 Execution Time: 17.205 ms
(6 rows)

test=# EXPLAIN ANALYZE EXECUTE p1(1);
 QUERY PLAN 

-

Re: making EXPLAIN extensible

On Tue, 2025-03-04 at 16:23 -0500, Robert Haas wrote:

> But, I'm doubtful that
> letting unrelated extensions try to share the same option name is
> that
> thing.

This sub-discussion started because we were wondering whether to prefix
the options. I'm just pointing out that, even if there is a collision,
and it happened to work, it's as likely to be a feature as a bug.

I didn't look into the technical details to see what might be required
to allow that kind of collaboration, and I am not suggesting you
redesign the entire feature around that idea.

Regards,
Jeff Davis

Re: Expanding HOT updates for expression and partial indexes

2025-03-05 Thread Burd, Greg

Hello,

I've rebased and updated the patch a bit.  The biggest change is that the 
performance penalty measured with v1 of this patch is essentially gone in v10.  
The overhead was due to re-creating IndexInfo information unnecessarily, which 
I found existed in the estate.  I've added a few fields in IndexInfo that are 
not populated by default but necessary when checking expression indexes, those 
fields are populated on demand and only once limiting their overhead.
  
Here's what you'll find if you look into execIndexing.c where the majority of 
changes happened. 

* assumes estate->es_result_relations[0] is the ResultRelInfo being updated
* uses ri_IndexRelationInfo[] from within estate rather than re-creating it
* augments IndexInfo only when needed for testing expressions and only once
* only creates a local old/new TupleTableSlot when not present in estate
* retains existing summarized index HOT update logic

One remaining concern stems from the assumption that 
estate->es_result_relations[0] is always going to be the relation being 
updated.  This is guarded by assert()'s in the patch.  It seems this is safe, 
all tests are passing (including TAP) and my review of the code seems to line 
up with that assumption.  That said... opinions?

Another lingering question is under what conditions the old/new TupleTableSlots 
are not created and available via the ResultRelInfo found in estate.  I've only 
seen this happen when there is an INSERT ... ON CONFLICT UPDATE ... with 
expression indexes.  I was hopeful that in all cases I could avoid re-creating 
those when checking expression indexes to avoid that repeated overhead.  I 
still avoid it when possible in this patch.

When you have time I'd appreciate any feedback.

-greg
Amazon Web Services: https://aws.amazon.com


v10-0001-Expand-HOT-update-path-to-include-expression-and.patch
Description: v10-0001-Expand-HOT-update-path-to-include-expression-and.patch

Re: Adding support for SSLKEYLOGFILE in the frontend

2025-03-05 Thread Daniel Gustafsson

> On 3 Mar 2025, at 16:23, Daniel Gustafsson  wrote:

> The attached 0002 also contains documentation touchups and comments etc.  0001
> is your patch from v6.

I managed to misunderstand skip blocks in TAP tests in the 0002, so the
attached version fixes that.  It has been failing on Debian in CI which I have
yet to look into.

--
Daniel Gustafsson

v8-0002-Review-fixups.patch
Description: Binary data

v8-0001-Add-support-for-dumping-SSL-keylog-to-a-file.patch
Description: Binary data

Re: Allow LISTEN on patterns

2025-03-05 Thread Greg Sabino Mullane

Does not seem like a bug to me. Just the normal auto-lowercase encountered
in every other SQL command. See:

greg=# select * from pg_listening_channels();
 pg_listening_channels
---
(0 rows)

greg=# listen foo;
LISTEN
greg=# select * from pg_listening_channels();
 pg_listening_channels
---
 foo
(1 row)

greg=# listen FOO;
LISTEN
greg=# select * from pg_listening_channels();
 pg_listening_channels
---
 foo
(1 row)

greg=# listen "FOO";
LISTEN
greg=# select * from pg_listening_channels();
 pg_listening_channels
---
 foo
 FOO
(2 rows)


Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

Re: Allow LISTEN on patterns

Trey Boudreau  writes:
> I didn’t see any past references to the pg_notify() ‘anomaly’:

> LISTEN FOO;
> NOTIFY FOO, ‘BAR’; -- notification delivered
> PERFORM pg_notify(‘FOO’, ‘BAR’); -- notification NOT delivered
> PERFORM pg_notify(‘foo’, ‘BAR’); -- notification delivered

> Can we come to some agreement on if we should consider this a bug?

I don't think it's a bug particularly.  The actual channel name
being listened to is lowercase "foo", per the usual SQL identifier
case-folding rules.  But pg_notify is taking a literal not an
identifier, so you have to match case.

We do have some functions that downcase the input string unless
double-quoted, so that the experience is closer to what you get
for a SQL identifier.  Perhaps pg_notify should have done that
for the channel name, but it didn't and I think it's much too late
to revisit that.

regards, tom lane

Re: Should we add debug_parallel_query=regress to CI?

Andres Freund  writes:
> On 2025-03-05 12:29:15 -0500, Tom Lane wrote:
>> I think this does need to be documented somewhere/somehow, just so
>> that people don't waste time focusing on "it's failing on FreeBSD"
>> when the actual cause is some other thing we happened to load
>> onto that task.

> 0002 is a first draft for one way to do this.

Works for me.

> Of course this still requires somebody analyzing a failure to look at
> cirrus.tasks.yml, but I don't know if we can do a whole lot about that?

If it's far away from the actual task-spec code it probably won't
get maintained very well, so putting it here seems OK to me.

regards, tom lane

Re: Should we add debug_parallel_query=regress to CI?

2025-03-05 Thread Peter Geoghegan

On Wed, Mar 5, 2025 at 1:34 PM Andres Freund  wrote:
> Pushed both patches.

Thanks!
-- 
Peter Geoghegan

Re: Make tuple deformation faster

On Thu, 2025-03-06 at 01:07 +1300, David Rowley wrote:
> I've attached the results. The 3990x with clang looks good, but the
> rest are mostly slower.

I am still curious why.

If it's due to compiler misoptimization, is that kind of thing often
misoptimized, or is there something we're doing in particular?

Even if we don't have answers, it might be worth adding a brief comment
that we empirically determined that booleans are faster than bitfields
or flags. In the future, maybe compilers mostly get this right, and we
want to change to bitfields.

Regards,
Jeff Davis

track generic and custom plans in pg_stat_statements

Hi,

Currently, there is little visibility for queries that are being executed
using generic or custom plans. There is pg_prepared_statements which
show generic_plans and custom_plans as of d05b172a760, but this
information is backend local and not very useful to a DBA that wishes
to track this information cumulatively and over time. [0] had intentions
to add these counters to pg_stat_statements, but was withdrawn due
to timing with the commitfest at the time [0] and never picked back up again.

I think it's useful to add these counters.

Therefore, the attached patch adds two new columns to pg_stat_statements:
"generic_plan_calls" and "custom_plan_calls". These columns track the
number of executions performed using a generic or custom plan.

Only non-utility statements are counted, as utility statements with an
optimizable parameterized query (i.e. CTAS) cannot be called with PREPARE.
Additionally, when such statements are repeatedly executed using an extended
protocol prepared statement, pg_stat_statements may not handle them properly,
since queryId is set to 0 during pgss_ProcessUtility.

To avoid introducing two additional counters in CachedPlan, the existing
boolean is_reusable—which determines whether a generic plan is reused to
manage lock requirements—has been repurposed as an enum. This enum now
tracks different plan states, including "generic reused", "generic first time"
and "custom". pg_stat_statements uses these states to differentiate between
generic and custom plans for tracking purposes. ( I felt this was better than
having to carry 2 extra booleans in CachedPlan for this purpose, but that will
be the alternative approach ).

Not included in this patch and maybe for follow-up work, is this information
can be added to EXPLAIN output and perhaps pg_stat_database. Maybe that's
a good idea also?

This patch bumps the version pf pg_stat_statements.

--
Sami Imseih
Amazon Web Services (AWS)


[0] 
https://www.postgresql.org/message-id/add1e591fbe8874107e75d04328859ec%40oss.nttdata.com


v1-0001-add-plan_cache-counters-to-pg_stat_statements.patch
Description: Binary data

Re: per backend WAL statistics

Hi,

On Wed, Mar 05, 2025 at 09:18:16AM -0500, Andres Freund wrote:
> Hi,
> 
> On 2025-03-05 13:03:07 +, Bertrand Drouvot wrote:
> > But yeah, if 0002 in [1] does not go in, then your concern is valid, so 
> > adding
> > the extra check in the attached.
> 
> This crashes in cfbot:
> 
> https://cirrus-ci.com/task/5111872610893824
> 

Thanks for the report! I usually always run a make check-world locally and also
launch the CI tests on my github repo before submitting patches. This time,
that was a one line change (as compared to v13), so confident enough I did not
trigger those tests. Murphy's Law I guess ;-)

So yeah, back to the issue, we have to pay more attention for the WAL stats
because pgWalUsage is "incremented" without any check with 
pgstat_tracks_backend_bktype()
(that's not the case for the IO stats where the counters are incremented taking
into account pgstat_tracks_backend_bktype()).

So for the moment, in the attached I "just" add a pgstat_tracks_backend_bktype()
check in pgstat_backend_have_pending_cb() but I'm not sure I like it that 
much...

Will think more about it...

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From db6b4601108c4ec4a636d5085c30baad758b6f5e Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 6 Jan 2025 10:00:00 +
Subject: [PATCH v15] per backend WAL statistics

Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.

This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity.  A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.

The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.

XXX: bump catalog version
XXX: bump of stats file format not required, as backend stats do not
persist on disk.
---
 doc/src/sgml/monitoring.sgml| 19 ++
 src/backend/utils/activity/pgstat_backend.c | 71 -
 src/backend/utils/activity/pgstat_wal.c |  1 +
 src/backend/utils/adt/pgstatfuncs.c | 26 +++-
 src/include/catalog/pg_proc.dat |  7 ++
 src/include/pgstat.h| 37 +--
 src/include/utils/pgstat_internal.h |  3 +-
 src/test/regress/expected/stats.out | 14 
 src/test/regress/sql/stats.sql  |  6 ++
 9 files changed, 161 insertions(+), 23 deletions(-)
  14.9% doc/src/sgml/
  43.2% src/backend/utils/activity/
  14.6% src/backend/utils/adt/
   8.2% src/include/catalog/
   4.2% src/include/utils/
   7.9% src/test/regress/expected/
   6.0% src/test/regress/sql/

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 16646f560e8..b1710680705 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4866,6 +4866,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage

   
 
+  
+   
+
+ pg_stat_get_backend_wal
+
+pg_stat_get_backend_wal ( integer )
+record
+   
+   
+Returns WAL statistics about the backend with the specified
+process ID. The output fields are exactly the same as the ones in the
+pg_stat_wal view.
+   
+   
+The function does not return WAL statistics for the checkpointer,
+the background writer, the startup process and the autovacuum launcher.
+   
+  
+
   

 
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index a9343b7b59e..7be35e667dd 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -38,6 +38,14 @@
  */
 static PgStat_BackendPending PendingBackendStats;
 
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
 /*
  * Utility routines to report I/O stats for backends, kept here to avoid
  * exposing PendingBackendStats to the outside world.
@@ -184,6 +192,57 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
 	MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
 }
 
+/*
+ * To determine whether WAL usage happened.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+	return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics.  Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_

Re: Allow LISTEN on patterns

2025-03-05 Thread Trey Boudreau



> On Mar 5, 2025, at 10:42 AM, Tom Lane  wrote:
> 
> Anyway, I encourage reading some of the past threads on this
> topic.
> 
I didn’t see any past references to the pg_notify() ‘anomaly’:

LISTEN FOO;
NOTIFY FOO, ‘BAR’; -- notification delivered
PERFORM pg_notify(‘FOO’, ‘BAR’); -- notification NOT delivered
PERFORM pg_notify(‘foo’, ‘BAR’); -- notification delivered

Can we come to some agreement on if we should consider this a bug?

— Trey

Re: Should we add debug_parallel_query=regress to CI?

Hi,

On 2025-03-05 11:19:46 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Post-commit issues due to debug_parallel_query=regress seem rather common,
> > surely not helped by CI/cfbot not flagging them. I wonder if we ought to 
> > make
> > one of the CI tasks use debug_parallel_query=regress, to avoid that problem?
> 
> Yeah, it certainly seems like a test coverage gap.

I decided to use freebsd, as it's a relatively fast task.  Additionally I
thought it might be interesting to do this testing on the test run that also
does debug_write_read_parse_plan_trees etc.

I tested it by intentionally not including the revert, and it indeed finds the
problem (not that that was really in doubt, but it seemed worth verifying).

https://cirrus-ci.com/task/5782413399293952?logs=test_world#L214

> However, we seem to be moving towards a situation where each type of CI run
> is a special snowflake that differs in multiple dimensions from other types.
> That might make it difficult to figure out which dimension is responsible
> for a particular failure.

True, but I don't really see an alternative. Having dedicated tasks for
testing just debug_parallel_query=regress (and about half a dozen other
things) on their own seems like a *lot* of resource usage for the gain.

> (OTOH, the same can be said of the buildfarm, and we've survived
> regardless.  So maybe I'm worried over nothing.)

The alternative seems to be to figure out the problem after commit, with
similar issues, as you point out. I'd much rather have to spend a minute
analyzing why one task triggers an issue than doing so under pressure after
commit.

I guess we could be add a "standardized" section at the top of each task
describing their oddities? Not sure it's worth it.

Greetings,

Andres Freund
>From 5e75ce6daeb293facbbcff0637d48b87ba40e592 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 5 Mar 2025 12:08:58 -0500
Subject: [PATCH v1] ci: freebsd: Specify debug_parallel_query=regress

A lot of buildfarm animals run with debug_parallel_query=regress, while CI
didn't test that. That lead to the annoying situation of only noticing related
test instabilities after merging changes upstream.

FreeBSD was chosen because it's a relatively fast task. It also tests
debug_write_read_parse_plan_trees etc, which probably is exercised a bit more
heavily with debug_parallel_query=regress.

ci-os-only: freebsd
Discussion: https://postgr.es/m/zbuk4mlov22yfoktf5ub3lwjw2b7ezwphwolbplthepda42int@h6wpvq7orc44
---
 .cirrus.tasks.yml | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index e7482da1fdd..a4cd0c76e80 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -139,7 +139,14 @@ task:
 CPPFLAGS: -DRELCACHE_FORCE_RELEASE -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
 CFLAGS: -Og -ggdb

-PG_TEST_INITDB_EXTRA_OPTS: -c debug_copy_parse_plan_trees=on -c debug_write_read_parse_plan_trees=on -c debug_raw_expression_coverage_test=on
+# Several buildfarm animals enable these options. Without testing them
+# during CI, it would be easy to cause breakage on the buildfarm with CI
+# passing.
+PG_TEST_INITDB_EXTRA_OPTS: >-
+  -c debug_copy_parse_plan_trees=on
+  -c debug_write_read_parse_plan_trees=on
+  -c debug_raw_expression_coverage_test=on
+  -c debug_parallel_query=regress
 PG_TEST_PG_UPGRADE_MODE: --link

   <<: *freebsd_task_template
-- 
2.48.1.76.g4e746b1a31.dirty

Re: Non-text mode for pg_dumpall

2025-03-05 Thread Mahendra Singh Thalor

Thanks Alvaro for feedback and review.

On Wed, 5 Mar 2025 at 20:42, Álvaro Herrera  wrote:
>
> Disclaimer: I didn't review these patches fully.
>
> On 2025-Mar-05, Mahendra Singh Thalor wrote:
>
> > On Wed, 5 Mar 2025 at 01:02, Álvaro Herrera  wrote:
> >
> > > A database name containing a newline breaks things for this patch:
> > >
> > > CREATE DATABASE "foo
> > > bar";
>
> > I also reported this issue on 29-01-2025. This breaks even without this
> > patch also.
>
> Okay, we should probably fix that, but I think the new map.dat file your
> patch adds is going to make the problem worse, because it doesn't look
> like you handled that case in any particular way that would make it not
> fail.  I think it would be good to avoid digging us up even deeper in
> that hole.  More generally, the pg_upgrade tests contain some code to
> try database names with almost all possible ascii characters (see
> generate_db in pg_upgrade/t/002_pg_upgrade.pl); it would be good to
> ensure that this new functionality also works correctly for that --
> perhaps add an equivalent test to the pg_dumpall test suite.

In the attached patch, I tried to solve the problem of the map.dat
file. I will do more analysis based on dbnames in 002_pg_upgrade.pl
file.

>
> Looking at 0001:
>
> I'm not sure that the whole common_dumpall_restore.c thing is properly
> structured.  First, the file name shouldn't presume which programs
> exactly are going to use the funcionality there.  Second, it looks like
> there's another PQconnectdbParams() in pg_backup_db.c and I don't
> understand what the reason is for that one to be separate.  In my mind,
> there should be a file maybe called connection.c or connectdb.c or
> whatever that's in charge of establishing connection for all the
> src/bin/pg_dump programs, for cleanliness sake.  (This is probably also
> the place where to put an on_exit callback that cleans up any leftover
> connections.)

Okay. I will do these changes.

>
> Looking at 0002 I see it mentions looking at the EXIT_NICELY macro for
> documentation.  No such macro exists.  But also I think the addition
> (and use) of reset_exit_nicely_list() is not a good idea.  It seems to
> assume that the only entries in that list are ones that can be cleared
> and reinstated whenever.  This makes too much of an assumption about how
> the program works.  It may work today, but it'll get in the way of any
> other patch that wants to set up some different on-exit clean up.  In
> other words, we shouldn't reset the on_exit list at all.  Also, this is
> just a weird addition:

I will do more study for this case and will update here.

>
> #define exit_nicely(code) exit(code)

Okay. I will fix this.

>
> You added "A" as an option to the getopt_long() call in pg_restore, but
> no handling for it is added.

Fixed.

>
> I think the --globals-only option to pg_restore should be a separate
> commit.

Okay.

>
> --
> Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Here, I am attaching updated patches for review and testing.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


v21_0002_pg_dumpall-with-non-text_format-5th_mar.patch
Description: Binary data


v21_0001_move-common-code-of-pg_dumpall-and-pg_restore-to-new_file.patch
Description: Binary data

Re: new commitfest transition guidance

2025-03-05 Thread Álvaro Herrera

On 2025-Mar-05, Daniel Gustafsson wrote:

> I would avoid using Google for finding content on the wiki, the search 
> function
> on the wiki itself is generally more reliable.  Searching for FOSDEM 2025
> returns the following as the top result:
> 
> https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2025_Developer_Meeting

The category system in the wiki is very useful for this kind of thing.

https://wiki.postgresql.org/wiki/Category:Developer_Meeting

I encourage everybody to tag the pages they create with one or more
category tags.  In this case, it means adding this at the bottom of the
page:

[[Category:Developer_Meeting]]

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Should we add debug_parallel_query=regress to CI?

Hi,

On 2025-03-05 13:10:00 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2025-03-05 12:29:15 -0500, Tom Lane wrote:
> >> I think this does need to be documented somewhere/somehow, just so
> >> that people don't waste time focusing on "it's failing on FreeBSD"
> >> when the actual cause is some other thing we happened to load
> >> onto that task.
> 
> > 0002 is a first draft for one way to do this.
> 
> Works for me.

Cool. Thanks for looking at it.

Pushed both patches.

Greetings,

Andres Freund

Re: explain plans for foreign servers

Jeff Davis  writes:
> Ideally, we'd have EXPLAIN ANALYZE return two result sets, kind of like
> how a query with a semicolon returns two result sets. That changes the
> expected message flow for EXPLAIN ANALYZE, though, so we'd need a new
> option so we are sure the client is expecting it (is this a sane
> idea?).

I'm afraid not.  That pretty fundamentally breaks the wire protocol,
I think.  Also (1) there could be more than two, no, if the query
touches more than one foreign table?  How would the client know
how many to expect?  (2) there would be no particularly compelling
ordering for the multiple resultsets.

> I wonder if Robert's extensible EXPLAIN work[1] could be useful
> here?

I'm wondering the same.  You could certainly imagine cramming
all of the foreign EXPLAIN output into some new field attached
to the ForeignScan node.  Readability and indentation would
require some thought, but the other approaches don't have any
mechanism for addressing that at all.

regards, tom lane

Re: Interrupts vs signals

Hi,

On 2025-02-28 22:24:56 +0200, Heikki Linnakangas wrote:
> I noticed that the ShutdownLatchSupport() function is unused. The first
> patch removes it.

Looks like that's the case since

commit 80a8f95b3bca6a80672d1766c928cda34e979112
Author: Andres Freund 
Date:   2021-08-13 05:49:26 -0700
 
Remove support for background workers without BGWORKER_SHMEM_ACCESS.

Oops.

LGTM.


> The second patch makes it possible to use ModifyWaitEvent() to switch
> between WL_POSTMASTER_DEATH and WL_EXIT_ON_PM_DEATH. WaitLatch() used to
> modify WaitEventSet->exit_on_postmaster_death directly, now it uses
> ModifyWaitEvent() for that. That's needed because with the final patch,
> WaitLatch() is in a different source file than WaitEventSet, so it cannot
> directly modify its field anymore.


> @@ -505,8 +513,14 @@ WaitLatch(Latch *latch, int wakeEvents, long timeout,
>   if (!(wakeEvents & WL_LATCH_SET))
>   latch = NULL;
>   ModifyWaitEvent(LatchWaitSet, LatchWaitSetLatchPos, WL_LATCH_SET, 
> latch);
> - LatchWaitSet->exit_on_postmaster_death =
> - ((wakeEvents & WL_EXIT_ON_PM_DEATH) != 0);
> +
> + /*
> +  * Update the event set for whether WL_EXIT_ON_PM_DEATH or
> +  * WL_POSTMASTER_DEATH was requested.  This is also cheap.
> +  */

"for whether" sounds odd to me.

I'd remove the "also" in the second sentence.


> @@ -1037,15 +1051,19 @@ ModifyWaitEvent(WaitEventSet *set, int pos, uint32 
> events, Latch *latch)
>   (!(event->events & WL_LATCH_SET) || set->latch == latch))
>   return;
>  
> - if (event->events & WL_LATCH_SET &&
> - events != event->events)
> + /* Allow switching between WL_POSTMASTER_DEATH and WL_EXIT_ON_PM_DEATH 
> */
> + if (event->events & WL_POSTMASTER_DEATH)

Hm, this only supports switching from WL_POSTMASTER_DEATH to
WL_EXIT_ON_PM_DEATH, not the other way round, right?


>   {
> - elog(ERROR, "cannot modify latch event");
> + if (events != WL_POSTMASTER_DEATH && events != 
> WL_EXIT_ON_PM_DEATH)
> + elog(ERROR, "cannot modify postmaster death event");
> + set->exit_on_postmaster_death = ((events & WL_EXIT_ON_PM_DEATH) 
> != 0);
> + return;
>   }


> - if (event->events & WL_POSTMASTER_DEATH)
> + if (event->events & WL_LATCH_SET &&
> + events != event->events)
>   {
> - elog(ERROR, "cannot modify postmaster death event");
> + elog(ERROR, "cannot modify latch event");
>   }

Why did you reorder this? I don't have a problem with it, I just can't quite
follow.


> The third patch is mechanical and moves existing code. The file header
> comments in the modified files are perhaps worth reviewing. They are also
> just existing text moved around, but there was some small decisions on what
> exactly should go where.
> 
> I'll continue working on the other parts, but these patches seems ready for
> commit already.

Interestingly git diff's was pretty annoying to read, even with --color-moved,
unless I used -M100 (setting the rename detection to require 100%
renamed). With "-M100 --color-moved=dimmed-zebra" it looks a lot saner.


> From 7bd0d2f98a1b9d7ad40f3f72cd1c93430c1d7cee Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas 
> Date: Fri, 28 Feb 2025 16:42:15 +0200
> Subject: [PATCH 3/3] Split WaitEventSet functions to separate source file
> 
> latch.c now only contains the Latch related functions, which build on
> the WaitEventSet abstraction. Most of the platform-dependent stuff is
> now in waiteventset.c.


>  include $(top_srcdir)/src/backend/common.mk
> diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
> index ab601c748f8..aae0cf7577d 100644
> --- a/src/backend/storage/ipc/latch.c
> +++ b/src/backend/storage/ipc/latch.c

>  #include "miscadmin.h"

It's a bit weird that MyLatch is declared in miscadmin.h, but that's not this
patch's fault.



> diff --git a/src/backend/storage/ipc/waiteventset.c 
> b/src/backend/storage/ipc/waiteventset.c
> new file mode 100644
> index 000..35e836d3398

> +#include "storage/proc.h"

proc.h wasn't previously included, and it's not clear to me why it'd be needed
now? If I remove it, things still compile, and I verified it's not indirectly
included.

Perhaps you had WakeupMyProc() in proc.c earlier?


> +/*
> + * Change the event mask and, in the WL_LATCH_SET case, the latch associated
> + * with the WaitEvent.  The latch may be changed to NULL to disable the latch
> + * temporarily, and then set back to a latch later.
> + *
> + * 'pos' is the id returned by AddWaitEventToSet.
> + */
> +void
> +ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch)
> +{
> ...
> +
> + /* Allow switching between WL_POSTMASTER_DEATH and WL_EXIT_ON_PM_DEATH 
> */
> + if (event->events & WL_POSTMASTER_DEATH)
> + {
> + if (events != WL_POSTMASTER_DEATH && events != 
> WL_EXIT_ON_PM_DEA

Re: explain plans for foreign servers