date:20250520

Re: Log connection establishment timings

2025-05-20 Thread Peter Eisentraut


On 12.03.25 16:43, Melanie Plageman wrote:

On Tue, Mar 11, 2025 at 6:27 PM Melanie Plageman
 wrote:


I did more manual testing of my patches, and I think they are mostly
ready for commit except for the IsConnectionBackend() macro (if we
have something to change it to).


I've committed this and marked it as such in the CF app.
Thanks to everyone for the review.


log_connections has been changed from a Boolean parameter to a string 
one, but a number of places in the documentation and in various pieces 
of test code still use the old values.  I think it would be better if 
they were adjusted to the new style.


There are two places in doc/src/sgml/config.sgml where 
log_connections=yes is used as an example.  This is a relatively 
prominent place, so it should not use deprecated values.


In src/backend/tcop/postgres.c, there is a call

SetConfigOption("log_connections", "true", context, source);

that could be adjusted.

Various uses in test code:

src/interfaces/libpq/t/005_negotiate_encryption.pl
src/test/authentication/t/001_password.pl
src/test/authentication/t/003_peer.pl
src/test/authentication/t/005_sspi.pl
src/test/authentication/t/007_pre_auth.pl
src/test/kerberos/t/001_auth.pl
src/test/ldap/t/001_auth.pl
src/test/ldap/t/002_bindpasswd.pl
src/test/modules/ldap_password_func/t/001_mutated_bindpasswd.pl
src/test/modules/oauth_validator/t/001_server.pl
src/test/modules/oauth_validator/t/002_client.pl
src/test/postmaster/t/002_connection_limits.pl
src/test/postmaster/t/003_start_stop.pl
src/test/recovery/t/013_crash_restart.pl
src/test/recovery/t/022_crash_temp_files.pl
src/test/recovery/t/032_relfilenode_reuse.pl
src/test/recovery/t/037_invalid_database.pl
src/test/ssl/t/SSL/Server.pm
src/tools/ci/pg_ci_base.conf

I suspect in some of these cases using one of the new more granular 
values would be appropriate?  This could also serve as examples and 
testing of the parameter itself.

Re: Regression in statement locations

2025-05-20 Thread Michael Paquier

On Tue, May 20, 2025 at 09:58:04AM +0200, Anthonin Bonnefoy wrote:
> Looking at the tests, there are 2 additionals empty DO blocks:
> +DO $$
> +DECLARE BEGIN
> +END $$;
> 
> What's the point of those? They won't be visible in the output since
> we have 'toplevel IS FALSE' in the pg_stat_statements calls and they
> don't fit the "DO block --- multiple inner queries with separators".

That's a copy-pasto.  Will remove.

> Other than that, the patch looks good.

Thanks for the review, Anthonin and Jian.
--
Michael


signature.asc
Description: PGP signature

Re: Make wal_receiver_timeout configurable per subscription

2025-05-20 Thread vignesh C

On Tue, 20 May 2025 at 03:16, Michael Paquier  wrote:
>
> On Mon, May 19, 2025 at 11:19:48AM -0400, Robert Haas wrote:
> > The advantage of Fujii-san's proposal is that it is very simple to
> > implement. A subscription option would indeed be better, but it would
> > also be considerably more complex. Why not start simple and if someone
> > wants to do the work to add something more complicated, that is fine?
>
> Logically, adding that as an option of CREATE SUBSCRIPTION would just
> be a duplication of what a connection strings are already able to do
> with "options='-c foo=fooval'", isn't it?

Although the value is set in the session that creates the
subscription, it will not be used by the apply worker because the
launcher process, which starts the apply worker after subscription
creation, is unaware of session-specific settings.

> It seems to me that the issue of downgrading wal_receiver_timeout to
> become user-settable is if we're OK to allow non-superusers play with
> it in the code path where it's used currently.  Knowing that physical
> WAL receivers are only spawned in a controlled manner by the startup
> process, this does not sound like an issue.

If we set the wal_receiver_timeout configuration using ALTER ROLE for
the subscription owner's role, the apply worker will start with that
value. However, any changes made via ALTER ROLE ... SET
wal_receiver_timeout will not take effect for an already running apply
worker unless the subscription is disabled and re-enabled. In
contrast, this is handled automatically during CREATE SUBSCRIPTION,
where parameter changes are detected.

Regards,
Vignesh

Re: PG 18 release notes draft committed

2025-05-20 Thread Bruce Momjian

On Thu, May 15, 2025 at 02:02:02PM -0400, Álvaro Herrera wrote:
> As related anecdote, the rename of columns in pg_stat_activity a few
> years back caused some pg_upgrade pain, which IMO was understandable;
> it's reasonable to call that kind of thing out as a possible
> incompatibility, at least IMO.
> 
> > The "Migration" section also doesn't list changes to the exported
> > PostgreSQL functins, which has bitten me as extension developer several
> > times.
> 
> Hmm.

I have thought about such issues, and I feel there are so many items
that would be interesting to extension developers that adding just a few
to the release notes would not be very helpful, and if we add all of
them it would be distracting for the majority of users.  It might be
helpful for someone to write a wiki page specifically for that audience.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: Should we optimize the `ORDER BY random() LIMIT x` case?

2025-05-20 Thread Aleksander Alekseev

Andrei,

> > ```
> > -- imagine replacing inefficient array_sample(array_agg(t), 10)
> > -- with more efficient array_sample_reservoir(t, 10)
> > SELECT (unnest(agg)).* AS k FROM
> > (  SELECT array_sample(array_agg(t), 10) AS agg FROM (
> > ... here goes the subquery ...
> > ) AS t
> > );
> > ```
> >
> > ... if only we supported such a column expansion for not registered
> > records. Currently such a query fails with:
> >
> > ```
> > ERROR:  record type has not been registered
> > ```
> I know about this issue. Having resolved it in a limited number of local
> cases (like FDW push-down of row types), I still do not have a universal
> solution worth proposing upstream. Do you have any public implementation
> of the array_sample_reservoir to play with?

array_sample_reservoir() is purely a figment of my imagination at the
moment. Semantically it does the same as array_sample(array_agg(t), N)
except the fact that array_sample(..., N) requires the array to have
at least N items. You can experiment with array_sample(array_agg(...),
N) as long as the subquery returns much more than N rows.

-- 
Best regards,
Aleksander Alekseev

Re: Please update the pgconf.dev Unconference notes

2025-05-20 Thread Tomas Vondra

On 5/20/25 08:49, Hannu Krosing wrote:
> Hi to all note-takers
> 
> (I added two who I *think* I remember took notes)
> 
> Please upload the notes to the Unconference section in
> https://wiki.postgresql.org/wiki/PGConf.dev_2025
> 
> I have also some random notes on scraps of paper from the two sessions
> I attended and did not present and would like to add these if I think
> something relevant is missing once the main notes are there
> 

I only took notes during the "better logging" session, and I already
posted those to the wiki. The discussion moved too quickly in different
directions for me to catch all the details, so the notes have gaps etc.
If others can improve that / clarify, that'd be great.

regards

-- 
Tomas Vondra

Re: Conflict detection for update_deleted in logical replication

2025-05-20 Thread Amit Kapila

On Fri, May 16, 2025 at 5:01 PM Amit Kapila  wrote:
>

Please find some more comments on the 0001 patch:
1.
We need to know about such transactions
+ * for conflict detection and resolution in logical replication. See
+ * GetOldestTransactionIdInCommit and its use.

Do we need to mention resolution in the above sentence? This patch is
all about detecting conflict reliably.

2. In wait_for_publisher_status(), we use remote_epoch,
remote_nextxid, and remote_oldestxid to compute full transaction id's.
Why can't we send FullTransactionIds for remote_oldestxid and
remote_nextxid from publisher? If these are required, maybe a comment
somewhere for that would be good.

3.
/*
+ * Note it is important to set committs value after marking ourselves as
+ * in the commit critical section (DELAY_CHKPT_IN_COMMIT). This is because
+ * we want to ensure all such transactions are finished before we allow
+ * the logical replication client to advance its xid which is used to hold
+ * back dead rows for conflict detection. See
+ * maybe_advance_nonremovable_xid.
+ */
+ committs = GetCurrentTimestamp();

How does setting committs after setting DELAY_CHKPT_IN_COMMIT help in
advancing client-side xid? IIUC, on client-side, we simply wait for
such an xid to be finished based on the remote_oldestxid and
remote_nextxid sent via the server. So, the above comment is not
completely clear to me. I have updated this and related comments in
the attached diff patch to make it clear. See if that makes sense to
you.

4.
In 0001's commit message, we have: "Furthermore, the preserved commit
timestamps and origin data are essential for
consistently detecting update_origin_differs conflicts." But it is not
clarified how this patch helps in consistently detecting
update_origin_differs conflict?

5. I have tried to add some more details in comments on why
oldest_nonremovable_xid needs to be FullTransactionId. See attached.

-- 
With Regards,
Amit Kapila.
diff --git a/src/backend/access/transam/twophase.c 
b/src/backend/access/transam/twophase.c
index 41fb4fc5025..ea508542745 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -2329,9 +2329,9 @@ RecordTransactionCommitPrepared(TransactionId xid,
/*
 * Note it is important to set committs value after marking ourselves as
 * in the commit critical section (DELAY_CHKPT_IN_COMMIT). This is 
because
-* we want to ensure all such transactions are finished before we allow
-* the logical replication client to advance its xid which is used to 
hold
-* back dead rows for conflict detection. See
+* we want to ensure all transactions that have acquired commit 
timestamp
+* are finished before we allow the logical replication client to 
advance
+* its xid which is used to hold back dead rows for conflict detection. 
See
 * maybe_advance_nonremovable_xid.
 */
committs = GetCurrentTimestamp();
diff --git a/src/backend/access/transam/xact.c 
b/src/backend/access/transam/xact.c
index dfc5108da3c..6e1dc76744f 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1431,6 +1431,10 @@ RecordTransactionCommit(void)
 * without holding the ProcArrayLock, since we're the only one
 * modifying it.  This makes checkpoint's determination of 
which xacts
 * are delaying the checkpoint a bit fuzzy, but it doesn't 
matter.
+*
+* Note, it is important to get the commit timestamp after 
marking the
+* transaction in the commit critical section. See
+* RecordTransactionCommitPrepared.
 */
Assert((MyProc->delayChkptFlags & DELAY_CHKPT_IN_COMMIT) == 0);
START_CRIT_SECTION();
diff --git a/src/include/replication/worker_internal.h 
b/src/include/replication/worker_internal.h
index 0fdf49a1938..7d1e2096232 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -99,7 +99,9 @@ typedef struct LogicalRepWorker
 * conditions. Such scenarios might occur if the replication slot is not
 * yet created by the launcher while the apply worker has already
 * initialized this field. During this period, a transaction ID 
wraparound
-* could falsely make this ID appear as if it originates from the 
future.
+* could falsely make this ID appear as if it originates from the future
+* w.r.t the transaction ID stored in the slot maintained by launcher. 
See
+* advance_conflict_slot_xmin.
 */
FullTransactionId oldest_nonremovable_xid;
 
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 65ecf3280fb..c6f5ebceefd 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -133,10 +133,10 @@ extern PGDLLIMPORT int FastPathLockGroupsPerBackend;
  *
  * Setting DELAY_CHK

Re: Regression in statement locations

2025-05-20 Thread Sami Imseih

Tested the patch and it looks good to me.

Not that I thought it would fail, but I also confirmed the pgaudit case
works as expected.

```
LOG:  AUDIT: SESSION,10,2,DDL,CREATE TABLE,,,"CREATE TABLE do_table
(""weird name""
INT)",
LOG:  AUDIT: SESSION,10,3,DDL,DROP TABLE,,,DROP table do_table,
DO
```

--
Sami

Re: queryId constant squashing does not support prepared statements

2025-05-20 Thread Dmitry Dolgov

> On Tue, May 20, 2025 at 11:03:52AM GMT, Dmitry Dolgov wrote:
> > By the way, the new test cases for ARRAY lists are sent in the last
> > patch of the series posted on this thread:
> > https://www.postgresql.org/message-id/7zbzwk4btnxoo4o3xbtzefoqvht54cszjj4bol22fmej5nmgkf@dbcn4wtakw4y
> >
> > These should be first in the list, IMO, so as it is possible to track
> > what the behavior was before the new logic as of HEAD, and what the
> > behavior would become after the new logic.
>
> Sure, I can reshuffle that.

Here is it, but the results are pretty much expected.
>From 8e22941aab88976631729ed43e3d4afaf616b691 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Tue, 20 May 2025 16:12:05 +0200
Subject: [PATCH v3 1/3] Extend ARRAY squashing tests

Testing coverage for ARRAY expressions is not enough. Add more test
cases, similar to already existing ones.
---
 .../pg_stat_statements/expected/squashing.out | 184 ++
 contrib/pg_stat_statements/sql/squashing.sql  |  60 ++
 2 files changed, 244 insertions(+)

diff --git a/contrib/pg_stat_statements/expected/squashing.out 
b/contrib/pg_stat_statements/expected/squashing.out
index 7b138af098c..730a52d6917 100644
--- a/contrib/pg_stat_statements/expected/squashing.out
+++ b/contrib/pg_stat_statements/expected/squashing.out
@@ -429,3 +429,187 @@ SELECT query, calls FROM pg_stat_statements ORDER BY 
query COLLATE "C";
  SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
 (2 rows)
 
+-- Nested arrays are squashed only at constants level
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+];
+ array 

+---
+ 
{{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10}}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+   query| calls 
++---
+ SELECT ARRAY[ +| 1
+ ARRAY[$1 /*, ... */], +| 
+ ARRAY[$2 /*, ... */], +| 
+ ARRAY[$3 /*, ... */], +| 
+ ARRAY[$4 /*, ... */]  +| 
+ ]  | 
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Relabel type
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 
9::oid];
+array
+-
+ {1,2,3,4,5,6,7,8,9}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+   query| calls 
++---
+ SELECT ARRAY[$1 /*, ... */::oid]   | 1
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- Some casting expression are simplified to Const
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb,
+   ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb,
+   ( '"9"')::jsonb, ( '"10"')::jsonb
+];
+   array   
 
+
+ 
{"\"1\"","\"2\"","\"3\"","\"4\"","\"5\"","\"6\"","\"7\"","\"8\"","\"9\"","\"10\""}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+   query| calls 
++---
+ SELECT ARRAY[ +| 1
+ ($1 /*, ... */)::jsonb+| 
+ ]  | 
+ SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1
+(2 rows)
+
+-- CoerceViaIO
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT ARRAY[
+   1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype,
+   4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype,
+   7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype,
+   10::int4::casttesttype, 11::int4::casttesttype
+];
+   array   
+---
+ {1,2,3,4,5,6,7,8,9,10,11}
+(1 row)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+   query

Re: PG 18 release notes draft committed

2025-05-20 Thread Bruce Momjian

On Tue, May 20, 2025 at 05:15:54PM +0300, Nazir Bilal Yavuz wrote:
> > As of the date of the commit, "Co-authored-by:" is listed as:
> >
> > https://wiki.postgresql.org/wiki/Commit_Message_Guidance
> >
> > "Co-authored-by:" is used by committers when they want to give full 
> > credit
> > to the named individuals, but also indicate that they made 
> > significant
> > changes.
> >
> > > A minor fix; I co-authored this with Thomas Munro, he is the actual 
> > > author.
> >
> > Uh, does this mean I should add Thomas Munro before or after your name,
> > or remove your name and list only Thomas Munro?
> 
> Sorry for taking your time, I did not know that. Then, I am okay with
> how it is right now.

No problem.  The "Co-authored-by:" guidance was only written down in
January of this year.  I assume Thomas Munro was following that guidance
when he wrote the commit message.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: Proposal for enabling auto-vectorization for checksum calculations

2025-05-20 Thread Nazir Bilal Yavuz

Hi,

On Tue, 20 May 2025 at 02:54, Matthew Sterrett
 wrote:
>
> Hello! Thanks for helping me with this.
> I'm still trying to figure out what is going on with the Bookworm test
> failures. I'm pretty sure this patchset should resolve all the issues
> with the macOS build, but I don't think it will help the linux
> failures unfortunately.

You can see the failure at the artifacts ->
'log/tmp_install/log/install.log' file on the CI web page [1].

If you want to replicate that on your local:

$ ./configure --with-llvm CLANG="ccache clang-16"
$ make -s -j8 world-bin
$ make -j8 check-world

should be enough. I was able to replicate it with these commands. I
hope these help.

[1] https://cirrus-ci.com/task/4834162550505472

--
Regards,
Nazir Bilal Yavuz
Microsoft

Re: Log connection establishment timings

2025-05-20 Thread Melanie Plageman

On Tue, May 20, 2025 at 4:52 AM Peter Eisentraut 
wrote:

> log_connections has been changed from a Boolean parameter to a string
> one, but a number of places in the documentation and in various pieces
> of test code still use the old values.  I think it would be better if
> they were adjusted to the new style.
>
> There are two places in doc/src/sgml/config.sgml where
> log_connections=yes is used as an example.  This is a relatively
> prominent place, so it should not use deprecated values.
>
>
In earlier versions of my patch, I played around with replacing these
references in the docs. I ended up not doing it because I wasn't sure we
had consensus on deprecating the "on", "true", "yes" options and that we
would continue to support them indefinitely. Thinking about it now, by no
longer documenting "on" and "off", I was obviously deprecating them (not to
mention removing support for log_connections = "y", "ye", etc).

I'll write a patch to change these.

> In src/backend/tcop/postgres.c, there is a call
>
>  SetConfigOption("log_connections", "true", context, source);
>
> that could be adjusted.
>

Do you think the debug option should be 'all' or a list of the options
covered by "true" (which is a subset of 'all')?

> Various uses in test code:
>
> src/interfaces/libpq/t/005_negotiate_encryption.pl
> src/test/authentication/t/001_password.pl
> src/test/authentication/t/003_peer.pl
> src/test/authentication/t/005_sspi.pl
> src/test/authentication/t/007_pre_auth.pl
> src/test/kerberos/t/001_auth.pl
> src/test/ldap/t/001_auth.pl
> src/test/ldap/t/002_bindpasswd.pl
> src/test/modules/ldap_password_func/t/001_mutated_bindpasswd.pl
> src/test/modules/oauth_validator/t/001_server.pl
> src/test/modules/oauth_validator/t/002_client.pl
> src/test/postmaster/t/002_connection_limits.pl
> src/test/postmaster/t/003_start_stop.pl
> src/test/recovery/t/013_crash_restart.pl
> src/test/recovery/t/022_crash_temp_files.pl
> src/test/recovery/t/032_relfilenode_reuse.pl
> src/test/recovery/t/037_invalid_database.pl
> src/test/ssl/t/SSL/Server.pm
> src/tools/ci/pg_ci_base.conf
>
> I suspect in some of these cases using one of the new more granular
> values would be appropriate?  This could also serve as examples and
> testing of the parameter itself.

Yep, for these, some of them specifically parse and test output generated
by one or more of the log_connections options. In those cases, obviously
those options should be provided at a minimum. However, I assume, based on
how I use the logs from TAP tests, that when a test fails, folks use the
logging to understand more about what went wrong even when the log output
is not specifically parsed and tested against.
The reason I didn't change these was that I was unsure which of these tests
would want which options.

And I didn't want to risk increasing the volume of logging output by
replacing any of them with "all".

I can, of course, go through and make my best guess. Or I could just
replace them with the equivalent subset of options.

In the absence of clarity on this, I did add tests to 001_password.pl
specifically exercising and testing combinations of log_connections options.

- Melanie

Re: strange perf regression with data checksums

2025-05-20 Thread Peter Geoghegan

On Tue, May 20, 2025 at 8:43 AM Peter Geoghegan  wrote:
> I imagine bitmap index scans will be similar to plain index scans.

To be clear, I meant that the magnitude of the problem will be
similar. I do not mean that they aren't fixed by my v2. They should be
fully fixed by v2.

-- 
Peter Geoghegan

Re: generic plans and "initial" pruning

2025-05-20 Thread Amit Langote

Hi Tom,

On Tue, May 20, 2025 at 12:06 PM Tom Lane  wrote:
> My attention was drawn to commit 525392d57 after observing that
> Valgrind complained about a memory leak in some code that commit added
> to BuildCachedPlan().  I tried to make sense of said code so I could
> remove the leak, and eventually arrived at the attached patch, which
> is part of a series of leak-fixing things hence the high sequence
> number.
>
> Unfortunately, the bad things I speculated about in the added comments
> seem to be reality.  The second attached file is a test case that
> triggers
>
> TRAP: failed Assert("list_length(plan_list) == 
> list_length(plan->stmt_list)"), File: "plancache.c", Line: 1259, PID: 602087
>
> because it adds a DO ALSO rule that causes the rewriter to generate
> more PlannedStmts than it did before.
>
> This is quite awful, because it does more than simply break the klugy
> (and undocumented) business about keeping the top-level List in a
> different context.  What it means is that any outside code that is
> busy iterating that List is very fundamentally broken: it's not clear
> what List index it ought to resume at, except that "the one it was at"
> is demonstrably incorrect.
>
> I also don't really believe the (also undocumented) assumption that
> such outside code is in between executions of PlannedStmts of the
> List and hence can tolerate those being ripped out and replaced.
> I have not attempted to build an example, because the one I have
> seems sufficiently damning.  But I bet that a recursive function
> could be constructed in such a way that an outer execution is
> still in progress when an inner call triggers UpdateCachedPlan.
>
> Another small problem (much more easily fixable than the above,
> probably) is that summarily setting "plan->is_valid = true"
> at the end is not okay.  We could already have received an
> invalidation that should result in marking the plan stale.
> (Holding locks on the tables involved is not sufficient to
> prevent that, as there are other sources of inval events.)

Thanks for pointing out the hole in the current handling of
CachedPlan->stmt_list. You're right that the approach of preserving
the list structure while replacing its contents in-place doesn’t hold
up when the rewriter adds or removes statements dynamically. There
might be other cases that neither of us have tried.  I don’t think
that mechanism is salvageable.

To address the issue without needing a full revert, I’m considering
dropping UpdateCachedPlan() and removing the associated MemoryContext
dance to preserve CachedPlan->stmt_list structure. Instead, the
executor would replan the necessary query into a transient list of
PlannedStmts, leaving the original CachedPlan untouched. That avoids
mutating shared plan state during execution and still enables deferred
locking in the vast majority of cases.

There are two variants of this approach. In the simpler form, the
transient PlannedStmt list exists only in executor-local memory and
isn’t registered with the invalidation machinery. That might be
acceptable in practice, since all referenced relations are locked at
that point -- but it would mean any invalidation events delivered
during execution are ignored. The more robust variant is to build a
one-query standalone CachedPlan using something like
GetTransientCachedPlanForQuery(), which I had proposed back in [1].
This gets added to a standalone_plan_list so that invalidation
callbacks can still reach it. I dropped that design earlier [2] due to
the cleanup overhead, but I’d be happy to bring it back in a
simplified form if that seems preferable.

One open question in either case is what to do if the number of
PlannedStmts in the rewritten plan changes as with your example. Would
it be reasonable to just go ahead and execute the additional
statements from the transient plan, even though the original
CachedPlan wouldn’t have known about them until the next use? That
would avoid introducing any new failure behavior while still handling
the invalidation correctly for the current execution.

> It's possible that this code can be fixed, but I fear it's
> going to involve some really fundamental redesign, which
> probably shouldn't be happening after beta1.  I think there
> is no alternative but to revert for v18.

...Beyond that, I think I’ve run out of clean options for making
deferred locking executor-local while keeping invalidation safe. I
know you'd previously objected (with good reason) to making
GetCachedPlan() itself run pruning logic to determine which partitions
to lock -- and to the idea of carrying or sharing the result of that
pruning back to the executor via interface changes in the path from
plancache.c through its callers down to ExecutorStart(). So I’ve
steered away from revisiting that direction. But if we’re not
comfortable with either of the transient replanning options, then we
may end up shelving the deferred locking idea entirely -- which would
be unfortunate, given how much it helps

Re: PG 18 release notes draft committed

2025-05-20 Thread Bruce Momjian

On Tue, May 20, 2025 at 03:46:44PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> Thanks for working on this!
> 
> On Fri, 2 May 2025 at 05:44, Bruce Momjian  wrote:
> >
> > I will continue improving it until beta 1, and until the final release.
> > I will probably add markup in 1-3 weeks.  Let the feedback begin.  ;-)
> 
> +
> +
> +Add server variable file_copy_method to control the file copying
> method (Nazir Bilal Yavuz)
> +§
> +

Uh, the commit is:

commit f78ca6f3ebb
Author: Thomas Munro 
Date:   Tue Apr 8 20:52:47 2025 +1200

Introduce file_copy_method setting.

It can be set to either COPY (the default) or CLONE if the system
supports it.  CLONE causes callers of copydir(), currently CREATE
DATABASE ... STRATEGY=FILE_COPY and ALTER DATABASE ... SET 
TABLESPACE =
..., to use copy_file_range (Linux, FreeBSD) or copyfile (macOS) to 
copy
files instead of a read-write loop over the contents.

CLONE gives the kernel the opportunity to share block ranges on
copy-on-write file systems and push copying down to storage on 
others,
depending on configuration.  On some systems CLONE can be used to 
clone
large databases quickly with CREATE DATABASE ... TEMPLATE=source
STRATEGY=FILE_COPY.

Other operating systems could be supported; patches welcome.

Co-authored-by: Nazir Bilal Yavuz 
Reviewed-by: Robert Haas 
Reviewed-by: Ranier Vilela 
Discussion: 
https://postgr.es/m/CA%2BhUKGLM%2Bt%2BSwBU-cHeMUXJCOgBxSHLGZutV5zCwY4qrCcE02w%40mail.gmail.com

As of the date of the commit, "Co-authored-by:" is listed as:

https://wiki.postgresql.org/wiki/Commit_Message_Guidance

"Co-authored-by:" is used by committers when they want to give full 
credit
to the named individuals, but also indicate that they made significant
changes.

> A minor fix; I co-authored this with Thomas Munro, he is the actual author.

Uh, does this mean I should add Thomas Munro before or after your name,
or remove your name and list only Thomas Munro?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: Regression in statement locations

2025-05-20 Thread David Steele


On 5/20/25 07:34, Sami Imseih wrote:

Tested the patch and it looks good to me.

Not that I thought it would fail, but I also confirmed the pgaudit case
works as expected.


I also tested and everything looks good with the patch.

I did a careful examination of the remaining diffs (there are quite a 
few) and in every case I consider them to be beneficial, i.e. they make 
the output more targeted and readable.


I did not do a real code review, but I did notice that the test table 
column is called weird_name as in our tests. I would argue that since it 
is missing the quotes and space it is not really all that weird and 
should maybe get a normal name so developers in the future don't wonder 
what is weird about it.


Thank you for this improvement and the quick fix!

Regards,
-David

Re: PG 18 release notes draft committed

2025-05-20 Thread Nazir Bilal Yavuz

Hi,

On Tue, 20 May 2025 at 16:52, Bruce Momjian  wrote:
>
> On Tue, May 20, 2025 at 03:46:44PM +0300, Nazir Bilal Yavuz wrote:
> > Hi,
> >
> > Thanks for working on this!
> >
> > On Fri, 2 May 2025 at 05:44, Bruce Momjian  wrote:
> > >
> > > I will continue improving it until beta 1, and until the final release.
> > > I will probably add markup in 1-3 weeks.  Let the feedback begin.  ;-)
> >
> > +
> > +
> > +Add server variable file_copy_method to control the file copying
> > method (Nazir Bilal Yavuz)
> > +§
> > +
>
> Uh, the commit is:
>
> commit f78ca6f3ebb
> Author: Thomas Munro 
> Date:   Tue Apr 8 20:52:47 2025 +1200
>
> Introduce file_copy_method setting.
>
> It can be set to either COPY (the default) or CLONE if the system
> supports it.  CLONE causes callers of copydir(), currently CREATE
> DATABASE ... STRATEGY=FILE_COPY and ALTER DATABASE ... SET 
> TABLESPACE =
> ..., to use copy_file_range (Linux, FreeBSD) or copyfile (macOS) 
> to copy
> files instead of a read-write loop over the contents.
>
> CLONE gives the kernel the opportunity to share block ranges on
> copy-on-write file systems and push copying down to storage on 
> others,
> depending on configuration.  On some systems CLONE can be used to 
> clone
> large databases quickly with CREATE DATABASE ... TEMPLATE=source
> STRATEGY=FILE_COPY.
>
> Other operating systems could be supported; patches welcome.
>
> Co-authored-by: Nazir Bilal Yavuz 
> Reviewed-by: Robert Haas 
> Reviewed-by: Ranier Vilela 
> Discussion: 
> https://postgr.es/m/CA%2BhUKGLM%2Bt%2BSwBU-cHeMUXJCOgBxSHLGZutV5zCwY4qrCcE02w%40mail.gmail.com
>
> As of the date of the commit, "Co-authored-by:" is listed as:
>
> https://wiki.postgresql.org/wiki/Commit_Message_Guidance
>
> "Co-authored-by:" is used by committers when they want to give full 
> credit
> to the named individuals, but also indicate that they made significant
> changes.
>
> > A minor fix; I co-authored this with Thomas Munro, he is the actual author.
>
> Uh, does this mean I should add Thomas Munro before or after your name,
> or remove your name and list only Thomas Munro?

Sorry for taking your time, I did not know that. Then, I am okay with
how it is right now.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: PG 18 release notes draft committed

2025-05-20 Thread Nazir Bilal Yavuz

Hi,

Thanks for working on this!

On Fri, 2 May 2025 at 05:44, Bruce Momjian  wrote:
>
> I will continue improving it until beta 1, and until the final release.
> I will probably add markup in 1-3 weeks.  Let the feedback begin.  ;-)

+
+
+Add server variable file_copy_method to control the file copying
method (Nazir Bilal Yavuz)
+§
+

A minor fix; I co-authored this with Thomas Munro, he is the actual author.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Re: RFC: Logging plan of the running query

2025-05-20 Thread torikoshia

On Sat, Apr 5, 2025 at 3:14 PM Atsushi Torikoshi 
 wrote:
On Thu, Apr 3, 2025 at 11:10 PM Robert Haas  
wrote:

Do we really need ExecProcNodeOriginal? Can we find some way to reuse
ExecProcNodeReal instead of making the structure bigger?


I also wanted to implement this without adding elements to PlanState if 
possible, but I haven't found a good solution, so the patch uses 
ExecSetExecProcNode.


I tackled this again and the attached patch removes ExecProcNodeOriginal 
from Planstate.
Instead of adding a new field, this version builds the behavior into the 
existing wrapper function, ExecProcNodeFirst().


Since ExecProcNodeFirst() is already handling instrumentation-related 
logic, the patch has maybe become a bit more complex to accommodate both 
that and the new behavior.


While it might make sense to introduce a more general mechanism that 
allows for stacking an arbitrary number of wrappers around ExecProcNode, 
I’m not sure it's possible or worth the added complexity—such layered 
wrapping doesn't seem like something we typically need.


What do you think?

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.From 41944eb943f8f6b2fb731125ed0d50ad29bbd338 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Tue, 20 May 2025 22:01:40 +0900
Subject: [PATCH v45] Add function to log the plan of the currently running
 query

Currently, we have to wait for the query execution to finish
to know its plan either using EXPLAIN ANALYZE or auto_explain.
This is not so convenient, for example when investigating
long-running queries on production environments.
To improve this situation, this patch adds pg_log_query_plan()
function that requests to log the plan of the currently
executing query.

On receipt of the request, codes for logging plan is wrapped
to every ExecProcNode under the current executing plan node,
and when the executor runs one of ExecProcNode, the plan is
actually logged. These wrappers are unwrapped when once the
plan is logged.
In this way, we can avoid adding the overhead which we'll face
when adding CHECK_FOR_INTERRUPTS() like mechanisms in somewhere
in executor codes safely.

Our initial idea was to send a signal to the target backend
process, which invokes EXPLAIN logic at the next
CHECK_FOR_INTERRUPTS() call. However, we realized during
prototyping that EXPLAIN is complex and may not be safely
executed at arbitrary interrupt points.

By default, only superusers are allowed to request to log the
plans because allowing any users to issue this request at an
unbounded rate would cause lots of log messages and which can
lead to denial of service.
---
 contrib/auto_explain/auto_explain.c  |  24 +-
 doc/src/sgml/func.sgml   |  55 +++
 src/backend/access/transam/xact.c|  13 +
 src/backend/catalog/system_functions.sql |   2 +
 src/backend/commands/Makefile|   1 +
 src/backend/commands/dynamic_explain.c   | 387 +++
 src/backend/commands/explain.c   |  38 +-
 src/backend/commands/meson.build |   1 +
 src/backend/executor/execMain.c  |  10 +
 src/backend/executor/execProcnode.c  |  13 +-
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/tcop/postgres.c  |   4 +
 src/backend/utils/init/globals.c |   2 +
 src/include/catalog/pg_proc.dat  |   6 +
 src/include/commands/dynamic_explain.h   |  28 ++
 src/include/commands/explain.h   |   5 +
 src/include/commands/explain_state.h |   1 +
 src/include/executor/executor.h  |   1 +
 src/include/miscadmin.h  |   1 +
 src/include/storage/procsignal.h |   2 +
 src/include/tcop/pquery.h|   1 -
 src/test/regress/expected/misc_functions.out |  54 ++-
 src/test/regress/sql/misc_functions.sql  |  41 +-
 23 files changed, 648 insertions(+), 46 deletions(-)
 create mode 100644 src/backend/commands/dynamic_explain.c
 create mode 100644 src/include/commands/dynamic_explain.h

diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index cd6625020a..e720ddf39f 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -15,6 +15,7 @@
 #include 
 
 #include "access/parallel.h"
+#include "commands/dynamic_explain.h"
 #include "commands/explain.h"
 #include "commands/explain_format.h"
 #include "commands/explain_state.h"
@@ -412,26 +413,9 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
 			es->format = auto_explain_log_format;
 			es->settings = auto_explain_log_settings;
 
-			ExplainBeginOutput(es);
-			ExplainQueryText(es, queryDesc);
-			ExplainQueryParameters(es, queryDesc->params, auto_explain_log_parameter_max_length);
-			ExplainPrintPlan(es, queryDesc);
-			if (es->analyze && auto_explain_log_triggers)
-ExplainPrintTriggers(es, queryDesc);
-			if (es->costs)
-ExplainPrintJITSummary(es, quer

Re: Violation of principle that plan trees are read-only

2025-05-20 Thread Andres Freund

Hi,

On 2025-05-18 19:31:33 -0400, Tom Lane wrote:
> While chasing down Valgrind leakage reports, I was disturbed
> to realize that some of them arise from a case where the
> executor scribbles on the plan tree it's given, which it is
> absolutely not supposed to do:
> 
> /*
>  * Initialize result tuple slot and assign its rowtype using the first
>  * RETURNING list.  We assume the rest will look the same.
>  */
> mtstate->ps.plan->targetlist = (List *) linitial(returningLists);
> 
> A bit of git archaeology fingers Andres' commit 4717fdb14, which we
> can't easily revert since he later got rid of ExecAssignResultType
> altogether.  But I think we need to do something about it --- it's
> purest luck that this doesn't cause serious problems in some cases.

I have no idea what I was smoking at that time, this clearly is wrong.

I think the reason it doesn't cause problems is that we're just using the
first child's targetlist every time and we're just assigning the same value
back every time.

What's even more absurd is: Why do we even need to assign the result type at
all? Before & after 4717fdb14. The planner ought to have already figured this
out, no?

It seems that if not, we'd have a problem anyway, who says the "calling" nodes
(say a wCTE) can cope with whatever output we come up with?

Except of course, there is exactly one case in our tests where the tupledescs
aren't equal :(

I've not dug fully into this, but I thought I should send this email to
confirm that I'm looking into the issue.

Greetings,

Andres Freund

Re: Re: proposal: schema variables

2025-05-20 Thread Bruce Momjian

On Thu, May 15, 2025 at 08:48:37AM +0200, Pavel Stehule wrote:
> Hi
> 
> fresh rebase

I will again ask why this patch set is being reposted when there is no
plan to apply it to git master?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: Please update the pgconf.dev Unconference notes

2025-05-20 Thread Andrei Lepikhov


On 20/5/2025 08:49, Hannu Krosing wrote:

Hi to all note-takers

(I added two who I *think* I remember took notes)

Please upload the notes to the Unconference section in
https://wiki.postgresql.org/wiki/PGConf.dev_2025

I have also some random notes on scraps of paper from the two sessions
I attended and did not present and would like to add these if I think
something relevant is missing once the main notes are there
I am glad to see that, discussing pg_stat_statements, the community in 
Canada have come to the exact scope of issues as recently at 
PGConf.DE.2025. I recall we also concluded that moving to the core is 
too much and discussed the idea of letting another module filter 
queries/columns/length of the query string to store in 
pg_stat_statements through hooks - 'extend the extension'.


--
regards, Andrei Lepikhov

Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

2025-05-20 Thread Jim Jones

Hi!

On 31.03.25 13:22, Yugo Nagata wrote:
> On Mon, 31 Mar 2025 20:00:57 +0900
> Yugo Nagata  wrote:
>
>> Hi,
>>
>> I found that multiple sessions concurrently execute CREATE OR REPLACE 
>> FUNCTION
>> for a same function, the error "tuple concurrently updated" is raised. This 
>> is
>> an internal error output by elog, also the message is not user-friendly.
>>
>> I've attached a patch to prevent this internal error by locking an exclusive
>> lock before the command and get the read tuple after acquiring the lock.
>> Also, if the function has been removed during the lock waiting, the new entry
>> is created.
> I also found the same error is raised when concurrent ALTER FUNCTION commands 
> are
> executed. I've added a patch to fix this in the similar way.
>
> Regards,
> Yugo Nagata


I just briefly tested this patch and it seems to work as expected for
CREATE OF REPLACE FUNCTION:

-- Session 1 (t1):

postgres=# BEGIN;
BEGIN
postgres=*# CREATE OR REPLACE FUNCTION f1()
RETURNS INT LANGUAGE plpgsql AS
$$ BEGIN RETURN 1; END;$$;
CREATE FUNCTION

-- Session 2 (t2)

postgres=# CREATE OR REPLACE FUNCTION f1()
RETURNS INT LANGUAGE plpgsql AS
$$ BEGIN RETURN 2; END;$$;

(wait)

-- Session 3 (t3)

postgres=# CREATE OR REPLACE FUNCTION f1()
RETURNS INT LANGUAGE plpgsql AS
$$ BEGIN RETURN 3; END;$$;

(wait)

-- Session 4 (t4)

postgres=# CREATE OR REPLACE FUNCTION f1()
RETURNS INT LANGUAGE plpgsql AS
$$ BEGIN RETURN 4; END;$$;
CREATE FUNCTION

(wait)

-- Session 1 (t5)

postgres=*# END;
COMMIT

at this point Sessions 2, 3, and 4 were released with: CREATE FUNCTION

-- Session 1 (t6)

postgres=# \sf f1
CREATE OR REPLACE FUNCTION public.f1()
 RETURNS integer
 LANGUAGE plpgsql
AS $function$ BEGIN RETURN 4; END;$function$

So... it no longer shows the error message:

ERROR:  tuple concurrently updated

I did the same for ALTER FUNCTION but I was unable to reproduce the
error your reported. Could you provide your script?


Best regards, Jim

Re: Add comment explaining why queryid is int64 in pg_stat_statements

2025-05-20 Thread Sami Imseih

Supporting unsigned INTs will be valuable for more than just this
obviously, and if we do
ever have that capability, we would likely want to go that route. I
know I've been asked about
why queryIds could be negative more than a few times. Until then, I
think the patch being
suggested is reasonable and cleaner.

A few comments on the patches:

1/ this was missed in pg_stat_statements
./contrib/pg_stat_statements/pg_stat_statements.c:uint64
saved_queryId = pstmt->queryId;

2/ Should "DatumGetInt64(hash_any_extended(.." be turned into a
macro since it's used in
several places? Also, we can add a comment in the macro such as
"
Output the queryId as an int64 rather than a uint64, to match the
BIGINT column used to emit
queryId in system views
"

I think this comment is needed to address the confusion that is the
original subject of this thread. Otherwise,
the confusion will be moved from pg_stat_statements.c to queryjumblefuncs.c

DoJumble(JumbleState *jstate, Node *node)
{
/* Jumble the given node */
@@ -208,9 +208,9 @@ DoJumble(JumbleState *jstate, Node *node)
FlushPendingNulls(jstate);
/* Process the jumble buffer and produce the hash value */
- return DatumGetUInt64(hash_any_extended(jstate->jumble,
- jstate->jumble_len,
- 0));
+ return DatumGetInt64(hash_any_extended(jstate->jumble,
+ jstate->jumble_len,
+ 0));
}
/*
@@ -256,10 +256,10 @@ AppendJumbleInternal(JumbleState *jstate, const
unsigned char *item,
if (unlikely(jumble_len >= JUMBLE_SIZE))
{
- uint64 start_hash;
+ int64 start_hash;
- start_hash = DatumGetUInt64(hash_any_extended(jumble,
- JUMBLE_SIZE, 0));
+ start_hash = DatumGetInt64(hash_any_extended(jumble,
+ JUMBLE_SIZE, 0));
memcpy(jumble, &start_hash, sizeof(start_hash));
jumble_len = sizeof(start_hash);

--
Sami Imseih
Amazon Web Services (AWS)

Re: Re: proposal: schema variables

2025-05-20 Thread Marcos Pegoraro

Em ter., 20 de mai. de 2025 às 11:56, Bruce Momjian 
escreveu:

> I will again ask why this patch set is being reposted when there is no
> plan to apply it to git master


Too bad. I would love to have this functionality, from the user's point of
view there are problems where it would solve them wonderfully. I don't know
technically of what prevents it from being natively on core, but it would
be great, it would definitely be.

regards
Marcos

Re: Capturing both IP address and hostname in the log

2025-05-20 Thread Melanie Plageman

On Thu, Apr 10, 2025 at 12:00 PM Tom Lane  wrote:

> Melanie recently committed a patch (9219093ca) that purports to
> generalize our log_connections logging ability:
>
> Convert the boolean log_connections GUC into a list GUC comprised of
> the
> connection aspects to log.
>
> This gives users more control over the volume and kind of connection
> logging.
>
> The current log_connections options are 'receipt', 'authentication',
> and
> 'authorization'. The empty string disables all connection logging.
> 'all'
> enables all available connection logging.
>
> I wonder if it'd be reasonable to remove the separate log_hostname GUC
> and fold it into this infrastructure, and while doing so make it
> possible to log either or both of the client IP address and hostname.
> (For that matter, I think there is interest in being able to capture
> the server IP address too, cf 3516ea768.  You might wish to log the
> IP address only once, not in every log line.)

Seems reasonable to me. I'd be willing to move such a thing forward but
would want to see the feature requestors' specific desired behavior (e.g.
an option for each of hostname, client ip address, and server address?).
Also, if they write a WIP patch at least to config.sgml, it would also help
me gauge how serious of a request it is or, rather, how satisfactory a
solution a log_connections option is.

Perhaps there are people who absolutely love log_hostname and don't want to
see it deprecated as a separate GUC?

I also think folding in log_disconnections as a "disconnection" option
makes sense. I do have some anxiety that a very long list of options will
anger users -- but I suppose that ship mostly sailed.

- Melanie

Re: Violation of principle that plan trees are read-only

2025-05-20 Thread Andres Freund

Hi,

On 2025-05-20 13:04:46 -0400, Tom Lane wrote:
> I wrote:
> > I think we can just delete this assignment (and fix these comments).
>
> As attached.

Largely makes sense, the only thing I see is that the !returningLists branch
does:
/*
 * We still must construct a dummy result tuple type, because 
InitPlan
 * expects one (maybe should change that?).
 */
mtstate->ps.plan->targetlist = NIL;

which we presumably shouldn't do anymore either.  It never changes anything
afaict, but still.

> I'm tempted to back-patch this: the plan tree damage seems harmless at
> present, but maybe it'd become less harmless with future fixes.

I am not sure, I can see arguments either way. 

There are *some* cases where this changes the explain output, but but the new
output is more correct, I think:

--- /tmp/a.out  2025-05-20 14:19:04.947945026 -0400
+++ /tmp/b.out  2025-05-20 14:19:12.878975702 -0400
@@ -18,7 +18,7 @@
  QUERY PLAN
 -
  Update on public.part_abc
-   Output: part_abc_1.a, part_abc_1.b, part_abc_1.c
+   Output: a, b, c
Update on public.part_abc_2 part_abc_1
->  Append
  Subplans Removed: 1

I suspect this is an argument for backpatching, not against - seems that
deparsing could end up creating bogus output in cases where it could matter?
Not sure if such cases are reachable via views (and thus pg_dump) or
postgres_fdw, but it seems possible.

Greetings,

Andres Freund

Re: Violation of principle that plan trees are read-only

2025-05-20 Thread Tom Lane

I wrote:
> I think we can just delete this assignment (and fix these comments).

As attached.  I'm tempted to back-patch this: the plan tree damage
seems harmless at present, but maybe it'd become less harmless with
future fixes.

regards, tom lane

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 46d533b7288..71bc174cfc9 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4830,12 +4830,11 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		ExprContext *econtext;
 
 		/*
-		 * Initialize result tuple slot and assign its rowtype using the first
-		 * RETURNING list.  We assume the rest will look the same.
+		 * Initialize result tuple slot and assign its rowtype using the plan
+		 * node's declared targetlist, which the planner set up to be the same
+		 * as the first RETURNING list.  We assume the rest will produce
+		 * compatible output.
 		 */
-		mtstate->ps.plan->targetlist = (List *) linitial(returningLists);
-
-		/* Set up a slot for the output of the RETURNING projection(s) */
 		ExecInitResultTupleSlotTL(&mtstate->ps, &TTSOpsVirtual);
 		slot = mtstate->ps.ps_ResultTupleSlot;
 
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 150e9f060ee..8e1eb77dd49 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1097,9 +1097,10 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 
 	/*
 	 * Set up the visible plan targetlist as being the same as
-	 * the first RETURNING list. This is for the use of
-	 * EXPLAIN; the executor won't pay any attention to the
-	 * targetlist.  We postpone this step until here so that
+	 * the first RETURNING list.  This is mostly for the use
+	 * of EXPLAIN; the executor won't execute that targetlist,
+	 * although it does use it to prepare the node's result
+	 * tuple slot.  We postpone this step until here so that
 	 * we don't have to do set_returning_clause_references()
 	 * twice on identical targetlists.
 	 */

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-05-20 Thread Peter Geoghegan

On Mon, May 12, 2025 at 8:58 AM Peter Geoghegan  wrote:
> I wonder if we can fix this problem by getting rid of the old support
> routine #5, "options". It currently doesn't do anything, and I always
> thought it was strange that it was added "for uniformity" with other
> index AMs.

Attached patch completely removes the nbtree "options" support
function, while changing the support function number of skip support:
it becomes support function #5 (the number previously used by
"options"). This patch should fix the regression that Tomas complained
about in an expedient way.

It's likely that somebody else will run into the same problem in the
future, the next time that a new support function is needed. But I
think that it makes sense to do this much now -- we need a short term
solution for Postgres 18. Usually I would never suggest breaking
compatibility like this, but, remarkably, we have never actually done
anything with our current support function 5. It's not possible to
break compatibility with code that can never be called in the first
place, so I see no compatibility to preserve.

Questions for Alexander about the "options" support function:

* Why did you invent the whole idea of an "options" support function,
given that it doesn't actually do anything? I get that it might be a
good idea to add these kinds of functions in the future, but why
didn't you wait until nbtree *actually had a use* for them?

* I've removed some of the tests that you added, that (for whatever
reason) cover nbtree specifically. The test from alter_generic.sql.
There might be some kind of loss of test coverage. What do you think?

-- 
Peter Geoghegan

v1-0001-Remove-OPTIONS-support-proc-from-nbtree.patch
Description: Binary data

Re: Please update the pgconf.dev Unconference notes

2025-05-20 Thread Michael Paquier

Hi Hannu,

On Tue, May 20, 2025 at 08:49:28AM +0200, Hannu Krosing wrote:
> (I added two who I *think* I remember took notes)
> 
> Please upload the notes to the Unconference section in
> https://wiki.postgresql.org/wiki/PGConf.dev_2025
> 
> I have also some random notes on scraps of paper from the two sessions
> I attended and did not present and would like to add these if I think
> something relevant is missing once the main notes are there

Thanks Hannu for the reminder.  I will add my notes in a few days,
with as well as a short translation of the slides I was not able to
show during the "TOAST fixing" session.  These were just structure
suggestions for the last bit case in toast headers, with a few notes
written at 5AM the morning before the unconference.

I would also be really interested in getting a copy of the slides you
have yourself produced on the matter for the session.  Could you send
me a PDF in private at least?  Let's say that you have convinced me in
helping with a review of what you have planned, as far as I
understand, for the v19 cycle on the matter with the OID -> TID lookup
replacement logic for the external toast pointers.
--
Michael


signature.asc
Description: PGP signature

Re: fixing CREATEROLE

2025-05-20 Thread Greg Sabino Mullane

On Tue, May 20, 2025 at 2:32 PM Robert Haas  wrote:

> trying to create a role that already exists. At this point, my head is
> already kind of exploding, because I thought we were pretty careful to
> try to make it so that pg_dump output can be restored without error even
> in the face of pre-existing objects like the public schema and
> the plpgsql language, but apparently we haven't applied the same principle
> to pg_dumpall.[1]
>

This has always been my understanding, even if we are not explicitly
stating it anywhere. pg_dump -> no errors. pg_dumpall -> always at least
one error :)

But if, as you posit above, we were to try running the output of pg_dumpall
> through psql as a non-superuser, the problem is a whole lot
> worse.

I'm of the camp that pg_dumpall should almost always be run as superuser.
That said, I find myself using pg_dumpall less and less with every year,
and cannot think of the last time I advised a client to use it (other than
a pg_dumpall --globals and ignore the errors as a poor-man's role
duplication system. Even that is getting rarer, as we generally don't want
the same passwords)

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

Re: Regression in statement locations

2025-05-20 Thread Michael Paquier

On Tue, May 20, 2025 at 02:04:20PM +, David Steele wrote:
> I did a careful examination of the remaining diffs (there are quite a few)
> and in every case I consider them to be beneficial, i.e. they make the
> output more targeted and readable.
> 
> I did not do a real code review, but I did notice that the test table column
> is called weird_name as in our tests. I would argue that since it is missing
> the quotes and space it is not really all that weird and should maybe get a
> normal name so developers in the future don't wonder what is weird about it.

I have fixed that, as it is not a weird attribute, removed the
unnedeed DO blocks in the tests as pointed out by Anthonin, and moved
one comment as pointed out by Jian.

> Thank you for this improvement and the quick fix!

Yeah, thanks all for pointing out that sometimes my analysis of things
can go off tracks.

The fix has been applied now on HEAD.  I've also checked the state of
pgaudit on branch dev-pg18, with the regression getting fixed.  Things
look clear now, at least from my side.
--
Michael


signature.asc
Description: PGP signature

Minor adjustment to pg_aios output naming

2025-05-20 Thread torikoshia


Hi,

I've noticed a minor inconsistency in the output of pg_aios.

According to the documentation, the values for 'operation' are described 
as:


  readv, a vectored read
  ...
  writev, a vectored write

However, in actual output, they appear as read and write -- without the 
trailing v:


  =# select operation from pg_aios;
  ..
  operation   | read

While this discrepancy is unlikely to cause any real issues, it would be 
better to keep the naming consistent.


I was a bit unsure which form to align to.
There are currently no other types of read/write operations in pg_aios, 
so the shorter form might have been sufficient.
However, using the 'v'-suffixed names makes the vectored nature of these 
operations explicit and future-proofs

the naming in case other variants are introduced later.


What do you think?

Best regards,

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.From 71c5d0b8f1a67026e8db2e7ee27ce18f6e0689e0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Wed, 21 May 2025 10:50:53 +0900
Subject: [PATCH v1] Adjust pg_aios operation names to match documentation

pg_aios output currently shows 'read' and 'write' for
vectored I/O operations, while the documentation
refers to them as 'readv' and 'writev'.
This patch updates the output to use the 'v'-suffixed
forms for consistency.
---
 src/backend/storage/aio/aio_io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/aio/aio_io.c b/src/backend/storage/aio/aio_io.c
index 00e176135a..520b5077df 100644
--- a/src/backend/storage/aio/aio_io.c
+++ b/src/backend/storage/aio/aio_io.c
@@ -181,9 +181,9 @@ pgaio_io_get_op_name(PgAioHandle *ioh)
 		case PGAIO_OP_INVALID:
 			return "invalid";
 		case PGAIO_OP_READV:
-			return "read";
+			return "readv";
 		case PGAIO_OP_WRITEV:
-			return "write";
+			return "writev";
 	}
 
 	return NULL;/* silence compiler */

base-commit: a6060f1cbec39575634043baeeaeb11e86042fa6
-- 
2.43.0

Re: Assert("vacrel->eager_scan_remaining_successes > 0")

2025-05-20 Thread Melanie Plageman

On Tue, May 20, 2025 at 5:23 PM Masahiko Sawada 
wrote:

>
> On Fri, May 2, 2025 at 11:59 AM Masahiko Sawada 
> wrote:
> >
> > However, there is a possibility that we have already eagerly scanned
> > another page and returned it to the read stream before we freeze the
> > eagerly-scanned page and disable the eager scan. In this case, the
> > next block that we retrieved from the read stream could also be an
> > eagerly-scanned page.
>
> I've added it to Open Items for v18.
>
> If I understand correctly, there's a potential scenario where we might
> eagerly scan more pages than permitted by the success and failure
> caps. One question is: what is the practical magnitude of this excess
> scanning? If the overflow could be substantial (for instance, eagerly
> scanning 30% of table pages), we should consider revising our eager
> scanning mechanism.

Thanks for reporting this. Sorry I missed it initially.

I need to do some more investigation, but IIUC you are saying that this is
an interaction between the read stream and eager scan code? I tried to
confirm that was the case by setting io_combine_limit and
maintenance_io_concurrency to 1, which should be similar behavior to not
using the read stream WRT read combining etc. But, even doing so, your
repro tripped the assert. What made you suspect an interaction with the
read stream?

- Melanie

Re: Assert("vacrel->eager_scan_remaining_successes > 0")

2025-05-20 Thread Masahiko Sawada

On Tue, May 20, 2025 at 3:22 PM Melanie Plageman
 wrote:
>
>
> On Tue, May 20, 2025 at 5:23 PM Masahiko Sawada  wrote:
>>
>>
>> On Fri, May 2, 2025 at 11:59 AM Masahiko Sawada  
>> wrote:
>> >
>> > However, there is a possibility that we have already eagerly scanned
>> > another page and returned it to the read stream before we freeze the
>> > eagerly-scanned page and disable the eager scan. In this case, the
>> > next block that we retrieved from the read stream could also be an
>> > eagerly-scanned page.
>>
>> I've added it to Open Items for v18.
>>
>> If I understand correctly, there's a potential scenario where we might
>> eagerly scan more pages than permitted by the success and failure
>> caps. One question is: what is the practical magnitude of this excess
>> scanning? If the overflow could be substantial (for instance, eagerly
>> scanning 30% of table pages), we should consider revising our eager
>> scanning mechanism.
>
>
> Thanks for reporting this. Sorry I missed it initially.
>
> I need to do some more investigation, but IIUC you are saying that this is an 
> interaction between the read stream and eager scan code?

Yes.

> I tried to confirm that was the case by setting io_combine_limit and 
> maintenance_io_concurrency to 1, which should be similar behavior to not 
> using the read stream WRT read combining etc. But, even doing so, your repro 
> tripped the assert. What made you suspect an interaction with the read stream?

While I haven't identified how exactly read stream is related to this
issue, what I've observed through debugging this issue is, during a
single read_stream_next_buffer() call, I observed that
heap_vac_scan_next_block() is invoked multiple times to return blocks
to the read stream and that we continued processing eagerly scanned
pages even after the success counter reaches zero while processing the
previous block. Even when I set both io_combine_limit and
maintenance_io_concurrency to 1, the debug logs showed that vacuum
still performs eager scanning two blocks ahead of the current block
being processed.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Make wal_receiver_timeout configurable per subscription

2025-05-20 Thread Masahiko Sawada

On Tue, May 20, 2025 at 2:13 AM vignesh C  wrote:
>
> On Tue, 20 May 2025 at 03:16, Michael Paquier  wrote:
> >
> > On Mon, May 19, 2025 at 11:19:48AM -0400, Robert Haas wrote:
> > > The advantage of Fujii-san's proposal is that it is very simple to
> > > implement. A subscription option would indeed be better, but it would
> > > also be considerably more complex. Why not start simple and if someone
> > > wants to do the work to add something more complicated, that is fine?
> >
> > Logically, adding that as an option of CREATE SUBSCRIPTION would just
> > be a duplication of what a connection strings are already able to do
> > with "options='-c foo=fooval'", isn't it?

I think there is a difference in the point that Vignesh made below;
the worker can detect wal_receiver_timeout change and restart.

>
> > It seems to me that the issue of downgrading wal_receiver_timeout to
> > become user-settable is if we're OK to allow non-superusers play with
> > it in the code path where it's used currently.  Knowing that physical
> > WAL receivers are only spawned in a controlled manner by the startup
> > process, this does not sound like an issue.
>
> If we set the wal_receiver_timeout configuration using ALTER ROLE for
> the subscription owner's role, the apply worker will start with that
> value. However, any changes made via ALTER ROLE ... SET
> wal_receiver_timeout will not take effect for an already running apply
> worker unless the subscription is disabled and re-enabled. In
> contrast, this is handled automatically during CREATE SUBSCRIPTION,
> where parameter changes are detected.

Right. But given changing wal_receiver_timeout doesn't happen
frequently in practice I guess this would not be a big downside of the
proposed idea.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Add comment explaining why queryid is int64 in pg_stat_statements

2025-05-20 Thread Michael Paquier

On Tue, May 20, 2025 at 11:28:17PM +1200, David Rowley wrote:
> Certainly, a bit late, yes. It basically requires implementing
> unsigned types on the SQL level. I believe there are a few sunken
> ships along that coastline, and plenty of history in the archives if
> you want to understand some of the difficulties.

Providing some more context and some more information than this reply,
the latest thread that I can find on the matter is this one, from
December 2024:
https://www.postgresql.org/message-id/CAN3gO4sPBKbfWYK10i294u3kzsfDb4WX891FMbjLnKjMS08u7A%40mail.gmail.com

It summarizes nicely the situation and it contains some more general
points.

This particular point from Tom about numeric promotion and casting
hierarchy resonates as a very good one:
https://www.postgresql.org/message-id/3180774.1733588677%40sss.pgh.pa.us
My own bet is that if you don't want to lose any existing
functionality, perhaps we should be just more aggressive with casting
requirements on input to begin with even if it means more work when
writing queries, even if it comes at a loss of usability that should
still be something..

FWIW, I've wanted an unsigned in-core type more than once.  It would
be a lot of work, but we have a lot of things that exist in the core
code that map to unsigned 32b or 64b integer values.  Just naming one,
WAL LSN difference calculations.  XLogRecPtr is a 64b unsigned value,
not its representation at SQL level, meaning that we'd overflow after
reaching the threshold of the bit tracking the signedness.  It's true
that a system would need to live long enough to reach such LSNs, but
we have also 32b integers plugged here and there.  Another one is
block numbers, or OID which is an in-core type.  Having an in-core
unsigned integer type would scale better that creating types mapping 
to every single internal core concept.
--
Michael


signature.asc
Description: PGP signature

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2025-05-20 Thread jian he

On Mon, May 12, 2025 at 4:31 PM Dmitry Koval  wrote:
>
> Hi!
>
> We (with colleagues) discussed further improvements to SPLIT/MERGE
> PARTITION(S). As a result of the discussion, the following answers to
> the questions remained:
>
> 1. Who should be the owner of new partitions (SPLIT PARTITION command)
> if owner of the partition being split is not the same as the current user?
> a) current user (since he is the one who creates new tables);
> b) the owner of the partitioned partition (since it is the owner of the
> table and should continue to own it).

per https://www.postgresql.org/docs/current/sql-altertable.html
"You must own the table to use ALTER TABLE."

That means the current user must own the to be SPLITed partition.
the new partitions owner should be current user (the one who did ALTER TABLE)


> 2. Who should be the owner of the new partition (MERGE PARTITIONS
> command) if the partitions being merged have different owners?
> a) current user (since he is the one who creates new table);
> b) merging of partitions should be forbidden if they have different owners.
>
let say:

alter table whatever_range merge partitions
( whatever_range_c, whatever_range_de )
into whatever_range_cde;

In this case, the current user must own
whatever_range_c and whatever_range_de to perform this operation.
and the current user will be the owner of whatever_range_cde.

Re: queryId constant squashing does not support prepared statements

2025-05-20 Thread Sami Imseih

> At the same time AFAICT there isn't much more code paths
> to worry about in case of a LocationExpr as a node

I can imagine there are others like value expressions,
row expressions, json array expressions, etc. that we may
want to also normalize.

> I believe it's worth to not only to keep amount of work to support
> LocationExpr as minimal as possible, but also impact on the existing
> code. What I see as a problem is keeping such specific information as
> the location boundaries in such a generic expression as A_Expr, where it
> will almost never be used. Do I get it right, you folks are ok with
> that?

There are other examples of fields that are minimally used in other structs.
Here is one I randomly spotted in SelectStmt such as SortClause, Limit options,
etc. I think the IN list is quite a common case, otherwise we would not
care as much as we do.

There are other examples of fields that are minimally used in other structs.
Here is one I randomly spotted in SelectStmt such as SortClause, Limit options,
etc. I think the IN list is quite a common case, otherwise we would not
care as much as we do. Adding another 8 bytes to the struts does not seem
like that big of a problem to me, especially the structs will remain below
64 bytes.


```
(gdb) ptype /o A_Expr
type = struct A_Expr {
/*  0  |   4 */NodeTag type;
/*  4  |   4 */A_Expr_Kind kind;
/*  8  |   8 */List *name;
/* 16  |   8 */Node *lexpr;
/* 24  |   8 */Node *rexpr;
/* 32  |   4 */ParseLoc location;
/* XXX  4-byte padding   */

   /* total size (bytes):   40 */
 }


(gdb) ptype \o A_ArrayExpr
Invalid character '\' in expression.
(gdb) ptype /o A_ArrayExpr
type = struct A_ArrayExpr {
/*  0  |   4 */NodeTag type;
/* XXX  4-byte hole  */
/*  8  |   8 */List *elements;
/* 16  |   4 */ParseLoc location;
/* XXX  4-byte padding   */

   /* total size (bytes):   24 */
 }
```


In general, Making something like T_LocationExpr as a query node
seems totally wrong to me. It's not a node, but rather a temporary
wrapper of some location information and it does not seem it has
business being used by the time we get to thee expression
transformations. It seems very odd considering location information
are simple fields in the parse node itself.


>I was a bit worried about not using a Node but Sami has reminded me
> last week that we already have in gram.y the concept of using some
> private structures to track intermediate results done by the parsing

Attached is a sketch of what I mean. There is a private struct that tracks
the list boundaries and this can wrap in_expr or whatever else makes
sense in the future.

+typedef struct ListWithBoundary
+{
+   Node   *expr;
+   ParseLocstart;
+   ParseLocend;
+} ListWithBoundary;
+
 /* ConstraintAttributeSpec yields an integer bitmask of these flags: */
 #define CAS_NOT_DEFERRABLE 0x01
 #define CAS_DEFERRABLE 0x02
@@ -269,6 +276,7 @@ static Node *makeRecursiveViewSelect(char
*relname, List *aliases, Node *query);
struct KeyAction *keyaction;
ReturningClause *retclause;
ReturningOptionKind retoptionkind;
+   struct ListWithBoundary *listwithboundary;
 }


+%type  in_expr

The values are then added to start_location/end_location ParseLoc in
A_ArrayExpr and A_Expr. Doing it this will keep changes to the parse_expr.c
code to a minimum, only the IN transformation will need to set the values
of the A_Expr into the final A_ArrayExpr.

--
Sami Imseih
Amazon Web Services (AWS)
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0b5652071d1..bec24aab720 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -136,6 +136,13 @@ typedef struct KeyActions
KeyAction *deleteAction;
 } KeyActions;
 
+typedef struct ListWithBoundary
+{
+   Node   *expr;
+   ParseLocstart;
+   ParseLocend;
+} ListWithBoundary;
+
 /* ConstraintAttributeSpec yields an integer bitmask of these flags: */
 #define CAS_NOT_DEFERRABLE 0x01
 #define CAS_DEFERRABLE 0x02
@@ -269,6 +276,7 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
struct KeyAction *keyaction;
ReturningClause *retclause;
ReturningOptionKind retoptionkind;
+   struct ListWithBoundary *listwithboundary;
 }
 
 %typestmt toplevel_stmt schema_stmt routine_body_stmt
@@ -523,8 +531,9 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
 %type  def_elem reloption_elem old_aggr_elem operator_def_elem
 %typedef_arg columnElem where_clause where_or_current_clause
a_expr b_expr c_expr Aexp

Re: Re: proposal: schema variables

2025-05-20 Thread Michael Paquier

On Tue, May 20, 2025 at 10:28:31PM +0200, Pavel Stehule wrote:
> This topic is difficult, because there is no common solution. SQL/PSM is
> almost dead. T-SQL (and MySQL) design is weak and cannot be used for
> security.
> Oracle's design is joined with just one environment. And although almost
> all widely used databases have supported session variables for decades, no
> one design
> is perfect. Proposed design is not perfect too (it introduces possible
> ambiguity) , but I think it can support most wanted use cases (can be
> enhanced in future),
> and it is consistent with Postgres. There are more ways to reduce risk of
> unwanted ambiguity to zero. But it increases the size of the patch.

One thing that I keep hearing about this feature is that this would be
really helpful for migration from Oracle to PostgreSQL, helping a lot
with rewrites of pl/pgsql functions.

There is one page on the wiki about private variables, dating back to
2016:
https://wiki.postgresql.org/wiki/CREATE_PRIVATE_VARIABLE

Perhaps it would help to summarize a bit the pros and cons of existing
implementations to drive which implementation would be the most suited
on a wiki page?  The way standards are defined is by overwriting
existing standards, and we don't have one in the SQL specification.
It's hard to say if there will be one at some point, but if the main
SQL products around the world have one, it pretty much is a point in
favor of having a standard.

Another possible angle that could be taken is to invest in a proposal
for the SQL committee to consider, forcing an actual standard that
PostgreSQL could rely on rather than having one behavior implemented
to have it standard-incompatible a few years after.  And luckily, we
have Vik Fearing and Peter Eisentraut in this community who invest
time and resources in this area.

FWIW, I do agree with the opinion that if you want to propose rebased
versions of this patch set on a periodic basis, you are free to do so.
This is the core concept of an open community.  In terms of my
committer time, I'm pretty much already booked in terms of features
planned for the next release, so I guess that I won't be able to
invest time on this thread.  Just saying.

I know that this patch set has been discussed at FOSDEM at some point,
so others may be able to comment more about that.  That's just one
opinion among many others.
--
Michael

signature.asc
Description: PGP signature

Addition of %b/backend_type in log_line_prefix of TAP test logs

2025-05-20 Thread Michael Paquier

Hi all,

While navigating through the logs in the CI, the buildfarm, or even my
own machine, I've found myself spending a lot of time looking at
very specific log entries to find out the PID of a specific process,
sometimes mistaking a normal backend vs a logical WAL sender while
looking for a PID, or just looking for an auxiliary process.

I'd like to suggest the following change in Cluster.pm:
-   print $conf "log_line_prefix = '%m [%p] %q%a '\n";
+   print $conf "log_line_prefix = '%m [%b,%p] %q%a '\n";

It is then possible to match a backend_type with a PID.  That
increases the quantity of the logs, of course, but I'm finding that
really helpful to have.  But perhaps it's only me?

Thanks,
--
Michael
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 1c11750ac1d0..3bbd4a3f7548 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -684,7 +684,7 @@ sub init
 	print $conf "\n# Added by PostgreSQL::Test::Cluster.pm\n";
 	print $conf "fsync = off\n";
 	print $conf "restart_after_crash = off\n";
-	print $conf "log_line_prefix = '%m [%p] %q%a '\n";
+	print $conf "log_line_prefix = '%m [%b,%p] %q%a '\n";
 	print $conf "log_statement = all\n";
 	print $conf "log_replication_commands = on\n";
 	print $conf "wal_retrieve_retry_interval = '500ms'\n";


signature.asc
Description: PGP signature

Re: POC: Parallel processing of indexes in autovacuum

2025-05-20 Thread Masahiko Sawada

On Thu, May 15, 2025 at 10:10 PM Daniil Davydov <3daniss...@gmail.com> wrote:
>
> Hi,
>
> On Fri, May 16, 2025 at 4:06 AM Matheus Alcantara
>  wrote:
> > I've reviewed the v1-0001 patch, the build on MacOS using meson+ninja is
> > failing:
> > ❯❯❯ ninja -C build install
> > ninja: Entering directory `build'
> > [1/126] Compiling C object
> > src/backend/postgres_lib.a.p/utils_misc_guc_tables.c.o
> > FAILED: src/backend/postgres_lib.a.p/utils_misc_guc_tables.c.o
> > ../src/backend/utils/misc/guc_tables.c:3613:4: error: incompatible
> > pointer to integer conversion initializing 'int' with an expression of
> > type 'void *' [-Wint-conversion]
> >  3613 | NULL,
> >   | ^~~~
> >
>
> Thank you for reviewing this patch!
>
> > It seems that the "autovacuum_reserved_workers_num" declaration on
> > guc_tables.c has an extra gettext_noop() call?
>
> Good catch, I fixed this warning in the v2 version.
>
> >
> > One other point is that as you've added TAP tests for the autovacuum I
> > think you also need to create a meson.build file as you already create
> > the Makefile.
> >
> > You also need to update the src/test/modules/meson.build and
> > src/test/modules/Makefile to include the new test/modules/autovacuum
> > path.
> >
>
> OK, I should clarify this moment : modules/autovacuum is not a normal
> test but a sandbox - just an example of how we can trigger parallel
> index autovacuum. Also it may be used for debugging purposes.
> In fact, 001_autovac_parallel.pl is not verifying anything.
> I'll do as you asked (add all meson and Make stuff), but please don't
> focus on it. The creation of the real test is still in progress. (I'll
> try to complete it as soon as possible).
>
> In this letter I will divide the patch into 2 parts : implementation
> and sandbox. What do you think about implementation?

Thank you for updating the patches. I have some comments on v2-0001 patch:

+   {
+   {"autovacuum_reserved_workers_num", PGC_USERSET,
RESOURCES_WORKER_PROCESSES,
+   gettext_noop("Number of worker processes, reserved for
participation in parallel index processing during autovacuum."),
+   gettext_noop("This parameter is depending on
\"max_worker_processes\" (not on \"autovacuum_max_workers\"). "
+"*Only* autovacuum workers can use these
additional processes. "
+"Also, these processes are taken into account
in \"max_parallel_workers\"."),
+   },
+   &av_reserved_workers_num,
+   0, 0, MAX_BACKENDS,
+   check_autovacuum_reserved_workers_num, NULL, NULL
+   },

I find that the name "autovacuum_reserved_workers_num" is generic. It
would be better to have a more specific name for parallel vacuum such
as autovacuum_max_parallel_workers. This parameter is related to
neither autovacuum_worker_slots nor autovacuum_max_workers, which
seems fine to me. Also, max_parallel_maintenance_workers doesn't
affect this parameter.

Which number does this parameter mean to specify: the maximum number
of parallel vacuum workers that can be used during autovacuum or the
maximum number of parallel vacuum workers that each autovacuum can
use?

---
The patch includes the changes to bgworker.c so that we can reserve
some slots for autovacuums. I guess that this change is not
necessarily necessary because if the user sets the related GUC
parameters correctly the autovacuum workers can use parallel vacuum as
expected.  Even if we need this change, I would suggest implementing
it as a separate patch.

---
+   {
+   {
+   "parallel_idx_autovac_enabled",
+   "Allows autovacuum to process indexes of this table in
parallel mode",
+   RELOPT_KIND_HEAP,
+   ShareUpdateExclusiveLock
+   },
+   false
+   },

The proposed reloption name doesn't align with our naming conventions.
Looking at our existing reloptions, we typically write out full words
rather than using abbreviations like 'autovac' or 'idx'.

The new reloption name seems not to follow the conventional naming
style for existing reloption. For instance, we don't use abbreviations
such as 'autovac' and 'idx'.

I guess we can implement this parameter as an integer parameter so
that the user can specify the number of parallel vacuum workers for
the table. For example, we can have a reloption
autovacuum_parallel_workers. Setting 0 (by default) means to disable
parallel vacuum during autovacuum, and setting special value -1 means
to let PostgreSQL calculate the parallel degree for the table (same as
the default VACUUM command behavior).

I've also considered some alternative names. If we were to use
parallel_maintenance_workers, it sounds like it controls the parallel
degree for all operations using max_parallel_maintenance_workers,
including CREATE INDEX. Similarly, vacuum_parallel_workers could be
interpreted as affecting both autovacuum and manual VACUUM commands,
suggesting that when users run "VACUUM (PARALLEL) t", the sys

Re: Please update the pgconf.dev Unconference notes

2025-05-20 Thread Hannu Krosing

Thanks a lot!

On Tue, May 20, 2025 at 9:42 AM Tomas Vondra  wrote:
>
> On 5/20/25 08:49, Hannu Krosing wrote:
> > Hi to all note-takers
> >
> > (I added two who I *think* I remember took notes)
> >
> > Please upload the notes to the Unconference section in
> > https://wiki.postgresql.org/wiki/PGConf.dev_2025
> >
> > I have also some random notes on scraps of paper from the two sessions
> > I attended and did not present and would like to add these if I think
> > something relevant is missing once the main notes are there
> >
>
> I only took notes during the "better logging" session, and I already
> posted those to the wiki. The discussion moved too quickly in different
> directions for me to catch all the details, so the notes have gaps etc.
> If others can improve that / clarify, that'd be great.
>
>
> regards
>
> --
> Tomas Vondra
>

Re: Addition of %b/backend_type in log_line_prefix of TAP test logs

2025-05-20 Thread Fujii Masao

On Wed, May 21, 2025 at 8:51 AM Michael Paquier  wrote:
>
> Hi all,
>
> While navigating through the logs in the CI, the buildfarm, or even my
> own machine, I've found myself spending a lot of time looking at
> very specific log entries to find out the PID of a specific process,
> sometimes mistaking a normal backend vs a logical WAL sender while
> looking for a PID, or just looking for an auxiliary process.
>
> I'd like to suggest the following change in Cluster.pm:
> -   print $conf "log_line_prefix = '%m [%p] %q%a '\n";
> +   print $conf "log_line_prefix = '%m [%b,%p] %q%a '\n";

+1 to this change.

Since pg_regress uses log_line_prefix = '%m %b[%p] %q%a ',
isn't it better to use the same setting in TAP tests as well?

Regards,

-- 
Fujii Masao

Re: Minor adjustment to pg_aios output naming

2025-05-20 Thread Michael Paquier

On Wed, May 21, 2025 at 11:14:11AM +0900, torikoshia wrote:
> Hi,
> 
> I've noticed a minor inconsistency in the output of pg_aios.
> 
> According to the documentation, the values for 'operation' are described as:
> 
>   readv, a vectored read
>   ...
>   writev, a vectored write
> 
> However, in actual output, they appear as read and write -- without the
> trailing v:
> 
>   =# select operation from pg_aios;
>   ..
>   operation   | read
> 
> While this discrepancy is unlikely to cause any real issues, it would be
> better to keep the naming consistent.
> 
> I was a bit unsure which form to align to.
> There are currently no other types of read/write operations in pg_aios, so
> the shorter form might have been sufficient.
> However, using the 'v'-suffixed names makes the vectored nature of these
> operations explicit and future-proofs
> the naming in case other variants are introduced later.
> 
> What do you think?

--- a/src/backend/storage/aio/aio_io.c
+++ b/src/backend/storage/aio/aio_io.c
@@ -181,9 +181,9 @@ pgaio_io_get_op_name(PgAioHandle *ioh)
 case PGAIO_OP_INVALID:
 return "invalid";
 case PGAIO_OP_READV:
-return "read";
+return "readv";
 case PGAIO_OP_WRITEV:
-return "write";
+return "writev";
 }

I think that your suggestion of fix is right.  The properties are
marked as "WRITEV" and "READV" for vectored operations.  So the
documentation is right, not the name used in the code.  Will fix,
thanks for the report.
--
Michael


signature.asc
Description: PGP signature

Re: Addition of %b/backend_type in log_line_prefix of TAP test logs

2025-05-20 Thread Michael Paquier

On Wed, May 21, 2025 at 11:41:18AM +0900, Fujii Masao wrote:
> Since pg_regress uses log_line_prefix = '%m %b[%p] %q%a ',
> isn't it better to use the same setting in TAP tests as well?

Oh, right, good point.  I've managed to miss this part in
pg_regress.c.  Using the same value makes even more sense.
--
Michael


signature.asc
Description: PGP signature

Re: [BUG] Skipped initialization of some xl_xact_parsed_prepare fields

2025-05-20 Thread Fujii Masao

On Thu, May 15, 2025 at 8:30 PM Fujii Masao  wrote:
> > I suppose to add initialization of `nstats` and `nabortstats` to
> > ParsePrepareRecord (see attached patch).
>
> LGTM. Barring any objection, I will commit this patch.

I've pushed the patch. Thanks!

Regards,

-- 
Fujii Masao

Re: domain over virtual generated column

2025-05-20 Thread jian he

On Mon, Apr 28, 2025 at 10:45 AM jian he  wrote:
>
> summary of attached patch:
> v1-0001
> we need to compute the generation expression for the domain with constraints,
> thus rename ExecComputeStoredGenerated to ExecComputeGenerated.
>
> v1-0002
> soft error variant of ExecPrepareExpr, ExecInitExpr.  for soft error 
> processing
> of CoerceToDomain.  we don't want error messages like "value for domain d2
> violates check constraint "d2_check"" while validating existing domain data, 
> we
> want something like:
> ERROR: column "b" of table "gtest24" contains values that violate the
> new constraint
>
> v1-0003 virtual generation columns over domain.

hi.

new patches attached.
mainly code refactoring, also try to reduce some unnecessary regress tests.
From ffd307605c1cfadd5a28090f90d54dceb75e6bc1 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Wed, 21 May 2025 10:59:07 +0800
Subject: [PATCH v2 1/3] rename ExecComputeStoredGenerated to
 ExecComputeGenerated

to support virtual generated column over domain type, we actually need compute
the virtual generated expression. we did it at ExecComputeGenerated for stored,
we can use it for virtual.  so do the rename.

discussion: https://postgr.es/m/CACJufxFNUHxuSE=g20c2alo3d+4t_j9h0x8esmirzcc4frl...@mail.gmail.com
---
 src/backend/commands/copyfrom.c|  4 ++--
 src/backend/executor/execReplication.c |  8 
 src/backend/executor/nodeModifyTable.c | 20 ++--
 src/include/executor/nodeModifyTable.h |  5 ++---
 4 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..906b6581e11 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1346,8 +1346,8 @@ CopyFrom(CopyFromState cstate)
 /* Compute stored generated columns */
 if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
 	resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
-	ExecComputeStoredGenerated(resultRelInfo, estate, myslot,
-			   CMD_INSERT);
+	ExecComputeGenerated(resultRelInfo, estate, myslot,
+		 CMD_INSERT);
 
 /*
  * If the target is a plain table, check the constraints of
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 53ddd25c42d..ca300ac0f00 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -587,8 +587,8 @@ ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
-	   CMD_INSERT);
+			ExecComputeGenerated(resultRelInfo, estate, slot,
+ CMD_INSERT);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
@@ -684,8 +684,8 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
 		/* Compute stored generated columns */
 		if (rel->rd_att->constr &&
 			rel->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
-	   CMD_UPDATE);
+			ExecComputeGenerated(resultRelInfo, estate, slot,
+ CMD_UPDATE);
 
 		/* Check the constraints of the tuple */
 		if (rel->rd_att->constr)
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 46d533b7288..ab41ad8a8bc 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -537,12 +537,12 @@ ExecInitGenerated(ResultRelInfo *resultRelInfo,
 }
 
 /*
- * Compute stored generated columns for a tuple
+ * Compute generated columns for a tuple.
+ * we might support virtual generated column in future, currently not.
  */
 void
-ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
-		   EState *estate, TupleTableSlot *slot,
-		   CmdType cmdtype)
+ExecComputeGenerated(ResultRelInfo *resultRelInfo, EState *estate,
+	 TupleTableSlot *slot, CmdType cmdtype)
 {
 	Relation	rel = resultRelInfo->ri_RelationDesc;
 	TupleDesc	tupdesc = RelationGetDescr(rel);
@@ -931,8 +931,8 @@ ExecInsert(ModifyTableContext *context,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
-	   CMD_INSERT);
+			ExecComputeGenerated(resultRelInfo, estate, slot,
+ CMD_INSERT);
 
 		/*
 		 * If the FDW supports batching, and batching is requested, accumulate
@@ -1058,8 +1058,8 @@ ExecInsert(ModifyTableContext *context,
 		 */
 		if (resultRelationDesc->rd_att->constr &&
 			resultRelationDesc->rd_att->constr->has_generated_stored)
-			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
-	   CMD_INSERT);
+			ExecComputeGenerated(resultRelInfo, estate, slot,
+ CMD_INSERT);
 
 		/*
 		 * Check any RLS WITH CHECK policies.
@@ -2146,8 +2146,8 @@ ExecUpdatePrepareSlot(ResultRelInfo *re

Re: Assertion failure in smgr.c when using pg_prewarm with partitioned tables

2025-05-20 Thread Fujii Masao

On Mon, May 19, 2025 at 5:18 PM Masahiro Ikeda  wrote:
>
> Thanks for your work and feedback!
>
> I've updated the patches and added regular regression tests for
> both pg_prewarm and amcheck.

Thanks for updating the patches!

Regarding the 0001 patch:

+CREATE TABLE test_part1 PARTITION OF test FOR VALUES FROM (1) TO (1000);
+INSERT INTO test SELECT generate_series(1, 100);

These lines don't seem necessary for the test. How about removing them?
It would simplify the test and slightly reduce the execution time - though
the time savings are very minimal.

+-- To specify the relation which does not have storage should fail.

This comment could be clearer as:
"pg_prewarm() should fail if the target relation has no storage."

+ /* Check that the storage exists. */

Maybe rephrase to:
"Check that the relation has storage."?

Regarding the 0002 patch:

- errdetail("Relation \"%s\" is a %s index.",
-RelationGetRelationName(rel), NameStr(((Form_pg_am)
GETSTRUCT(amtuprel))->amname;
+ errdetail("Relation \"%s\" is a %s %sindex.",
+RelationGetRelationName(rel), NameStr(((Form_pg_am)
GETSTRUCT(amtuprel))->amname),
+(rel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) ?
"partitioned " : "")));

Instead of manually building the message, how about using
errdetail_relkind_not_supported()?
It would be more consistent with similar error reporting elsewhere.

> I agree with back-patching v3-0001. I was able to reproduce the issue
> on the REL_17_STABLE branch.

Thanks for the confirmation.

> One concern is that this patch changes
> the error message in production:
>
> * v17.5 (without --enable-cassert)
> > ERROR:  fork "main" does not exist for this relation
>
> * REL_17_STABLE with the v3-0001 patch (without --enable-cassert)
> > ERROR:  relation "test" does not have storage
>
> However, I think preventing the assertion failure should take priority.

Yes.

Regards,

-- 
Fujii Masao

Re: Add comment explaining why queryid is int64 in pg_stat_statements

2025-05-20 Thread Michael Paquier

On Tue, May 20, 2025 at 10:35:39AM -0500, Sami Imseih wrote:
> 1/ this was missed in pg_stat_statements
> ./contrib/pg_stat_statements/pg_stat_statements.c:uint64
> saved_queryId = pstmt->queryId;

Right.  Some greps based on "queryid" and "query_id" point that this
is the last reference to uint in the tree.

> 2/ Should "DatumGetInt64(hash_any_extended(.." be turned into a
> macro since it's used in
> several places? Also, we can add a comment in the macro such as
> "
> Output the queryId as an int64 rather than a uint64, to match the
> BIGINT column used to emit
> queryId in system views
> "

We are talking about two places in queryjumblefuncs.c.  Leaving the
code as in David's original proposal is fine IMO.  Adding a comment
about the implied uint64 -> int64 conversion when calling
hash_any_extended() sounds like a good idea to document what we want
from the hash.

I've had a closer look at David's patch, and I did not spot other
places that required similar adjustments.  I may have missed
something, of course.

David, how would you like to move on with this patch set?  I can take
responsibility for both patches as the plan ID change can qualify as
an open item.
--
Michael
From 64df69093c52a17287aabb8551e8ab2b62d1b16a Mon Sep 17 00:00:00 2001
From: David Rowley 
Date: Tue, 20 May 2025 12:43:18 +1200
Subject: [PATCH v2 1/2] Change internal queryid type to int64

Previously, this was uint64, which perhaps was chosen in cff440d36 as the
previous type was uint32 prior to that widening.

Having this as uint64 does require us to have to cast the value in
various places since the value is output in its signed form in various
places, such as EXPLAIN VERBOSE and in the pg_stat_statements.queryid
column.

We should likely note down this change in the release notes "Source
Code" section as some extensions may wish to adjust their code.

Discussion: https://postgr.es/m/50cb0c8b-994b-48f9-a1c4-13039eb35...@eisentraut.org
---
 src/include/nodes/parsenodes.h|  2 +-
 src/include/nodes/plannodes.h |  2 +-
 src/include/utils/backend_status.h|  6 +--
 src/backend/access/brin/brin.c|  2 +-
 src/backend/access/nbtree/nbtsort.c   |  2 +-
 src/backend/commands/explain.c|  8 +---
 src/backend/commands/vacuumparallel.c |  2 +-
 src/backend/nodes/gen_node_support.pl |  5 +++
 src/backend/nodes/outfuncs.c  |  6 +++
 src/backend/nodes/queryjumblefuncs.c  | 28 --
 src/backend/nodes/readfuncs.c |  6 +++
 src/backend/rewrite/rewriteHandler.c  |  2 +-
 src/backend/tcop/postgres.c   |  4 +-
 src/backend/utils/activity/backend_status.c   | 12 +++---
 src/backend/utils/adt/pgstatfuncs.c   |  4 +-
 .../pg_stat_statements/pg_stat_statements.c   | 38 +--
 16 files changed, 73 insertions(+), 56 deletions(-)

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 4610fc61293b..b976a40a75d6 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -128,7 +128,7 @@ typedef struct Query
 	 * might not be set; also not stored.  This is the result of the query
 	 * jumble, hence ignored.
 	 */
-	uint64		queryId pg_node_attr(equal_ignore, query_jumble_ignore, read_write_ignore, read_as(0));
+	int64		queryId pg_node_attr(equal_ignore, query_jumble_ignore, read_write_ignore, read_as(0));
 
 	/* do I set the command result tag? */
 	bool		canSetTag pg_node_attr(query_jumble_ignore);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 658d76225e47..56cfdca074de 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -53,7 +53,7 @@ typedef struct PlannedStmt
 	CmdType		commandType;
 
 	/* query identifier (copied from Query) */
-	uint64		queryId;
+	int64		queryId;
 
 	/* plan identifier (can be set by plugins) */
 	uint64		planId;
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 430ccd7d78e4..bbebe517501f 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -170,7 +170,7 @@ typedef struct PgBackendStatus
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
 
 	/* query identifier, optionally computed using post_parse_analyze_hook */
-	uint64		st_query_id;
+	int64		st_query_id;
 
 	/* plan identifier, optionally computed using planner_hook */
 	uint64		st_plan_id;
@@ -321,7 +321,7 @@ extern void pgstat_clear_backend_activity_snapshot(void);
 
 /* Activity reporting functions */
 extern void pgstat_report_activity(BackendState state, const char *cmd_str);
-extern void pgstat_report_query_id(uint64 query_id, bool force);
+extern void pgstat_report_query_id(int64 query_id, bool force);
 extern void pgstat_report_plan_id(uint64 plan_id, bool force);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
@@ -329

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

2025-05-20 Thread Amit Kapila

On Wed, May 21, 2025 at 12:45 AM Masahiko Sawada  wrote:
>
> On Mon, May 19, 2025 at 2:05 AM Amit Kapila  wrote:
> >
> > On Sun, May 18, 2025 at 1:09 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Sat, May 10, 2025 at 7:08 AM Amit Kapila  
> > > wrote:
> > > >
> > > >
> > > > Can we have a parameter like immediately_reserve in
> > > > create_logical_slot API, similar to what we have for physical slots?
> > > > We need to work out the details, but that should address the kind of
> > > > use case you are worried about, unless I am missing something.
> > >
> > > Interesting idea. One concern in my mind is that in the use case I
> > > mentioned above, users would need to carefully manage the extra
> > > logical slot to keep the logical decoding active. The logical decoding
> > > is deactivated on the standby as soon as users drop all logical slots
> > > on the primary.
> > >
> >
> > Yes, but the same is true for a physical slot in the case of physical
> > replication used via primary_slot_name parameter.
>
> Could you elaborate on this?
>

I am trying to correlate with the case where standby no longer needs
physical slot due to some reason like the standby machine failure, or
say someone uses pg_createsubscriber on standby to make it subscriber,
etc. In such a case, user needs to manually remove the physical slot
on primary. There is difference in both cases but the point is one may
need to manage physical slot as well.

>
> I recently had a discussion with Ashtosh at PGConf.dev regarding an
> alternative approach: introducing a new command syntax such as "ALTER
> SYSTEM UPDATE wal_level TO 'logical'". In his presentation[1], he
> outlined this proposed command as a means to modify specific GUC
> parameters synchronously. The backend executing this command would
> manage the transition, allowing users to interrupt the process via
> Ctrl-C if necessary. In the specific context of wal_level change, this
> command could be designed to reject operations like "ALTER SYSTEM
> UPDATE wal_level TO 'minimal'" with an error, effectively preventing
> undesirable wal_level transitions to or from 'minimal'. While this
> approach shares similarities with our previous proposal of
> implementing a dedicated SQL function for WAL level modifications, it
> offers a more standardized interface for users.
>
> Though I find merit in this proposal, I remain uncertain about its
> implementation details and whether it represents the optimal solution
> for online wal_level changes, particularly given that our current
> approach of automatic WAL level adjustment appears viable.
>

Yeah, I find the idea that the presence of a logical slot will allow
the user to enable logical decoding/replication more appealing than
this new alternative, leaving aside the challenges of realizing it.

-- 
With Regards,
Amit Kapila.

Re: Fix slot synchronization with two_phase decoding enabled

2025-05-20 Thread Nisha Moond

On Tue, May 6, 2025 at 4:52 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Mon, May 5, 2025 at 6:59 PM Amit Kapila wrote:
> >
> > On Sun, May 4, 2025 at 2:33 PM Masahiko Sawada
> >  wrote:
> > >
> > > While I cannot be entirely certain of my analysis, I believe the root
> > > cause might be related to the backward movement of the confirmed_flush
> > > LSN. The following scenario seems possible:
> > >
> > > 1. The walsender enables the two_phase and sets two_phase_at (which
> > > should be the same as confirmed_flush).
> > > 2. The slot's confirmed_flush regresses for some reason.
> > > 3. The slotsync worker retrieves the remote slot information and
> > > enables two_phase for the local slot.
> > >
> >
> > Yes, this is possible. Here is my theory as to how it can happen in the 
> > current
> > case. In the failed test, after the primary has prepared a transaction, the
> > transaction won't be replicated to the subscriber as two_phase was not
> > enabled for the slot. However, subsequent keepalive messages can send the
> > latest WAL location to the subscriber and get the confirmation of the same 
> > from
> > the subscriber without its origin being moved. Now, after we restart the 
> > apply
> > worker (due to disable/enable for a subscription), it will use the previous
> > origin_lsn to temporarily move back the confirmed flush LSN as explained in
> > one of the previous emails in another thread [1]. During this temporary
> > movement of confirm flush LSN, the slotsync worker fetches the two_phase_at
> > and confirm_flush_lsn values, leading to the assertion failure. We see this
> > issue intermittently because it depends on the timing of slotsync worker's
> > request to fetch the slot's value.
>
> Based on this theory, I can reproduce the BF failure in the 040 tap-test on
> HEAD after applying the 0001 patch. This is achieved by using the injection
> point to stop the walsender from sending a keepalive before receiving the old
> origin position from the apply worker, ensuring the confirmed_flush
> consistently moves backward before slotsync.
>
> Additionally, I've reproduced the duplicate data issue on HEAD without 
> slotsync
> using the attached script (after applying the injection point patch). This
> issue arises if we immediately disable the subscription after the
> confirm_flush_lsn moves backward, preventing the walsender from advancing the
> confirm_flush_lsn.
>
> In this case, if a prepared transaction exists before two_phase_at, then after
> re-enabling the subscription, it will replicate that prepared transaction when
> decoding the PREPARE record and replicate that again when decoding the COMMIT
> PREPARED record. In such cases, the apply worker keeps reporting the error:
>
> ERROR: transaction identifier "pg_gid_16387_755" is already in use.
>
> Apart from above, we're investigating whether the same issue can occur in
> back-branches and will share the results once ready.
>

The issue was confirmed to occur on back branches as well, due to
confirmed_flush_lsn moving backward. It has now been fixed on HEAD and
all supported back-branches down to PG13.

For details, refer to the separate thread [1]; the fix was committed
(commit: ad5eaf3)[2].

The BF failure has not occurred since the fix, but we’ll continue to
keep an eye.

[1] 
https://www.postgresql.org/message-id/CAJpy0uDZ29P=BYB1JDWMCh-6wXaNqMwG1u1mB4=10ly0x7h...@mail.gmail.com
[2] 
https://github.com/postgres/postgres/commit/ad5eaf390c58294e2e4c1509aa87bf13261a5d15

--
Thanks,
Nisha

Re: fix: propagate M4 env variable to flex subprocess

2025-05-20 Thread Nazir Bilal Yavuz

Hi,

On Tue, 13 May 2025 at 18:54, Andres Freund  wrote:
>
> Hi,
>
> On 2025-05-12 23:14:59 -0400, J. Javier Maestro wrote:
> > The pgflex wrapper runs flex with an explicit environment, so it doesn't
> > inherit environment variables from the parent process. However, flex can
> > use the M4 env variable and/or the PATH (via execvp) to find the m4 macro
> > processor.
>
> > Thus, since flex honors the M4 env variable, it should be propagated to the
> > subprocess environment if it's set in the parent environment. This patch
> > fixes it.
>
> I think it probably was not intentional to fully specify the environment,
> rather than just *adding* FLEX_TMP_DIR to the caller's environment. Bilal, I
> think you wrote this originally, do you recall?

I found the original pull request [1] but it does not include the
'FLEX_TMP_DIR' part. I tried to search where the 'FLEX_TMP_DIR' part
is added [2], it seems that this part is added while rebasing so there
is no specific commit.

[1] https://github.com/anarazel/postgres/pull/51
[2] 
https://github.com/anarazel/postgres/commit/cd817afd4ab006b90307a940d96b5116d649165c

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: Regression in statement locations

2025-05-20 Thread Anthonin Bonnefoy

On Tue, May 20, 2025 at 5:59 AM Michael Paquier  wrote:
> Once I have fixed that, I've been a bit puzzled by the difference in
> output in the tests of pg_overexplain, but I think that the new output
> is actually the correct one: the queries whose plan outputs have
> changed are passed as arguments of the explain_filter() function,
> hence the location of the inner queries point at the start location of
> the inner query instead of the start of the top-level query.  Note
> that if you add a semicolon at the end of these three queries in the
> pg_overexplain tests, we finish with an end location reported.

Indeed, by dumping more details on the query string and parsed string
in pg_overexplain, the reported location does match the inner SELECT.
This seems appropriate since it is for the planned select statement.

   Executor Parameter Types: none
   Query String:  EXPLAIN (DEBUG, RANGE_TABLE, COSTS OFF) SELECT
genus, array_agg(name ORDER BY name) FROM vegetables GROUP BY genus
   Parse Location: 41 to end
   Parsed String: SELECT genus, array_agg(name ORDER BY name) FROM
vegetables GROUP BY genus

Looking at the tests, there are 2 additionals empty DO blocks:
+DO $$
+DECLARE BEGIN
+END $$;

What's the point of those? They won't be visible in the output since
we have 'toplevel IS FALSE' in the pg_stat_statements calls and they
don't fit the "DO block --- multiple inner queries with separators".

Other than that, the patch looks good.

Re: wrong query results on bf leafhopper

2025-05-20 Thread Tomas Vondra




On 5/20/25 07:50, David Rowley wrote:
> On Tue, 20 May 2025 at 16:07, Tom Lane  wrote:
>> Failures like this one [1]:
>>
>> @@ -340,9 +340,13 @@
>>  create function myinthash(myint) returns integer strict immutable language
>>internal as 'hashint4';
>>  NOTICE:  argument type myint is only a shell
>> +ERROR:  ROWS is not applicable when function does not return a set
>>
>> are hard to explain as anything besides "that machine is quite
>> broken".  Whether it's flaky hardware, broken compiler, or what is
>> undeterminable from here, but I don't believe it's our bug.  So I'm
>> unexcited about putting effort into it.
> 
> There are certainly much fewer moving parts in PostgreSQL code for
> that one as this failure doesn't seem to rely on anything stored in
> any tables or the catalogues.
> 
> I'd have thought it would be unlikely to be a compiler bug as wouldn't
> that mean it'd fail every time?
> 
> Are there any Prime95-like stress testers for ARM that could be run on
> this machine?
> 
> It would be good to kick this one out the pool if there's hardware issues.
> 

There are tools like "stress" and "stressant", etc. Works on my rpi5,
but depends on the packager.

I'd probably just look at dmesg first. In my experience hardware issues
are often pretty visible there - reports of failed I/O requests, thermal
issues on the CPU, that kind of stuff.


regards

-- 
Tomas Vondra

Re: Should we optimize the `ORDER BY random() LIMIT x` case?

2025-05-20 Thread Aleksander Alekseev

Nico,

> > Can the detection of such queries be done by the yacc/bison parser
> > grammar?
>
> Maybe the `sortby` rule could check if the expression is `random()`,
> then `sort_clause` could check if `$3` is a one-item `sortby_list` of
> just `random()` and mark `$$` as special -- this should be cheap, yes?
> We'd still need to check for `LIMIT` somewhere else.

Although partially implementing an optimizer on the parser level is
possible, it doesn't strike me as a right long-term solution. Firstly,
it's a hacky solution because maybe it will work for random() but for
some reason will not work for -random()*2. Secondly, imagine adding a
dozen optimizations like this in the upcoming years and all of them
interacting with each other. Imagine you are the person who has to
maintain this and not break things when adding another optimization.
All in all, this would be a poor design choice in my opinion.

-- 
Best regards,
Aleksander Alekseev

Re: generic plans and "initial" pruning

2025-05-20 Thread Tomas Vondra



On 5/20/25 05:06, Tom Lane wrote:
> Amit Langote  writes:
>> Pushed after some tweaks to comments and the test case.
> 
> My attention was drawn to commit 525392d57 after observing that
> Valgrind complained about a memory leak in some code that commit added
> to BuildCachedPlan().  I tried to make sense of said code so I could
> remove the leak, and eventually arrived at the attached patch, which
> is part of a series of leak-fixing things hence the high sequence
> number.
> 
> Unfortunately, the bad things I speculated about in the added comments
> seem to be reality.  The second attached file is a test case that
> triggers
> 
> ...

FYI I added this as a PG18 open item:

  https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items


regards

-- 
Tomas Vondra

Re: Add comment explaining why queryid is int64 in pg_stat_statements

2025-05-20 Thread David Rowley

On Tue, 20 May 2025 at 18:12, Julien Rouhaud  wrote:
> I don't think it was discussed, but why not go the other way, keep everything
> as uint64 and expose an uint64 datatype at the SQL level instead?  That's
> something I actually want pretty often so I would be happy to have that
> natively, whether it's called bigoid, oid8 or anything else.  That's probably
> too late for pg18 though.

Certainly, a bit late, yes. It basically requires implementing
unsigned types on the SQL level. I believe there are a few sunken
ships along that coastline, and plenty of history in the archives if
you want to understand some of the difficulties.

David

Re: [PATCH] Allow parallelism for plpgsql return expression after commit 556f7b7

2025-05-20 Thread DIPESH DHAMELIYA

> On Tue, May 20, 2025 at 11:57 AM Dilip Kumar  wrote:
>
> On Mon, May 5, 2025 at 11:19 AM DIPESH DHAMELIYA
>  wrote:
> >
> > Hello everyone,
> >
> > With the commit 556f7b7bc18d34ddec45392965c3b3038206bb62, Any plpgsql 
> > function that returns scalar value would not be able to use parallelism to 
> > evaluate a return statement. It will not be considered for parallel 
> > execution because we are passing maxtuples = 2 to exec_run_select from 
> > exec_eval_expr to evaluate the return expression of the function.
> >
> I could not find commit '556f7b7bc18d34ddec45392965c3b3038206bb62' in
> git log on the master branch, but here is my analysis after looking at
> your patch.

Here is the github link to commit -
https://github.com/postgres/postgres/commit/556f7b7bc18d34ddec45392965c3b3038206bb62
and discussion -
https://www.postgresql.org/message-id/flat/20241206062549.710dc01cf91224809dd6c0e1%40sraoss.co.jp

>
> I don't think we can remove the 'maxtuples' parameter from
> exec_run_select().  In this particular case, the query itself is
> returning a single tuple, so we are good. Still, in other cases where
> the query returns more tuples, it makes sense to stop the execution as
> soon as we have got enough tuples otherwise, it will do the execution
> until we produce all the tuples. Consider the below example where we
> just need to use the first tuple, but if we apply your patch, the
> executor will end up processing all the tuples, and it will impact the
> performance.  So IMHO, the benefit you get by enabling a parallelism
> in some cases may hurt badly in other cases, as you will end up
> processing more tuples than required.
>
> CREATE OR REPLACE FUNCTION get_first_user_email()
> RETURNS TEXT AS $$
> DECLARE
> user_email TEXT;
> BEGIN
> user_email = (SELECT email FROM users);
> RETURN user_email;
> END;
> $$ LANGUAGE plpgsql;
>

I understand but aren't we blocking parallelism for genuine cases with
a very complex query where parallelism can help to some extent to
improve execution time? Users can always rewrite a query (for example
using TOP clause) if they are expecting one tuple to be returned.

cookConstraint dead code

2025-05-20 Thread jian he

hi.

in cookConstraint
/*
 * Make sure no outside relations are referred to (this is probably dead
 * code now that add_missing_from is history).
 */
if (list_length(pstate->p_rtable) != 1)
ereport(ERROR,
(errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
 errmsg("only table \"%s\" can be referenced in check
constraint",
relname)));

looks like it indeed is dead code.
because  we cannot use subquery in check constraint.
if you want to reference another table, then you need
explicit mention it, like:

CREATE TABLE t3 (a int CHECK ( (a> v3.a)));

v3. is a ColumnRef node.
transformColumnRef, refnameNamespaceItem will error out if there is no
table name
specified in FROM clause.

Re: Log connection establishment timings

2025-05-20 Thread Daniel Gustafsson

> On 20 May 2025, at 10:52, Peter Eisentraut  wrote:

> src/test/ssl/t/SSL/Server.pm

While I don't have an immediate usecase or test in mind, having the SSL tests
use log_conections=all could be handy when hacking on the TLS support.

--
Daniel Gustafsson

Re: Conflict detection for update_deleted in logical replication

2025-05-20 Thread shveta malik

On Tue, May 20, 2025 at 8:38 AM shveta malik  wrote:
>
> Please find few more comments:
>
> 1)
> ProcessStandbyPSRequestMessage:
> + /*
> + * This shouldn't happen because we don't support getting primary status
> + * message from standby.
> + */
> + if (RecoveryInProgress())
> + elog(ERROR, "the primary status is unavailable during recovery");
>
>
> a) This error is not clear. Is it supposed to be user oriented error
> or internal error? As a user, it is difficult to interpret this error
> and take some action.
>
> b) What I understood is that there is no user of enabling
> 'retain_conflict_info' for a subscription which is subscribing to a
> publisher which is physical standby too. So shall we emit such an
> ERROR during 'Create Sub(retain_conflict_info=on)' itself? But that
> would need checking whether the publisher is physical standby or not
> during CREATE-SUB. Is that a possibility?
>
> 2)
>
> --
>   send_feedback(last_received, requestReply, requestReply);
>
> + maybe_advance_nonremovable_xid(&data, false);
> +
>   /*
>   * Force reporting to ensure long idle periods don't lead to
>   * arbitrarily delayed stats. Stats can only be reported outside
> --
>
> Why do we need this call to 'maybe_advance_nonremovable_xid' towards
> end of LogicalRepApplyLoop() i.e. the last call? Can it make any
> further 'data.phase' change here? IIUC, there are 2 triggers for
> 'data.phase' change through LogicalRepApplyLoop(). First one is for
> the very first time when we start this loop, it will set data.phase to
> 0  (RCI_GET_CANDIDATE_XID) and will call
> maybe_advance_nonremovable_xid(). After that, LogicalRepApplyLoop()
> function can trigger a 'data.phase' change only when it receives a
> response from the publisher. Shouldn't the first 4 calls
>  to maybe_advance_nonremovable_xid() from LogicalRepApplyLoop() suffice?
>
> 3)
> Code is almost the same for GetOldestActiveTransactionId() and
> GetOldestTransactionIdInCommit(). I think we can merge these two.
> GetOldestActiveTransactionId() can take new arg "getTxnInCommit".
> GetOldestTransactionIdInCommit() can call
> GetOldestActiveTransactionId() with that arg as true, whereas other 2
> callers can pass it as false.
>

Few more comments mostly for patch001:

4)
For this feature, since we are only interested in remote UPDATEs
happening concurrently, so shall we ask primary to provide oldest
"UPDATE" transaction-id in commit-phase rather than any operation's
transaction-id? This may avoid unnecessarily waiting and pinging at
subscriber's end in order to advance oldest_nonremovable-xid.
Thoughts?

5)
+
+/*
+ * GetOldestTransactionIdInCommit()
+ *
+ * Similar to GetOldestActiveTransactionId but returns the oldest
transaction ID
+ * that is currently in the commit phase.
+ */
+TransactionId
+GetOldestTransactionIdInCommit(void)

If there is no transaction currently in 'commit' phase, this function
will then return the next transaction-id. Please mention this in the
comments. I think the doc 'protocol-replication.html' should also be
updated for the same.

6)

+ data->flushpos_update_time = 0;

Do we really need to reset this 'flushpos_update_time' at the end of
wait_for_local_flush()? Even in the next cycle (when the phase
restarts from RCI_GET_CANDIDATE_XID), we can reuse this time to decide
whether we should call get_flush_position() again or skip it, when in
wait_for_local_flush().

7)

+/*
+ * The remote WAL position that has been applied and flushed locally. We
+ * record this information while sending feedback to the server and use this
+ * both while sending feedback and advancing oldest_nonremovable_xid.
+ */
+static XLogRecPtr last_flushpos = InvalidXLogRecPtr;

I think we record/update last_flushpos in wait_for_local_flush() as
well. Shall we update comments accordingly?

8)
Shall we rename "check_conflict_info_retaintion" to
"check_conflict_info_retention" or
"check_remote_for_retainconflictinfo"? ('retaintion' is not a correct
word)

9)
In RetainConflictInfoData, we can keep reply_time along with other
remote_* variables. The idea is to separate the variables received in
remote's response from the ones purely set and reset by the local
node.

thanks
Shveta

Re: Logical Replication of sequences

2025-05-20 Thread Nisha Moond

On Tue, May 20, 2025 at 8:35 AM Nisha Moond  wrote:
>
> >
> > Thanks for the comments, these are handled in the attached v20250516
> > version patch.
> >
>
> Thanks for the patches. Here are my review comments -
>
> Patch-0004: src/backend/replication/logical/sequencesync.c
>

Hi,

Currently, the behavior of the internal query used to fetch sequence
info from the pub is inconsistent and potentially misleading.

case1: If a single non-existent sequence is passed (e.g.,  VALUES
('public','n10')), the query throws an ERROR, so we get error on sub -
  ERROR:  could not receive list of sequence information from the
publisher: ERROR:  sequence "public.n10" does not exist

case2: If multiple non-existent sequences are passed (e.g., VALUES
('public','n8'),('public','n9')), it silently returns zero rows,
resulting only in a LOG message instead of an error.
  LOG:  Logical replication sequence synchronization for subscription
"subs" - total unsynchronized: 2; batch #1 = 2 attempted, 0 succeeded,
0 mismatched

IMO, This inconsistency can be confusing for users. I think we should
make the behavior uniform. Either -
 (a) Raise an error if any/all of the requested sequences are missing
on the publisher, or
 (b) Instead of raising an error, emit a LOG(as is done in case2) and
maybe include the count of missing sequences too.

I'm fine with either option.

--
Thanks,
Nisha

Re: plan shape work

2025-05-20 Thread Tomas Vondra

On 5/19/25 20:01, Robert Haas wrote:
> Hi,
> 
> A couple of people at pgconf.dev seemed to want to know more about my
> ongoing plan shape work, so here are the patches I have currently.
> This is a long way from something that actually looks like a usable
> feature, but these are bits of infrastructure that I think will be
> necessary to get to a usable feature. As a recap, my overall goal here
> is to make it so that you can examine a finished plan, figure out what
> decisions the planner made, and then somehow get the planner to make
> those same decisions over again in a future planning cycle. Since
> doing this for all types of planner decisions seems too difficult for
> an initial goal, I'm focusing on scans and joins for now. A further
> goal is that I want it to be possible for extensions to use this
> infrastructure to implement a variety of different policies that they
> might feel to be beneficial, so I'm looking to minimize the amount of
> stuff that has to be done in core PostgreSQL or can only be used by
> core PostgreSQL.
> 
> ...

Thanks for the overview. I don't have any immediate feedback, but it
sounds like it might be related to the "making planner decisions clear"
session from the unconference ...

The basic premise of that session was about how to give users better
info about the planner decisions - why paths were selected/rejected,
etc. A simple example would be "why was the index not used", and the
possible answers include "dominated by cost by another path" or "does
not match the index keys" etc.

I wonder if this work might be useful for something like that.

regards

-- 
Tomas Vondra

Re: Conflict detection for update_deleted in logical replication

2025-05-20 Thread shveta malik

On Tue, May 20, 2025 at 10:24 AM Amit Kapila  wrote:
>
> On Tue, May 20, 2025 at 8:38 AM shveta malik  wrote:
> >
> > Please find few more comments:
> >
> > 1)
> > ProcessStandbyPSRequestMessage:
> > + /*
> > + * This shouldn't happen because we don't support getting primary status
> > + * message from standby.
> > + */
> > + if (RecoveryInProgress())
> > + elog(ERROR, "the primary status is unavailable during recovery");
> >
> >
> > a) This error is not clear. Is it supposed to be user oriented error
> > or internal error? As a user, it is difficult to interpret this error
> > and take some action.
> >
>
> This is an internal error for developers to understand that they have
> sent the wrong message. Do you have any suggestions to change it?

The current message is fine if point b) is already taken care of.

>
> > b) What I understood is that there is no user of enabling
> > 'retain_conflict_info' for a subscription which is subscribing to a
> > publisher which is physical standby too. So shall we emit such an
> > ERROR during 'Create Sub(retain_conflict_info=on)' itself? But that
> > would need checking whether the publisher is physical standby or not
> > during CREATE-SUB. Is that a possibility?
> >
>
> The 0003 patch already took care of this, see check_conflict_info_retaintion.
>

Okay, thanks. Missed it somehow during review.

thanks
Shveta

Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler.

2025-05-20 Thread Daniel Gustafsson

> On 20 May 2025, at 04:29, Tom Lane  wrote:

> I want to argue for reverting, at least for v18.  I do not think that
> ProcessGetMemoryContextInterrupt is anywhere near release-quality.
> I found out while poking into Valgrind leak reports that it leaks
> memory --- and does so in TopMemoryContext.

Ugh, that's indeed a showstopper.  Attached is a revert of the feature, I want
to re-read and re-test before pushing to make sure I didn't miss anything but
can go ahead with it later today.

--
Daniel Gustafsson

0001-Revert-function-to-get-memory-context-stats-for-proc.patch
Description: Binary data

Re: PG 18 release notes draft committed

2025-05-20 Thread Bruce Momjian

On Tue, May 13, 2025 at 08:05:24AM +0200, Laurenz Albe wrote:
> On Mon, 2025-05-12 at 20:27 -0700, David G. Johnston wrote:
> > Should all columns removed from system views and/or catalogs be listed in
> > “Migration” or is there some filtering criteria? There are at minimum quite
> > a few statistics related ones we’ve dropped that only appear in the Changes
> > section (e.g., pg_stat_io, pg_stat_wal).
> 
> I am not sure.
> 
> On the one hand, the catalogs don't promise to be a stable API, so there
> would be no need to enumerate such changes as compatibility breaks.
> The "Migration" section also doesn't list changes to the exported
> PostgreSQL functins, which has bitten me as extension developer several
> times.
> 
> On the other hand, the catalogs are described in the documentation, which
> gives them more exposure, and it doesn't seem unreasonable to document
> breaking changes as well.
> 
> Do you have an idea how many changes there are?  If there are not too many,
> and somebody is willing to do the work, I wouldn't be against it.

First, I apologize for the delay in my replying --- I was on vacation
last week.

Second, let me explain the criteria I use for table changes, and then we
can discuss if the criteria is correct, and whether I followed the
criteria accurately for PG 18.

So, there are system views and system tables.  Most system views are
important to users, because we created them mostly for user consumption,
while system tables might or might not hold useful information for
users.

Second, we have three possible changes --- column addition, column
renaming, and column removal.  And third, we can list the changes in the
incompatibility section, or in the main release notes.

So, for column additions, I would never list them in the incompatibility
section, though it could break SELECT *.  For renames and deletes, they
would normally appear in the incompatibility section, unless they are
system tables that do not normally hold user-helpful information, in
which case I might list it in the main section, or not at all.

I believe I followed that criteria for PG 18.  There might be a few
cases in PG 18 where columns used for monitoring were renamed or deleted
because they were replaced, and I felt it was too complex to list them
in the incompatibility section because there were new features mixed
into the process so I listed them in the main section.  I thought that
was the proper balance.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: strange perf regression with data checksums

2025-05-20 Thread Peter Geoghegan

On Mon, May 19, 2025 at 8:21 PM Tomas Vondra  wrote:
> With v2 the regression disappears, both for index-only scans and regular
> index scans. I haven't tried anything with bitmap scans - I hit the
> regression mostly by accident, on a workload that does IOS only. I may
> try constructing something with bitmap scans, but I didn't have time for
> that right now.

I imagine bitmap index scans will be similar to plain index scans.

> I don't know what "fully fix" means in this context. I see a massive
> improvement with v2, I have no idea if that's the best we could do.

You expected there to be *zero* performance impact from enabling
checksums for this workload, since it is a pure read-only workload.
That's what I meant by "fully fix".

> But you're right - it seems sensitive to how many rows are returned, and
> at some point the contention goes away and there's no regression.
>
> I need to do proper automated testing, to get reliable results. I've
> been doing manual testing, but it's easy to make mistakes that way.
>
> Do you have any suggestions what cases you'd like me to test?

Nothing comes to mind. Again, just be aware that we can only
completely avoid calling BufferGetLSNAtomic is only possible when:

* Scan is an index-only scan

OR

* Scan is a bitmap index scan

OR

* Scan is a plain index scan, reading a page that _bt_readpage()
returned "false" for when called.

In other words, plain index scans that read a lot of tuples might
receive no benefit whatsoever. It's possible that it already matters
less there anyway. It's also possible that there is some unforeseen
benefit from merely *delaying* the call to BufferGetLSNAtomic. But in
all likelihood these "unfixed" plain index scans are just as fast with
v2 as they are when run on master/baseline.

-- 
Peter Geoghegan

Re: generic plans and "initial" pruning

2025-05-20 Thread Tom Lane

Amit Langote  writes:
> Thanks for pointing out the hole in the current handling of
> CachedPlan->stmt_list. You're right that the approach of preserving
> the list structure while replacing its contents in-place doesn’t hold
> up when the rewriter adds or removes statements dynamically. There
> might be other cases that neither of us have tried.  I don’t think
> that mechanism is salvageable.

> To address the issue without needing a full revert, I’m considering
> dropping UpdateCachedPlan() and removing the associated MemoryContext
> dance to preserve CachedPlan->stmt_list structure. Instead, the
> executor would replan the necessary query into a transient list of
> PlannedStmts, leaving the original CachedPlan untouched. That avoids
> mutating shared plan state during execution and still enables deferred
> locking in the vast majority of cases.

Yeah, I think messing with the CachedPlan is just fundamentally wrong.
It breaks the invariant that the executor should not scribble on what
it's handed --- maybe not as obviously as some other cases, but it's
still not a good design.

I kind of feel that we ought to take two steps back and think
about what it even means to have a generic plan in this situation.
Perhaps we should simply refuse to use that code path if there are
prunable partitioned tables involved?

> Let me know what you think -- I’ll hold off on posting a revert or a
> replacement until we’ve agreed on the path forward.

I had not looked at 525392d57 in any detail before (the claim in
the commit message that I reviewed it is a figment of someone's
imagination).  Now that I have, I'm still going to argue for revert.
Aside from the points above, I really hate what's been done to the
fundamental executor APIs.  The fact that ExecutorStart callers have
to know about this is as ugly as can be.  I also don't like the
fact that it's added overhead in cases where there can be no benefit
(notice that my test case doesn't even involve a partitioned table).

I still like the core idea of deferring locking, but I don't like
anything about this implementation of it.  It seems like there has
to be a better and simpler way.

regards, tom lane

Re: Suggestion: Update Copyright Year to 2025 in Recently Added Files

2025-05-20 Thread Bruce Momjian

On Sun, May 18, 2025 at 11:34:07PM -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > On Sun, May 18, 2025 at 12:09:23PM +0530, Shaik Mohammad Mujeeb wrote:
> >> Just wanted to check - should this be updated to reflect 1996-2025 instead?
> 
> > This process is automated by src/tools/copyright.pl on a yearly-basis,
> > but it is possible that holes appear when some code gets committed
> > that predates the new year.
> 
> I had the idea that this was part of our pre-branch checklist,
> but it was not mentioned there.  I added it.

I only run it in early January of each year.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: Violation of principle that plan trees are read-only

2025-05-20 Thread Tom Lane

Andres Freund  writes:
> Afaict the mtstate->ps.plan->targetlist assignment, and the ExecTypeFromTL(),
> ExecAssignResultTypeFromTL() before that, are completely superfluous.  Am I
> missing something?

I think you are right.  The two tlists are completely identical in
most cases, because of this bit in setrefs.c:

/*
 * Set up the visible plan targetlist as being the same as
 * the first RETURNING list. This is for the use of
 * EXPLAIN; the executor won't pay any attention to the
 * targetlist.  We postpone this step until here so that
 * we don't have to do set_returning_clause_references()
 * twice on identical targetlists.
 */
splan->plan.targetlist = copyObject(linitial(newRL));

I added a quick check

+   if (!equal(mtstate->ps.plan->targetlist,
+  linitial(returningLists)))
+   elog(WARNING, "not matched targetlist");

to verify that.  There's one regression case in which it complains,
and that's one where we pruned the first result relation so that
linitial(returningLists) is now the second result rel's RETURNING
list.  But as you say, that should still produce the same tupdesc.
I think we can just delete this assignment (and fix these comments).

> Wonder if the targetlist assignment is superfluous made me wonder if we would
> detect mismatches - and afaict we largely wouldn't. There's basically no
> verification in ExecBuildProjectionInfo().  And indeed, adding something very
> basic shows:

Hmm, seems like an independent issue.

regards, tom lane

Re: fixing CREATEROLE

2025-05-20 Thread Robert Haas

On Wed, Apr 30, 2025 at 4:29 PM Tom Lane  wrote:
> However, I'd at least like to complain about the fact that
> it breaks pg_dumpall, which is surely not expecting anything
> but the default behavior.  If for any reason the restore is
> run under a non-default setting of createrole_self_grant,
> there's a potential of creating role grants that were not
> there in the source database.  Admittedly the damage is
> probably limited by the fact that it only applies if the
> restoring user has CREATEROLE but not SUPERUSER, which
> I imagine is a rare case.  But don't we need to add
> createrole_self_grant to the set of GUCs that pg_dump[all]
> resets in the emitted SQL?

I spent some time on this today. As you might imagine, it's quite easy
to make pg_dumpall emit SET createrole_self_grant = '', but doing so
seems less useful than I had expected. I wonder if I'm missing
something important here, so I thought I'd better inquire before
proceeding further.

As I see it, the core difficulty is that the output of pg_dumpall
always contains a CREATE ROLE statement for every single role in the
system, even the bootstrap superuser. For starters, that means that if
you simply run initdb and create cluster #1, run initdb again and
create cluster #2, dump the first and restore to the second, you will
get an error, because the same bootstrap superuser will exist in both
systems, so when you feed the output of pg_dumpall to psql, you end up
trying to create a role that already exists. At this point, my head is
already kind of exploding, because I thought we were pretty careful to
try to make it so that pg_dump output can be restored without error
even in the face of pre-existing objects like the public schema and
the plpgsql language, but apparently we haven't applied the same
principle to pg_dumpall.[1]

But if, as you posit above, we were to try running the output of
pg_dumpall through psql as a non-superuser, the problem is a whole lot
worse. You can imagine a pg_dumpall feature that only tries to dump
(on the source system) roles that the dumping user can administer, and
only tries to recreate those roles on the target system, but we
haven't got that feature, so we're going to try to recreate every
single source role on the target system, including the bootstrap user
and the non-superuser who is restoring the dump if they exist on the
source side and any other superusers and any other users created by
other CREATEROLE superusers and it seems to me that under any set of
somewhat-reasonable assumptions you're going to expect a bunch of
error messages to start showing up at this point. In short, trying to
restore pg_dumpall output as a non-superuser appears to be an
unsupported scenario, so the fact that we don't SET
createrole_self_grant = '' to cater to it doesn't really seem like a
bug to many any more.

In fact, I think there's a decent argument that we ought to let the
prevailing value of createrole_self_grant take effect in this
scenario. One pretty likely scenario, at least as it seems to me, is
that the user was superuser on the source system and is not superuser
on the target system but wants to recreate the same set of roles. If
they want to freely access objects owned by those roles as they could
on the source system, then they're going to need self-grants, and we
have no better clue than the value of createrole_self_grant to help us
figure out whether they want that or not.

To state the concern another way, if this is a bug, it should be
possible to construct a test case that fails without the patch and
passes with the patch. But it appears to me that the only way I could
do that is if I programatically edit the dump. And that seems like
cheating, because if we are talking about a scenario where the user is
editing the dump, they can also add SET createrole_self_grant = '' if
desired.

I don't want to make it sound like I now hate the idea of doing as you
proposed here, because I do see the point of nailing down critical
GUCs that can affect the interpretation of SQL statements in places
like pg_dumpall output, and maybe we should do that here ... kinda
just in case? But I'm not altogether sure that's a sufficient
justification, and at any rate I think we need to be clear on whether
that *is* the justification or whether there's something more concrete
that we're trying to make work.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

[1] Exception: When --binary-upgrade is used, we emit only ALTER ROLE
and not CREATE ROLE for the bootstrap superuser. Why we think the
error is only worth avoiding in the --binary-upgrade case is unclear
to me.

Re: proposal: schema variables

2025-05-20 Thread Laurenz Albe

On Tue, 2025-05-20 at 16:28 -0400, Bruce Momjian wrote:
> On Tue, May 20, 2025 at 08:47:36PM +0200, Daniel Gustafsson wrote:
> > > On 20 May 2025, at 18:39, Bruce Momjian  wrote:
> > > My only point is that we should only be using email lists for work that
> > > is being actively worked on to be added to community Postgres.  There
> > > has been talk of a trimmed-down version of this being applied, but I
> > > don't see any work in that direction.
> > > 
> > > This patch should be moved to a separate location where perhaps people
> > > can subscribe to updates when they are posted, perhaps github.
> > 
> > As a project with no roadmap governed by open forum consensus I don't think 
> > we
> > have any right to tell community members what they can or cannot work on 
> > here,
> > any technical discussion which conforms with our published policies should 
> > be
> > welcome.  If Pavel want's to continue rebasing his patchset here then he 
> > has,
> > IMHO, every right to do so.
> > 
> > Whether or not a committer will show interest at some point is another 
> > thing,
> > but we are seeing a very good role-model for taking responsibility for ones
> > work here at the very least =)
> 
> Well, we do have a right, e.g., we would not allow someone to repeatedly
> post patches for a Postgres extension we don't manage, or the jdbc
> driver.  I also don't think we would allow someone to continue posting
> patches for a feature we have decided to reject, and I think we have
> decided to reject the patch in in its current form.  I think we might
> accept a trimmed-down version, but I don't see the patch moving in that
> direction.
> 
> Now, of course, if I am the only one who feels this way, I can suppress
> these emails on my end.

In my opinion, this patch set is adding something that would be valuable to
have in core.

If no committer intends to pick it up and commit it, I think the proper
action would be to step up and reject the patch set, not complain about the
insistence of the author.

Yours,
Laurenz Albe

Re: Violation of principle that plan trees are read-only

2025-05-20 Thread Tom Lane

Andres Freund  writes:
> Largely makes sense, the only thing I see is that the !returningLists branch
> does:
>   /*
>* We still must construct a dummy result tuple type, because 
> InitPlan
>* expects one (maybe should change that?).
>*/
>   mtstate->ps.plan->targetlist = NIL;

> which we presumably shouldn't do anymore either.  It never changes anything
> afaict, but still.

D'oh ... I had seen that branch before, but missed fixing it.
Yeah, the targetlist will be NIL already, but it's still wrong.

>> I'm tempted to back-patch this: the plan tree damage seems harmless at
>> present, but maybe it'd become less harmless with future fixes.

> There are *some* cases where this changes the explain output, but but the new
> output is more correct, I think:
> ...
> I suspect this is an argument for backpatching, not against - seems that
> deparsing could end up creating bogus output in cases where it could matter?
> Not sure if such cases are reachable via views (and thus pg_dump) or
> postgres_fdw, but it seems possible.

I don't believe that we guarantee EXPLAIN output to be 100% valid SQL,
so I doubt there's a correctness argument here; certainly it'd not
affect pg_dump.  I'm curious though: what was the test case you were
looking at?

regards, tom lane

Re: Re: proposal: schema variables

2025-05-20 Thread Pavel Stehule

Hi

út 20. 5. 2025 v 18:39 odesílatel Bruce Momjian  napsal:

> On Tue, May 20, 2025 at 01:33:18PM -0300, Marcos Pegoraro wrote:
> > Em ter., 20 de mai. de 2025 às 11:56, Bruce Momjian 
> > escreveu:
> >
> > I will again ask why this patch set is being reposted when there is
> no
> > plan to apply it to git master
> >
> > Too bad. I would love to have this functionality, from the user's point
> of view
> > there are problems where it would solve them wonderfully. I don't know
> > technically of what prevents it from being natively on core, but it
> would be
> > great, it would definitely be.
>
> My only point is that we should only be using email lists for work that
> is being actively worked on to be added to community Postgres.  There
> has been talk of a trimmed-down version of this being applied, but I
> don't see any work in that direction.
>

I sent a reduced version a few months ago - from 21 patches to 8 (and it
can be reduced to six if we postpone tools for detection ambiguity).
The timing was not perfect - the focus was and it is concentrated to finish
pg18.

I am very sorry if this topic and patches bother anyone. I am afraid if I
close it to some personal github, it will not be visible, and I am sure this
feature is missing in Postgres. Today we have few workarounds. Some
workarounds are not available everywhere, some workarounds cannot
be used for security. With integrated solutions some scenarios can be done
more easily, more secure, faster, more comfortable.  It is true, so
mentioned scenarios are not "hot" today. Stored procedures or RLS or
migration procedures from other databases are not extra common. But
who uses it, then he misses session variables.

This topic is difficult, because there is no common solution. SQL/PSM is
almost dead. T-SQL (and MySQL) design is weak and cannot be used for
security.
Oracle's design is joined with just one environment. And although almost
all widely used databases have supported session variables for decades, no
one design
is perfect. Proposed design is not perfect too (it introduces possible
ambiguity) , but I think it can support most wanted use cases (can be
enhanced in future),
and it is consistent with Postgres. There are more ways to reduce risk of
unwanted ambiguity to zero. But it increases the size of the patch.

Regards

Pavel

>
> This patch should be moved to a separate location where perhaps people
> can subscribe to updates when they are posted, perhaps github.
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   Do not let urgent matters crowd out time for investment in the future.
>

Re: proposal: schema variables

2025-05-20 Thread Bruce Momjian

On Tue, May 20, 2025 at 08:47:36PM +0200, Daniel Gustafsson wrote:
> > On 20 May 2025, at 18:39, Bruce Momjian  wrote:
> > My only point is that we should only be using email lists for work that
> > is being actively worked on to be added to community Postgres.  There
> > has been talk of a trimmed-down version of this being applied, but I
> > don't see any work in that direction.
> > 
> > This patch should be moved to a separate location where perhaps people
> > can subscribe to updates when they are posted, perhaps github.
> 
> As a project with no roadmap governed by open forum consensus I don't think we
> have any right to tell community members what they can or cannot work on here,
> any technical discussion which conforms with our published policies should be
> welcome.  If Pavel want's to continue rebasing his patchset here then he has,
> IMHO, every right to do so.
> 
> Whether or not a committer will show interest at some point is another thing,
> but we are seeing a very good role-model for taking responsibility for ones
> work here at the very least =)

Well, we do have a right, e.g., we would not allow someone to repeatedly
post patches for a Postgres extension we don't manage, or the jdbc
driver.  I also don't think we would allow someone to continue posting
patches for a feature we have decided to reject, and I think we have
decided to reject the patch in in its current form.  I think we might
accept a trimmed-down version, but I don't see the patch moving in that
direction.

Now, of course, if I am the only one who feels this way, I can suppress
these emails on my end.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: Violation of principle that plan trees are read-only

2025-05-20 Thread Andres Freund

Hi,

On 2025-05-20 16:18:57 -0400, Tom Lane wrote:
> Andres Freund  writes:
> >> I'm tempted to back-patch this: the plan tree damage seems harmless at
> >> present, but maybe it'd become less harmless with future fixes.
> 
> > There are *some* cases where this changes the explain output, but but the 
> > new
> > output is more correct, I think:
> > ...
> > I suspect this is an argument for backpatching, not against - seems that
> > deparsing could end up creating bogus output in cases where it could matter?
> > Not sure if such cases are reachable via views (and thus pg_dump) or
> > postgres_fdw, but it seems possible.
> 
> I don't believe that we guarantee EXPLAIN output to be 100% valid SQL,
> so I doubt there's a correctness argument here; certainly it'd not
> affect pg_dump.

I wasn't thinking of EXPLAIN itself, but was wondering whether it's possible
to create a view, rule or such that is affected by the output change. Would be
a weird case, if it existed.


> I'm curious though: what was the test case you were looking at?

It's a modified query from our regression tests, I had added some debugging to
find cases where the targetlists differed.  I attached the extracted, somewhat
modified, sql script.

Greetings,

Andres Freund


repro-dml-returning.sql
Description: application/sql

Re: Violation of principle that plan trees are read-only

2025-05-20 Thread Nico Williams

On Mon, May 19, 2025 at 08:41:06AM -0400, Isaac Morland wrote:
> I assume this question has an obvious negative answer, but why can't we
> attach const declarations to the various structures that make up the plan
> tree (at all levels, all the way down)? I know const doesn't actually
> prevent a value from changing, but at least the compiler would complain if
> code accidentally tried.

What you want is for C to have a type attribute that denotes
immutability all the way down.  `const` doesn't do that.  One thing that
could be done is to write a utility that creates const-all-the-way-down
clones of given types, but such a tool can't be written as C
pre-processor macros -- it would have to be a tool that parses the
actual type definitions, or uses DWARF or similar to from the
compilation of a file (I've done this latter before, but this weds you
to compilers that output DWARF, which MSVC doesn't, for example).

Really, the C committee ought to add this at some point, darn it.  It
would be the opposite of Rust's &mut.

Nico
--

Re: Join removal and attr_needed cleanup

2025-05-20 Thread Bennie Swart

Given that some time has passed I'd like to raise this again. Any chance 
of this making it back to 16 to fix the regression?


We'll be using 16 for the better part of a year still and we're having 
to resort to some annoying workarounds for this.


Regards,

Bennie

On 2024/11/10 18:17, Tom Lane wrote:

Bennie Swart  writes:

-- join elimination fails
-- expect both b and c to be eliminated, but b remains
explain (costs off)
    select a.*
      from foo a
    left join foo b on (b.id1, b.id2) = (a.id1, a.id2)
    left join foo c on (c.id1, c.id2) = (a.id1, b.id2);
     -- ^^


This does work in HEAD, presumably as a consequence of a3179ab69:

regression=# explain (costs off)
select a.*
  from foo a
left join foo b on (b.id1, b.id2) = (a.id1, a.id2)
left join foo c on (c.id1, c.id2) = (a.id1, b.id2);
 QUERY PLAN
---
  Seq Scan on foo a
(1 row)

I think it's still too soon to consider back-patching that though,
since it's only been in the tree for six weeks.

regards, tom lane

Re: Violation of principle that plan trees are read-only

2025-05-20 Thread Tom Lane

I wrote:
> [ confused... ]  I get the same output from that script with or
> without the patch.

Oh, scratch that: I'd gotten confused about which branch I was
working in.  It does change the output as you say.

I still think it's irrelevant to view display, including pg_dump,
though.  Those operations work from a parse tree not a plan tree.

regards, tom lane

Re: Disable parallel query by default

2025-05-20 Thread Scott Mead



On Wed, May 14, 2025, at 4:06 AM, Laurenz Albe wrote:
> On Tue, 2025-05-13 at 17:53 -0400, Scott Mead wrote:
> > On Tue, May 13, 2025, at 5:07 PM, Greg Sabino Mullane wrote:
> > > On Tue, May 13, 2025 at 4:37 PM Scott Mead  wrote:
> > > > I'll open by proposing that we prevent the planner from automatically
> > > > selecting parallel plans by default
> > > 
> > > That seems a pretty heavy hammer, when we have things like
> > > parallel_setup_cost that should be tweaked first.
> > 
> > I agree it's a big hammer and I thought through parallel_setup_cost
> > quite a bit myself.  The problem with parallel_setup_cost is that it
> > doesn't actually represent the overhead of a setting up parallel
> > query for a busy system.  It does define the cost of setup for a
> > *single* parallel session, but it cannot accurately express the
> > cost of CPU and other overhead associated with the second, third,
> > fourth, etc... query that is executed as parallel.  The expense to
> > the operating system is a function of the _rate_ of parallel query
> > executions being issued.  Without new infrastructure, there's no way
> > to define something that will give me a true representation of the
> > cost of issuing a query with parallelism.
> 
> There is no way for the optimizer to represent that your system is
> under CPU overload currently.  But I agree with Greg that
> parallel_setup_cost is the setting that should be adjusted.
> If PostgreSQL is more reluctant to even start considering a parallel plan,
> that would be a move in the right direction in a case like this:
> 
> > > > What is the fallout?  When a high-volume, low-latency query flips to
> > > > parallel execution on a busy system, we end up in a situation where
> > > > the database is effectively DDOSing itself with a very high rate of
> > > > connection establish and tear-down requests.  Even if the query ends
> > > > up being faster (it generally does not), the CPU requirements for the
> > > > same workload rapidly double or worse, with most of it being spent
> > > > in the OS (context switch, fork(), destroy()).  When looking at the
> > > > database, you'll see a high load average, and high wait for CPU with
> > > > very little actual work being done within the database.
> 
> You are painting a bleak picture indeed.  I get to see PostgreSQL databases
> in trouble regularly, but I have not seen anything like what you describe.
> If a rather cheap, very frequent query is suddenly estimated to be
> expensive enough to warrant a parallel plan, I'd suspect that the estimates
> must be seriously off.
> 
> With an argument like that, you may as well disable nested loop joins.
> I have seen enough cases where disabling nested loop joins, without any
> deeper analysis, made very slow queries reasonably fast.

My argument is that parallel query should not be allowed to be invoked without 
user intervention.  Yes, nestedloop can have a similar impact, but let's take a 
look at the breakdown at scale of PQ: 

1. pgbench -i -s 100

2. Make a query that will execute in parallel

SELECT aid, a.bid, bbalance
   FROM pgbench_accounts a, pgbench_branches b 
WHERE a.bid = b.bid
ORDER BY bbalance desc;

Non Parallel query = 4506.559 ms
Parallel query = 2849.073 

Arguably, much better.

3. Use pgbench to execute these with a concurrency of 10, rate limit of 1 tps

pgbench -R 1 -r -T 120 -P 5 --no-vacuum -f pselect.sql -c 10

4. The parallel query was executing ~ 2.8 seconds in isolation, but when 
running with 10 concurrent sessions, breaks down to 5.8 seconds the 
non-parallel version executes on average of 5.5 seconds.  You've completely 
erased the gains and only have a concurrency of 5 (that's with 
max_parallel_workers = 8).  If you increase max_parallel_workers, this quickly 
becomes worse.

Even though parallel query is faster in isolation, even a small amount of 
concurrency has a quickly compounding effect the degrades very quickly (again, 
defaults with a 16 processor machine).

Concurrency - Non Parallel Runtime - Parallel Runtime
  1-   5003.951  -   3681.452
  5-  4936.565  -   4565.171
 10-  5573.239  -   5894.397
 15   -   6224.292  -   8470.982
 20  -   5632.948  -   13277.857

Even with max_parallel_workers protecting us with '8' (default), we erase our 
advantage by the time we go to concurrency of 5 clients.  

Going back to the original commit which enabled PQ by default[1], it was done 
so that the feature would be tested during beta.  I think it's time that we 
limit the accidental impact this can have to users by disabling the feature by 
default. 



[1]-
https://github.com/postgres/postgres/commit/77cd477c4ba885cfa1ba67beaa82e06f2e182b85

"
Enable parallel query by default.
Change max_parallel_degree default from 0 to 2.  It is possible that
this is not a good idea, or that we should g

Re: proposal: schema variables

2025-05-20 Thread Bruce Momjian

On Tue, May 20, 2025 at 10:36:54PM +0200, Laurenz Albe wrote:
> On Tue, 2025-05-20 at 16:28 -0400, Bruce Momjian wrote:
> > Well, we do have a right, e.g., we would not allow someone to repeatedly
> > post patches for a Postgres extension we don't manage, or the jdbc
> > driver.  I also don't think we would allow someone to continue posting
> > patches for a feature we have decided to reject, and I think we have
> > decided to reject the patch in in its current form.  I think we might
> > accept a trimmed-down version, but I don't see the patch moving in that
> > direction.
> > 
> > Now, of course, if I am the only one who feels this way, I can suppress
> > these emails on my end.
> 
> In my opinion, this patch set is adding something that would be valuable to
> have in core.
> 
> If no committer intends to pick it up and commit it, I think the proper
> action would be to step up and reject the patch set, not complain about the
> insistence of the author.

Are you saying I should not complain until we have officially rejected
the patch set?  If we officially reject it, the patch author would no
longer post it?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: Re: proposal: schema variables

2025-05-20 Thread Bruce Momjian

On Tue, May 20, 2025 at 10:28:31PM +0200, Pavel Stehule wrote:
> út 20. 5. 2025 v 18:39 odesílatel Bruce Momjian  napsal:
> I sent a reduced version a few months ago - from 21 patches to 8 (and it can 
> be
> reduced to six if we postpone tools for detection ambiguity).
> The timing was not perfect - the focus was and it is concentrated to finish
> pg18.

It was not clear to me that this patch set was being reduced to make it
more likely it would be accepted?  I thought it was the same patch set
since 20??.

> I am very sorry if this topic and patches bother anyone. I am afraid if I 
> close
> it to some personal github, it will not be visible, and I am sure this
> feature is missing in Postgres. Today we have few workarounds. Some 
> workarounds
> are not available everywhere, some workarounds cannot
> be used for security. With integrated solutions some scenarios can be done 
> more
> easily, more secure, faster, more comfortable.  It is true, so
> mentioned scenarios are not "hot" today. Stored procedures or RLS or migration
> procedures from other databases are not extra common. But
> who uses it, then he misses session variables.

Understood.  If people feel it is progressing toward acceptance, I
certainly withdraw my objection and apologize.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: Violation of principle that plan trees are read-only

2025-05-20 Thread Tom Lane

Andres Freund  writes:
> On 2025-05-20 16:18:57 -0400, Tom Lane wrote:
>> I'm curious though: what was the test case you were looking at?

> It's a modified query from our regression tests, I had added some debugging to
> find cases where the targetlists differed.  I attached the extracted, somewhat
> modified, sql script.

[ confused... ]  I get the same output from that script with or
without the patch.

regards, tom lane

Re: Avoid orphaned objects dependencies, take 3

2025-05-20 Thread Jeff Davis

On Mon, 2025-05-19 at 14:07 -0400, Robert Haas wrote:
> I agree with that, but I think that it may also be error-prone to
> assume that it's OK to acquire heavyweight locks on other catalog
> objects at any place in the code where we record a dependency. I will
> not be surprised at all if that turns out to have some negative
> consequences. For example, I think it might result in acquiring the
> locks on those other objects at a subtly wrong time (leading to race
> conditions) or acquiring two locks on the same object with different
> lock modes where we should really only acquire one. I'm all in favor
> of solving this problem using heavyweight locks, but I think that
> implicitly acquiring them as a side effect of recording dependencies
> feels too surprising.

I see what you mean now, in the sense that other code that calls
LockDatabaseObject (and other variants of LockAcquire) are mostly
higher-level operations like AlterPublication(), and not side-effects
of something else.

But relation_open() is sort of an exception. There are lots of places
where that takes a lock because we happen to want something out of the
relcache, like generate_partition_qual() taking a lock on the parent or
CheckAttributeType() taking a lock on the typrelid. You could say those
are fairly obvious, but that's because we already know, and we could
make it more widely known that recording a dependency takes a lock.

One compromise might be to have recordDependencyOn() take a LOCKMODE
parameter, which would both inform the caller that a lock will be
taken, and allow the caller to do it their own way and specify NoLock
if necessary. That still results in a huge diff, but the end result
would not be any more complex than the current code.

Regards,
Jeff Davis

Re: Violation of principle that plan trees are read-only

2025-05-20 Thread Jose Luis Tallon


On 20/5/25 22:43, Nico Williams wrote:

[snip]
What you want is for C to have a type attribute that denotes
immutability all the way down.  `const` doesn't do that.  One thing that
could be done is to write a utility that creates const-all-the-way-down
clones of given types, but such a tool can't be written as C
pre-processor macros -- it would have to be a tool that parses the
actual type definitions, or uses DWARF or similar to from the
compilation of a file (I've done this latter before, but this weds you
to compilers that output DWARF, which MSVC doesn't, for example).

Really, the C committee ought to add this at some point, darn it.  It
would be the opposite of Rust's &mut.


Like C++'s const specifier, specially const references to objects?  This 
is actually natively compatible with C code, "just" by throwing extern 
"C" around.


    https://en.cppreference.com/w/cpp/language/cv

(most of Postgres' code is already "Object-oriented in C" in my view...)


The fact that the C++ compiler is usually able to optimize deeper/better 
than a C one due to aggresive inlining+global optimization and inferred 
"strict"ness doesn't hurt either :)



My €.02.  HTH.

    / J.L.

--
Parkinson's Law: Work expands to fill the time alloted to it.

Re: Assert("vacrel->eager_scan_remaining_successes > 0")

2025-05-20 Thread Masahiko Sawada

(CC'ed to Melanie)

On Fri, May 2, 2025 at 11:59 AM Masahiko Sawada  wrote:
>
> Hi,
>
> I hit the assertion failure in the subject line with the following script:
>
> create table t (a int) with (autovacuum_enabled = off);
> insert into t select generate_series(1, 1_000_000);
> vacuum t;
> insert into t select generate_series(1, 1_000_000);
> set vacuum_freeze_min_age to 0;
> vacuum t;
>
> When the success count reaches to 0, we disable the eager scan by
> resetting related fields as follows:
>
> /*
>  * If we hit our success cap, permanently disable eager
>  * scanning by setting the other eager scan management
>  * fields to their disabled values.
>  */
> vacrel->eager_scan_remaining_fails = 0;
> vacrel->next_eager_scan_region_start = InvalidBlockNumber;
> vacrel->eager_scan_max_fails_per_region = 0;
>
> However, there is a possibility that we have already eagerly scanned
> another page and returned it to the read stream before we freeze the
> eagerly-scanned page and disable the eager scan. In this case, the
> next block that we retrieved from the read stream could also be an
> eagerly-scanned page.

I've added it to Open Items for v18.

If I understand correctly, there's a potential scenario where we might
eagerly scan more pages than permitted by the success and failure
caps. One question is: what is the practical magnitude of this excess
scanning? If the overflow could be substantial (for instance, eagerly
scanning 30% of table pages), we should consider revising our eager
scanning mechanism.

One potential solution would be to implement a counter tracking the
number of eagerly scanned but unprocessed pages. This counter could
then inform the decision-making process in
find_next_unskippable_block() regarding whether to proceed with eager
scanning of additional pages.

Alternatively, if the excess scanning proves negligible in practice,
we could adopt a simpler solution by allowing
vacrel->eager_scan_remaining_successes to accept negative values and
removing the assertion in question.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

2025-05-20 Thread Vitaly Davydov

Hi Alexander,

Thank you very much for the review!

> The patchset doesn't seem to build after 371f2db8b0, which adjusted
> the signature of the INJECTION_POINT() macro.  Could you, please,
> update the patchset accordingly.

I've updated the patch (see attached). Thanks.

> I see in 0004 patch we're calling XLogGetReplicationSlotMinimumLSN()
> before slots synchronization then use it for WAL truncation.
> Generally looks good.  But what about the "if
> (InvalidateObsoleteReplicationSlots(...))" branch?  It calls
> XLogGetReplicationSlotMinimumLSN() again.  Why would the value
> obtained from the latter call reflect slots as they are synchronized
> to the disk?

In patch 0004 I call XLogGetReplicationSlotMinimumLSN() again to keep the old
behaviour - this function was called in KeepLogSeg prior to my change. I also
call CheckPointReplicationSlots at the next line to save the invalidated and
other dirty slots on disk again to make sure, the new oldest LSN is in sync.

The problem I tried to solve in this if-branch is to fix test
src/test/recovery/t/019_replslot_limit.pl which was failed because the WAL was
not truncated enought for the test to pass ok. In general, this branch is not
necessary and we may fix the test by calling checkpoint twice (please, see the
alternative.rej patch for this case). If you think, we should incorporate this
new change, I'm ok to do it. But the WAL will be truncating more lazily.

Furthermore, I think we can save slots on disk right after invalidation, not in
CheckPointGuts to avoid saving invalidated slots twice.

With best regards,
Vitaly
From 41eed2a90d68f4d9ac1ee3d00c89879358d19fd1 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Thu, 21 Nov 2024 23:07:22 +0100
Subject: [PATCH 2/5] Add TAP test to check logical repl slot advance during
 checkpoint

The test verifies that logical replication slot is still valid after
immediate restart on checkpoint completion in case when the slot was
advanced during checkpoint.

Original patch by: Tomas Vondra 
Modified by: Vitaly Davydov 
Discussion: https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497
---
 src/test/modules/Makefile |   4 +-
 src/test/modules/meson.build  |   1 +
 .../test_replslot_required_lsn/Makefile   |  18 +++
 .../test_replslot_required_lsn/meson.build|  15 +++
 .../t/001_logical_slot.pl | 124 ++
 5 files changed, 160 insertions(+), 2 deletions(-)
 create mode 100644 src/test/modules/test_replslot_required_lsn/Makefile
 create mode 100644 src/test/modules/test_replslot_required_lsn/meson.build
 create mode 100644 src/test/modules/test_replslot_required_lsn/t/001_logical_slot.pl

diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index aa1d27bbed3..53d3dd8e0ed 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -46,9 +46,9 @@ SUBDIRS = \
 
 
 ifeq ($(enable_injection_points),yes)
-SUBDIRS += injection_points gin typcache
+SUBDIRS += injection_points gin typcache test_replslot_required_lsn
 else
-ALWAYS_SUBDIRS += injection_points gin typcache
+ALWAYS_SUBDIRS += injection_points gin typcache test_replslot_required_lsn
 endif
 
 ifeq ($(with_ssl),openssl)
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 9de0057bd1d..ac0dbd1f10f 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -43,3 +43,4 @@ subdir('typcache')
 subdir('unsafe_tests')
 subdir('worker_spi')
 subdir('xid_wraparound')
+subdir('test_replslot_required_lsn')
diff --git a/src/test/modules/test_replslot_required_lsn/Makefile b/src/test/modules/test_replslot_required_lsn/Makefile
new file mode 100644
index 000..e5ff8af255b
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/Makefile
@@ -0,0 +1,18 @@
+# src/test/modules/test_replslot_required_lsn/Makefile
+
+EXTRA_INSTALL=src/test/modules/injection_points \
+	contrib/test_decoding
+
+export enable_injection_points
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_replslot_required_lsn
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_replslot_required_lsn/meson.build b/src/test/modules/test_replslot_required_lsn/meson.build
new file mode 100644
index 000..999c16201fb
--- /dev/null
+++ b/src/test/modules/test_replslot_required_lsn/meson.build
@@ -0,0 +1,15 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+tests += {
+  'name': 'test_replslot_required_lsn',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'tap': {
+'env': {
+   'enable_injection_points': get_option('injection_points') ? 'yes' : 'no',
+},
+'tests': [
+  't/001_logical_slot.pl'
+],
+  },
+}
diff --git a/src/test/modules/test_replslot_required_ls

Re: Violation of principle that plan trees are read-only

2025-05-20 Thread Andres Freund

Hi,

On 2025-05-20 10:59:22 -0400, Andres Freund wrote:
> On 2025-05-18 19:31:33 -0400, Tom Lane wrote:
> > While chasing down Valgrind leakage reports, I was disturbed
> > to realize that some of them arise from a case where the
> > executor scribbles on the plan tree it's given, which it is
> > absolutely not supposed to do:
> >
> > /*
> >  * Initialize result tuple slot and assign its rowtype using the 
> > first
> >  * RETURNING list.  We assume the rest will look the same.
> >  */
> > mtstate->ps.plan->targetlist = (List *) linitial(returningLists);
> >
> > A bit of git archaeology fingers Andres' commit 4717fdb14, which we
> > can't easily revert since he later got rid of ExecAssignResultType
> > altogether.  But I think we need to do something about it --- it's
> > purest luck that this doesn't cause serious problems in some cases.
>
> I have no idea what I was smoking at that time, this clearly is wrong.
>
> I think the reason it doesn't cause problems is that we're just using the
> first child's targetlist every time and we're just assigning the same value
> back every time.
>
>
> What's even more absurd is: Why do we even need to assign the result type at
> all? Before & after 4717fdb14. The planner ought to have already figured this
> out, no?
>
> It seems that if not, we'd have a problem anyway, who says the "calling" nodes
> (say a wCTE) can cope with whatever output we come up with?
>
> Except of course, there is exactly one case in our tests where the tupledescs
> aren't equal :(

That's a harmless difference, as it turns out. The varnos / varattnos differ,
but that's fine, because we don't actually build the projection with that
tlist, we just use to allocate the slot, and that doesn't care about the
tlist. The building of the projection uses the specific child's returning
list.

Afaict the mtstate->ps.plan->targetlist assignment, and the ExecTypeFromTL(),
ExecAssignResultTypeFromTL() before that, are completely superfluous.  Am I
missing something?


Wonder if the targetlist assignment is superfluous made me wonder if we would
detect mismatches - and afaict we largely wouldn't. There's basically no
verification in ExecBuildProjectionInfo().  And indeed, adding something very
basic shows:

--- /home/andres/src/postgresql/src/test/regress/expected/merge.out 
2025-04-06 22:54:14.078394968 -0400
+++ 
/srv/dev/build/postgres/m-dev-assert/testrun/regress/regress/results/merge.out  
2025-05-20 11:51:51.549525728 -0400
@@ -2653,20 +2653,95 @@
 MERGE into measurement m
  USING new_measurement nm ON
   (m.city_id = nm.city_id and m.logdate=nm.logdate)
 WHEN MATCHED AND nm.peaktemp IS NULL THEN DELETE
 WHEN MATCHED THEN UPDATE
  SET peaktemp = greatest(m.peaktemp, nm.peaktemp),
 unitsales = m.unitsales + coalesce(nm.unitsales, 0)
 WHEN NOT MATCHED THEN INSERT
  (city_id, logdate, peaktemp, unitsales)
VALUES (city_id, logdate, peaktemp, unitsales);
+WARNING:  type mismatch: resno 1: slot type 0 vs expr type 23
+WARNING:  TargetEntry:
+DETAIL: {TARGETENTRY
+   :expr
+  {VAR
+  :varno -1
+  :varattno 3
+  :vartype 23
+  :vartypmod -1
+  :varcollid 0
+  :varnullingrels (b)
+  :varlevelsup 0
+  :varreturningtype 0
+  :varnosyn 2
+  :varattnosyn 1
+  :location 384
+  }
+   :resno 1
+   :resname city_id
+   :ressortgroupref 0
+   :resorigtbl 0
+   :resorigcol 0
+   :resjunk false
+   }
+
+WARNING:  type mismatch: resno 2: slot type 23 vs expr type 1082
+WARNING:  TargetEntry:
+DETAIL: {TARGETENTRY
+   :expr
+  {VAR
+  :varno -1
+  :varattno 4
+  :vartype 1082
+  :vartypmod -1
+  :varcollid 0
+  :varnullingrels (b)
+  :varlevelsup 0
+  :varreturningtype 0
+  :varnosyn 2
+  :varattnosyn 2
+  :location 393
+  }
+   :resno 2
+   :resname logdate
+   :ressortgroupref 0
+   :resorigtbl 0
+   :resorigcol 0
+   :resjunk false
+   }
+
+WARNING:  type mismatch: resno 3: slot type 1082 vs expr type 23
+WARNING:  TargetEntry:
+DETAIL: {TARGETENTRY
+   :expr
+  {VAR
+  :varno -1
+  :varattno 1
+  :vartype 23
+  :vartypmod -1
+  :varcollid 0
+  :varnullingrels (b)
+  :varlevelsup 0
+  :varreturningtype 0
+  :varnosyn 2
+  :varattnosyn 3
+  :location 402
+  }
+   :resno 3
...


Greetings,

Andres Freund

Re: Re: proposal: schema variables

2025-05-20 Thread Bruce Momjian

On Tue, May 20, 2025 at 01:33:18PM -0300, Marcos Pegoraro wrote:
> Em ter., 20 de mai. de 2025 às 11:56, Bruce Momjian 
> escreveu:
> 
> I will again ask why this patch set is being reposted when there is no
> plan to apply it to git master
> 
> Too bad. I would love to have this functionality, from the user's point of 
> view
> there are problems where it would solve them wonderfully. I don't know
> technically of what prevents it from being natively on core, but it would be
> great, it would definitely be.

My only point is that we should only be using email lists for work that
is being actively worked on to be added to community Postgres.  There
has been talk of a trimmed-down version of this being applied, but I
don't see any work in that direction.

This patch should be moved to a separate location where perhaps people
can subscribe to updates when they are posted, perhaps github.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: plan shape work

2025-05-20 Thread Maciek Sakrejda

+1, this seems like it could be very useful. A somewhat related issue
is being able to tie plan nodes back to the query text: it can be hard
to understand the planner's decisions if it's not even clear what part
of the query it's making decisions about. I'm sure this is not an easy
problem in general, but I wonder if you think that could be improved
in the course of this work, or if you have other thoughts about it.

Thanks,
Maciek

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

2025-05-20 Thread Masahiko Sawada

On Mon, May 19, 2025 at 2:05 AM Amit Kapila  wrote:
>
> On Sun, May 18, 2025 at 1:09 AM Masahiko Sawada  wrote:
> >
> > On Sat, May 10, 2025 at 7:08 AM Amit Kapila  wrote:
> > >
> > >
> > > Can we have a parameter like immediately_reserve in
> > > create_logical_slot API, similar to what we have for physical slots?
> > > We need to work out the details, but that should address the kind of
> > > use case you are worried about, unless I am missing something.
> >
> > Interesting idea. One concern in my mind is that in the use case I
> > mentioned above, users would need to carefully manage the extra
> > logical slot to keep the logical decoding active. The logical decoding
> > is deactivated on the standby as soon as users drop all logical slots
> > on the primary.
> >
>
> Yes, but the same is true for a physical slot in the case of physical
> replication used via primary_slot_name parameter.

Could you elaborate on this? IIUC the purpose of using a physical slot
in a physical replication case is obvious; users don't want to lose
WAL files necessary for replication. On the other hand, this empty
logical slot needs to be maintained just for keeping the logical
decoding active.

>
> > Also, with this idea of automatically increasing WAL level, do we want
> > to keep the 'logical' WAL level? If so, it requires an extra step of
> > creating a non-reserved logical slot on the primary in order for the
> > standby to activate the logical decoding. On the other hand, we can
> > also keep the 'logical' WAL level for the compatibility and for making
> > the logical decoding enabled without the coordination of WAL level
> > transition.
>
> Right, I also feel we should retain both ways to enable logical
> replication at least initially. Once we get some feedback, we may
> think of removing 'logical' as wal_level.
>
> >  But wal_level GUC parameter would no longer tell the
> > actual WAL level to users when 'replica' + logical slots.
> >
>
> Right.
>
> > Is it
> > sufficient to provide a read-only GUC parameter, say
> > effective_wal_level showing the actual WAL level being used?
> >
>
> I am not so sure about how we want to communicate this to the user,
> but I guess to start with, this is a good idea.

I recently had a discussion with Ashtosh at PGConf.dev regarding an
alternative approach: introducing a new command syntax such as "ALTER
SYSTEM UPDATE wal_level TO 'logical'". In his presentation[1], he
outlined this proposed command as a means to modify specific GUC
parameters synchronously. The backend executing this command would
manage the transition, allowing users to interrupt the process via
Ctrl-C if necessary. In the specific context of wal_level change, this
command could be designed to reject operations like "ALTER SYSTEM
UPDATE wal_level TO 'minimal'" with an error, effectively preventing
undesirable wal_level transitions to or from 'minimal'. While this
approach shares similarities with our previous proposal of
implementing a dedicated SQL function for WAL level modifications, it
offers a more standardized interface for users.

Though I find merit in this proposal, I remain uncertain about its
implementation details and whether it represents the optimal solution
for online wal_level changes, particularly given that our current
approach of automatic WAL level adjustment appears viable. Ashtosh
plans to initiate a separate discussion thread where we can explore
these considerations in greater detail.

Regards,

[1] 
https://www.pgevents.ca/events/pgconfdev2025/schedule/session/286-changing-shared_buffers-on-the-fly/

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: queryId constant squashing does not support prepared statements

2025-05-20 Thread Dmitry Dolgov

> On Tue, May 20, 2025 at 06:30:25AM GMT, Michael Paquier wrote:
> On Mon, May 12, 2025 at 06:40:43PM -0400, Sami Imseih wrote:
> > Also, LocationExpr is not really an expression node, but a wrapper to
> > an expression node, so I think it's wrong to define it as a Node and be
> > required to add the necessary handling for it in nodeFuncs.c. I think we
> > can just define it as a struct in gram.y so it can carry the locations of 
> > the
> > expression and then set the List of the location boundaries in
> > A_Expr and A_ArrayExpr. right?
>
> Right.  LocationExpr is not a full Node, so if we can do these
> improvements without it we have less maintenance to worry about across
> the board with less code paths.  At the end, I think that we should
> try to keep the amount of work done by PGSS as minimal as possible.
>
> I was a bit worried about not using a Node but Sami has reminded me
> last week that we already have in gram.y the concept of using some
> private structures to track intermediate results done by the parsing
> that we sometimes do not want to push down to the code calling the
> parser.  If we can do the same, the result could be nicer.

I believe it's worth to not only to keep amount of work to support
LocationExpr as minimal as possible, but also impact on the existing
code. What I see as a problem is keeping such specific information as
the location boundaries in such a generic expression as A_Expr, where it
will almost never be used. Do I get it right, you folks are ok with
that?

At the same time AFAICT there isn't much more code paths to worry about
in case of a LocationExpr as a node -- in the end all options would have
to embed the location information into ArrayExpr during transformation,
independently from how this information was conveyed. Aside that the
only extra code we've got is node functions (exprType, etc). Is there
anything I'm missing here?

> By the way, the new test cases for ARRAY lists are sent in the last
> patch of the series posted on this thread:
> https://www.postgresql.org/message-id/7zbzwk4btnxoo4o3xbtzefoqvht54cszjj4bol22fmej5nmgkf@dbcn4wtakw4y
>
> These should be first in the list, IMO, so as it is possible to track
> what the behavior was before the new logic as of HEAD, and what the
> behavior would become after the new logic.

Sure, I can reshuffle that.

BTW, I'm going to be away for a couple of weeks soon. So if you want to
decide one way or another soonish, let's do it now.

Re: Statistics Import and Export

2025-05-20 Thread Hari Krishna Sunder

Ah ya, forgot that reltuples are not always accurate. This sounds
reasonable to me.

On Mon, May 19, 2025 at 2:32 PM Nathan Bossart 
wrote:

> On Mon, May 19, 2025 at 02:13:45PM -0700, Hari Krishna Sunder wrote:
> > I think it would be better to revert 9879105 since there can be a
> > considerable number of true empty tables that we don´t need to process.
>
> I'm not sure that's a use-case we really need to optimize.  Even with
> 100,000 empty tables, "vacuumdb --analyze-only --missing-stats-only --jobs
> 64" completes in ~5.5 seconds on my laptop.  Plus, even if reltuples is 0,
> there might actually be rows in the table, in which case analyzing it will
> produce rows in pg_statistic.
>
> --
> nathan
>

Re: proposal: schema variables

2025-05-20 Thread Daniel Gustafsson

> On 20 May 2025, at 18:39, Bruce Momjian  wrote:
> 
> On Tue, May 20, 2025 at 01:33:18PM -0300, Marcos Pegoraro wrote:
>> Em ter., 20 de mai. de 2025 às 11:56, Bruce Momjian 
>> escreveu:
>> 
>>I will again ask why this patch set is being reposted when there is no
>>plan to apply it to git master
>> 
>> Too bad. I would love to have this functionality, from the user's point of 
>> view
>> there are problems where it would solve them wonderfully. I don't know
>> technically of what prevents it from being natively on core, but it would be
>> great, it would definitely be.
> 
> My only point is that we should only be using email lists for work that
> is being actively worked on to be added to community Postgres.  There
> has been talk of a trimmed-down version of this being applied, but I
> don't see any work in that direction.
> 
> This patch should be moved to a separate location where perhaps people
> can subscribe to updates when they are posted, perhaps github.

As a project with no roadmap governed by open forum consensus I don't think we
have any right to tell community members what they can or cannot work on here,
any technical discussion which conforms with our published policies should be
welcome.  If Pavel want's to continue rebasing his patchset here then he has,
IMHO, every right to do so.

Whether or not a committer will show interest at some point is another thing,
but we are seeing a very good role-model for taking responsibility for ones
work here at the very least =)

--
Daniel Gustafsson

Re: Hash table scans outside transactions

2025-05-20 Thread Aidar Imamov


Ashutosh Bapat wrote 2023-03-28 15:58:

Bumping it to attract some attention.

On Tue, Mar 21, 2023 at 12:51 PM Ashutosh Bapat
 wrote:


Hi,
Hash table scans (seq_scan_table/level) are cleaned up at the end of a
transaction in AtEOXact_HashTables(). If a hash seq scan continues
beyond transaction end it will meet "ERROR: no hash_seq_search scan
for hash table" in deregister_seq_scan(). That seems like a limiting
the hash table usage.

Our use case is
1. Add/update/remove entries in hash table
2. Scan the existing entries and perform one transaction per entry
3. Close scan

repeat above steps in an infinite loop. Note that we do not
add/modify/delete entries in step 2. We can't use linked lists since
the entries need to be updated or deleted using hash keys. Because the
hash seq scan is cleaned up at the end of the transaction, we
encounter error in the 3rd step. I don't see that the actual hash
table scan depends upon the seq_scan_table/level[] which is cleaned up
at the end of the transaction.

I have following questions
1. Is there a way to avoid cleaning up seq_scan_table/level() when the
transaction ends?
2. Is there a way that we can use hash table implementation in
PostgreSQL code for our purpose?


--
Best Wishes,
Ashutosh Bapat


Hi!
I tried to resend this thread directly to myself, but for some reason it
ended up in the whole hackers list..

I thought I'd chime in on this topic since it hasn't really been 
discussed

anywhere else (or maybe I missed it).
I've attached two patches: the first one is a little test extension to
demonstrate the problem (just add "hash_test" to 
"shared_preload_libraries"),
and the second is a possible solution. Basically, the idea is that we 
don't
reset the scan counter if we find scans that started outside of the 
current
transaction at the end. I have to admit, though, that I can't 
immediately
say what other implications this might have or what else we need to 
watch

out for if we try this.
Maybe any thoughts on that?

regards,
Aidar Imamov
diff --git a/contrib/Makefile b/contrib/Makefile
index 2f0a88d3f77..b4ff1391387 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -21,6 +21,7 @@ SUBDIRS = \
 		earthdistance	\
 		file_fdw	\
 		fuzzystrmatch	\
+		hash_test	\
 		hstore		\
 		intagg		\
 		intarray	\
diff --git a/contrib/hash_test/Makefile b/contrib/hash_test/Makefile
new file mode 100644
index 000..a4ce6692127
--- /dev/null
+++ b/contrib/hash_test/Makefile
@@ -0,0 +1,18 @@
+# contrib/hash_test/Makefile
+
+MODULE_big = hash_test
+OBJS = \
+	hash_test.o
+
+PGFILEDESC = "hash_test - demostrate hash_seq_search and transaction issue"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/hash_test
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/hash_test/hash_test.c b/contrib/hash_test/hash_test.c
new file mode 100644
index 000..7c9e24f40cc
--- /dev/null
+++ b/contrib/hash_test/hash_test.c
@@ -0,0 +1,73 @@
+
+#include "postgres.h"
+
+#include "access/xact.h"
+#include "fmgr.h"
+#include "miscadmin.h"
+#include "postmaster/bgworker.h"
+#include "utils/hsearch.h"
+
+#define HASH_TEST_TABLE_NELEMS 2
+
+PG_MODULE_MAGIC;
+
+PGDLLEXPORT void hash_test_workhorse(Datum main_arg);
+
+static HTAB *hash_test_table;
+
+typedef struct
+{
+	/* key */
+	int			num_key;
+} hash_test_table_entry;
+
+void
+_PG_init(void)
+{
+	BackgroundWorker worker;
+
+	if (!process_shared_preload_libraries_in_progress)
+		return;
+
+	MemSet(&worker, 0, sizeof(BackgroundWorker));
+	worker.bgw_flags = BGWORKER_SHMEM_ACCESS |
+		BGWORKER_BACKEND_DATABASE_CONNECTION;
+	worker.bgw_start_time = BgWorkerStart_ConsistentState;
+	worker.bgw_restart_time = BGW_NEVER_RESTART;
+	sprintf(worker.bgw_library_name, "hash_test");
+	sprintf(worker.bgw_function_name, "hash_test_workhorse");
+	sprintf(worker.bgw_name, "hash_test proccess");
+
+	RegisterBackgroundWorker(&worker);
+}
+
+void
+hash_test_workhorse(Datum main_arg)
+{
+	hash_test_table_entry *h_entry;
+	HASHCTL		ctl;
+	HASH_SEQ_STATUS hs;
+
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	ctl.keysize = sizeof(int);
+	ctl.entrysize = sizeof(hash_test_table_entry);
+
+	hash_test_table = hash_create("hash_test_table",
+  HASH_TEST_TABLE_NELEMS,
+  &ctl,
+  HASH_ELEM | HASH_BLOBS);
+
+	/* insert elements */
+	for (int i = 0; i < HASH_TEST_TABLE_NELEMS; i++)
+		hash_search(hash_test_table, &i, HASH_ENTER, NULL);
+
+	/* go through hash table */
+	hash_seq_init(&hs, hash_test_table);
+	while ((h_entry = hash_seq_search(&hs)) != NULL)
+	{
+		StartTransactionCommand();
+		/* do some stuff */
+		CommitTransactionCommand();
+	}
+}
diff --git a/contrib/hash_test/meson.build b/contrib/hash_test/meson.build
new file mode 100644
index 000..bc3aac4e66d
--- /dev/null
+++ b/contrib/hash_test/meson.build
@@ -0,0 +1,10 @@
+
+hash_test_sources = files(
+  'has

Re: [BUG] Skipped initialization of some xl_xact_parsed_prepare fields

2025-05-20 Thread Daniil Davydov

Hi,

On Wed, May 21, 2025 at 9:59 AM Fujii Masao  wrote:
> I've pushed the patch. Thanks!
>

Glad to hear it, thank you!

--
Best regards,
Daniil Davydov

Re: Avoid orphaned objects dependencies, take 3

2025-05-20 Thread Bertrand Drouvot

Hi,

On Tue, May 20, 2025 at 02:12:41PM -0700, Jeff Davis wrote:
> On Mon, 2025-05-19 at 14:07 -0400, Robert Haas wrote:
> > I agree with that, but I think that it may also be error-prone to
> > assume that it's OK to acquire heavyweight locks on other catalog
> > objects at any place in the code where we record a dependency. I will
> > not be surprised at all if that turns out to have some negative
> > consequences. For example, I think it might result in acquiring the
> > locks on those other objects at a subtly wrong time (leading to race
> > conditions) or acquiring two locks on the same object with different
> > lock modes where we should really only acquire one. I'm all in favor
> > of solving this problem using heavyweight locks, but I think that
> > implicitly acquiring them as a side effect of recording dependencies
> > feels too surprising.
> 
> I see what you mean now, in the sense that other code that calls
> LockDatabaseObject (and other variants of LockAcquire) are mostly
> higher-level operations like AlterPublication(), and not side-effects
> of something else.
> 
> But relation_open() is sort of an exception. There are lots of places
> where that takes a lock because we happen to want something out of the
> relcache, like generate_partition_qual() taking a lock on the parent or
> CheckAttributeType() taking a lock on the typrelid. You could say those
> are fairly obvious, but that's because we already know, and we could
> make it more widely known that recording a dependency takes a lock.
> 
> One compromise might be to have recordDependencyOn() take a LOCKMODE
> parameter, which would both inform the caller that a lock will be
> taken, and allow the caller to do it their own way and specify NoLock
> if necessary. That still results in a huge diff, but the end result
> would not be any more complex than the current code.

Thanks for sharing your thoughts! I had in mind to "just" check if there
is an existing lock (and if so, skip acquiring a new one) but your proposal
sounds better. Indeed it would make the locking behavior explicit and also
be flexible (allowing the callers to choose the LOCKMODE).

I'll prepare a new version implementing your proposal.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Violation of principle that plan trees are read-only

2025-05-20 Thread Yasir

On Mon, May 19, 2025 at 7:45 PM Tom Lane  wrote:

> Isaac Morland  writes:
> > I assume this question has an obvious negative answer, but why can't we
> > attach const declarations to the various structures that make up the plan
> > tree (at all levels, all the way down)? I know const doesn't actually
> > prevent a value from changing, but at least the compiler would complain
> if
> > code accidentally tried.
>
> The big problem is that a "const" attached to a top-level pointer
> doesn't inherently propagate down to sub-nodes.  So if I had, say,
> "const Query *stmt", the compiler would complain about
>
> stmt->jointree = foo;
>
> but not about
>
> stmt->jointree->quals = foo;
>
> I guess we could imagine developing an entirely parallel set of
> struct declarations with "const" on all pointer fields, like
>
> typedef struct ConstQuery
> {
> ...
> const ConstFromExpr   *jointree;
> ...
> } ConstQuery;
>
> but even with automated maintenance of the ConstFoo doppelganger
> typedefs, it seems like that'd be a notational nightmare.  For
> one thing, I'm not sure how to teach the compiler that casting
> "Query *" to "ConstQuery *" is okay but vice versa isn't.
>
> Does C++ have a better story in this area?  I haven't touched it
> in so long that I don't remember.
>
> regards, tom lane
>
>
One unconventional but potentially effective approach to detect unexpected
modifications in the plan tree can be as follows:

   - Implement a function that can deeply compare two plan trees for
   structural or semantic differences.
   - Before passing the original plan tree to the executor, make a deep
   copy of it.
   - After execution (or at strategic checkpoints), compare the current
   plan tree against the original copy.
   - If any differences are detected, emit a warning or log it for further
   inspection.


Yes, this approach introduces some memory and performance overhead.
However, we can limit its impact by enabling it conditionally via a
compile-time flag or #define, making it suitable for debugging or
assertion-enabled builds.

It might sound a bit unconventional, but it could serve as a useful sanity
check especially during development or when investigating plan tree
integrity issues.

Pardon me if this sounds naive!

Yasir
Data Bene

Re: Re: proposal: schema variables

2025-05-20 Thread Pavel Stehule

st 21. 5. 2025 v 2:21 odesílatel Michael Paquier 
napsal:

> On Tue, May 20, 2025 at 10:28:31PM +0200, Pavel Stehule wrote:
> > This topic is difficult, because there is no common solution. SQL/PSM is
> > almost dead. T-SQL (and MySQL) design is weak and cannot be used for
> > security.
> > Oracle's design is joined with just one environment. And although almost
> > all widely used databases have supported session variables for decades,
> no
> > one design
> > is perfect. Proposed design is not perfect too (it introduces possible
> > ambiguity) , but I think it can support most wanted use cases (can be
> > enhanced in future),
> > and it is consistent with Postgres. There are more ways to reduce risk of
> > unwanted ambiguity to zero. But it increases the size of the patch.
>
> One thing that I keep hearing about this feature is that this would be
> really helpful for migration from Oracle to PostgreSQL, helping a lot
> with rewrites of pl/pgsql functions.
>
> There is one page on the wiki about private variables, dating back to
> 2016:
> https://wiki.postgresql.org/wiki/CREATE_PRIVATE_VARIABLE
>
>
I wrote mail

https://www.postgresql.org/message-id/cafj8prb8kdwqcdn2x1_63c58+07oy4z+rudk_xptup+pe8r...@mail.gmail.com

and there is another wiki page
https://wiki.postgresql.org/wiki/Variable_Design



> Perhaps it would help to summarize a bit the pros and cons of existing
> implementations to drive which implementation would be the most suited
> on a wiki page?  The way standards are defined is by overwriting
> existing standards, and we don't have one in the SQL specification.
> It's hard to say if there will be one at some point, but if the main
> SQL products around the world have one, it pretty much is a point in
> favor of having a standard.
>

Although it is  maybe a peccant idea - I can imagine two different
implementations of server side session variables with different syntaxes
(and different advantages and disadvantages, and different use cases). The
implementations are not going against, but we should to accept
fact, so one feature is implemented twice. We should choose just one, that
will be implemented first. Proposed helps with migration from
PL/SQL.


>
> Another possible angle that could be taken is to invest in a proposal
> for the SQL committee to consider, forcing an actual standard that
> PostgreSQL could rely on rather than having one behavior implemented
> to have it standard-incompatible a few years after.  And luckily, we
> have Vik Fearing and Peter Eisentraut in this community who invest
> time and resources in this area.
>

Theoretically the proposed design is a subset of implementation from DB2 -
I designed it without knowledge of this DB2 feature. But without
introduction of concept of modules (that is partially redundant to
schemas), this design is very natural and I am very sure, so there is not
another way, how to design it. We can ask Peter or Vik about real
possibilities in this area. I have not any information from this area, just
I haven't seen the changes in SQL/PSM for decades, so I didn't think  about
it.



> FWIW, I do agree with the opinion that if you want to propose rebased
> versions of this patch set on a periodic basis, you are free to do so.
> This is the core concept of an open community.  In terms of my
> committer time, I'm pretty much already booked in terms of features
> planned for the next release, so I guess that I won't be able to
> invest time on this thread.  Just saying.
>

thank you for an info


> I know that this patch set has been discussed at FOSDEM at some point,
> so others may be able to comment more about that.  That's just one
> opinion among many others.
> --
> Michael
>

1 2 >

1 - 100 of 104 matches

Mail list logo