Re: Fix slot synchronization with two_phase decoding enabled

2025-04-11 Thread Nisha Moond
HI,

Please find the patches attached for all three approaches.

On Wed, Apr 9, 2025 at 10:45 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Thu, Apr 3, 2025 at 3:16 PM Zhijie Hou (Fujitsu) wrote:
> >
> > On Thu, Apr 3, 2025 at 1:38 PM Masahiko Sawada wrote:
> >
> > >
> > > On Wed, Apr 2, 2025 at 7:58 PM Amit Kapila 
> > > wrote:
> > > >
> > > > On Thu, Apr 3, 2025 at 7:50 AM Zhijie Hou (Fujitsu)
> > > >  wrote:
> > > > >
> > > > > On Thu, Apr 3, 2025 at 3:30 AM Masahiko Sawada wrote:
> > > > >
> > > > > >
> > > > > > On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu)
> > > > > >  wrote:
> > > > > >
> > > > > > Thank you for the explanation! I agree that the issue happens in
> > > > > > these
> > > cases.
> > > > > >
> > > > > > As another idea, I wonder if we could somehow defer to make the
> > > > > > synced slot as 'sync-ready' until we can ensure that the slot
> > > > > > doesn't have any transactions that are prepared before the point
> > > > > > of enabling two_phase. For example, when the slotsync worker
> > > > > > fetches the remote slot, it remembers the confirmed_flush_lsn
> > > > > > (say
> > > > > > LSN-1) if the local slot's two_phase becomes true or the local
> > > > > > slot is newly created with enabling two_phase, and then it makes
> > > > > > the slot 'sync-ready' once it confirmed that the slot's
> > > > > > restart_lsn passed
> > > LSN-1. Does it work?
> > > > >
> > > > > Thanks for the idea!
> > > > >
> > > > > We considered a similar approach in [1] to confirm there is no
> > > > > prepared transactions before two_phase_at, but the issue is that
> > > > > when the two_phase flag is switched from 'false' to 'true' (as in
> > > > > the case with (copy_data=true, failover=true, two_phase=true)). In
> > > > > this case, the slot may have already been marked as sync-ready
> > > > > before the two_phase flag is enabled, as slotsync is unaware of
> > > > > potential
> > > future changes to the two_phase flag.
> > > >
> > > > This can happen because when copy_data is true, tablesync can take a
> > > > long time to complete the sync and in the meantime, slot without a
> > > > two_phase flag would have been synced to standby. Such a slot would
> > > > be marked as sync-ready even if we follow the calculation proposed
> > > > by Sawada-san. Note that we enable two_phase once all the tables are
> > > > in ready state (See run_apply_worker() and comments atop worker.c
> > > > (TWO_PHASE TRANSACTIONS)).
> > >
> > > Right. It doesn't make sense to make the slot not-sync-ready and then
> > > back to sync-ready.
> > >
> > > While I agree with the approach for HEAD and it seems difficult to
> > > find a solution, I'm concerned that disallowing to use both failover
> > > and two_phase in a minor release would affect users much. Users who
> > > are already using that combination might end up needing to re-think
> > > their system architecture. So I'm trying to narrow down use cases
> > > where we're going to prohibit or to find workarounds.
>
> We analyzed more for the backbranch fix, and here is a summary of different
> approaches that could be used for PG17.
>
> --
> Approach 1
> --
>
> This is the original approach implemented in V4 patch.
>
> In this approach, we entirely disallow enabling both failover and the 
> two-phase
> feature together for a replication slot or subscription.
>
> pros:
>
> This restriction is simple to implement and easy for users to comprehend.
>
> cons:
>
> Users would be unable to use the two-phase feature in conjunction with
> failover.
>
> Following the upgrade to a new release with this fix, existing subscriptions
> that have both failover and two-phase enabled would require manual 
> re-creation,
> which is time-consuming.
>

Patch "v5_aprch_1-0001" implements the above Approach 1.

> --
> Approach 2
> --
>
> Instead of disallowing the use of two-phase and failover together, a more
> flexible strategy could be only restrict failover for slots with two-phase
> enabled when there's a possibility of existing prepared transactions before 
> the
> two_phase_at that are not yet replicated. During slot creation with two-phase
> and failover, we could check for any decoded prepared transactions when
> determining the decoding start point (DecodingContextFindStartpoint). For
> subsequent attempts to alter failover to true, we ensure that two_phase_at is
> less than restart_lsn, indicating that all prepared transactions have been
> committed and replicated, thus the bug would not happen.
>
> pros:
>
> This method minimizes restrictions for users. Especially during slot creation
> with (two_phase=on, failover=on), as it’s uncommon for transactions to prepare
> during consistent snapshot creation, the restriction becomes almost
> unnoticeable.
>
> Users are not always forced to re-create subscriptions post-upgrade.
>
> cons:
>
> The logic involved for (restart_lsn > two_phase_at) might be less intuitive 
> for
> users.
>
> After upgrading, it's recommended 

disallow ALTER VIEW SET DEFAULT when the corresponding base relation column is a generated column

2025-04-11 Thread jian he
hi.

CREATE TABLE gtest1 (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
CREATE VIEW gtest1v AS SELECT * FROM gtest1;
ALTER VIEW gtest1v ALTER COLUMN b SET DEFAULT 100;

INSERT INTO gtest1v VALUES (8, DEFAULT) returning *;
ERROR:  cannot insert a non-DEFAULT value into column "b"
DETAIL:  Column "b" is a generated column.

we can make
ALTER VIEW gtest1v ALTER COLUMN b SET DEFAULT 100;
error out,
then
INSERT INTO gtest1v VALUES (8, DEFAULT) returning *;
will work just fine.

obviously,
INSERT INTO gtest1v VALUES (8, 1) returning *;
will fail.


we can do this by in ATExecColumnDefault,
checking if
* gtest1v is updatable view or not
* column b is an updatable column or not
* column b on view corresponding base relation's column is a generated
column or not.

if all these conditions meet then, we error out saying
``cannot alter column \"%s\" on updateable view ``.


what do you think?
From 8e973bcd093ce25a5728f10aa9e73eb838406758 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Fri, 11 Apr 2025 15:41:10 +0800
Subject: [PATCH v1 1/1] disallow set default when baserel column is generated

disallow change updatable view column default expression when the corresponding
base column is generated column.

discussion: https://postgr.es/m/
---
 src/backend/commands/tablecmds.c  | 54 +++
 src/backend/rewrite/rewriteHandler.c  |  2 +-
 src/include/rewrite/rewriteHandler.h  |  4 ++
 .../regress/expected/generated_stored.out | 29 +-
 .../regress/expected/generated_virtual.out| 29 +-
 src/test/regress/sql/generated_stored.sql |  6 +--
 src/test/regress/sql/generated_virtual.sql|  6 +--
 7 files changed, 95 insertions(+), 35 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 686f1850cab..5548b629a0c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -81,6 +81,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parse_type.h"
 #include "parser/parse_utilcmd.h"
+#include "parser/parsetree.h"
 #include "parser/parser.h"
 #include "partitioning/partbounds.h"
 #include "partitioning/partdesc.h"
@@ -8156,6 +8157,59 @@ ATExecColumnDefault(Relation rel, const char *colName,
  (TupleDescAttr(tupdesc, attnum - 1)->attgenerated == ATTRIBUTE_GENERATED_STORED ?
   errhint("Use %s instead.", "ALTER TABLE ... ALTER COLUMN ... DROP EXPRESSION") : 0)));
 
+
+	/* Check if this is an automatically updatable view */
+	if (rel->rd_rel->relkind == RELKIND_VIEW && newDefault != NULL)
+	{
+		Query	   *viewquery = get_view_query(rel);
+
+		if (view_query_is_auto_updatable(viewquery, true) == NULL)
+		{
+			Bitmapset  *set_col	= NULL;
+
+			set_col = bms_add_member(set_col,
+	 attnum - FirstLowInvalidHeapAttributeNumber);
+
+			if (view_cols_are_auto_updatable(viewquery, set_col, NULL, NULL) == NULL)
+			{
+RangeTblRef *rtr;
+RangeTblEntry *base_rte;
+Relation	base_rel;
+TupleDesc	rel_tupdesc;
+TargetEntry *tle;
+AttrNumber	attno;
+
+rtr = (RangeTblRef *) linitial(viewquery->jointree->fromlist);
+base_rte = rt_fetch(rtr->rtindex, viewquery->rtable);
+Assert(base_rte->rtekind == RTE_RELATION);
+
+base_rel = table_open(base_rte->relid, AccessShareLock);
+rel_tupdesc = RelationGetDescr(base_rel);
+
+
+tle = (TargetEntry *) list_nth(viewquery->targetList, attnum - 1);
+Assert(!tle->resjunk);
+Assert(IsA(tle->expr, Var));
+
+attno = ((Var *) tle->expr)->varattno;
+
+if (TupleDescAttr(rel_tupdesc, attno - 1)->attgenerated)
+{
+	Form_pg_attribute att = TupleDescAttr(tupdesc, attno - 1);
+
+	ereport(ERROR,
+			errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			errmsg("cannot alter column \"%s\" default expression on view \"%s\"",
+	colName, RelationGetRelationName(rel)),
+			errdetail("Column \"%s\" on base relation \"%s\" is a generated column",
+	  NameStr(att->attname),
+	  RelationGetRelationName(base_rel)));
+}
+table_close(base_rel, AccessShareLock);
+			}
+		}
+	}
+
 	/*
 	 * Remove any old default for the column.  We use RESTRICT here for
 	 * safety, but at present we do not expect anything to depend on the
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index f0bce5f9ed9..5a3a9ed0c94 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -2776,7 +2776,7 @@ view_query_is_auto_updatable(Query *viewquery, bool check_cols)
  * We do not check whether the referenced columns of the base relation are
  * updatable.
  */
-static const char *
+const char *
 view_cols_are_auto_updatable(Query *viewquery,
 			 Bitmapset *required_cols,
 			 Bitmapset **updatable_cols,
diff --git a/src/include/rewrite/rewriteHandler.h b/src/include/rewrite/rewriteHandler.h
index 99cab1a3bfa..6a4cb14d150 100644
--- a/src/include/rewrite/rewriteHandler.h
+++ b/src/include/rewrite/rewriteHandler.h

Regression test fails when 1) old PG is installed and 2) meson/ninja build is used

2025-04-11 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

While creating patches for older branches I found the $SUBJECT. I do not have 
much knowledge
for meson thus I'm not sure it is intentional.

Reproducer
===
I could reproduce the failure with steps:

1. install old PG, e.g., PG16. To your system. .so file must be put on your 
$$LD_LIBRARY_PATH.
2. build newer PG, e.g., master, with meson build system [1].
3. run regression test and ERROR would be reported [2].

This issue does not happen when I used autoconf/make build system.

Analysis
=

According to the log, the instance could be started but psql could not work 
correctly:

```
--- stdout ---
# executing test in /home/hayato/builddir/testrun/regress/regress group regress 
test regress
# initializing database system by copying initdb template
# using temp instance on port 40047 with PID 949892
Bail out!# test failed
--- stderr ---
psql: symbol lookup error: psql: undefined symbol: PQservice
# command failed: "psql" -X -q -c "CREATE DATABASE \"regression\" ...

(test program exited with status code 2)
==
```

It looks like that psql required the function `PQservice` in the library but it
could not find in the used libpq.so. Since the function has been introduced 
since
PG18, I suspect psql tried to link with .so file for old PG (installed one).
IIUC each commands should refer libraries exist in tmp_install, not the 
system's one.

Is this an issue to be solved on PG community, or specification of meson/ninja?
Or... could it happen only on my environment?

Note

I'm using AlmaLinux 9.5. I can give more detail if needed.

[1]:
```
$ meson setup -Dinjection_points=true -Dcassert=true --optimization=0 --debug 
../postgres/
The Meson build system
Version: 0.63.3
...
Project name: postgresql
Project version: 18devel
...
$ ninja
...
```

[2]:
```
$ meson test --suite setup --suite regress
ninja: Entering directory `/home/hayato/builddir'
ninja: no work to do.
1/4 postgresql:setup / tmp_install   OK0.82s
2/4 postgresql:setup / install_test_filesOK0.05s
3/4 postgresql:setup / initdb_cache  OK1.88s
4/4 postgresql:regress / regress/regress ERROR 3.66s   exit 
status 2
...
Ok: 3   
Expected Fail:  0   
Fail:   1   
Unexpected Pass:0   
Skipped:0   
Timeout:0
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED





Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions

2025-04-11 Thread Nazir Bilal Yavuz
Hi,

On Thu, 10 Apr 2025 at 16:50, Bertrand Drouvot
 wrote:
> On Tue, Apr 08, 2025 at 02:40:52AM -0400, Andres Freund wrote:
> > I think it's ok for now.  It might be worth doing a larger redesign of the
> > pgbuffercache docs at some point...
> >
> >
> > Pushed.
> >
> >
> > Thanks for your patches and thanks for all the reviewers getting this ready!
>
> Thanks for the patch! That sounds like a great addition. I was doing some
> tests and did not see any issues.

Thank you for looking into this!

> Also while doing the tests I thouhgt that it
> could be useful to evict only from a subset of NUMA nodes (now that NUMA
> awareness is in). We'd need to figure out what to do for buffers that are 
> spread
> across NUMA nodes though.
>
> Does that sound like an idea worth to spend time on? (If so, I'd be happy to 
> work
> on it).

I think it looks useful, but I’m not too familiar with NUMA, so I’m
not sure I can say much with confidence. Maybe someone else can chime
in with a more solid take?

--
Regards,
Nazir Bilal Yavuz
Microsoft




Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions

2025-04-11 Thread Nazir Bilal Yavuz
Hi,

Thank you for looking into this!

On Thu, 10 Apr 2025 at 23:06, Robert Haas  wrote:
>
> On Tue, Mar 18, 2025 at 6:03 PM Aidar Imamov  wrote:
> > > for (int buf = 1; buf < NBuffers; buf++)
> > Mb it would be more correct to use <= NBuffers?
>
> I agree that (int buf = 1; buf < NBuffers; buf++) isn't right because
> that iterates one fewer times than the number of buffers. What was
> ultimately committed was:
>
> +   for (int buf = 1; buf <= NBuffers; buf++)
> +   {
> +   BufferDesc *desc = GetBufferDescriptor(buf - 1);
>
> Curiously, there is no other instance of <= NBuffers in the code.
> Elsewhere we instead do:
>
> for (i = 0; i < NBuffers; i++)
> {
> BufferDesc *bufHdr = GetBufferDescriptor(i);
>
> Or in BufferSync:
>
> for (buf_id = 0; buf_id < NBuffers; buf_id++)
> {
> BufferDesc *bufHdr = GetBufferDescriptor(buf_id);
>
> Nonetheless what was committed seems pretty defensible, because we
> have lots of other places that do GetBufferDescriptor(buffer - 1) and
> similar. Alternating between 0-based indexing and 1-based indexing
> like this seems rather error-prone somehow. :-(

I understand your point. I did it like that because bufferids start
from 1 and go to NBuffers inclusive in the pg_buffercache view, so it
seems more natural to me to implement it like that. I am okay to
replace these loops with [1] to make it standart everywhere:

[1]
for (int buf = 0; buf < NBuffers; buf++)
{
BufferDesc *desc = GetBufferDescriptor(buf);

--
Regards,
Nazir Bilal Yavuz
Microsoft




Re: Correct documentation for protocol version

2025-04-11 Thread Fujii Masao




On 2025/04/11 5:17, Dave Cramer wrote:

No, you are correct.

See new patch


Thanks for updating the patch!

- Identifies the message as a protocol version negotiation
+ Identifies the message as a protocol version negotiation.
+ The server sends this message if the requested protocol is
+ not equal to the version the server supports or the client
+ requests protocol options that are not recognized.
  message.

You added the sentence starting with "The server sends..."
between "negotiation" and "message", but it should be placed
after "message", right?

Even though the requested version is not equal to the latest
version that the server supports, if it's older than
the latest one, the message is not sent. So how about
wording it like this instead:

-
Identifies the message as a protocol version negotiation message.
The server sends this message when the client requests a newer
protocol version than the server supports, or when the client
includes protocol options that the server does not recognize.
-

+ The protcol version requested by the client unless it is higher than 
the
+ latest version we support in which case the latest protocol version 
we support.

Maybe rewording this for clarity and using “the server
instead of “we” would help. For example:

-
The latest protocol version supported by the server if the client
requests a newer protocol version than the server supports.
The protocol version requested by the client, otherwise.
-


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION





Add pg_buffercache_mark_dirty[_all] functions to the pg_buffercache

2025-04-11 Thread Nazir Bilal Yavuz
Hi,

There is another thread [1] to add both pg_buffercache_evict_[relation
| all] and pg_buffercache_mark_dirty[_all] functions to the
pg_buffercache. I decided to create another thread as
pg_buffercache_evict_[relation | all] functions are committed but
pg_buffercache_mark_dirty[_all] functions still need review.

pg_buffercache_mark_dirty(): This function takes a buffer id as an
argument and tries to mark this buffer as dirty. Returns true on
success.
pg_buffercache_mark_dirty_all(): This is very similar to the
pg_buffercache_mark_dirty() function. The difference is
pg_buffercache_mark_dirty_all() does not take an argument. Instead it
just loops over the shared buffers and tries to mark all of them as
dirty. It returns the number of buffers marked as dirty.

Since that patch is targeted for the PG 19, pg_buffercache is bumped to v1.7.

Latest version is attached and people who already reviewed the patches are CCed.

[1] 
postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw%40mail.gmail.com

--
Regards,
Nazir Bilal Yavuz
Microsoft
From 06f12f6174c0b6e1c5beeb4c1a8f4b33b89cc158 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Fri, 4 Apr 2025 13:39:49 +0300
Subject: [PATCH v7] Add pg_buffercache_mark_dirty[_all]() functions for
 testing

This commit introduces two new functions for marking shared buffers as
dirty:

pg_buffercache_mark_dirty(): Marks a specific shared buffer as dirty.
pg_buffercache_mark_dirty_all(): Marks all shared buffers as dirty in a
single operation.

The pg_buffercache_mark_dirty_all() function provides an efficient
way to dirty the entire buffer pool (e.g., ~550ms vs. ~70ms for 16GB of
shared buffers), complementing pg_buffercache_mark_dirty() for more
granular control.

These functions are intended for developer testing and debugging
scenarios, enabling users to simulate various buffer pool states and
test write-back behavior. Both functions are superuser-only.

Author: Nazir Bilal Yavuz 
Reviewed-by: Andres Freund 
Reviewed-by: Aidar Imamov 
Reviewed-by: Joseph Koshakow 
Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw%40mail.gmail.com
---
 src/include/storage/bufmgr.h  |  2 +
 src/backend/storage/buffer/bufmgr.c   | 75 +++
 doc/src/sgml/pgbuffercache.sgml   | 54 -
 contrib/pg_buffercache/Makefile   |  2 +-
 .../expected/pg_buffercache.out   | 30 +++-
 contrib/pg_buffercache/meson.build|  1 +
 .../pg_buffercache--1.6--1.7.sql  | 14 
 contrib/pg_buffercache/pg_buffercache.control |  2 +-
 contrib/pg_buffercache/pg_buffercache_pages.c | 33 
 contrib/pg_buffercache/sql/pg_buffercache.sql | 10 ++-
 10 files changed, 216 insertions(+), 7 deletions(-)
 create mode 100644 contrib/pg_buffercache/pg_buffercache--1.6--1.7.sql

diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 33a8b8c06fb..ec7fec6368a 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -312,6 +312,8 @@ extern void EvictRelUnpinnedBuffers(Relation rel,
 	int32 *buffers_evicted,
 	int32 *buffers_flushed,
 	int32 *buffers_skipped);
+extern bool MarkUnpinnedBufferDirty(Buffer buf);
+extern void MarkAllUnpinnedBuffersDirty(int32 *buffers_dirtied);
 
 /* in buf_init.c */
 extern void BufferManagerShmemInit(void);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index db8f2b1754e..e2bdb86525b 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -6699,6 +6699,81 @@ EvictRelUnpinnedBuffers(Relation rel, int32 *buffers_evicted,
 	}
 }
 
+/*
+ * Try to mark the provided shared buffer as dirty.
+ *
+ * This function is intended for testing/development use only!
+ *
+ * Same as EvictUnpinnedBuffer() but with MarkBufferDirty() call inside.
+ *
+ * Returns true if the buffer was already dirty or it has successfully been
+ * marked as dirty.
+ */
+bool
+MarkUnpinnedBufferDirty(Buffer buf)
+{
+	BufferDesc *desc;
+	uint32		buf_state;
+
+	Assert(!BufferIsLocal(buf));
+
+	/* Make sure we can pin the buffer. */
+	ResourceOwnerEnlarge(CurrentResourceOwner);
+	ReservePrivateRefCountEntry();
+
+	desc = GetBufferDescriptor(buf - 1);
+
+	/* Lock the header and check if it's valid. */
+	buf_state = LockBufHdr(desc);
+	if ((buf_state & BM_VALID) == 0)
+	{
+		UnlockBufHdr(desc, buf_state);
+		return false;
+	}
+
+	/* Check that it's not pinned already. */
+	if (BUF_STATE_GET_REFCOUNT(buf_state) > 0)
+	{
+		UnlockBufHdr(desc, buf_state);
+		return false;
+	}
+
+	PinBuffer_Locked(desc);		/* releases spinlock */
+
+	/* If it was not already dirty, mark it as dirty. */
+	if (!(buf_state & BM_DIRTY))
+	{
+		LWLockAcquire(BufferDescriptorGetContentLock(desc), LW_EXCLUSIVE);
+		MarkBufferDirty(buf);
+		LWLockRelease(BufferDescriptorGetContentLock(desc));
+	}
+
+	UnpinBuffer(desc);
+
+	return true;
+}
+
+/*
+ * Try to mark all 

Re: Correct documentation for protocol version

2025-04-11 Thread Dave Cramer
On Fri, 11 Apr 2025 at 05:05, Fujii Masao 
wrote:

>
>
> On 2025/04/11 5:17, Dave Cramer wrote:
> > No, you are correct.
> >
> > See new patch
>
> Thanks for updating the patch!
>
> - Identifies the message as a protocol version negotiation
> + Identifies the message as a protocol version negotiation.
> + The server sends this message if the requested protocol is
> + not equal to the version the server supports or the client
> + requests protocol options that are not recognized.
>message.
>
> You added the sentence starting with "The server sends..."
> between "negotiation" and "message", but it should be placed
> after "message", right?
>
> Even though the requested version is not equal to the latest
> version that the server supports, if it's older than
> the latest one, the message is not sent. So how about
> wording it like this instead:
>
> -
> Identifies the message as a protocol version negotiation message.
> The server sends this message when the client requests a newer
> protocol version than the server supports, or when the client
> includes protocol options that the server does not recognize.
> -
>
> + The protcol version requested by the client unless it is higher
> than the
> + latest version we support in which case the latest protocol
> version we support.
>
> Maybe rewording this for clarity and using “the server
> instead of “we” would help. For example:
>
> -
> The latest protocol version supported by the server if the client
> requests a newer protocol version than the server supports.
> The protocol version requested by the client, otherwise.
> -
>
>
Reworded as suggested
Dave

>
>


protocol-5.patch
Description: Binary data


Re: merge file_exists_in_directory and _fileExistsInDirectory functions and move into common file dumputils.c

2025-04-11 Thread Álvaro Herrera
On 2025-Apr-11, Michael Paquier wrote:

> Perhaps we should just use a more centralized place, like file_utils.c
> so as all frontends could benefit of it?

I'm not sure about that.  This code looks to be making too many
assumptions that aren't acceptable for a general routine, such as
complaining only that the directory name is long without the possibility
that the culprit is the file name.  It's more or less okay in current
uses because they're all using harcoded short names, but that would not
hold in general.  At the same time, isn't every call of this routine a
potential TOCTTOU bug?  Again it's probably fine for the current code,
but I wouldn't be too sure about making this generally available as-is.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)




Re: why there is not VACUUM FULL CONCURRENTLY?

2025-04-11 Thread Antonin Houska
Matheus Alcantara  wrote:

> Hi,
> 
> On Tue, Apr 1, 2025 at 10:31 AM Antonin Houska  wrote:
> > One more version, hopefully to make cfbot happy (I missed the bug because I
> > did not set the RELCACHE_FORCE_RELEASE macro in my environment.)
> 
> Thanks for the new version! I'm starting to study this patch series and
> I just want to share some points about the documentation on v12-0004:

Please check the next version [1]. Thanks for your input.

[1] https://www.postgresql.org/message-id/97795.1744363522%40localhost

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler.

2025-04-11 Thread Daniel Gustafsson
> On 24 Mar 2025, at 20:31, Rahila Syed  wrote:

> Please find the attached updated and rebased patch.

Thanks for this rebase, as was mentioned in the other thread I plan to get this
committed fairly soon.  A few comments on the code, all performed in the
attached v3.


+   else
+   {
+   /* Log failure of unpinning */
+   elog(DEBUG2, "unable to unpin the segment %u as 
CurrentResourceOwner is NULL or releasing",
+seg);
+   seg->resowner = NULL;
+   }
I removed the elog() calls since I can't see it adding enough value, and the
assignment to NULL can be removed as well since we've already asserted that
seg->resowner is NULL.


+   INJECTION_POINT("dsa_create_on_res_release");
This feels like a name which limits its use to this one test, whereas it is a
general purpose injection point.  Renamed, and also moved to using dashes
rather than underscore as the former is project style.


+void
+test_dsa_inj(const char *name, const void *private_data)
Rather than duplicating the code I created an internal function for this test
which can be called from the existing basic test as well as this new test.

I also did a little bit of renaming to make it more readable.

As it can only really be tested with an injection point I plan on only
backpatching to 17 initially.  Searching the archives I didn't find any mention
of this bug ever being hit so it seems safe to let it prove itself in testable
versions before going further back with it.

--
Daniel Gustafsson



v3-0001-Don-t-try-to-enlarge-resourceowner-when-releasing.patch
Description: Binary data


Re: Silence resource leaks alerts

2025-04-11 Thread Ranier Vilela
Thanks Michael, for looking at this.


Em sex., 11 de abr. de 2025 às 02:09, Michael Paquier 
escreveu:

> On Thu, Apr 10, 2025 at 03:10:02PM -0300, Ranier Vilela wrote:
> > While it is arguable that this is a false warning, there is a benefit in
> > moving the initialization of the string buffer, silencing the warnings
> that
> > are presented in this case.
> >
> > 1. pg_overexplain.c
> > 2. ruleutils.c
>
> These code paths are far from being critical and the two ones in
> ruleutils.c are older, even if it is a practice that had better be
> discouraged particularly as initStringInfo() can allocate some memory
> for nothing.  So it could bloat the current memory context if these
> code paths are repeatedly taken.
>
Yeah, it's a bit annoying to do unnecessary work.
Plus a small gain, by delaying memory allocation until when it is actually
needed.


> FWIW, I'm with these changes to delay these initializations as you are
> proposing.

Thanks.


>   The RMT has a say about such changes post feature-freeze,
> though, even if the one in pg_overexplain.c is new to v18.
>
I agree.

best regards,
Ranier Vilela


Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions

2025-04-11 Thread Robert Haas
On Fri, Apr 11, 2025 at 4:02 AM Nazir Bilal Yavuz  wrote:
> I understand your point. I did it like that because bufferids start
> from 1 and go to NBuffers inclusive in the pg_buffercache view, so it
> seems more natural to me to implement it like that. I am okay to
> replace these loops with [1] to make it standart everywhere:
>
> [1]
> for (int buf = 0; buf < NBuffers; buf++)
> {
> BufferDesc *desc = GetBufferDescriptor(buf);

I'm more making an observation than asking for a change. If you and
others think it should be changed, that is fine, but I'm uncertain
myself what we ought to be doing here. I just wanted to raise the
issue.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Silence resource leaks alerts

2025-04-11 Thread Ranier Vilela
Em sex., 11 de abr. de 2025 às 02:37, Tom Lane  escreveu:

> Michael Paquier  writes:
> > On Thu, Apr 10, 2025 at 03:10:02PM -0300, Ranier Vilela wrote:
> >> While it is arguable that this is a false warning, there is a benefit in
> >> moving the initialization of the string buffer, silencing the warnings
> that
> >> are presented in this case.
> >>
> >> 1. pg_overexplain.c
> >> 2. ruleutils.c
>
> > These code paths are far from being critical and the two ones in
> > ruleutils.c are older, even if it is a practice that had better be
> > discouraged particularly as initStringInfo() can allocate some memory
> > for nothing.  So it could bloat the current memory context if these
> > code paths are repeatedly taken.
>
> I don't believe either module ever gets run in a long-lived memory
> context.  So I think the burden of proof to show that these leaks
> are meaningful is on the proposer.
>
Despite the $subject talking about leaks, the issue in this specific case
is doing unnecessary work.
Get the silencing of these alerts for free.


> I'm totally not on board with the approach of "if Coverity says this
> is a leak then we must fix it".

I partially agree.
In some cases, it can actually be disastrous.

For example: src/backend/tcop/fastpath.c (line 456)
CID 1608897: (#1 of 1): Resource leak (RESOURCE_LEAK)
10. leaked_storage: Variable abuf going out of scope leaks the storage
abuf.data points to

"Fixing" this leak, causes several errors:

diff --strip-trailing-cr -U3
C:/dll/postgres_dev/postgres_commit/src/test/regress/expected/largeobject_1.out
C:/dll/postgres_dev/postgres_commit/build/testrun/regress/regress/results/largeobject.out
---
C:/dll/postgres_dev/postgres_commit/src/test/regress/expected/largeobject_1.out
2024-10-24 16:40:59.364703100 -0300
+++
C:/dll/postgres_dev/postgres_commit/build/testrun/regress/regress/results/largeobject.out
2025-04-11 08:55:10.085484800 -0300
@@ -37,11 +37,13 @@
 (1 row)

 \lo_unlink 42
+ERROR:  pfree called with invalid pointer 00C6593FF110 (header
0x01d03fc52f08)
 \dl
-  Large objects
- ID | Owner | Description
-+---+-
-(0 rows)
+   Large objects
+ ID |  Owner  | Description
++-+-
+ 42 | regress_lo_user | the ultimate answer
+(1 row)

 -- Load a file
 CREATE TABLE lotest_stash_values (loid oid, fd integer);
@@ -417,19 +419,354 @@
 (1 row)

 \lo_import :filename
+ERROR:  pfree called with invalid pointer 00C6593FF110 (header
0x01d03fc529e8)
 \set newloid :LASTOID
 -- just make sure \lo_export does not barf
 \set filename :abs_builddir '/results/lotest2.txt'
 \lo_export :newloid :filename
+ERROR:  pfree called with invalid pointer 00C6593FF110 (header
0x01d03fc529e8)
 -- This is a hack to test that export/import are reversible
 -- This uses knowledge about the inner workings of large object mechanism
 -- which should not be used outside it.  This makes it a HACK
 SELECT pageno, data FROM pg_largeobject WHERE loid = (SELECT loid from
lotest_stash_values)
 EXCEPT
 SELECT pageno, data FROM pg_largeobject WHERE loid = :newloid;
- pageno | data
-+--
-(0 rows)
+ pageno



By and large, it's more efficient
> for us to leak small allocations and recover them at context reset or
> delete than it is to pfree those allocations retail.

As long as you don't allocate memory in advance and unnecessarily.
Even for small amounts, it is an expensive operation.
And compilers are not able to remove it themselves.

  Sure, if we're
> talking about big allocations, or if there will be a lot of such
> allocations during the lifetime of the context, we'd better do the
> retail pfrees.  Sadly, such criteria are outside Coverity's ken.
>
Coverity has gotten better and better,
but understanding the complexity in Postgres, for memory usage,
is still far away.

best regards,
Ranier Vilela


Re: as per commit 643a1a61985bef2590496, move create/open dir code to function using switch case of pg_backup_directory.c file also

2025-04-11 Thread Álvaro Herrera
I don't understand why the routine is called "create_or_open_dir".  In
what sense does this open the directory?  I think "check_or_create_dir"
would be closer to what this seem to be doing.

Is there no TOCTTOU bug in pg_dumpall because of the way this code is
written?  A malicious user that can create an empty directory that
pg_dumpall is going to use as output destination could remove it after
the opendir(), then replace it with another directory with a symlink
called "global.dat" that causes some other file to be overwritten with
the privileges of the user running pg_dumpall.  Maybe there's no problem
here, but I don't see what the explanation for that is.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




add some more error checks into _fileExistsInDirectory function to report "too long name" error

2025-04-11 Thread Mahendra Singh Thalor
Hi,
In another thread[1], Álvaro gave some feedback for _fileExistsInDirectory
function for "too long name" error.
Basically, in _fileExistsInDirectory function, we pass dirname and filename
but we were checking only the combined length of these two names.

Here, I am attaching a patch which will check lengths of dirname and
filename separately and will report errors if the name is too long.

I added a check in some other parts also to report an error for "too long
name".

Please review the attached patch and let me know feedback.

[1]:
https://www.postgresql.org/message-id/202504110938.4kx73ylnv6p4%40alvherre.pgsql

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


v01-add-some-more-error-checks-into-_fileExistsInDirecto.patch
Description: Binary data


Re: merge file_exists_in_directory and _fileExistsInDirectory functions and move into common file dumputils.c

2025-04-11 Thread Mahendra Singh Thalor
On Fri, 11 Apr 2025 at 10:21, Michael Paquier  wrote:
>
> On Thu, Apr 10, 2025 at 10:41:33PM +0530, Mahendra Singh Thalor wrote:
> > We have file_exists_in_directory function in pg_restore.c and same
> > code we are using in _fileExistsInDirectory function in
pg_backup_archiver.c
> > also.
> > Here, I am attaching a patch to move these duplicate functions into
> > dumputils.c file
>
> Indeed.  I don't quite see a reason not to remove this duplication,
> and both routines in pg_restore.c and the pg_dump code are the same.

Thanks Michael for the feedback.

>
> dumputils.h is only used by pg_dump and pg_dumpall, and its top
> comment documents exactly that, so using this location for a routine
> that would be used by a pg_restore path is a bit strange to me for
> something that is qualified as a "dump" routine in your patch.
>
> Perhaps we should just use a more centralized place, like file_utils.c
> so as all frontends could benefit of it?

I tried to add it into file_utils.c but I was getting many "symbols not
found errors"  so I moved this common function into dumputils.h as we have
another common function in that file. (Ex; create_or_open_dir)

If we want to move this function into file_utils.c, then I can try to
rewrite the patch.

>
> Please make sure to add it to the next commit fest that will begin in
> July, this refactoring proposal is too late to be considered for v18.
> --
> Michael

Okay. Thank you. I will add it.

On Fri, 11 Apr 2025 at 15:08, Álvaro Herrera  wrote:
>
> On 2025-Apr-11, Michael Paquier wrote:
>
> > Perhaps we should just use a more centralized place, like file_utils.c
> > so as all frontends could benefit of it?
>
> I'm not sure about that.  This code looks to be making too many
> assumptions that aren't acceptable for a general routine, such as
> complaining only that the directory name is long without the possibility
> that the culprit is the file name.  It's more or less okay in current
> uses because they're all using harcoded short names, but that would not
> hold in general.  At the same time, isn't every call of this routine a
> potential TOCTTOU bug?  Again it's probably fine for the current code,
> but I wouldn't be too sure about making this generally available as-is.
>
> --
> Álvaro Herrera   48°01'N 7°57'E  —
https://www.EnterpriseDB.com/
> "Oh, great altar of passive entertainment, bestow upon me thy discordant
images
> at such speed as to render linear thought impossible" (Calvin a la TV)

Thanks Álvaro for the feedback.

/*
>  * file_exists_in_directory
>  *
>  * Returns true if the file exists in the given directory.
>  */
> static bool
> file_exists_in_directory(const char *dir, const char *filename)
> {
> struct stat st;
> charbuf[MAXPGPATH];
>
> if (strlen(dir) >= MAXPGPATH)
> pg_fatal("directory name too long: \"%s\"", dir);
>
> if (strlen(filename) >= MAXPGPATH)
> pg_fatal("file name too long: \"%s\"", filename);
>
> /* Now check path length of dir/filename */
> if (snprintf(buf, MAXPGPATH, "%s/%s", dir, filename) >= MAXPGPATH)
> pg_fatal("combined name of directory:\"%s\" and file:\"%s\" is too
> long", filename, dir);
>
> return (stat(buf, &st) == 0 && S_ISREG(st.st_mode));
> }
>

I did changes as per above code and added some checks in code to give an
error for "too long" name. I started a new thread[1] for "too long names"
check and later I will post an updated patch here to move
duplicate functions into one common file.

[1] :
https://www.postgresql.org/message-id/CAKYtNApPmWmU9rdf__D=cA7ivL6H_UrPc=w0chw74p2acxj...@mail.gmail.com

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


Re: add some more error checks into _fileExistsInDirectory function to report "too long name" error

2025-04-11 Thread Daniel Gustafsson
> On 11 Apr 2025, at 14:26, Mahendra Singh Thalor  wrote:
> 
> Hi,
> In another thread[1], Álvaro gave some feedback for _fileExistsInDirectory 
> function for "too long name" error.
> Basically, in _fileExistsInDirectory function, we pass dirname and filename 
> but we were checking only the combined length of these two names.

My interpretation of the original problem in the other thread is that the
errormessage isn't applicable for a generic function as it only mention
directory, not that checking the combination is inherently wrong.

> Here, I am attaching a patch which will check lengths of dirname and filename 
> separately and will report errors if the name is too long.

Since we only care about the combination of directory and filename, do we
really gain much by using separate checks?  A proposed filename exceeding
MAXPGPATH should be pretty rare in production I'd hope.

+   if (snprintf(buf, MAXPGPATH, "%s/%s", dir, filename) >= MAXPGPATH)
+   pg_fatal("combined name of directory:\"%s\" and file:\"%s\" is too 
long", filename, dir);

snprintf() will return a negative value in case of an error so if we really
want to clamp down on path generation we should probably check that as well.

--
Daniel Gustafsson





Re: RFC: Allow EXPLAIN to Output Page Fault Information

2025-04-11 Thread torikoshia

On 2025-03-25 10:27, torikoshia wrote:

On 2025-03-22 20:23, Jelte Fennema-Nio wrote:

On Wed, 19 Mar 2025 at 14:15, torikoshia  
wrote:
BTW based on your discussion, I thought this patch could not be 
merged

anytime soon. Does that align with your understanding?


Yeah, that aligns with my understanding. I don't think it's realistic
to get this merged before the code freeze, but I think both of the
below issues could be resolved.


- With bgworker-based AIO, this patch could mislead users into
underestimating the actual storage I/O load, which is undesirable.


To resolve this, I think the patch would need to change to not report
anything if bgworker-based AIO is used.


Agreed.
I feel the new GUC io_method can be used to determine whether
bgworker-based AIO is being used.


I took this approach and when io_method=worker, no additional output is 
shown in the attached patch.



--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.From e80e53eb36f7909ca8638b26d3cd61a58a5bc889 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 11 Apr 2025 22:01:22 +0900
Subject: [PATCH v4] Add storage I/O tracking to 'BUFFERS' option

The 'BUFFERS' option currently indicates whether a block hit the shared
buffer, but does not distinguish between a cache hit in the OS cache or
a storage I/O operation.
While shared buffers and OS cache offer similar performance, storage
I/O is significantly slower in comparison in general.  By measuring
the numbers of storage I/O read and write, we can better identify if
storage I/O is a bottleneck in performance.

Added tracking of storage I/O usage by calling getrusage(2) at both the
planning and execution phase start and end points.

A more granular approach as well as current BUFFERS option(tracking at
each plan node) was considered but found to be impractical due to the
high performance cost of frequent getrusage() calls.

This output is shown when io_method=worker, since asynchronous workers
handle I/O for multiple processes, and isolating the EXPLAIN target's
I/O is difficult.

TODO:
I believe this information is mainly useful when used in auto_explain.
I'll implement it later.
---
 doc/src/sgml/ref/explain.sgml   |  25 +-
 src/backend/access/brin/brin.c  |   8 +-
 src/backend/access/gin/gininsert.c  |   8 +-
 src/backend/access/nbtree/nbtsort.c |   8 +-
 src/backend/commands/explain.c  | 125 +++-
 src/backend/commands/prepare.c  |   8 +
 src/backend/commands/vacuumparallel.c   |   8 +-
 src/backend/executor/execParallel.c |  35 +-
 src/backend/executor/instrument.c   |  79 ++-
 src/include/commands/explain.h  |   1 +
 src/include/executor/execParallel.h |   2 +
 src/include/executor/instrument.h   |  20 +-
 src/include/port/win32/sys/resource.h   |   2 +
 src/port/win32getrusage.c   |   4 +
 src/test/regress/expected/explain_1.out | 849 
 src/tools/pgindent/typedefs.list|   1 +
 16 files changed, 1145 insertions(+), 38 deletions(-)
 create mode 100644 src/test/regress/expected/explain_1.out

diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 9ed1061b7f..493afe6a34 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -201,10 +201,27 @@ ROLLBACK;
   query processing.
   The number of blocks shown for an
   upper-level node includes those used by all its child nodes.  In text
-  format, only non-zero values are printed.  Buffers information is
-  included by default when ANALYZE is used but
-  otherwise is not included by default, but can be enabled using this
-  option.
+  format, only non-zero values are printed.
+  If possible, this option also displays the number of read and write
+  operations performed on storage during the planning and execution phases,
+  shown at the end of the plan. These values are obtained from the
+  getrusage() system call. Note that on platforms that
+  do not support getrusage(), such as Windows, no output
+  will be shown, even if reads or writes actually occur. Additionally, even
+  on platforms where getrusage() is supported, if the
+  kernel is built without the necessary options to track storage read and
+  write operations, no output will be shown.  Also, When
+  io_method is set to worker, no output
+  will be shown, as I/O handled by asynchronous workers cannot be measured
+  accurately.
+  The timing and unit of measurement for read and write operations may vary
+  depending on the platform. For example, on Linux, a read is counted only
+  if this process caused data to be fetched from the storage layer, and a
+  write is counted at the page-dirtying time. On Linux, the unit of
+  measurement for read and write operations is 512 bytes.
+  Buffers information is included by default when ANALYZE
+  is used but otherwise is not inc

COALESCE with single argument looks like identity function

2025-04-11 Thread Maksim Milyutin

Hello everyone!


I've noticed that COALESCE function doesn't converge to argument 
expression if it is alone in argument list of COALESCE as part 
simplification routine for expressions in planner. This might suppress 
further useful transformations when non-strict ops are required from 
some expression like converging OUTER JOIN to INNER one with WHERE qual 
containing COALESCE over single column from inner side.


The patch of transformation in question for COALESCE is attached.


--
Best regard,
Maksim Milyutin
From 1287610efa3895a0ababfc66f346a6a7c7edf9b9 Mon Sep 17 00:00:00 2001
From: Maksim Milyutin 
Date: Fri, 11 Apr 2025 15:43:42 +0300
Subject: [PATCH v1] Simplify COALESCE with single argument

---
 src/backend/optimizer/util/clauses.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 26a3e050086..60f33839214 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -3332,6 +3332,8 @@ eval_const_expressions_mutator(Node *node,
 	return (Node *) makeNullConst(coalesceexpr->coalescetype,
   -1,
   coalesceexpr->coalescecollid);
+if (list_length(newargs) == 1)
+	return (Node *) linitial(newargs);
 
 newcoalesce = makeNode(CoalesceExpr);
 newcoalesce->coalescetype = coalesceexpr->coalescetype;
-- 
2.43.0



Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions

2025-04-11 Thread Andres Freund
Hi,

On 2025-04-10 13:50:36 +, Bertrand Drouvot wrote:
> Thanks for the patch! That sounds like a great addition. I was doing some
> tests and did not see any issues. Also while doing the tests I thouhgt that it
> could be useful to evict only from a subset of NUMA nodes (now that NUMA
> awareness is in). We'd need to figure out what to do for buffers that are 
> spread
> across NUMA nodes though.
>
> Does that sound like an idea worth to spend time on? (If so, I'd be happy to 
> work
> on it).

I'm not sure that's common enough to warrant its own function. You can do that
with pg_buffercache_evict(), it'll be slower than pg_buffercache_evict_all(),
but given that determining the numa node already is somewhat expensive, I'm
not sure it's going to make that big a difference.

Greetings,

Andres Freund




Re: Correct documentation for protocol version

2025-04-11 Thread Fujii Masao




On 2025/04/11 18:27, Dave Cramer wrote:



On Fri, 11 Apr 2025 at 05:05, Fujii Masao mailto:masao.fu...@oss.nttdata.com>> wrote:



On 2025/04/11 5:17, Dave Cramer wrote:
 > No, you are correct.
 >
 > See new patch

Thanks for updating the patch!

-         Identifies the message as a protocol version negotiation
+         Identifies the message as a protocol version negotiation.
+         The server sends this message if the requested protocol is
+         not equal to the version the server supports or the client
+         requests protocol options that are not recognized.
            message.

You added the sentence starting with "The server sends..."
between "negotiation" and "message", but it should be placed
after "message", right?

Even though the requested version is not equal to the latest
version that the server supports, if it's older than
the latest one, the message is not sent. So how about
wording it like this instead:

-
Identifies the message as a protocol version negotiation message.
The server sends this message when the client requests a newer
protocol version than the server supports, or when the client
includes protocol options that the server does not recognize.
-

+         The protcol version requested by the client unless it is higher 
than the
+         latest version we support in which case the latest protocol 
version we support.

Maybe rewording this for clarity and using “the server
instead of “we” would help. For example:

-
The latest protocol version supported by the server if the client
requests a newer protocol version than the server supports.
The protocol version requested by the client, otherwise.
-


Reworded as suggested


Thanks for updating the patch!


While checking the code in older branches, I noticed that the returned
protocol version is always the latest version supported by the server.
However, as we discussed, in master, the server may return the version
requested by the client. The change was introduced in commit 516b87502dc.
So, probably we'll need to update the documentation differently for
master and the older branches.


The patch adds a new explanation about when the NegotiateProtocolVersion
message is sent. But a similar explanation already exists in protocol.sgml:

  NegotiateProtocolVersion
  
   
The server does not support the minor protocol version requested
by the client, but does support an earlier version of the protocol;
this message indicates the highest supported minor version.  This
message will also be sent if the client requested unsupported protocol
options (i.e., beginning with _pq_.) in the
startup packet.

Given that, I'm now wondering if the new description in the patch
might be redundant.


Also, your original concern was that the phrase "Newest minor protocol version"
is inaccurate since the field contains both major and minor version numbers
(e.g., 3.2). However, based on other usage in protocol.sgml and source
comments in related code, "minor version" seems to refer to the full version
like 3.2, i.e., not just the minor part, so we might not need to reword it
after all.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION





Re: Regression test fails when 1) old PG is installed and 2) meson/ninja build is used

2025-04-11 Thread Andres Freund
Hi,

On 2025-04-11 07:53:07 +, Hayato Kuroda (Fujitsu) wrote:
> Dear hackers,
> 
> While creating patches for older branches I found the $SUBJECT. I do not have 
> much knowledge
> for meson thus I'm not sure it is intentional.

> Reproducer
> ===
> I could reproduce the failure with steps:
> 
> 1. install old PG, e.g., PG16. To your system. .so file must be put on your 
> $$LD_LIBRARY_PATH.
> 2. build newer PG, e.g., master, with meson build system [1].
> 3. run regression test and ERROR would be reported [2].
> 
> This issue does not happen when I used autoconf/make build system.
>
> Analysis
> =
> 
> According to the log, the instance could be started but psql could not work 
> correctly:
> 
> ```
> --- stdout ---
> # executing test in /home/hayato/builddir/testrun/regress/regress group 
> regress test regress
> # initializing database system by copying initdb template
> # using temp instance on port 40047 with PID 949892
> Bail out!# test failed
> --- stderr ---
> psql: symbol lookup error: psql: undefined symbol: PQservice
> # command failed: "psql" -X -q -c "CREATE DATABASE \"regression\" ...
> 
> (test program exited with status code 2)
> ==

I can't reproduce this.  For me the psql started by pg_regress is the one in
tmp_install and so is the libpq it links to.

$ killall -STOP psql
$ ps aux|grep psql
andres   3375208  0.0  0.0  28696  9972 pts/5TApr10   0:00 psql tpch_10
andres   3597915  1.0  0.0  36036 10120 ?T09:42   0:00 psql -X -a 
-q -d regression -v HIDE_TABLEAM=on -v HIDE_TOAST_COMPRESSION=on
andres   3597916  1.0  0.0  36036 10120 ?T09:42   0:00 psql -X -a 
-q -d regression -v HIDE_TABLEAM=on -v HIDE_TOAST_COMPRESSION=on
andres   3597918  0.6  0.0  36036 10144 ?T09:42   0:00 psql -X -a 
-q -d regression -v HIDE_TABLEAM=on -v HIDE_TOAST_COMPRESSION=on
andres   3597920  0.3  0.0  36036 10104 ?T09:42   0:00 psql -X -a 
-q -d regression -v HIDE_TABLEAM=on -v HIDE_TOAST_COMPRESSION=on
andres   3597922  0.6  0.0  36036 10120 ?T09:42   0:00 psql -X -a 
-q -d regression -v HIDE_TABLEAM=on -v HIDE_TOAST_COMPRESSION=on
andres   3597955  0.0  0.0   6608  2180 pts/0S+   09:42   0:00 grep psql
$ ls -l /proc/3597918/exe
lrwxrwxrwx 1 andres andres 0 Apr 11 09:42 /proc/3597918/exe -> 
/srv/dev/build/postgres/m-dev-assert/tmp_install/srv/dev/install/postgres/m-dev-assert/bin/psql

$ less /proc/3597918/maps
...
000 103:06 4831894711
/srv/dev/build/postgres/m-dev-assert/tmp_install/srv/dev/install/postgres/m-dev-assert/lib/x86_64-linux-gnu/libpq.so.5.18


And meson-logs/testlog.txt shows that the command is executed with
PATH=/srv/dev/build/postgres/m-dev-assert/tmp_install//srv/dev/install/postgres/m-dev-assert/bin:
LD_LIBRARY_PATH=/srv/dev/build/postgres/m-dev-assert/tmp_install//srv/dev/install/postgres/m-dev-assert/lib/x86_64-linux-gnu

Can you check whether your meson-logs/testlog.txt shows the appropriate
PATH/LD_LIBRARY_PATH and whether libpq is in the right place?

Greetings,

Andres Freund




Re: COALESCE with single argument looks like identity function

2025-04-11 Thread Tom Lane
Maksim Milyutin  writes:
> I've noticed that COALESCE function doesn't converge to argument 
> expression if it is alone in argument list of COALESCE as part 
> simplification routine for expressions in planner. This might suppress 
> further useful transformations when non-strict ops are required from 
> some expression like converging OUTER JOIN to INNER one with WHERE qual 
> containing COALESCE over single column from inner side.

Seems like a reasonable idea --- it's probably a rare case, but the
check is cheap enough.  I'd add some comments though.

Please add this to the open commitfest so we don't lose track of it.

regards, tom lane




Re: disallow ALTER VIEW SET DEFAULT when the corresponding base relation column is a generated column

2025-04-11 Thread Tom Lane
jian he  writes:
> CREATE TABLE gtest1 (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
> CREATE VIEW gtest1v AS SELECT * FROM gtest1;
> ALTER VIEW gtest1v ALTER COLUMN b SET DEFAULT 100;

> INSERT INTO gtest1v VALUES (8, DEFAULT) returning *;
> ERROR:  cannot insert a non-DEFAULT value into column "b"
> DETAIL:  Column "b" is a generated column.

> we can make
> ALTER VIEW gtest1v ALTER COLUMN b SET DEFAULT 100;
> error out,

This is not an improvement over having the error happen at run time.

(1) What if the state of the underlying column changes between the
ALTER VIEW and the INSERT?  Either you have rejected something
that could have worked, or in the other direction you're going to get
the run-time error anyway.

(2) I don't see anything wrong or surprising about the run-time
error anyway, thus I fail to see that this is an improvement,
even aside from (1).

regards, tom lane




Conflicting updates of command progress

2025-04-11 Thread Antonin Houska
While working on [1] I realized that some field of pg_stat_progress_cluste has
weird value. On closer look I found out that cluster_rel() indirectly calls
index_build() and that overwrites the progress parameter. Following are the
parameter values in the master branch:

/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP1
#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP  2
#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES  3
#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP   4
#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES   5
#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX6
#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP7

/* Progress parameters for CREATE INDEX */
/* 3, 4 and 5 reserved for "waitfor" metrics */
#define PROGRESS_CREATEIDX_COMMAND  0
#define PROGRESS_CREATEIDX_INDEX_OID6
#define PROGRESS_CREATEIDX_ACCESS_METHOD_OID8
#define PROGRESS_CREATEIDX_PHASE9   /* 
AM-agnostic phase # */
#define PROGRESS_CREATEIDX_SUBPHASE 10  /* 
phase # filled by AM */
#define PROGRESS_CREATEIDX_TUPLES_TOTAL 11
#define PROGRESS_CREATEIDX_TUPLES_DONE  12
#define PROGRESS_CREATEIDX_PARTITIONS_TOTAL 13
#define PROGRESS_CREATEIDX_PARTITIONS_DONE  14
/* 15 and 16 reserved for "block number" metrics */

The only conflicting parameters I see here are
PROGRESS_CLUSTER_PHASE_REBUILD_INDEX vs PROGRESS_CREATEIDX_INDEX_OID (number
6). Fortunately, index_build() does not set PROGRESS_CREATEIDX_INDEX_OID so
there's no live bug. However [1] adds some PROGRESS_CLUSTER_ parameters, thus
making conflicts realistic.

AFAICS the current design does not consider that one progress-reporting
command can be called by another one. Not sure what the correct fix is. We can
either ignore update requests from the "nested" commands, or display the
progress of both. The latter approach is probably more invasive - is that
worth the effort?

[1] https://commitfest.postgresql.org/patch/5117/

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: Improve a few appendStringInfo calls new to v18

2025-04-11 Thread Nathan Bossart
On Fri, Apr 11, 2025 at 10:40:57AM +1200, David Rowley wrote:
> On Fri, 11 Apr 2025 at 02:51, Nathan Bossart  wrote:
>> This probably isn't v18 material, but this reminds me of my idea to change
>> appendStringInfoString() into a macro for appendBinaryStringInfo() so that
>> the compiler can remove the runtime strlen() calls for string literals [0].
>> In most cases, the benefits are probably negligible, but StringInfo is
>> sometimes used in hot paths.
> 
> That one has come up a few times. The most lengthy discussion I
> remember was in [1]. It didn't come to anything, but I don't think
> there were any objections to it, so maybe we should just do it.
> 
> In the thread I did some measurements of binary size increases.  For
> non-compile-time consts, it does mean putting the strlen() call in the
> calling function, which is a bit of overhead in terms of size.  The
> macro trick I suggested should have fixed that, but I admit the macro
> is a bit ugly. The macro version also still has the overhead of having
> to pass the length of the string when it detects a compile-time const.

Thanks for the additional context.

-- 
nathan




Re: Changing shared_buffers without restart

2025-04-11 Thread Ashutosh Bapat
On Mon, Apr 7, 2025 at 2:13 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> Yes, you're right, plain dynamic Barrier does not ensure all available
> processes will be synchronized. I was aware about the scenario you
> describe, it's mentioned in commentaries for the resize function. I was
> under the impression this should be enough, but after some more thinking
> I'm not so sure anymore. Let me try to structure it as a list of
> possible corner cases that we need to worry about:
>
> * New backend spawned while we're busy resizing shared memory. Those
>   should wait until the resizing is complete and get the new size as well.
>
> * Old backend receives a resize message, but exits before attempting to
>   resize. Those should be excluded from coordination.

Should we detach barrier in on_exit()?

>
> * A backend is blocked and not responding before or after the
>   ProcSignalBarrier message was sent. I'm thinking about a failure
>   situation, when one rogue backend is doing something without checking
>   for interrupts. We need to wait for those to become responsive, and
>   potentially abort shared memory resize after some timeout.

Right.

>
> I think a relatively elegant solution is to extend ProcSignalBarrier
> mechanism to track not only pss_barrierGeneration, as a sign that
> everything was processed, but also something like
> pss_barrierReceivedGeneration, indicating that the message was received
> everywhere but not processed yet. That would be enough to allow
> processes to wait until the resize message was received everywhere, then
> use a global Barrier to wait until all processes are finished.  It's
> somehow similar to your proposal to use two signals, but has less
> implementation overhead.

The way it's implemented in v4 still has the disjoint group problem.
Assume backends p1, p2, p3. All three of them are executing
ProcessProcSignalBarrier(). All three of them updated
pss_barrierReceivedGeneration

/* The message is observed, record that */
pg_atomic_write_u64(&MyProcSignalSlot->pss_barrierReceivedGeneration,
shared_gen);

p1, p2 moved faster and reached following code from ProcessBarrierShmemResize()
if (BarrierAttach(barrier) == SHMEM_RESIZE_REQUESTED)
  WaitForProcSignalBarrierReceived(pg_atomic_read_u64(&ShmemCtrl->Generation));

Since all the processes have received the barrier message, p1, p2 move
ahead and go through all the next phases and finish resizing even
before p3 gets a chance to call ProcessBarrierShmemResize() and attach
itself to Barrier. This could happen because it processed some other
ProcSignalBarrier message. p1 and p2 won't wait for p3 since it has
not attached itself to the barrier. Once p1, p2 finish, p3 will attach
itself to the barrier and resize buffers again - reinitializing the
shared memory, which might has been already modified by p1 or p2. Boom
- there's memory corruption.

Either every process has to make sure that all the other extant
backends have attached themselves to the barrier OR somebody has to
ensure that and signal all the backends to proceed. The implementation
doesn't do either.

>
> * Shared memory address space is now reserved for future usage, making
>   shared memory segments clash (e.g. due to memory allocation)
>   impossible.  There is a new GUC to control how much space to reserve,
>   which is called max_available_memory -- on the assumption that most of
>   the time it would make sense to set its value to the total amount of
>   memory on the machine. I'm open for suggestions regarding the name.

With 0006 applied
+ /* Clean up some reserved space to resize into */
+ if (munmap(m->shmem + m->shmem_size, new_size - m->shmem_size) == -1)
ze, m->shmem)));
... snip ...
+ ptr = mremap(m->shmem, m->shmem_size, new_size, 0);

We unmap the portion of reserved address space where the existing
segment would expand into. As long as we are just expanding this will
work. I am wondering how would this work for shrinking buffers? What
scheme do you have in mind?

>
> * There is one more patch to address hugepages remap. As mentioned in
>   this thread above, Linux kernel has certain limitations when it comes
>   to mremap for segments allocated with huge pages. To work around it's
>   possible to replace mremap with a sequence of unmap and map again,
>   relying on the anon file behind the segment to keep the memory
>   content. I haven't found any downsides of this approach so far, but it
>   makes the anonymous file patch 0007 mandatory.

In 0008
if (munmap(m->shmem, m->shmem_size) < 0)
... snip ...
/* Resize the backing anon file. */
if(ftruncate(m->segment_fd, new_size) == -1)
...
/* Reclaim the space */
ptr = mmap(m->shmem, new_size, PROT_READ | PROT_WRITE,
mmap_flags | MAP_FIXED, m->segment_fd, 0);

How are we preventing something get mapped into the space after
m->shmem + newsize? We will need to add an unallocated but reserved
addressed space map after m->shmem+newsize right?

--
Best Wishes,
Ashutosh Bapat




Re: Correct documentation for protocol version

2025-04-11 Thread Dave Cramer
On Fri, 11 Apr 2025 at 09:39, Fujii Masao 
wrote:

>
>
> On 2025/04/11 18:27, Dave Cramer wrote:
> >
> >
> > On Fri, 11 Apr 2025 at 05:05, Fujii Masao  > wrote:
> >
> >
> >
> > On 2025/04/11 5:17, Dave Cramer wrote:
> >  > No, you are correct.
> >  >
> >  > See new patch
> >
> > Thanks for updating the patch!
> >
> > - Identifies the message as a protocol version negotiation
> > + Identifies the message as a protocol version negotiation.
> > + The server sends this message if the requested protocol is
> > + not equal to the version the server supports or the client
> > + requests protocol options that are not recognized.
> > message.

>
> > You added the sentence starting with "The server sends..."
> > between "negotiation" and "message", but it should be placed
> > after "message", right?
> >
> > Even though the requested version is not equal to the latest
> > version that the server supports, if it's older than
> > the latest one, the message is not sent. So how about
> > wording it like this instead:
> >
> > -
> > Identifies the message as a protocol version negotiation message.
> > The server sends this message when the client requests a newer
> > protocol version than the server supports, or when the client
> > includes protocol options that the server does not recognize.
> > -
> >
> > + The protcol version requested by the client unless it is
> higher than the
> > + latest version we support in which case the latest
> protocol version we support.
> >
> > Maybe rewording this for clarity and using “the server
> > instead of “we” would help. For example:
> >
> > -
> > The latest protocol version supported by the server if the client
> > requests a newer protocol version than the server supports.
> > The protocol version requested by the client, otherwise.
> > -
> >
> >
> > Reworded as suggested
>
> Thanks for updating the patch!
>
>
> While checking the code in older branches, I noticed that the returned
> protocol version is always the latest version supported by the server.
> However, as we discussed, in master, the server may return the version
> requested by the client. The change was introduced in commit 516b87502dc.
> So, probably we'll need to update the documentation differently for
> master and the older branches.
>
>
> The patch adds a new explanation about when the NegotiateProtocolVersion
> message is sent. But a similar explanation already exists in protocol.sgml:
>
>NegotiateProtocolVersion
>
> 
>  The server does not support the minor protocol version requested
>  by the client, but does support an earlier version of the
> protocol;
>  this message indicates the highest supported minor version.  This
>  message will also be sent if the client requested unsupported
> protocol
>  options (i.e., beginning with _pq_.) in the
>  startup packet.
>

Well this isn't quite true since if you request 3.0 and have invalid
options it will return 3.0, which is not the highest supported minor
version.


>
> Given that, I'm now wondering if the new description in the patch
> might be redundant.
>
>
> Also, your original concern was that the phrase "Newest minor protocol
> version"
> is inaccurate since the field contains both major and minor version numbers
> (e.g., 3.2). However, based on other usage in protocol.sgml and source
> comments in related code, "minor version" seems to refer to the full
> version
> like 3.2, i.e., not just the minor part, so we might not need to reword it
> after all.
>

IMO the comments should be changed to reflect reality. If 3.2 is a minor
version what is a major version ?

Dave


Re: Correct documentation for protocol version

2025-04-11 Thread Jelte Fennema-Nio
On Fri, 11 Apr 2025 at 22:57, Dave Cramer  wrote:
> Well this isn't quite true since if you request 3.0 and have invalid options 
> it will return 3.0, which is not the highest supported minor version.

Probably good to update this section too then to be similarly correct
as your already updated section. Maybe also good to clarify further
that the version that the server responds with is the protocol version
that will be used during the following communication.




Re: Changing shared_buffers without restart

2025-04-11 Thread Dmitry Dolgov
> On Fri, Apr 11, 2025 at 08:04:39PM GMT, Ashutosh Bapat wrote:
> On Mon, Apr 7, 2025 at 2:13 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> >
> > Yes, you're right, plain dynamic Barrier does not ensure all available
> > processes will be synchronized. I was aware about the scenario you
> > describe, it's mentioned in commentaries for the resize function. I was
> > under the impression this should be enough, but after some more thinking
> > I'm not so sure anymore. Let me try to structure it as a list of
> > possible corner cases that we need to worry about:
> >
> > * New backend spawned while we're busy resizing shared memory. Those
> >   should wait until the resizing is complete and get the new size as well.
> >
> > * Old backend receives a resize message, but exits before attempting to
> >   resize. Those should be excluded from coordination.
>
> Should we detach barrier in on_exit()?

Yeah, good point.

> > I think a relatively elegant solution is to extend ProcSignalBarrier
> > mechanism to track not only pss_barrierGeneration, as a sign that
> > everything was processed, but also something like
> > pss_barrierReceivedGeneration, indicating that the message was received
> > everywhere but not processed yet. That would be enough to allow
> > processes to wait until the resize message was received everywhere, then
> > use a global Barrier to wait until all processes are finished.  It's
> > somehow similar to your proposal to use two signals, but has less
> > implementation overhead.
>
> The way it's implemented in v4 still has the disjoint group problem.
> Assume backends p1, p2, p3. All three of them are executing
> ProcessProcSignalBarrier(). All three of them updated
> pss_barrierReceivedGeneration
>
> /* The message is observed, record that */
> pg_atomic_write_u64(&MyProcSignalSlot->pss_barrierReceivedGeneration,
> shared_gen);
>
> p1, p2 moved faster and reached following code from 
> ProcessBarrierShmemResize()
> if (BarrierAttach(barrier) == SHMEM_RESIZE_REQUESTED)
>   
> WaitForProcSignalBarrierReceived(pg_atomic_read_u64(&ShmemCtrl->Generation));
>
> Since all the processes have received the barrier message, p1, p2 move
> ahead and go through all the next phases and finish resizing even
> before p3 gets a chance to call ProcessBarrierShmemResize() and attach
> itself to Barrier. This could happen because it processed some other
> ProcSignalBarrier message. p1 and p2 won't wait for p3 since it has
> not attached itself to the barrier. Once p1, p2 finish, p3 will attach
> itself to the barrier and resize buffers again - reinitializing the
> shared memory, which might has been already modified by p1 or p2. Boom
> - there's memory corruption.

It won't reinitialize anything, since this logic is controlled by the
ShmemCtrl->NSharedBuffers, if it's already updated nothing will be
changed.

About the race condition you mention, there is indeed a window between
receiving the ProcSignalBarrier and attaching to the global Barrier in
resize, but I don't think any process will be able to touch buffer pool
while inside this window. Even if it happens that the remapping itself
was blazing fast that this window was enough to make one process late
(e.g. if it was busy handling some other signal as you mention), as I've
showed above it shouldn't be a problem.

I can experiment with this case though, maybe there is a way to
completely close this window to not thing about even potential
scenarios.

> > * Shared memory address space is now reserved for future usage, making
> >   shared memory segments clash (e.g. due to memory allocation)
> >   impossible.  There is a new GUC to control how much space to reserve,
> >   which is called max_available_memory -- on the assumption that most of
> >   the time it would make sense to set its value to the total amount of
> >   memory on the machine. I'm open for suggestions regarding the name.
>
> With 0006 applied
> + /* Clean up some reserved space to resize into */
> + if (munmap(m->shmem + m->shmem_size, new_size - m->shmem_size) == -1)
> ze, m->shmem)));
> ... snip ...
> + ptr = mremap(m->shmem, m->shmem_size, new_size, 0);
>
> We unmap the portion of reserved address space where the existing
> segment would expand into. As long as we are just expanding this will
> work. I am wondering how would this work for shrinking buffers? What
> scheme do you have in mind?

I didn't like this part originally, and after changes to support hugetlb
I think it's worth it just to replace mremap with munmap/mmap. That way
there will be no such question, e.g. if a segment is getting shrinked
the unmapped area will again become a part of the reserved space.

> > * There is one more patch to address hugepages remap. As mentioned in
> >   this thread above, Linux kernel has certain limitations when it comes
> >   to mremap for segments allocated with huge pages. To work around it's
> >   possible to replace mremap with a sequence of unmap and map again,
> >   relying on the anon fil

Re: stats.sql fails during installcheck on mac

2025-04-11 Thread Sami Imseih
> The code in xlog.c filters out the syncs for WAL_SYNC_METHOD_OPEN and
> WAL_SYNC_METHOD_OPEN_DSYNC, wouldn't it be more consistent to do the
> same in the code and the SQL test, using an IN clause with the two
> values that block the syncs rather than a NOT IN clause with the three
> values that allow the syncs?

I actually originally had it this way, but for some reason
felt it would be better to be explicit about the methods we want to test rather
than not test. I can't think of a very compelling reason to go either way, so v2
LGTM.

>> Hmm, that's a little nasty, because it's not showing up in the
>> buildfarm.  It appears from a little testing that the issue only
>> manifests if you have fsync = on, which we generally don't on
> >buildfarm animals.

> right, "make check" does not encounter this because it runs
> with fsync=off, as I mentioned at the top of the thread.

what do you think of this? I think we should set fsync = on
at least for the part of the test that proceeds the 2 checkpoints and
set if back to off at the end of the tests for fsync stats. It is concerning
the tests for the fsync stats are not being exercised in
the buildfarm.


--
Sami Imseih
Amazon Web Services (AWS)




Re: Correct documentation for protocol version

2025-04-11 Thread Jelte Fennema-Nio
On Fri, 11 Apr 2025 at 21:39, Fujii Masao  wrote:
> While checking the code in older branches, I noticed that the returned
> protocol version is always the latest version supported by the server.
> However, as we discussed, in master, the server may return the version
> requested by the client. The change was introduced in commit 516b87502dc.
> So, probably we'll need to update the documentation differently for
> master and the older branches.

No need for different docs. Given that older branches only support 3.0
protocol, there's no way for a client to request a version earlier
than the "latest version supported by the server".

> The patch adds a new explanation about when the NegotiateProtocolVersion
> message is sent. But a similar explanation already exists in protocol.sgml:

Side-comment: I think our protocol docs are pretty annoyingly spread
across two pages.

> Given that, I'm now wondering if the new description in the patch
> might be redundant.
>
>
> Also, your original concern was that the phrase "Newest minor protocol 
> version"
> is inaccurate since the field contains both major and minor version numbers
> (e.g., 3.2). However, based on other usage in protocol.sgml and source
> comments in related code, "minor version" seems to refer to the full version
> like 3.2, i.e., not just the minor part, so we might not need to reword it
> after all.

I quite like the new wording from Dave so +1 from me. I also think for
protocol docs it's especially important to be very precise and leave
very little room for interpretation.

One thing that we should probably clarify though (which was somewhat
clarified in the previous wording) is that we only send this message
if the client requested a major version that the major version that
the server supports. i.e. we will never send a
NegotiateProtocolVersion message to 3.2 if the client requested 4.0.




Re: Some problems regarding the self-join elimination code

2025-04-11 Thread Andrei Lepikhov

On 4/10/25 14:39, Andrei Lepikhov wrote:

On 4/10/25 13:36, Alexander Korotkov wrote:
On Wed, Apr 9, 2025 at 10:39 AM Andrei Lepikhov  
wrote:

It seems we are coming to the conclusion that join removal optimisation
may do something out of ChangeVarNodes resposibility. Before further
complicating of this function code I would like to know opinion of Tom,
who initially proposed [1] to use this routine. May be better a) return
to more specialised change_relid / sje_walker machinery or b) move
ChangeVarNodes out of rewriteManip and make it multi-purpose routine,
allowing to transform expression that may happen after a Var node 
change?


What about adding a callback to ChangeVarNodes_context that would
called for each RestrictInfo after changing varnodes itself?  SJE
could use a callback that replaces OpExpr with NullTest when needed.
I think it is doable, of course. Just looking forward a little, it may 
need more complication in the future (SJE definitely should be widened 
to partitioned tables) and it may be simpler to have two different 
routines for two different stages of planning. 
To provide some food for thought, here is a draft in attachment which 
addresses both issues: RestrictInfo relid replacement and move 
SJE-specific code out of the ChangeVarNodes routine (callback approach).


--
regards, Andrei LepikhovFrom 6b68703d7b38326393da58e42618e4915a0e4590 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" 
Date: Fri, 11 Apr 2025 14:30:33 +0200
Subject: [PATCH v0] Switch the approach to ChangeVarNodes's extensibility.

---
 src/backend/optimizer/plan/analyzejoins.c | 149 +++---
 src/backend/rewrite/rewriteManip.c|  95 ++
 src/include/rewrite/rewriteManip.h|  14 +-
 3 files changed, 161 insertions(+), 97 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 6b58567f511..0f0ed1785e6 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -74,6 +74,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
    List *restrictlist,
    List **extra_clauses);
 static int	self_join_candidates_cmp(const void *a, const void *b);
+static bool ChangeVarNodes_callback(Node *node, void *arg);
 
 
 /*
@@ -397,7 +398,8 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
 		{
 			Assert(subst > 0);
 
-			ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
+			ChangeVarNodesExtended((Node *) sjinf->semi_rhs_exprs, relid, subst,
+	0, ChangeVarNodes_callback);
 		}
 	}
 
@@ -458,7 +460,8 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
 			   sjinfo->ojrelid, subst);
 			Assert(!bms_is_empty(phv->phrels));
 
-			ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
+			ChangeVarNodesExtended((Node *) phv->phexpr, relid, subst, 0,
+	ChangeVarNodes_callback);
 
 			Assert(phv->phnullingrels == NULL); /* no need to adjust */
 		}
@@ -512,7 +515,8 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
 		}
 
 		if (subst > 0)
-			ChangeVarNodes((Node *) otherrel->lateral_vars, relid, subst, 0);
+			ChangeVarNodesExtended((Node *) otherrel->lateral_vars, relid,
+	subst, 0, ChangeVarNodes_callback);
 	}
 }
 
@@ -746,7 +750,8 @@ remove_rel_from_eclass(EquivalenceClass *ec, SpecialJoinInfo *sjinfo,
 		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
 
 		if (sjinfo == NULL)
-			ChangeVarNodes((Node *) rinfo, relid, subst, 0);
+			ChangeVarNodesExtended((Node *) rinfo, relid, subst, 0,
+	ChangeVarNodes_callback);
 		else
 			remove_rel_from_restrictinfo(rinfo, relid, sjinfo->ojrelid);
 	}
@@ -1537,7 +1542,8 @@ update_eclasses(EquivalenceClass *ec, int from, int to)
 		em->em_jdomain->jd_relids = adjust_relid_set(em->em_jdomain->jd_relids, from, to);
 
 		/* We only process inner joins */
-		ChangeVarNodes((Node *) em->em_expr, from, to, 0);
+		ChangeVarNodesExtended((Node *) em->em_expr, from, to, 0,
+ChangeVarNodes_callback);
 
 		foreach_node(EquivalenceMember, other, new_members)
 		{
@@ -1571,7 +1577,8 @@ update_eclasses(EquivalenceClass *ec, int from, int to)
 			continue;
 		}
 
-		ChangeVarNodes((Node *) rinfo, from, to, 0);
+		ChangeVarNodesExtended((Node *) rinfo, from, to, 0,
+ChangeVarNodes_callback);
 
 		/*
 		 * After switching the clause to the remaining relation, check it for
@@ -1666,6 +1673,108 @@ add_non_redundant_clauses(PlannerInfo *root,
 	}
 }
 
+static bool
+ChangeVarNodes_callback(Node *node, void *arg)
+{
+	ChangeVarNodes_context *context = (ChangeVarNodes_context *) arg;
+
+	if (node == NULL)
+		return false;
+
+	if (IsA(node, RangeTblRef))
+	{
+		return false;
+	}
+	else if (IsA(node, RestrictInfo))
+	{
+		RestrictInfo   *rinfo = (RestrictInfo *) node;
+		intrelid = -1;
+		bool			is_req_equal =
+			(rinfo->required_relids == rinfo->clause_relids);
+		bool			is_multiple =
+		(bms_membership(rinfo->clause_relids) == B

Re: Feature Recommendations for Logical Subscriptions

2025-04-11 Thread YeXiu
Amit Kapila, Yes, as you mentioned, but I’d like to add that when using the 
exclusion method for newly added columns, there’s no need to modify the 
publication. This is similar to how fields are automatically synchronized when 
columns are unspecified during initial setup. This is also a key reason why 
this approach is valuable.


YeXiu
1518981...@qq.com







 原始邮件
 
   
发件人:Amit Kapila 

Re: Feature Recommendations for Logical Subscriptions

2025-04-11 Thread YeXiu
Another permission-related issue involves scenarios where multiple logical 
replication slots exist. If a replication slot grants full data access 
permissions and user accounts are not explicitly bound to specific slots, there 
could be security risks where accounts might connect to high-privilege 
replication slots, potentially leading to data security vulnerabilities.


YeXiu
1518981...@qq.com







 原始邮件
 
   
发件人:Amit Kapila 

Re: Improve a few appendStringInfo calls new to v18

2025-04-11 Thread Tom Lane
Peter Eisentraut  writes:
> Would it be useful to augment appendStringInfo() something like this:

> if (VA_ARGS_NARGS() == 0)
>  return appendStringInfoString(str, fmt);

That would change the behavior in edge cases, for instance
appendStringInfo(str, "foo%%bar").  Maybe we'd never hit those,
but on the whole I'm not in love with the idea.

regards, tom lane




Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-11 Thread Wolfgang Walther

Jacob Champion:

On Wed, Apr 9, 2025 at 4:42 PM Jelte Fennema-Nio  wrote:

I think your suggestion of not using any .so files would best there (from w 
user perspective). I'd be quite surprised if a static build still resulted in 
me having to manage shared library files anyway.

Done this way in v5. I had planned to separate the implementations by
a #define, but I ran into issues with Makefile.shlib, so I split the
shared and dynamic versions into separate files. I just now realized
that we do something about this exact problem in src/common, so I'll
see if I can copy its technique for the next go round.


I tried to apply this patch to nixpkgs' libpq build [1]. First, I pinned 
a recent commit from master (one where the v5 patch will apply cleanly 
later) and enabled --with-libcurl [2].


At this stage, without the patch applied, I observe the following:

1. The default, dynamically linked, build succeeds and libpq.so is 
linked to libcurl.so as expected!


2. The statically linked build fails during configure:

  checking for curl_multi_init in -lcurl... no
  configure: error: library 'curl' does not provide curl_multi_init

config.log tells me that it can't link to libcurl, because of undefined 
references, for example:


  undefined reference to `psl_is_cookie_domain_acceptable'
  undefined reference to `nghttp2_session_check_request_allowed'

I assume the many libs listed in Libs.private in libcurl.pc are not 
added automatically for this check?



Next, I applied the v5 patch and:

3. Running the same build as in step 1 above (dynamically linked), I can 
see that libpq.so does have some reference to dlopen / libpq-oauth in it 
- good. But libpq-oauth.so itself is not built. The commands I am using 
to build just the libpq package are essentially like this:


  make submake-libpgport
  make submake-libpq
  make -C src/bin/pg_config install
  make -C src/common install
  make -C src/include install
  make -C src/interfaces/libpq install
  make -C src/port install

I tried adding "make submake-libpq-oauth", but that doesn't exist.

When I do "make -C src/interfaces/libpq-oauth", I get this error:

  make: *** No rule to make target 'oauth-curl.o', needed by 
'libpq-oauth-18.so'.  Stop.


Not sure how to proceed to build libpq-oauth.so.


4. The statically linked build fails with the same configure error as above.


I can only test autoconf right now, not meson - don't have a working 
setup for that, yet.


Best,

Wolfgang

[1]: 
https://github.com/NixOS/nixpkgs/blob/master/pkgs/servers/sql/postgresql/libpq.nix






someone else to do the list of acknowledgments

2025-04-11 Thread Peter Eisentraut

I would like for someone else to prepare the list of acknowledgments in
the release notes this year.

I have been preparing the list of acknowledgments in the release notes
(example: [0]) since PostgreSQL 10 (launched from discussions at PGCon
2017 [1]).  I'm looking to hand this off now, so that I'm not hogging
this job forever.

[0]: 
https://www.postgresql.org/docs/17/release-17.html#RELEASE-17-ACKNOWLEDGEMENTS
[1]: 
https://wiki.postgresql.org/wiki/PgCon_2017_Developer_Meeting#Release_notes_scope.2C_and_giving_credit


I'm happy to train the next person and hand them my tips and scripts,
or they can of course define their own processes.

So that prospective candidates know what they are getting into, the
(my) process is approximately:

1. collect names from git logs in semi-automated way
2. sort, organize, fix, and normalize names
3. check manually against git log
4. commit
5. fix up based on public feedback
6. keep updated until release

The whole thing might take about 20 to 30 hours wall-clock time.

I have found it not useful to start this too early, since you'll get a
lot of new names during the beta period.  I have lately usually
started after the August beta release.  (Or you can start early and
keep it updated.  Again, it's your process.)

Anyone can do this, you don't need to be a committer or developer (but
you'll need to be able to produce a well-formed documentation patch).
However, I suggest that because there is a fair amount of work to
normalize, fix, and transliterate names, it would help if you've been
around for a while and have some passing familiarity with the names of
the people around here.  Also, since this list is often cited for
public credit, some care and attention to detail is needed.

So, there is some time to think about this.  Please discuss here if
you're interested or have questions.

(This is presupposing that we still want to do this.  If you have
other ideas for a better list or no list, this is also the time to
discuss this.)




Re: Things I don't like about \du's "Attributes" column

2025-04-11 Thread David G. Johnston
On Sun, Feb 9, 2025 at 2:11 AM Pavel Luzanov 
wrote:

>
> I don't understand from new commitfest transition guidance
> what to do with status of commitfest entry in this case.
> May be it needs to be closed. And in a case I will be able to propose
> a new version, I will open a new entry.
>
> The commitfest entry now has Needs Review status and stayed in
> the closed January commitfest.
>
> 0. 
> https://www.postgresql.org/message-id/flat/003e3a66-8fcc-4ca0-9e0e-c0afda1c9424%40eisentraut.org
> 1. https://commitfest.postgresql.org/51/4738/
> 2. 
> https://www.postgresql.org/message-id/5341835b-e7be-44dc-b6e5-400e9e3f3...@postgrespro.ru
> 3. 
> https://www.postgresql.org/message-id/CA%2BTgmoZ_uGDb3N8AKHG6nOc5HZPp5Y_ogFhrRbhoVnPHN%2B4t3g%40mail.gmail.com
>
>
As author it is encouraged that you decide whether Waiting on Author or
Withdrawn is the desired status for this patch if you do not want it, as
presented, to be committed.  If you are content with it being committed
as-is it should be marked Review Needed in an Open Commitfest.  Abandoning
this approach and going for what Robert suggested would suggest withdrawing
this CF entry.

However, I do think we are at something committable, though I'd make one,
maybe two, changes to v8.

Valid until -> Password valid until: the timestamp value already forces a
wide column, adding the word Password to the header to clarify what is
valid simply provides the same context that the create role page provides
when it shows the valid until attribute immediately below the password
attribute.  Leaving "valid until" alone retains the attribute name tieback.

Connection limit -> Con. limit: maybe this gets rejected on translation
grounds but the abbreviation here seems obvious and shaves 7 characters off
the mandatory width for a column that occupies 12 characters more than the
values require.

Even without those changes I as reviewer would concur with the proposal and
try to move this on from bike-shedding to a go/no-go decision (by marking
it Ready to Commit) as to whether this is an improvement over the status
quo.

David J.


Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-11 Thread Peter Eisentraut

On 08.04.25 19:44, Jacob Champion wrote:

Would anybody following along be opposed to a situation where
- dynamiclib builds go through the dlopen() shim
- staticlib builds always rely on statically linked symbols


If this can be implemented in a straightforward way, that would be the 
best way, I think.






remove unnecessary explicit type conversion (to char) for appendStringInfoChar function calls

2025-04-11 Thread Mahendra Singh Thalor
Hi,
In the current master code, 3 places we are using appendStringInfoChar
call with explicit type conversion into char. This is not needed as
all other places, we are using direct character to append.

--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -302,7 +302,7 @@ InteractiveBackend(StringInfo inBuf)
 */

/* Add '\0' to make it look the same as message case. */
-   appendStringInfoChar(inBuf, (char) '\0');
+   appendStringInfoChar(inBuf, '\0');

Here, I am attaching a small patch to fix these 3 type conversions on head.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


v01_remove-unnecessary-type-conversion-into-char-for-appendStringInfoChar.patch
Description: Binary data


Re: disallow ALTER VIEW SET DEFAULT when the corresponding base relation column is a generated column

2025-04-11 Thread David G. Johnston
On Friday, April 11, 2025, Tom Lane  wrote:

> jian he  writes:
> > CREATE TABLE gtest1 (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
> > CREATE VIEW gtest1v AS SELECT * FROM gtest1;
> > ALTER VIEW gtest1v ALTER COLUMN b SET DEFAULT 100;
>
> > INSERT INTO gtest1v VALUES (8, DEFAULT) returning *;
> > ERROR:  cannot insert a non-DEFAULT value into column "b"
> > DETAIL:  Column "b" is a generated column.
>
> > we can make
> > ALTER VIEW gtest1v ALTER COLUMN b SET DEFAULT 100;
> > error out,
>
> This is not an improvement over having the error happen at run time.
>
> (1) What if the state of the underlying column changes between the
> ALTER VIEW and the INSERT?  Either you have rejected something
> that could have worked, or in the other direction you're going to get
> the run-time error anyway.
>

I concur.  The view is only loosely coupled to the base relation, via the
rewrite rule which is applied at runtime.  Putting checks in place that
strongly couples the two relations adds a coupling burden that we are
better off avoiding.

David J.


Re: getting "shell command argument contains a newline or carriage return:" error with pg_dumpall when db name have new line in double quote

2025-04-11 Thread Nathan Bossart
On Thu, Apr 10, 2025 at 11:58:41PM +0530, Mahendra Singh Thalor wrote:
> As per above discussions, for v18, we will not do any change to server
> side to fix the issue of \n\r in database names. But as a cleanup
> patch, we can give an alert to the user by "pg_upgrade --check". As
> per current code, pg_dump and pg_upgrade will fail with "shell
> command" error but in the attached patch, we will give some extra info
> to the user by "pg_upgrade --check" so that they can fix database
> names before trying to upgrade.
> 
> Here, I am attaching a patch which will give a list of invalid
> database names in "pg_upgrade --check". We can consider this as a
> cleanup patch.

Are you proposing this for v18?  I think this is all v19 material at this
point.  Perhaps we could argue this is a bug fix that should be
back-patched, but IMHO that's a bit of a stretch.  I don't sense a
tremendous amount of urgency, either.

-- 
nathan




Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler.

2025-04-11 Thread Rahila Syed
Hi Daniel,

Thank you for your review and code improvements.

Please find below some observations.

1. dsm_unpin_mapping(dsm_segment *seg)
+   if (CurrentResourceOwner &&
IsResourceOwnerReleasing(CurrentResourceOwner))
+   return;

Given that the function can return without setting resowner to a
CurrentResourceOwner which is not NULL, shall we change the function
signature to return true when "unpin" is successful and false when not?
This behavior existed earlier too, but adding the check has made it
explicit.
Although this function is not currently in use, it would be beneficial to
make
the API more verbose.

2.  If value of IsResourceOwnerReleasing changes between
dsm_create_descriptor
and attach_internal, the dsm segment and dsa area will end up with
different resource owners.
Earlier the chances of CurrentResourceOwner changing between the two
functions were zero.
May be something can be done to keep resowner assignments under both these
functions
in sync.

Thank you,
Rahila Syed


Re: n_ins_since_vacuum stats for aborted transactions

2025-04-11 Thread Sami Imseih
I spent some time thinking about this today.

> "The tuple counters below, except where noted, are incremented even if the 
> transaction aborts."

I like this idea, and I think it fits good as a blurb under "27.2.2.
Viewing Statistics"

I suggest a slight re-write however.

+  
+   An aborted transaction will also increment tuple-related counters,
unless otherwise noted.
+  

> So, here are the relevant counters, with their treatment of aborted 
> transaction tuples:
>
> seq_tup_read - says live
> idx_tup_fetch - says live
> n_tup_ins - default notice
> n_tup_upd - default notice
> n_tup_del - default notice
> n_mod_since_analyze - inline reason for non-default
> n_ins_since_vacuum - default notice

All the counters mentioned above will increment the number of rows
modified/accessed even in the case of an aborted transaction, except
for n_mod_since_analyze.

> n_live_tup - says live (is this a counter?)
> n_dead_tup - says dead (is this a counter?)

They are not values that are purely incremental. They are incremented
by insert/update/delete for committed transactions, but are also
updated
by VACUUM or VACUUM FULL. So, these will need some inlined description
of their behavior,

> I'm also thinking to reword n_tup_upd, something like:
>
> Total number of rows updated.  Subsets of these updates are also tracked in 
> n_tup_hot_upd and n_tup_newpage_upd to facilitate performance monitoring.

I think the current explanation is clear enough, I am also not too
thrilled about the "...to facilitate performance monitoring." since
the cumulative stats system
as a whole is known to be used to facilitate perf monitoring.

What do you think of the attached?

--
Sami Imseih
Amazon Web Services (AWS)



--
Sami Imseih
Amazon Web Services (AWS)


v3-0001-Clarify-when-aborted-rows-are-tracked-for-tuple-r.patch
Description: Binary data


Re: Horribly slow pg_upgrade performance with many Large Objects

2025-04-11 Thread Nathan Bossart
On Tue, Apr 08, 2025 at 12:22:00PM -0500, Nathan Bossart wrote:
> On Tue, Apr 08, 2025 at 01:07:09PM -0400, Tom Lane wrote:
>> Nathan Bossart  writes:
>>> I do think it's worth considering going back to copying
>>> pg_largobject_metadata's files for upgrades from v16 and newer.
>> 
>> (If we do this) I don't see why we'd need to stop at v16.  I'm
>> envisioning that we'd use COPY, which will be dealing in the
>> text representation of aclitems, and I don't think that's changed
>> in a long time.  The sort of thing that would break it is changes
>> in the set of available/default privilege bits for large objects.
> 
> I was thinking of actually reverting commit 12a53c7 for upgrades from v16,
> which AFAICT is the last release where any relevant storage formats changed
> (aclitem changed in v16).  But if COPY gets us pretty close to that and is
> less likely to be disrupted by future changes, it could be a better
> long-term approach.
> 
>> That is, where the dump currently contains something like
>> 
>> SELECT pg_catalog.lo_create('2121');
>> ALTER LARGE OBJECT 2121 OWNER TO postgres;
>> GRANT ALL ON LARGE OBJECT 2121 TO joe;
>> 
>> we'd have
>> 
>> COPY pg_largeobject_metadata FROM STDIN;
>> ...
>> 2121 10  {postgres=rw/postgres,joe=rw/postgres}
>> ...
>> 
>> and some appropriate COPY data for pg_shdepend too.

I did some more research here.  For many large objects without ACLs to
dump, I noticed that the vast majority of time is going to restoring the
ALTER OWNER commands.  For 1 million such large objects, restoring took ~73
seconds on my machine.  If I instead invented an lo_create_with_owner()
function and created 100 per SELECT command, the same restore takes ~7
seconds.  Copying the relevant pg_shdepend rows out and back in takes ~2.5
seconds.  I imagine using COPY for pg_largeobject_metadata would also take
a couple of seconds in this case.

For upgrading, I don't think there's any huge benefit to optimizing the
restore commands versus using COPY.  It might make future catalog changes
for large object stuff easier, but I'd expect those to be rare.  However,
the optimized restore commands could be nice for non-pg_upgrade use-cases.

-- 
nathan




Re: type cache cleanup improvements

2025-04-11 Thread Noah Misch
On Tue, Oct 22, 2024 at 08:33:24PM +0300, Alexander Korotkov wrote:
> On Tue, Oct 22, 2024 at 6:10 PM Pavel Borisov  wrote:
> > On Tue, 22 Oct 2024 at 11:34, Alexander Korotkov  
> > wrote:
> >> I'm going to push this if no objections.

(This became commit b85a9d0.)

> + /* Call delete_rel_type_cache() if we actually cleared something */
> + if (hadTupDescOrOpclass)
> + delete_rel_type_cache_if_needed(typentry);

I think the intent was to maintain the invariant that a RelIdToTypeIdCacheHash
entry exists if and only if certain kinds of data appear in the TypeCacheHash
entry.  However, TypeCacheOpcCallback() clears TCFLAGS_OPERATOR_FLAGS without
maintaining RelIdToTypeIdCacheHash.  Is it right to do that?




Re: Fixing various typos in comments and docs

2025-04-11 Thread Daniel Gustafsson
> On 3 Mar 2025, at 01:39, Jacob Brazeal  wrote:
> 
> This patch fixes various typos I've found, most of them from recent commits.

Thanks, I've applied the fixes for typos introduced during the v18 cycle.  I
did leave a few out from your patch though:

> - Because not all statistics are not transferred by
> + Because not all statistics are transferred by

I skipped this as it changes the sentence completely rather than fix a typo.
It should perhaps still be fixed but not as part of a typo cleanup.


- * Many thanks to Adisak Pochanayon, who's article about 
SLZ
+ * Many thanks to Adisak Pochanayon, whose article about 
SLZ
This particular case is Jan's personal writing and not documentation so I don't
think we should change that.  The other instance of "who's" is probably a
correct fix but since that's an old typo it would require backpatching to avoid
risking conflicts for backpatching surrounding code so I left that one out as
well.

> Separately from this, I have been working on some tooling to flag typos in 
> new commits. Is that something we'd ever want to automate?

Existing spellcheckers for code usually have quite high rates of false
positives, so any automated tooling would have to avoid that to not become a
burden rather than a help.  Personally I think it's something which is best
suited for manual processing with manual review of findings, much like static
code analysis.

--
Daniel Gustafsson





New committer: Jacob Champion

2025-04-11 Thread Jonathan S. Katz
The Core Team would like to extend our congratulations to Jacob 
Champion, who has accepted an invitation to become our newest PostgreSQL 
committer.


Please join us in wishing Jacob much success and few reverts!

Thanks,

Jonathan


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: New committer: Jacob Champion

2025-04-11 Thread Joe Conway

On 4/11/25 16:26, Jonathan S. Katz wrote:

The Core Team would like to extend our congratulations to Jacob
Champion, who has accepted an invitation to become our newest PostgreSQL
committer.

Please join us in wishing Jacob much success and few reverts!


\o/

Congrats Jacob!

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: New committer: Jacob Champion

2025-04-11 Thread Nathan Bossart
On Fri, Apr 11, 2025 at 01:26:04PM -0700, Jonathan S. Katz wrote:
> The Core Team would like to extend our congratulations to Jacob Champion,
> who has accepted an invitation to become our newest PostgreSQL committer.
> 
> Please join us in wishing Jacob much success and few reverts!

Congrats!

-- 
nathan




Re: Large expressions in indexes can't be stored (non-TOASTable)

2025-04-11 Thread Nathan Bossart
On Wed, Apr 09, 2025 at 08:54:21PM -0300, Euler Taveira wrote:
> LGTM. I have a few suggestions.

Thanks for reviewing.

> +   /*
> +* To avoid needing a TOAST table for pg_replication_origin, we restrict
> +* replication origin names to 512 bytes.  This should be more than enough
> +* for all practical use.
> +*/
> +   if (strlen(roname) > MAX_RONAME_LEN)
> +   ereport(ERROR,
> 
> I wouldn't duplicate the comment. Instead, I would keep it only in origin.h.

Hm.  I agree that duplicating the comment isn't great, but I'm also not
wild about forcing readers to jump to the macro definition to figure out
why there is a length restriction.

> +errdetail("Repilcation origin names must be no longer than 
> %d bytes.",
> +  MAX_RONAME_LEN)));
> 
> s/Repilcation/Replication/

Fixed.

> +#define MAX_RONAME_LEN (512)
> 
> It is just a cosmetic suggestion but I usually use parentheses when it is an
> expression but don't include it for a single number.

We use both styles, but the no-parentheses style does seem to be preferred.

$ grep -E "^#define\s[A-Z_]+\s\([0-9]+\)$" src/* -rI | wc -l
  23
$ grep -E "^#define\s[A-Z_]+\s[0-9]+$" src/* -rI | wc -l
 861

> Should we document this maximum length?

I've added a note.

-- 
nathan
>From 6762a3786b76379c82ccfa4c9da1812e928179b5 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 9 Apr 2025 14:00:31 -0500
Subject: [PATCH v2 1/1] Remove pg_replication_origin's TOAST table.

---
 doc/src/sgml/func.sgml   |  1 +
 src/backend/catalog/catalog.c|  2 --
 src/backend/replication/logical/origin.c | 12 
 src/include/catalog/pg_replication_origin.h  |  2 --
 src/include/replication/origin.h |  7 +++
 src/test/regress/expected/misc_functions.out |  4 
 src/test/regress/expected/misc_sanity.out|  6 +-
 src/test/regress/sql/misc_functions.sql  |  3 +++
 src/test/regress/sql/misc_sanity.sql |  3 +++
 9 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1c5cfee25d1..a9c5c855524 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -29941,6 +29941,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * 
ps.setting::int + :offset

 Creates a replication origin with the given external
 name, and returns the internal ID assigned to it.
+The name must be no longer than 512 bytes.

   
 
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c
index a6edf614606..a7eb60dd2d2 100644
--- a/src/backend/catalog/catalog.c
+++ b/src/backend/catalog/catalog.c
@@ -315,8 +315,6 @@ IsSharedRelation(Oid relationId)
relationId == PgDbRoleSettingToastIndex ||
relationId == PgParameterAclToastTable ||
relationId == PgParameterAclToastIndex ||
-   relationId == PgReplicationOriginToastTable ||
-   relationId == PgReplicationOriginToastIndex ||
relationId == PgShdescriptionToastTable ||
relationId == PgShdescriptionToastIndex ||
relationId == PgShseclabelToastTable ||
diff --git a/src/backend/replication/logical/origin.c 
b/src/backend/replication/logical/origin.c
index 6583dd497da..c17c0348c6e 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -264,6 +264,18 @@ replorigin_create(const char *roname)
SysScanDesc scan;
ScanKeyData key;
 
+   /*
+* To avoid needing a TOAST table for pg_replication_origin, we restrict
+* replication origin names to 512 bytes.  This should be more than 
enough
+* for all practical use.
+*/
+   if (strlen(roname) > MAX_RONAME_LEN)
+   ereport(ERROR,
+   (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+errmsg("replication origin name is too long"),
+errdetail("Replication origin names must be no 
longer than %d bytes.",
+  MAX_RONAME_LEN)));
+
roname_d = CStringGetTextDatum(roname);
 
Assert(IsTransactionState());
diff --git a/src/include/catalog/pg_replication_origin.h 
b/src/include/catalog/pg_replication_origin.h
index deb43065fe9..7ade8bbda39 100644
--- a/src/include/catalog/pg_replication_origin.h
+++ b/src/include/catalog/pg_replication_origin.h
@@ -54,8 +54,6 @@ 
CATALOG(pg_replication_origin,6000,ReplicationOriginRelationId) BKI_SHARED_RELAT
 
 typedef FormData_pg_replication_origin *Form_pg_replication_origin;
 
-DECLARE_TOAST_WITH_MACRO(pg_replication_origin, 4181, 4182, 
PgReplicationOriginToastTable, PgReplicationOriginToastIndex);
-
 DECLARE_UNIQUE_INDEX_PKEY(pg_replication_origin_roiident_index, 6001, 
ReplicationOriginIdentIndex, pg_replication_origin

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

2025-04-11 Thread Alexander Korotkov
On Fri, Apr 11, 2025 at 5:06 AM Andres Freund  wrote:
> On 2025-04-11 00:47:19 +0200, Matthias van de Meent wrote:
> > On Fri, 11 Apr 2025 at 00:27, Andres Freund  wrote:
> > >
> > > Hi,
> > >
> > > On 2025-03-09 14:13:52 +0200, Alexander Korotkov wrote:
> > > > I've revised commit message, comments, formatting etc.
> > > > I'm going to push this if no objections.
> > >
> > > I'm rather confused as to why this is a thing to push at this point? This
> > > doesn't seem to be a bugfix and it's post feature freeze.
> >
> > I think the patch from that mail got committed as 6bb6a62f about a
> > month ago, which was shortly after Alexander's message. Did you get
> > confused about the month of his message, or by the incorrect state of
> > the CF entry?
>
> Sorry for that Alexander - for some reason the email just showed up as new in
> my inbox and I only looked at the date, not the month :(

Not a problem at all!

--
Regards,
Alexander Korotkov
Supabase




Re: type cache cleanup improvements

2025-04-11 Thread Alexander Korotkov
On Fri, Apr 11, 2025 at 11:32 PM Noah Misch  wrote:
>
> On Tue, Oct 22, 2024 at 08:33:24PM +0300, Alexander Korotkov wrote:
> > On Tue, Oct 22, 2024 at 6:10 PM Pavel Borisov  
> > wrote:
> > > On Tue, 22 Oct 2024 at 11:34, Alexander Korotkov  
> > > wrote:
> > >> I'm going to push this if no objections.
>
> (This became commit b85a9d0.)
>
> > + /* Call delete_rel_type_cache() if we actually cleared something */
> > + if (hadTupDescOrOpclass)
> > + delete_rel_type_cache_if_needed(typentry);
>
> I think the intent was to maintain the invariant that a RelIdToTypeIdCacheHash
> entry exists if and only if certain kinds of data appear in the TypeCacheHash
> entry.  However, TypeCacheOpcCallback() clears TCFLAGS_OPERATOR_FLAGS without
> maintaining RelIdToTypeIdCacheHash.  Is it right to do that?

Thank you for the question.  I'll recheck this in next couple of days.

--
Regards,
Alexander Korotkov
Supabase




Re: n_ins_since_vacuum stats for aborted transactions

2025-04-11 Thread David G. Johnston
On Fri, Apr 11, 2025 at 12:33 PM Sami Imseih  wrote:

>
> > I'm also thinking to reword n_tup_upd, something like:
> >
> > Total number of rows updated.  Subsets of these updates are also tracked
> in n_tup_hot_upd and n_tup_newpage_upd to facilitate performance monitoring.
>
> I think the current explanation is clear enough, I am also not too
> thrilled about the "...to facilitate performance monitoring." since
> the cumulative stats system
> as a whole is known to be used to facilitate perf monitoring.
>

Yeah, it was mostly a style thing - I was trying to avoid using
parentheses, but the existing does make the needed point.


> What do you think of the attached?
>
>
WFM.  Though is there a reason to avoid adding the "why" of the exception
for n_mod_since_analyze?

David J.


Re: New committer: Jacob Champion

2025-04-11 Thread Michael Paquier
On Fri, Apr 11, 2025 at 01:26:04PM -0700, Jonathan S. Katz wrote:
> Please join us in wishing Jacob much success and few reverts!

Congratulations and welcome, Jacob!

May your path lead to a peaceful buildfarm and few reverts.
--
Michael


signature.asc
Description: PGP signature


Re: stats.sql fails during installcheck on mac

2025-04-11 Thread Michael Paquier
On Fri, Apr 11, 2025 at 10:44:59AM -0500, Sami Imseih wrote:
> I actually originally had it this way, but for some reason
> felt it would be better to be explicit about the methods we want to test 
> rather
> than not test. I can't think of a very compelling reason to go either way, so 
> v2
> LGTM.

I will proceed with v2 then, thanks.

> what do you think of this? I think we should set fsync = on
> at least for the part of the test that proceeds the 2 checkpoints and
> set if back to off at the end of the tests for fsync stats. It is concerning
> the tests for the fsync stats are not being exercised in
> the buildfarm.

One thing I fear here is the impact for animals with little capacity,
like PIs and the like.  On the other hand, I could just switch one of
my animals to use fsync = on on at least one branch.
--
Michael


signature.asc
Description: PGP signature


Re: New committer: Jacob Champion

2025-04-11 Thread Bharath Rupireddy
Congratulations Jacob!

Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


On Fri, Apr 11, 2025 at 1:26 PM Jonathan S. Katz 
wrote:

> The Core Team would like to extend our congratulations to Jacob
> Champion, who has accepted an invitation to become our newest PostgreSQL
> committer.
>
> Please join us in wishing Jacob much success and few reverts!
>
> Thanks,
>
> Jonathan
>


Re: Fixing various typos in comments and docs

2025-04-11 Thread Jacob Brazeal
Thank you! I had completely forgotten about this, I appreciate that you dug
this one out of the archives!

> Existing spellcheckers for code usually have quite high rates of false
> positives, so any automated tooling would have to avoid that to not
become a
> burden rather than a help.  Personally I think it's something which is
best
>suited for manual processing with manual review of findings, much like
static
> code analysis.

Sounds good.


Re: n_ins_since_vacuum stats for aborted transactions

2025-04-11 Thread Sami Imseih
> WFM.  Though is there a reason to avoid adding the "why" of the exception for 
> n_mod_since_analyze?

I did think about that. I thought it will be understood that since
this is a field that deals with ANALYZE,
it will be understood that only committed changes matter here, and not
worth adding more text to the
description. but, maybe it's worth it?

--
Sami




Re: stats.sql fails during installcheck on mac

2025-04-11 Thread Sami Imseih
> > what do you think of this? I think we should set fsync = on
> > at least for the part of the test that proceeds the 2 checkpoints and
> > set if back to off at the end of the tests for fsync stats. It is concerning
> > the tests for the fsync stats are not being exercised in
> > the buildfarm.
>
> One thing I fear here is the impact for animals with little capacity,
> like PIs and the like.  On the other hand, I could just switch one of
> my animals to use fsync = on on at least one branch.

Yes, there should be some tests running for these stats,
so if it's possible to enable fsync on one or a few animals, that
will be better than nothing.

--
Sami




Re: Add missing PGDLLIMPORT markings

2025-04-11 Thread Peter Eisentraut

On 09.04.25 12:02, Peter Eisentraut wrote:
I ran src/tools/mark_pgdllimport.pl and it detected a few new global 
variables with missing markings.  See attached patch.  Please point out 
if any of these should not be marked or if they are special cases in 
some other way.  I'm Cc'ing various people involved with that new code.


I have committed the remaining ones.





Re: Improve a few appendStringInfo calls new to v18

2025-04-11 Thread Peter Eisentraut

On 10.04.25 05:51, David Rowley wrote:

Looks like v18 has grown a few appendStringInfo misusages, e.g. using
appendStringInfo() when no formatting is needed or just using format
"%s" instead of using appendStringInfoString().


Would it be useful to augment appendStringInfo() something like this:

if (VA_ARGS_NARGS() == 0)
return appendStringInfoString(str, fmt);

?





Re: n_ins_since_vacuum stats for aborted transactions

2025-04-11 Thread David G. Johnston
On Fri, Apr 11, 2025 at 5:19 PM Sami Imseih  wrote:

> > WFM.  Though is there a reason to avoid adding the "why" of the
> exception for n_mod_since_analyze?
>
> I did think about that. I thought it will be understood that since
> this is a field that deals with ANALYZE,
> it will be understood that only committed changes matter here, and not
> worth adding more text to the
> description. but, maybe it's worth it?
>
>
Absent field questions I'm content to think it is sufficiently obvious or
discoverable for others.

David J.


Re: New committer: Jacob Champion

2025-04-11 Thread Srinath Reddy
Congrats, Jacob.

Thanks && Regards,
Srinath Reddy Sadipiralla,
EDB: https://www.enterprisedb.com 


Re: New committer: Jacob Champion

2025-04-11 Thread Amul Sul
On Saturday, 12 April 2025, Jonathan S. Katz  wrote:

> The Core Team would like to extend our congratulations to Jacob Champion,
> who has accepted an invitation to become our newest PostgreSQL committer.
>
> Please join us in wishing Jacob much success and few reverts!
>

Many congratulations, Jacob.

Regards,
Amul


-- 
Regards,
Amul Sul
EDB: http://www.enterprisedb.com


Re: stats.sql fails during installcheck on mac

2025-04-11 Thread Michael Paquier
On Fri, Apr 11, 2025 at 07:37:49PM -0500, Sami Imseih wrote:
> Yes, there should be some tests running for these stats,
> so if it's possible to enable fsync on one or a few animals, that
> will be better than nothing.

I have just done that on batta that only tests HEAD, that's a start.
--
Michael


signature.asc
Description: PGP signature


Small patch fixing a query correctness issue in Gin with operator classes implementing Consistent functions

2025-04-11 Thread Vinod Sridharan
Hi All,

Please find a small patch to fix an existing bug in the GIN index that
impacts correctness of query results for operator classes that use a
consistentFunction and do not specify a triConsistent function. This
patch is against the master and fixes an issue not in any release
branches.

Please find the thread discussing this issue and the fix here:
https://www.postgresql.org/message-id/flat/CAFMdLD4Ks5b%3DCbBh1PjFSytm0zdNv9-ddyeE%2BopusAKCVph7%3Dg%40mail.gmail.com

--
Thanks and Regards,
Vinod Sridharan
[Microsoft]


v1-0001-Fix-shimTriConsistentFn-mutating-the-entryRes-val.patch
Description: Binary data


Re: New committer: Jacob Champion

2025-04-11 Thread Peter Geoghegan
On Fri, Apr 11, 2025 at 4:26 PM Jonathan S. Katz  wrote:
> Please join us in wishing Jacob much success and few reverts!

Well done, Jacob.

-- 
Peter Geoghegan




Re: TOAST versus toast

2025-04-11 Thread David G. Johnston
On Sun, Mar 16, 2025 at 8:33 PM Peter Smith  wrote:

> Thanks for your suggestions. At this point option (1) is looking most
> attractive. Probably, I will just withdraw the CF entry soon unless
> there is some new interest. Just chipping away fixing a few places
> isn't going to achieve the consistency this thread was aiming for.
>
>
I've moved this back to waiting on author pending a final decision.
Interested parties might still chime in but it doesn't seem like it is
actively looking for reviewers at this point.

David J.