Re: pg_stat_statements and "IN" conditions

2025-03-17 Thread Dmitry Dolgov
> On Mon, Mar 17, 2025 at 12:07:44PM GMT, vignesh C wrote:
>
> I noticed that the feedback from Sami at [1] has not yet been
> addressed, I have changed the status to Waiting on Author, kindly
> address them and update the status to Needs review.
> [1] - 
> https://www.postgresql.org/message-id/CAA5RZ0vt29Om%2BtKFOcUNhXV%2BkKpNnj0yj6OFho3-wngcMHWnAQ%40mail.gmail.com

I'm afraid there is a disagreement about this part of the feedback. I'm
not yet convinced about the idea suggested over there (treating mutable
functions in the same way as constants) and not planning to change
anything, at least not in the current version of the patch.




Re: speedup COPY TO for partitioned table.

2025-03-17 Thread newtglobal postgresql_contributors
Hi Jian,
Tested this patch with COPY sales TO STDOUT; ~ 1.909ms, improving performance 
over the older COPY (SELECT * FROM sales) TO STDOUT; ~ 3.80ms method. This 
eliminates query planning overhead and significantly speeds up data export from 
partitioned tables. 
Our test setup involved creating a partitioned table(sales), inserted 500 
records, and comparing execution times.

-- Step 1: Create Partitioned Parent Table
CREATE TABLE sales (
id SERIAL NOT NULL,
sale_date DATE NOT NULL,
region TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL,
category TEXT NOT NULL,
PRIMARY KEY (id, sale_date,region)
) PARTITION BY RANGE (sale_date);

-- Step 2: Create Range Partitions (2023 & 2024)
CREATE TABLE sales_2023 PARTITION OF sales
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01')
PARTITION BY HASH (region);

CREATE TABLE sales_2024 PARTITION OF sales
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
PARTITION BY HASH (region);

-- Step 3: Create Hash Partitions for sales_2023
CREATE TABLE sales_2023_part1 PARTITION OF sales_2023
FOR VALUES WITH (MODULUS 2, REMAINDER 0);

CREATE TABLE sales_2023_part2 PARTITION OF sales_2023
FOR VALUES WITH (MODULUS 2, REMAINDER 1);

-- Step 4: Create Hash Partitions for sales_2024
CREATE TABLE sales_2024_part1 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 2, REMAINDER 0);

CREATE TABLE sales_2024_part2 PARTITION OF sales_2024
FOR VALUES WITH (MODULUS 2, REMAINDER 1);

-- Step 5: Insert Data **AFTER** Creating Partitions
INSERT INTO sales (sale_date, region, amount, category)
SELECT 
('2023-01-01'::DATE + (random() * 730)::int) AS sale_date,  -- Random date 
in 2023-2024 range
CASE WHEN random() > 0.5 THEN 'North' ELSE 'South' END AS region,  -- 
Random region
(random() * 1000)::NUMERIC(10,2) AS amount,  -- Random amount (0 to 1000)
CASE WHEN random() > 0.5 THEN 'Electronics' ELSE 'Furniture' END AS 
category  -- Random category
FROM generate_series(1, 500);

COPY (SELECT * FROM SALES) TO STDOUT;  ~ 1.909ms

COPY SALES TO STDOUT; ~ 3.80ms

This change is recommended for better performance in PostgreSQL partitioned 
tables.

Supporting TCP_SYNCNT in libpq

2025-03-17 Thread Francesco Canovai
This patch introduces support for a `tcp_syn_count` parameter in
libpq, allowing control over the number of SYN retransmissions when
initiating a connection.

The primary goal is to prevent the walreceiver from getting stuck
resending SYNs for an extended period, up to
`net.ipv4.tcp_syn_retries` (127 seconds by default), in the event of
network disruptions.

A specific scenario where this can occur is during a failover in Kubernetes:

* The primary node fails, a standby is promoted, and other standbys
attempt to reconnect to the service representing the new primary.
* The `primary_conninfo` of the standby points to a service, usually
managed via iptables rules.
* If the walreceiver's initial SYN is dropped due to outdated rules,
the connection may remain stranded until the system timeout is
reached.
* As a result, a second standby may reattach after a couple of
minutes. In the case of synchronous replication, this can block the
writes from the application.

In this scenario, `tcp_user_timeout` could close a connection retrying
the SYNs (even though it doesn't seem to do it from the documentation,
it works) the parameter will affect the entire connection.
`connect_timeout`, doesn't work with `PQconnectPoll`, so it won't
prevent the walreceiver from timing out.

Thank you,
Francesco
From 34eb35abdfdc660d90f104544f40331d659718a9 Mon Sep 17 00:00:00 2001
From: Francesco Canovai 
Date: Fri, 31 Jan 2025 13:39:51 +0100
Subject: [PATCH] Support TCP_SYNCNT in libpq

Add a tcp_syn_count parameter to libpq.

This will allow the user to define the maximum amount of retransmission
that TCP will send before aborting the attempt to establish a
connection. This can be used only on systems where TCP_SYNCNT is
available.
---
 doc/src/sgml/libpq.sgml   | 15 ++
 src/interfaces/libpq/fe-connect.c | 49 +++
 src/interfaces/libpq/libpq-int.h  |  1 +
 3 files changed, 65 insertions(+)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 8fa0515c6a0..8029819a25e 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1576,6 +1576,21 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  tcp_syn_count
+  
+   
+Controls the maximum number of SYN retransmissions that TCP
+will send before aborting the attempt to establish a connection.
+A higher value increases the number of retries before giving up,
+while a lower value results in faster failure detection.
+This parameter is only effective on systems where
+TCP_SYNCNT is available; on other systems,
+it has no effect.
+   
+  
+ 
+
  
   replication
   
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index d5051f5e820..889c1c771d4 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -267,6 +267,10 @@ static const internalPQconninfoOption PQconninfoOptions[] = {
 		"TCP-User-Timeout", "", 10, /* strlen(INT32_MAX) == 10 */
 	offsetof(struct pg_conn, pgtcp_user_timeout)},
 
+	{"tcp_syn_count", NULL, NULL, NULL,
+		"TCP-SYN-Count", "", 10, /* strlen(INT32_MAX) == 10 */
+	offsetof(struct pg_conn, tcp_syn_count)},
+
 	/*
 	 * ssl options are allowed even without client SSL support because the
 	 * client can still handle SSL modes "disable" and "allow". Other
@@ -2627,6 +2631,41 @@ setTCPUserTimeout(PGconn *conn)
 	return 1;
 }
 
+/*
+* Set the maximum number of SYN retransmissions.
+*/
+static int
+setTCPSynCnt(PGconn *conn)
+{
+	int			count;
+
+	if (conn->tcp_syn_count == NULL)
+		return 1;
+
+	if (!pqParseIntParam(conn->tcp_syn_count, &count, conn,
+		 "tcp_syn_count"))
+		return 0;
+
+	if (count < 0)
+		count = 0;
+
+#ifdef TCP_SYNCNT
+	if (setsockopt(conn->sock, IPPROTO_TCP, TCP_SYNCNT,
+   (char *) &count, sizeof(count)) < 0)
+	{
+		char		sebuf[PG_STRERROR_R_BUFLEN];
+
+		libpq_append_conn_error(conn, "%s(%s) failed: %s",
+"setsockopt",
+"TCP_SYNCNT",
+SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
+		return 0;
+	}
+#endif
+
+	return 1;
+}
+
 /* --
  * pqConnectDBStart -
  *		Begin the process of making a connection to the backend.
@@ -3368,6 +3407,15 @@ keep_going:		/* We will come back to here until there is
 		}
 	}
 
+	if (addr_cur->family != AF_UNIX)
+	{
+		if (!setTCPSynCnt(conn))
+		{
+			conn->try_next_addr = true;
+			goto keep_going;
+		}
+	}
+
 	/*--
 	 * We have three methods of blocking SIGPIPE during
 	 * send() calls to this socket:
@@ -5008,6 +5056,7 @@ freePGconn(PGconn *conn)
 	free(conn->keepalives_idle);
 	free(conn->keepalives_interval);
 	free(conn->keepalives_count);
+	free(conn->tcp_syn_count);
 	free(conn->sslmode);
 	free(conn->sslnegotiation);
 	free(conn->sslcert);
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index f36f7f19d58..d25f440a940 100644
--- a/src/in

Re: Forbid to DROP temp tables of other sessions

2025-03-17 Thread Daniil Davydov
Hi,

On Mon, Mar 17, 2025 at 1:15 PM David G. Johnston
 wrote:
>
> It’s seems like the bug “session A can read and write to session B’s tables” 
> has gotten lost in this thread that pertains to something related but 
> different.  Solving the bug is not going to involve adding a new GUC.
I don't think it's worth putting this in a separate discussion. Now
everything is moving towards the fact that the superuser will be
prohibited from changing the temporary tables of other sessions. In
fact, he can't do it anyway (except for DROP command) - see [1]. But
now the error for INSERT, UPDATE, DELETE and SELECT commands may not
surface immediately due to errors in the code. The only question now
is whether superuser should be allowed to DROP these other temp
tables. Since opinions are divided, I suggested adding a GUC that will
only control this feature.

> If they specify the precise pg_temp schema to affect they likely didn’t make 
> a mistake.
Yep, I agree. However, the features of the postgres kernel do not
allow the superuser to correctly perform INSERT, UPDATE, DELETE,
SELECT operations, because temporary table's pages cannot be stored in
shared buffers.

> If instead we are discussing the owner of the temporary table there is 
> probably a discussion to be had and decision to be documented somewhere - 
> maybe that central place of testing being wished for.
As far as I understand, only superuser and table's owner (within
session) can access the temp tables of session. We want CRUD
operations to be performed only by the owner.

--
Best regards,
Daniil Davydov

[1] 
https://www.postgresql.org/message-id/CAJDiXgj6TBzn=6ezx7+9bna9hpbitbu+muv-n3mhen_zs3n...@mail.gmail.com




Re: Fwd: lwlocknames.h beautification attempt

2025-03-17 Thread Álvaro Herrera
On 2025-Mar-16, Gurjeet Singh wrote:

> (resending the email because it was held for moderation; replaced image
> attachment with a link, which might be the reason for being put in the
> moderation queue)

Kindly don't do this (specifically: cancel the original message and send
a different copy).  Just have patience with the moderation queue.
Messages are typically approved within a day anyway.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Unify a recently-added inconsisnt message

2025-03-17 Thread Kyotaro Horiguchi
Subject: Unify a recent-added inconsisnt message

Hello.

I noticed that the recent commit cd3c45125d2 introduced the following
three inconsistent messages at the same time:

pg_dump.c:
+   printf(_("  --no-policiesdo not dump row security 
policies\n"));

pg_dumpall.c:
+   printf(_("  --no-policiesdo not dump row security 
policies\n"));

pg_restore.c:
+   printf(_("  --no-policiesdo not restore row level 
security policies\n"));

The existing user-facing messages commonly use "row security".
Therefore, I believe the newly introduced message should follow the
same wording for consistency.

Please find the attached patch that updates the inconsistent message.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 01db29ea295ec4541a870ed2dbbcd63184f03bef Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 17 Mar 2025 11:54:58 +0900
Subject: [PATCH] Unify wording of user-facing "row security"

Row-level security is mostly referred to as "row security" in
user-facing messages. However, a recent commit (cd3c45125d2)
introduced the first instance of "row-level security" in such
messages, while the corresponding message added in that commit uses
"row security." This commit updates the wording for consistency but
does not change non-user-facing wording.
---
 src/bin/pg_dump/pg_restore.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c
index d947b2d2068..337e64a8a29 100644
--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -508,7 +508,7 @@ usage(const char *progname)
 	printf(_("  --no-datado not restore data\n"));
 	printf(_("  --no-data-for-failed-tables  do not restore data of tables that could not be\n"
 			 "   created\n"));
-	printf(_("  --no-policiesdo not restore row level security policies\n"));
+	printf(_("  --no-policiesdo not restore row security policies\n"));
 	printf(_("  --no-publicationsdo not restore publications\n"));
 	printf(_("  --no-schema  do not restore schema\n"));
 	printf(_("  --no-security-labels do not restore security labels\n"));
-- 
2.43.5



Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-03-17 Thread Shubham Khanna
On Fri, Mar 14, 2025 at 5:43 PM Nisha Moond  wrote:
>
> Hi Shubham,
>
> Here are a few comments for the v12 patch.
>
> doc/src/sgml/ref/pg_createsubscriber.sgml :
>
> 1.
> +  
> +   For all source server non-template databases create subscriptions for
> +   create subscriptions for databases with the same names on the
> +   target server.
>
> is “create subscriptions for” repeated? The sentence doesn’t make sense.
>

Fixed.

> 2. Typo
> +   switches. This option cannot be together with --all.
>
> /cannot be together/cannot be used together/
> ~~~
>

Fixed.

> 3. src/bin/pg_basebackup/pg_createsubscriber.c :
> + /* Establish a connection to the PostgreSQL server */
> + conn = connect_database(opt->pub_conninfo_str, true);
>
> I think comment will be more clear if say “ Establish a connection to
> the primary server */ or “source server”
> ~~~
>

Fixed.

> src/bin/pg_basebackup/t/042_all_option.pl :
>
> 4. As per Ashutosh's comment at [1], tests need to be added to verify
> logical replication behavior after using the --all option.
> 
>
> Please refer to the attached top-up fix patch, which includes the
> above changes along with some cosmetic fixes in the test file
> 042_all_option.pl.
>
> [1] 
> https://www.postgresql.org/message-id/CAExHW5uJHYAge99oS_iPfGWwZ_eCr2xFCNnifQkGs2GXeMQKGQ%40mail.gmail.com
>
> --

Fixed.

The attached patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.


v13-0001-Enhance-pg_createsubscriber-to-fetch-and-append-.patch
Description: Binary data


Re: Forbid to DROP temp tables of other sessions

2025-03-17 Thread Steven Niu

Hi,

I have some comments:

1. namespace.c, if relation->schemaname is pg_temp but myTempNamespace 
isn't set, the error information might be misleading. Consider checking 
OidIsValid(myTempNamespace) first.


2."you have not any temporary relations" --> "you have no any temporary 
relations"


3. Regarding to the code "strncmp(nspname, "pg_temp", 7)", is it ok when 
the nspname contains something like "pg_temp_1234"? I think we should 
use strcmp instead of strncmp for exact matching.


Thanks,
Steven

在 2025/3/17 17:03, Daniil Davydov 写道:

Hi,
I see that the presence of isolation tests in the patch is
controversial. First things first, let's concentrate on fixing the
bug.
I attach a new version of patch (for `master` branch) to this letter.
It contains better comments and a few small improvements.

P.S.
Sorry for bad formatting in previous letter (idk how to fix it in gmail client)

--
Best regards,
Daniil Davydov






Re: Not-terribly-safe checks for CRC intrinsic support

2025-03-17 Thread John Naylor
On Sat, Mar 15, 2025 at 6:04 AM Tom Lane  wrote:
>
> In short, I think we ought to apply and perhaps back-patch something
> like the attached.

Seems like reasonable defensive coding and consistency.

-/* return computed value, to prevent the above being optimized away */
+/* else this function could get optimized away altogether: */

-/* return computed value, to prevent the above being optimized away */
+/* return computed value, just to be extra sure this isn't
optimized away */

I'd be okay with keeping the original comment, though, since it seems
to be explaining the choice well enough.

> BTW, it looks to me like PGAC_AVX512_POPCNT_INTRINSICS is at similar
> hazard, but I'm not entirely sure how to fix that one.

"buf" is the variable there that we're loading from, so that would be
the one to make global.

-- 
John Naylor
Amazon Web Services




Re: Conflict detection for multiple_unique_conflicts in logical replication

2025-03-17 Thread Amit Kapila
On Thu, Mar 13, 2025 at 4:30 PM Nisha Moond  wrote:
>
> Attached is the v4 patch (test case changes only).
>

Comments:
=
1.
+ /*
+ * Report an INSERT_EXISTS or UPDATE_EXISTS conflict when only one unique
+ * constraint is violated.
+ */
+ if (conflicts == 1)
+ {
+ Oid uniqueidx;
+ RepOriginId origin;
+ TimestampTz committs;
+ TransactionId xmin;
+
+ uniqueidx = linitial_oid(conflictIndexes);
+ conflictslot = linitial(conflictSlots);
+
+ GetTupleTransactionInfo(conflictslot, &xmin, &origin, &committs);
+ ReportApplyConflict(estate, resultRelInfo, ERROR, type,
+ searchslot, conflictslot, remoteslot,
+ uniqueidx, xmin, origin, committs);
+ }
+
+ /*
+ * Report MULTIPLE_UNIQUE_CONFLICTS when two or more unique constraints
+ * are violated.
+ */
+ else if (conflicts > 1)
+ ReportMultipleUniqueConflict(estate, resultRelInfo, ERROR,
+ CT_MULTIPLE_UNIQUE_CONFLICTS,
+ searchslot, remoteslot,
+ conflictSlots, conflictIndexes);

It looks a bit odd to have different functions for one or multiple
conflicts. We can improve this coding pattern by extending the current
function ReportApplyConflict to report one or multiple conflicts
depending on the length of conflictSlots.

2. From the commit message: "Also, the patch adds a new column
'confl_multiple_unique_conflicts' in view pg_stat_subscription_stats
to support stats collection for this conflict type.". This part can be
split into a second patch. Let's try to get the core patch first.

-- 
With Regards,
Amit Kapila.




Re: Add Pipelining support in psql

2025-03-17 Thread Anthonin Bonnefoy
Here is a new patch set:

0001: This introduces the \sendpipeline meta-command and forbid \g in
a pipeline. This is to fix the formatting options of \g that are not
supported in a pipeline.

0002: Allows ';' to send a query using extended protocol when within a
pipeline by using PQsendQueryParams with 0 parameters. It is not
possible to send parameters with extended protocol this way and
everything will be propagated through the query string, similar to a
simple query.


v02-0001-psql-Create-new-sendpipeline-meta-command.patch
Description: Binary data


v02-0002-psql-Allow-to-add-queries-in-an-ongoing-pipeline.patch
Description: Binary data


Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2025-03-17 Thread Michael Paquier
On Mon, Mar 17, 2025 at 07:33:42AM +, Bykov Ivan wrote:
> See:
> https://www.postgresql.org/message-id/flat/5ac172e0b77a4baba50671cd1a15285f%40localhost.localdomain#6c43f354f5f42d2a27e6824faa660a86
> 
> Is it really worth spending extra execution time to increase entropy
> when we have non-NULL nodes?

The computed time is already quite cheap, so I'm not much worried
about that, FWIW.

> We could also add entropy if we see a change in the node->type value for
> non-NULL variants.

I am not sure to get this one.  The issue shows up if we have the same
Node computed successively as reported on this thread.  It would still
be a problem for different node types, though less likely, when two
nodes use with similar fields.
--
Michael


signature.asc
Description: PGP signature


Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-03-17 Thread vignesh C
On Mon, 17 Mar 2025 at 11:28, Shubham Khanna
 wrote:
>
> On Fri, Mar 14, 2025 at 5:43 PM Nisha Moond  wrote:
>
> Fixed.
>
> The attached patch contains the suggested changes.

I feel like we're trying to address two separate tasks in this thread:
a) Enhancing pg_createsubscriber to automatically retrieve databases
when none is provided. b) Refactoring all pg_createsubscriber tests.

I suggest we keep this thread focused solely on retrieving all
databases and start a new thread for test refactoring. This will allow
us to tackle each task separately and ensure a cleaner commit.

Regards,
Vignesh




Re: 64 bit numbers vs format strings

2025-03-17 Thread Thomas Munro
On Mon, Mar 17, 2025 at 8:09 PM Tom Lane  wrote:
> Peter Eisentraut  writes:
> > This is not really possible.  The  behavior is baked deeply into
> > the gettext code.  (Also note that you don't only need support in
> > xgettext, which is part of our build system, but also in the runtime
> > library, which we don't control.)
>
> Hmm, I find that comment fairly scary.  How do we know that the
> runtime library actually gets this right on every supported platform?
> It's surely not because we test it, because we do not.

I don't know too much about libintl and its history other than what
I've looked up for these discussions, but I can't find any other
implementations other than Sun's, GNU's and NetBSD's.  Sun/Oracle and
NetBSD went out of their way to understand these and other GNUisms.  I
am not sure if they should even be called "extensions"...  extensions
to what?  I guess the historical answer would have been "Sun's
version", but see below for a new development which raises
philosophical questions.

1.  Solaris -- the original implementation has special support for the
things GNU's added, and certainly covers this  stuff:

https://docs.oracle.com/cd/E36784_01/html/E39536/gnkbn.html#ILDEVgnosj

I just tried it out on a cfarm Solaris box (well I thought I already
knew this from an earlier round of discussions about this but wanted
to check again before replying and found my old test program still
there...).  Note the "" in the catalogue:

tmunro@s11-sparc:~/gettext-hacking$ uname -a
SunOS s11-sparc.cfarm 5.11 11.4.78.189.2 sun4v sparc sun4v logical-domain
tmunro@s11-sparc:~/gettext-hacking$ tail -5 locales/fr/LC_MESSAGES/messages.po

#: test.c:8
#, c-format
msgid "the answer is %\n"
msgstr "la réponse est %\n"
tmunro@s11-sparc:~/gettext-hacking$ cat test.c
#include 
#include 
#include 
#include 
#include 

#define GETTEXT_DOMAIN "messages"
#define GETTEXT_OUTPUT_DIR "locales"

int
main()
{
setenv("LANGUAGE", "fr", 1);
setlocale(LC_ALL, "fr_FR.UTF-8");
bindtextdomain(GETTEXT_DOMAIN, GETTEXT_OUTPUT_DIR);
textdomain(GETTEXT_DOMAIN);
printf(gettext("the answer is %" PRId64 "\n"), (int64_t) 42);
}
tmunro@s11-sparc:~/gettext-hacking$ gcc test.c
tmunro@s11-sparc:~/gettext-hacking$ ./a.out
la réponse est 42

You can also see that stuff in the illumos source tree:

https://github.com/illumos/illumos-gate/blob/master/usr/src/lib/libc/port/i18n/gettext_gnu.c

2.  NetBSD -- I haven't try it myself (I can send my test program if
you are interested) but it clearly knows about GNU's system-dependent
macros, and its stated goal was to be "based on the specifications
from GNU gettext":

https://wiki.netbsd.org/projects/project/libintl/
https://github.com/NetBSD/src/blob/trunk/lib/libintl/sysdep.c

What aspect of that might not work portably?  Are there any other
implementations I'm missing?  What standard would an implementation
follow, if it were to exist?

POSIX 2024 also finally standardised gettext() and associated tools.
I don't see these macros mentioned there (after an admittedly cursory
scan of the relevant sections), or for that matter any mention of the
portability problem they solve (perhaps I'll write in about that), but
it doesn't seem to make any sense to deprive ourselves of features
supported by all known implementations that solve a real problem, just
because a standard suddenly appeared retroactively rendering them
"extensions" in some sense.  I mean, GNU is clearly functioning as a
of de facto standard of very long standing, which I think the POSIX
discussion[1] acknowledged succinctly in the description field "POSIX
defines catgets() but most software rather uses gettext()".  I don't
think I've ever seen catgets() in several decades around C and Unix.

(Amusingly the GNU maintainer showed up to say (paraphrasing) "don't
do it", and (paraphrasing) "if you want to solve a problem that we
actually have why don't you add all the missing _l function so we can
write portable multithreaded programs".  Hear hear!)

[1] https://www.austingroupbugs.net/view.php?id=1122




Re: Forbid to DROP temp tables of other sessions

2025-03-17 Thread Steven Niu




在 2025/3/17 18:13, Daniil Davydov 写道:

Hi,

On Mon, Mar 17, 2025 at 4:48 PM Steven Niu  wrote:


1. namespace.c, if relation->schemaname is pg_temp but myTempNamespace
isn't set, the error information might be misleading. Consider checking
OidIsValid(myTempNamespace) first.

Could you please clarify exactly which place in the code we are
talking about? I think we handle this case in the
LookupExplicitNamespace call (with all appropriate error information).



I mean RangeVarGetRelidExtended(), you deleted the following code:

if (!OidIsValid(myTempNamespace))
   relId = InvalidOid; /* this probably can't happen? */




2."you have not any temporary relations" --> "you have no any temporary
relations"

I am not an English speaker, but it seems that "have not" would be
more correct. Someone has to judge us :)



3. Regarding to the code "strncmp(nspname, "pg_temp", 7)", is it ok when
the nspname contains something like "pg_temp_1234"? I think we should
use strcmp instead of strncmp for exact matching.

Great catch! I'll fix it. Please, see v3 patch.

--
Best regards,
Daniil Davydov






Re: Forbid to DROP temp tables of other sessions

2025-03-17 Thread Steven Niu




在 2025/3/17 18:56, Daniil Davydov 写道:

Hi,

On Mon, Mar 17, 2025 at 5:33 PM Steven Niu  wrote:


I mean RangeVarGetRelidExtended(), you deleted the following code:

if (!OidIsValid(myTempNamespace))
 relId = InvalidOid; /* this probably can't happen? */


Hm, I got it. Let's start with the fact that the comment "this
probably can't happen?" is incorrect. Even if we don't have a
temporary namespace in our session, we still can specify "pg_temp_N"
in the psql query.
Next, if relation->schemaname is pg_temp, then we firstly call
LookupExplicitNamespace, where you can find if
(!OidIsValid(myTempNamespace)) check.

--
Best regards,
Daniil Davydov


If the (relation->relpersistence == RELPERSISTENCE_TEMP) can ensure the 
myTempNamespace is always valid, then my comment can be ignored. 
Otherwise I think the modified RangeVarGetRelidExtended() should keep 
check of myTempNamespace, like this:


if (relation->relpersistence == RELPERSISTENCE_TEMP)
{
  Oid namespaceId;

 if (!OidIsValid(myTempNamespace))
 relId = InvalidOid; /* this probably can't happen? */
 else
 {
   if (relation->schemaname)
   {
 namespaceId = LookupExplicitNamespace(relation->schemaname, 
missing_ok);

 if (namespaceId != myTempNamespace)
 {
 ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("could not access temporary relations of other 
sessions")));

 }
  }
  else
  {
 namespaceId = myTempNamespace;
 Assert(OidIsValid(namespaceId));
  }
  if (missing_ok && !OidIsValid(namespaceId))
 relId = InvalidOid;
  else
 relId = get_relname_relid(relation->relname, namespaceId);
 }
...
...

Thanks,
Steven




Prune partitions by ScalarArrayOpExpr with an array parameter (partkey = ANY($1))

2025-03-17 Thread Andrei Lepikhov

Hi,

As I see, initial pruning doesn't work in the case when a 
ScalarArrayOpExpr contains a parameter as the RHS of the expression, 
like following:


partkey = ANY($1)

As colleagues say, it is quite typical to use stored procedures, pass an 
array of IDs as a parameter, and use it in a SELECT clause.


So, here I propose a patch that extends pruning machinery. It is nothing 
innovative or complicated, but I'm not sure it is fully operational so 
far: it may need some discussion, review and polishing.


I intended to add it to the next commitfest if this feature makes sense.

--
regards, Andrei Lepikhov
From f6daa91cd6630e85a0ba37bcba41ae06c4a1fa34 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" 
Date: Fri, 14 Mar 2025 12:51:42 +0100
Subject: [PATCH] Enhance partition pruning for an array parameter.

It is designed to prune partitions in case when incoming clause looks like
the following: 'partkey = ANY($1)'.

It seems quite a common case when the array is a parameter of a function.

Although the code is covered by tests the code should be carefully reviewed and
tested.
---
 src/backend/partitioning/partprune.c  |  71 +-
 src/test/regress/expected/inherit.out | 180 ++
 src/test/regress/sql/inherit.sql  |  64 +
 3 files changed, 311 insertions(+), 4 deletions(-)

diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 48a35f763e..622f0d230a 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -2179,6 +2179,9 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
 		List	   *elem_exprs,
    *elem_clauses;
 		ListCell   *lc1;
+		int			strategy;
+		Oid			lefttype,
+	righttype;
 
 		if (IsA(leftop, RelabelType))
 			leftop = ((RelabelType *) leftop)->arg;
@@ -2206,10 +2209,6 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
 			negator = get_negator(saop_op);
 			if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
 			{
-int			strategy;
-Oid			lefttype,
-			righttype;
-
 get_op_opfamily_properties(negator, partopfamily,
 		   false, &strategy,
 		   &lefttype, &righttype);
@@ -2219,6 +2218,12 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
 			else
 return PARTCLAUSE_NOMATCH;	/* no useful negator */
 		}
+		else
+		{
+			get_op_opfamily_properties(saop_op, partopfamily, false,
+	   &strategy, &lefttype,
+	   &righttype);
+		}
 
 		/*
 		 * Only allow strict operators.  This will guarantee nulls are
@@ -2365,6 +2370,64 @@ match_clause_to_partition_key(GeneratePruningStepsContext *context,
 			 */
 			elem_exprs = arrexpr->elements;
 		}
+		else if (IsA(rightop, Param))
+		{
+			Oidcmpfn;
+			PartClauseInfo *partclause;
+
+			if (righttype == part_scheme->partopcintype[partkeyidx])
+cmpfn = part_scheme->partsupfunc[partkeyidx].fn_oid;
+			else
+			{
+switch (part_scheme->strategy)
+{
+	/*
+	 * For range and list partitioning, we need the ordering
+	 * procedure with lefttype being the partition key's type,
+	 * and righttype the clause's operator's right type.
+	 */
+case PARTITION_STRATEGY_LIST:
+case PARTITION_STRATEGY_RANGE:
+	cmpfn =
+		get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+		  part_scheme->partopcintype[partkeyidx],
+		  righttype, BTORDER_PROC);
+	break;
+
+	/*
+	 * For hash partitioning, we need the hashing procedure
+	 * for the clause's type.
+	 */
+case PARTITION_STRATEGY_HASH:
+	cmpfn =
+		get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+		  righttype, righttype,
+		  HASHEXTENDED_PROC);
+	break;
+
+default:
+	elog(ERROR, "invalid partition strategy: %c",
+		 part_scheme->strategy);
+	cmpfn = InvalidOid; /* keep compiler quiet */
+	break;
+}
+
+if (!OidIsValid(cmpfn))
+	return PARTCLAUSE_NOMATCH;
+			}
+
+			partclause = (PartClauseInfo *) palloc(sizeof(PartClauseInfo));
+			partclause->keyno = partkeyidx;
+			partclause->opno = saop_op;
+			partclause->op_is_ne = false;
+			partclause->op_strategy = strategy;
+			partclause->expr = rightop;
+			partclause->cmpfn = cmpfn;
+
+			*pc = partclause;
+
+			return PARTCLAUSE_MATCH_CLAUSE;
+		}
 		else
 		{
 			/* Give up on any other clause types. */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index e671975a28..d5f3137b93 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -3823,3 +3823,183 @@ select * from tuplesest_tab join
 
 drop table tuplesest_parted;
 drop table tuplesest_tab;
+--
+-- Test the cases for partition pruning by an expression like:
+-- partkey = ANY($1)
+--
+CREATE TABLE array_prune (id int)
+PARTITION BY HASH(id);
+CREATE TABLE array_prune_t0
+ PARTITION OF array_prune FOR VALUES WITH (modulus 2, remainder 0

Re: Update Unicode data to Unicode 16.0.0

2025-03-17 Thread Tom Lane
Jeff Davis  writes:
> That was discussed a few times, but:

> (a) That doesn't exactly solve the problem, because people still need
> indexes on LOWER() or CASEFOLD(); and

> (b) If we change IMMUTABLE to mean "returns the same results on every
> platform for all time", that would be too strict for many purposes,
> like the planner doing constant folding.

Yeah.  Not only would the set of functions meeting such a standard be
vanishingly small, but so would the set of use-cases.  What we need is
some sort of understanding that "this is okay to use in indexes",
"this is okay to constant-fold when planning", etc.  Maybe it's less
about "is it okay to just assume this" and more about "can we devise
a method for figuring out when we have to reindex, replan, etc".
We've got bits of that in our collation versioning infrastructure,
but that doesn't cover every source of infrequently-mutating behavior.

> I have been thinking about ways we can express the right dependencies,
> and I may be making some proposals along those lines.

I await a proposal with interest.

regards, tom lane




Re: Add -k/--link option to pg_combinebackup

2025-03-17 Thread Robert Haas
On Thu, Mar 6, 2025 at 6:18 AM Israel Barth Rubio  wrote:
> > I'm happy with this now, so barring objections or complaints from CI,
> > I plan to commit it soon.
>
> Great, thank you!

"Soon" ended up being somewhat later than planned, because I've been
out of town, but done now.

Now, to see if the buildfarm explodes...

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add -k/--link option to pg_combinebackup

2025-03-17 Thread Nathan Bossart
On Mon, Mar 17, 2025 at 02:10:14PM -0400, Robert Haas wrote:
> Now, to see if the buildfarm explodes...

I think koel might complain.  pgindent on my machine yields the following:

diff --git a/src/bin/pg_combinebackup/copy_file.c 
b/src/bin/pg_combinebackup/copy_file.c
index 97ecda5a66d..b0c94f6ee31 100644
--- a/src/bin/pg_combinebackup/copy_file.c
+++ b/src/bin/pg_combinebackup/copy_file.c
@@ -72,9 +72,10 @@ copy_file(const char *src, const char *dst,
 }

 #ifdef WIN32
+
 /*
- * We have no specific switch to enable CopyFile on Windows, because
- * it's supported (as far as we know) on all Windows machines. So,
+ * We have no specific switch to enable CopyFile on Windows, because it's
+ * supported (as far as we know) on all Windows machines. So,
  * automatically enable it unless some other strategy was selected.
  */
 if (copy_method == COPY_METHOD_COPY)

-- 
nathan




Re: TOAST versus toast

2025-03-17 Thread Jan Wieck

On 3/17/25 00:24, Tom Lane wrote:

Note the lack of any upper case.  Shortly later we reverse-engineered
an acronym for it [2], with the winner being Tom Lockhart's

 The Oversized-Attribute Storage Technique


Which made it into an acronym. Acronyms are typically capitalized to 
distinguish them from ordinary words.



Best Regards, Jan




Re: lwlocknames.h beautification attempt

2025-03-17 Thread Robert Haas
On Mon, Mar 17, 2025 at 2:43 AM Álvaro Herrera  wrote:
> Tom didn't say he didn't like this change.  He said he didn't like a
> different change, which is not the one I committed.

Sorry, I should have read the emails more carefully. I missed the fact
that there were two different proposals. It was the idea of
right-aligning things that I was unhappy about.

So, no objection to what you actually committed... except that I don't
think that using % specifiers in SOME places in a format string is
better than using them in ALL the places. It's not broken because
$lockidx can't contain a % sign, but in general I think when we switch
from print to printf it's better for us to have the format string be a
constant so that it's clear that we can't accidentally get an extra %
escape in there depending on the values of variables being
interpolated.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: lwlocknames.h beautification attempt

2025-03-17 Thread Álvaro Herrera
On 2025-Mar-17, Robert Haas wrote:

> On Mon, Mar 17, 2025 at 2:43 AM Álvaro Herrera  
> wrote:
> > Tom didn't say he didn't like this change.  He said he didn't like a
> > different change, which is not the one I committed.
> 
> Sorry, I should have read the emails more carefully. I missed the fact
> that there were two different proposals. It was the idea of
> right-aligning things that I was unhappy about.

Ah, okay.

> So, no objection to what you actually committed... except that I don't
> think that using % specifiers in SOME places in a format string is
> better than using them in ALL the places. It's not broken because
> $lockidx can't contain a % sign, but in general I think when we switch
> from print to printf it's better for us to have the format string be a
> constant so that it's clear that we can't accidentally get an extra %
> escape in there depending on the values of variables being
> interpolated.

I suppose this is a reasonable complaint; however, I don't see an actual
problem here.  Even if I hack the regexp in generate-lwlocknames.pl to
accept a %-sign in the lock name (and introduce a matching % in
wait_event_names.txt), then that % is emitted verbatim rather than
attempted to further expand.  Is this because this is Perl rather than
C?  I'm not sure.

Note that a % in the lock number (which also needs a regexp hack) can't
cause a problem either, because of the check that the lock numbers are
an ordered sequence.

I think it's quite difficult to cause actual problems here.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: Unify a recently-added inconsisnt message

2025-03-17 Thread Tom Lane
Kyotaro Horiguchi  writes:
> I noticed that the recent commit cd3c45125d2 introduced the following
> three inconsistent messages at the same time:
> pg_dump.c:
> +   printf(_("  --no-policiesdo not dump row security 
> policies\n"));
> pg_dumpall.c:
> +   printf(_("  --no-policiesdo not dump row security 
> policies\n"));
> pg_restore.c:
> +   printf(_("  --no-policiesdo not restore row level 
> security policies\n"));

Ooops :-(.  I should have caught that during review.

> Please find the attached patch that updates the inconsistent message.

Pushed, thanks for noticing it.

regards, tom lane




Re: Some read stream improvements

2025-03-17 Thread Andres Freund
Hi,

On 2025-03-14 22:03:15 +1300, Thomas Munro wrote:
> I have pushed the new pin limit patches, after some more testing and
> copy editing. I dropped an unnecessary hunk (in read_stream_reset(), a
> change I'd like to make but it didn't belong in this commit) and
> dropped the word "Soft" from GetSoftPinLimit() as it wasn't helping
> comprehension and isn't even true for the local buffer version (which
> I still think might be an issue, will come back to that, but again it
> seemed independent).

Something odd:

I was looking to push a test that increases localbuf.c coverage, which passed
with a previous version of these changes.  However, it failed, and it did so
on FreeBSD alone.  The same test doesn't fail when tested together with the
rest of the AIO work.

https://cirrus-ci.com/build/5951090857869312
https://cirrus-ci.com/task/6177637229395968

I do not understand what could be OS dependent here. I tried to build with
exactly the same options as CI on freebsd, but couldn't repro the issue.

It's perhaps worth noting that this failed even before my recent localbuf:
commits.


I ran CI with a patch that adds a bunch of debug information and just runs the
relevant test:
https://cirrus-ci.com/task/6516935619248128

2025-03-17 16:19:14.270 UTC client backend[5526] pg_regress/temp LOG:  
statement: SELECT count(*), max(a) max_a, min(a) min_a, max(cnt) max_cnt FROM 
test_temp;
2025-03-17 16:19:14.271 UTC client backend[5526] pg_regress/temp DEBUG:  
NlocalPinnedBuffers=98++
2025-03-17 16:19:14.271 UTC client backend[5526] pg_regress/temp DEBUG:  
NlocalPinnedBuffers=99--
2025-03-17 16:19:14.271 UTC client backend[5526] pg_regress/temp DEBUG:  pgsr 
create 0xde34f1f57f8: test_temp, max_pinned: 100, NLocPin: 98
2025-03-17 16:19:14.271 UTC client backend[5526] pg_regress/temp DEBUG:  pgsr 
0xde34f1f57f8: buffer_limit: 2, pinned_buffers: 0, max_pinned: 100, ios_i_p: 0, 
distance: 1, pending_read_nblocks: 1, NLocPin: 98

comparing that with a local run:

2025-03-17 12:18:55.989 EDT client backend[4117093] pg_regress/temp LOG:  
statement: SELECT count(*), max(a) max_a, min(a) min_a, max(cnt) max_cnt FROM 
test_temp;
2025-03-17 12:18:55.989 EDT client backend[4117093] pg_regress/temp DEBUG:  
pgsr create 0x56029555cad8: test_temp, max_pinned: 100, NLocPin: 97
2025-03-17 12:18:55.989 EDT client backend[4117093] pg_regress/temp DEBUG:  
pgsr 0x56029555cad8: buffer_limit: 3, pinned_buffers: 0, max_pinned: 100, 
ios_i_p: 0, distance: 1, pending_read_nblocks: 1, NLocPin: 97
2025-03-17 12:18:55.989 EDT client backend[4117093] pg_regress/temp DEBUG:  
pgsr 0x56029555cad8: StartReadBuffers: wait: 0, pinned: 0 += 1, distance: 1
2025-03-17 12:18:55.989 EDT client backend[4117093] pg_regress/temp DEBUG:  
pgsr 0x56029555cad8: buffer_limit: 3, pinned_buffers: 0, max_pinned: 100, 
ios_i_p: 0, distance: 1, pending_read_nblocks: 1, NLocPin: 97
2025-03-17 12:18:55.989 EDT client backend[4117093] pg_regress/temp DEBUG:  
pgsr 0x56029555cad8: StartReadBuffers: wait: 0, pinned: 0 += 1, distance: 1

So one thing is that the pin count differs by 1 at the start of the scan. No
idea why.


But knowing that I was able to repro the issue locally too, by just ensuring
the pin count is one higher by the start of the query.


I still don't know what drives the difference between freebsd and the rest,
but IIUC the reason this fails is just that we are holding too many buffers
pinned, due to some buffers being pinned outside of read_stream.c.


Greetings,

Andres Freund




RE: AIX support

2025-03-17 Thread Srirama Kucherlapati
Hi Team,
Here are the updates on the meson on AIX.  Our team had pushed the fixes meson 
github here …

https://github.com/mesonbuild/meson/pull/14335

#14335 Enhance AIX shared library build to use an export 
List.





Re: Prune partitions by ScalarArrayOpExpr with an array parameter (partkey = ANY($1))

2025-03-17 Thread Pavel Stehule
po 17. 3. 2025 v 14:28 odesílatel Andrei Lepikhov 
napsal:

> Hi,
>
> As I see, initial pruning doesn't work in the case when a
> ScalarArrayOpExpr contains a parameter as the RHS of the expression,
> like following:
>
> partkey = ANY($1)
>
> As colleagues say, it is quite typical to use stored procedures, pass an
> array of IDs as a parameter, and use it in a SELECT clause.
>
> So, here I propose a patch that extends pruning machinery. It is nothing
> innovative or complicated, but I'm not sure it is fully operational so
> far: it may need some discussion, review and polishing.
>
> I intended to add it to the next commitfest if this feature makes sense.
>

+1

Pavel


> --
> regards, Andrei Lepikhov
>


Re: Skip collecting decoded changes of already-aborted transactions

2025-03-17 Thread Masahiko Sawada
On Thu, Mar 13, 2025 at 10:04 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear hackers,
>
> I hope I'm in the correct thread. In abfb296, rbtxn_skip_prepared() was 
> removed from
> SnapBuildDistributeNewCatalogSnapshot(). ISTM it was an only caller of the 
> function.
>
> Is it an intentional for external projects? Or it can be removed like 
> attached?

I think we can keep it as all RBTXN_xxx flags have the corresponding
macro and the comments of these macros somewhat help understand what
the flag indicates.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Forbid to DROP temp tables of other sessions

2025-03-17 Thread Daniil Davydov
Hi,

On Mon, Mar 17, 2025 at 8:29 PM Steven Niu  wrote:
>
> If the (relation->relpersistence == RELPERSISTENCE_TEMP) can ensure the
> myTempNamespace is always valid, then my comment can be ignored.

No, even if we see a temporary table in RangeVarGetRelidExtended,
MyTempNamespace still can be `InvalidOid` (I mentioned it in the
previous letter).
Thus, comment like "this probably can't happen?" should be removed.

> Otherwise I think the modified RangeVarGetRelidExtended() should keep
> check of myTempNamespace, like this:
>
> if (relation->relpersistence == RELPERSISTENCE_TEMP)
> {
>Oid namespaceId;
>
>   if (!OidIsValid(myTempNamespace))
>   relId = InvalidOid; /* this probably can't happen? */
>  ...

OK, let's assume that MyTempNamespace == InvalidOid and caller
specified "pg_temp_N" in his query. In this case we want to throw an
error, because access to the other temp tables is prohibited.
If we keep code like "if (!OidIsValid(myTempNamespace)) => relId =
InvalidOid", eventually the caller will get an error "relation ...
does not exist".
Yes, we have saved the caller from making mistakes, but we are giving
the wrong error message.

In my realization the caller will get the correct error message, and I
still think that we should keep v3 patch logic.

--
Best regards,
Daniil Davydov




Re: Make COPY format extendable: Extract COPY TO format implementations

2025-03-17 Thread Masahiko Sawada
On Tue, Mar 4, 2025 at 4:06 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Mon, 3 Mar 2025 11:06:39 -0800,
>   Masahiko Sawada  wrote:
>
> > I agree with the fix and the patch looks good to me. I've updated the
> > commit message and am going to push, barring any objections.
>
> Thanks!
>
> I've rebased the patch set. Here is a summary again:

Thank you for updating the patches. Here are some review comments on
the 0001 patch:

+   if (strcmp(format, "text") == 0)
+   {
+   /* "csv_mode == false && binary == false" means "text" */
+   return;
+   }
+   else if (strcmp(format, "csv") == 0)
+   {
+   opts_out->csv_mode = true;
+   return;
+   }
+   else if (strcmp(format, "binary") == 0)
+   {
+   opts_out->binary = true;
+   return;
+   }
+
+   /* custom format */
+   if (!is_from)
+   {
+   funcargtypes[0] = INTERNALOID;
+   handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+   funcargtypes, true);
+   }
+   if (!OidIsValid(handlerOid))
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("COPY format \"%s\" not recognized", format),
+parser_errposition(pstate, defel->location)));

I think that built-in formats also need to have their handler
functions. This seems to be a conventional way for customizable
features such as tablesample and access methods, and we can simplify
this function.

---
I think we need to update the documentation to describe how users can
define the handler functions and what each callback function is
responsible for.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Re: proposal: schema variables

2025-03-17 Thread Marcos Pegoraro
Em seg., 17 de mar. de 2025 às 15:33, Pavel Stehule 
escreveu:

> I was asked for sending a reduced patchset
>

Would be good to explain what this reduced patchset is.
Complete patch contains this and that
Reduced patch contains only this.

Regards
Marcos


Re: AIO v2.5

2025-03-17 Thread Melanie Plageman
On Fri, Mar 14, 2025 at 3:43 PM Andres Freund  wrote:
>
> Open items:
>
> - Right now effective_io_concurrency cannot be set > 0 on Windows and other
>   platforms that lack posix_fadvise. But with AIO we can read ahead without
>   posix_fadvise().
>
>   It'd not really make anything worse than today to not remove the limit, but
>   it'd be pretty weird to prevent windows etc from benefiting from AIO.  Need
>   to look around and see whether it would require anything other than doc
>   changes.

I've attached a patch that removes the limit for
effective_io_concurrency and maintenance_io_concurrency. I tested both
GUCs with fadvise manually disabled on my system and I think it is
working for those read stream users I tried (vacuum and BHS).

I checked around to make sure no one was using only the value of the
guc to guard prefetches, and it seems like we're safe.

The one thing I am wondering about with the docs is whether or not we
need to make it more clear that only a subset of the "simultaneous
I/O" behavior controlled by eic/mic is available if your system
doesn't have fadvise. I tried to do that a bit, but I avoided getting
into too many details.

- Melanie
From 398282293c38f9dfb4f03839a4a7962887c9c0c9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 17 Mar 2025 15:17:21 -0400
Subject: [PATCH v1] Enable IO concurrency on all systems

Previously effective_io_concurrency and maintenance_io_concurrency could
not be set above 0 on machines without fadvise support. AIO enables
IO concurrency without such support.

Currently only subsystems using the read stream API will take advantage
of this. Other users of maintenance_io_concurrency (like recovery
prefetching) which leverage OS advice directly will not benefit from
this change. In those cases, maintenance_io_concurrency will have no
effect on I/O behavior.
---
 doc/src/sgml/config.sgml  | 15 +-
 doc/src/sgml/ref/alter_tablespace.sgml|  2 +-
 doc/src/sgml/ref/create_tablespace.sgml   |  2 +-
 src/backend/access/common/reloptions.c|  8 -
 src/backend/commands/variable.c   | 30 ---
 src/backend/utils/misc/guc_tables.c   |  4 +--
 src/backend/utils/misc/postgresql.conf.sample |  5 ++--
 src/bin/initdb/initdb.c   |  5 
 src/include/storage/bufmgr.h  |  6 
 src/include/utils/guc_hooks.h |  4 ---
 10 files changed, 14 insertions(+), 67 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 69638d7c1f9..2a7587de320 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2581,8 +2581,7 @@ include_dir 'conf.d'
  session attempts to initiate in parallel.  The allowed range is
  1 to 1000, or
  0 to disable issuance of asynchronous I/O requests.
- The default is 16 on supported systems, otherwise
- 0.
+ The default is 16.
 
 
 
@@ -2593,8 +2592,8 @@ include_dir 'conf.d'
 
 
 
- On systems without prefetch advice support, attempting to configure
- any value other than 0 will error out.
+ On systems with prefetch advice support,
+ effective_io_concurrency also controls the prefetch distance.
 
 
 
@@ -2617,10 +2616,10 @@ include_dir 'conf.d'
  for maintenance work that is done on behalf of many client sessions.
 
 
- The default is 10 on supported systems, otherwise 0.  This value can
- be overridden for tables in a particular tablespace by setting the
- tablespace parameter of the same name (see
- ).
+ The default is 10.  This value can be overridden
+ for tables in a particular tablespace by setting the tablespace
+ parameter of the same name (see ).
 

   
diff --git a/doc/src/sgml/ref/alter_tablespace.sgml b/doc/src/sgml/ref/alter_tablespace.sgml
index 6ec863400d1..d0e08089ddb 100644
--- a/doc/src/sgml/ref/alter_tablespace.sgml
+++ b/doc/src/sgml/ref/alter_tablespace.sgml
@@ -88,7 +88,7 @@ ALTER TABLESPACE name RESET ( ,
   ,
diff --git a/doc/src/sgml/ref/create_tablespace.sgml b/doc/src/sgml/ref/create_tablespace.sgml
index 9d5ab025261..b77e774c53f 100644
--- a/doc/src/sgml/ref/create_tablespace.sgml
+++ b/doc/src/sgml/ref/create_tablespace.sgml
@@ -110,7 +110,7 @@ CREATE TABLESPACE tablespace_name
 and maintenance_io_concurrency.
 Setting these values for a particular tablespace will override the
 planner's usual estimate of the cost of reading pages from tables in
-that tablespace, and the executor's prefetching behavior, as established
+that tablespace, and how many concurrent I/Os are issued, as established
 by the configuration parameters of the
 same name (see ,
 ,
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 59

Re: Add -k/--link option to pg_combinebackup

2025-03-17 Thread Robert Haas
On Mon, Mar 17, 2025 at 2:17 PM Nathan Bossart  wrote:
> I think koel might complain.  pgindent on my machine yields the following:

Indeed, it did. I still think it's a mistake to have testing for this
in the buildfarm and not in 'meson test' or CI.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: vacuumdb changes for stats import/export

2025-03-17 Thread Nathan Bossart
On Fri, Mar 14, 2025 at 02:05:30PM -0500, Nathan Bossart wrote:
> If no feedback or objections materialize, I'm planning to commit these
> early next week.

While preparing this for commit, I noticed that the expression index part
of the query was disregarding attstattarget.  To fix, I've modified that
part to look at the index's pg_attribute entries.

-- 
nathan
>From b242f4376433c252291850f3b899f6cb0a6b24ad Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 17 Mar 2025 16:11:19 -0500
Subject: [PATCH v7 1/3] vacuumdb: Teach vacuum_one_database() to reuse query
 results.

Presently, each call to vacuum_one_database() performs a catalog
query to retrieve the list of tables to process.  A proposed
follow-up commit would add a "missing only" feature to
--analyze-in-stages, which requires us to save the results of the
catalog query (since tables without statistics would have them
after the first stage).  This commit adds this ability via a new
parameter for vacuum_one_database() that specifies either a
previously-retrieved list to process or a place to return the
results of the catalog query.  Note that this commit does not make
use of this new parameter for anything.  That is left for a
follow-up commit.

Author: Corey Huinker 
Co-authored-by: Nathan Bossart 
Reviewed-by: John Naylor 
Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
---
 src/bin/scripts/vacuumdb.c | 330 ++---
 1 file changed, 194 insertions(+), 136 deletions(-)

diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 982bf070be6..e28f82a0eba 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -62,10 +62,16 @@ typedef enum
 
 static VacObjFilter objfilter = OBJFILTER_NONE;
 
+static SimpleStringList *retrieve_objects(PGconn *conn,
+   
  vacuumingOptions *vacopts,
+   
  SimpleStringList *objects,
+   
  bool echo);
+
 static void vacuum_one_database(ConnParams *cparams,

vacuumingOptions *vacopts,
int stage,

SimpleStringList *objects,
+   
SimpleStringList **found_objs,
int 
concurrentCons,
const char 
*progname, bool echo, bool quiet);
 
@@ -405,7 +411,7 @@ main(int argc, char *argv[])
{
vacuum_one_database(&cparams, &vacopts,
stage,
-   
&objects,
+   
&objects, NULL,

concurrentCons,

progname, echo, quiet);
}
@@ -413,7 +419,7 @@ main(int argc, char *argv[])
else
vacuum_one_database(&cparams, &vacopts,

ANALYZE_NO_STAGE,
-   &objects,
+   &objects, NULL,
concurrentCons,
progname, echo, 
quiet);
}
@@ -461,8 +467,36 @@ escape_quotes(const char *src)
 /*
  * vacuum_one_database
  *
- * Process tables in the given database.  If the 'objects' list is empty,
- * process all tables in the database.
+ * Process tables in the given database.
+ *
+ * There are two ways to specify the list of objects to process:
+ *
+ * 1) The "found_objs" parameter is a double pointer to a fully qualified list
+ *of objects to process, as returned by a previous call to
+ *vacuum_one_database().
+ *
+ * a) If both "found_objs" (the double pointer) and "*found_objs" (the
+ *once-dereferenced double pointer) are not NULL, this list takes
+ *priority, and anything specified in "objects" is ignored.
+ *
+ * b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
+ *(the once-dereferenced double pointer) _is_ NULL, the "objects"
+ *parameter takes priority, and the results of the catalog query
+ *described in (2) are stored in "found_objs".
+ *
+ * c) If "found_objs" (the double pointer) is NULL, the "objects"
+ *parameter again

Re: vacuumdb changes for stats import/export

2025-03-17 Thread Corey Huinker
>
>
> While preparing this for commit, I noticed that the expression index part
> of the query was disregarding attstattarget.  To fix, I've modified that
> part to look at the index's pg_attribute entries.
>

+1, should have been there all along.


Re: support fix query_id for temp table

2025-03-17 Thread Christoph Berg
Re: Michael Paquier
> On Thu, Feb 01, 2024 at 07:37:32AM +, ma lz wrote:
> > session 1:
> > create temp table ttt ( a int );
> > insert into ttt values(3); -- query_id is XXX  from 
> > pg_stat_activity
> > 
> > session 2:
> > create temp table ttt ( a int );
> > insert into ttt values(3);-- query_id is YYY  from 
> > pg_stat_activity
> > 
> > I know temp table has different oid, so query_id is different, is
> > there a way to use table name for temp table  instead of oid?
> 
> The CREATE TABLE statements have indeed the same query ID (in 16~),
> and the inserts have a different one as they touch different schemas
> and relations.  That's quite an old problem, that depends on the
> RangeVar attached to an InsertStmt.  I don't quite see a way to
> directly handle that except by using a custom implementation in query
> jumbling for this node and its RangeVar, so there is no "easy" way to
> tackle that :/

A customer reported that pg_stat_statements is not useful for them
because they are seeing 160k different query ids in 6-8 hours. They
also proposed to use the temp table name for query jumbling and wrote
a patch for it, which I would also see as the obvious solution to the
problem.

Here's that patch with regression tests added. I would think changing
this would be a big usability improvement for anyone using temp tables
a lot.

There does not seem to be a performance impact - all test were run
with pg_stat_statements active:

Standard pgbench -S (-s 10):
without patch: tps = 154155.407337 (without initial connection time)
   with patch: tps = 154223.966534 (without initial connection time)

pgbench -S on temp tables where each table has just one record:
without patch: tps = 184430.801954 (without initial connection time)
   with patch: tps = 185692.602764 (without initial connection time)

Christoph
>From c50dbb614f5e7696cb687aa156eb4149dcdb231d Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Mon, 17 Mar 2025 17:17:17 +0100
Subject: [PATCH v1] Jumble temp tables by name

Query jumbling considers everything by oid, which is fine for regular
objects. But for temp tables, which have to be recreated in each session
(or even transaction), this means that the same temp table query run
from the next session will never be aggregated in pg_stat_statements.
Instead, the statistics are polluted with a large number of 1-call
entries.

Fix by using the temp table name instead. This has the risk of
aggregating structurally different temp tables together if they same the
same name, but practically, the queries will likely differ in other
details anyway. And even if not, aggregating the entries in
pg_stat_statements instead of polluting the stats seems the better
choice. (The user has still the option to simply change the name of the
temp table to have the queries separated. In the old scheme, the user
does not have any chance to change behavior.)
---
 .../pg_stat_statements/expected/select.out| 31 
 contrib/pg_stat_statements/sql/select.sql | 12 +++
 src/backend/nodes/queryjumblefuncs.c  | 35 +++
 src/include/nodes/parsenodes.h|  2 +-
 4 files changed, 79 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 37a30af034a..8a7e237298c 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -241,6 +241,37 @@ SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C";
 (12 rows)
 
 DROP TABLE pgss_a, pgss_b CASCADE;
+-- temp tables
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+BEGIN;
+  CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP;
+  SELECT * FROM temp_t;
+ id 
+
+(0 rows)
+
+COMMIT;
+BEGIN;
+  CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP;
+  SELECT * FROM temp_t;
+ id 
+
+(0 rows)
+
+COMMIT;
+SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"; -- one query with two calls
+ calls | query  
+---+
+ 2 | SELECT * FROM temp_t
+ 0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"
+ 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
+(3 rows)
+
 --
 -- access to pg_stat_statements_info view
 --
diff --git a/contrib/pg_stat_statements/sql/select.sql b/contrib/pg_stat_statements/sql/select.sql
index e0be58d5e24..81b9d50ecec 100644
--- a/contrib/pg_stat_statements/sql/select.sql
+++ b/contrib/pg_stat_statements/sql/select.sql
@@ -90,6 +90,18 @@ SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C";
 
 DROP TABLE pgss_a, pgss_b CASCADE;
 
+-- temp tables
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+BEGIN;
+  CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP;
+  SELECT * FROM temp_t;

Re: pgsql: pg_upgrade: Preserve default char signedness value from old clus

2025-03-17 Thread Masahiko Sawada
On Mon, Mar 17, 2025 at 10:20 AM Robert Haas  wrote:
>
> On Fri, Feb 21, 2025 at 1:20 PM Masahiko Sawada  
> wrote:
> > pg_upgrade: Preserve default char signedness value from old cluster.
>
> Hi,
>
> I noticed that after running 'meson test --suite setup --suite
> pg_upgrade', the file delete_old_cluster.sh is left behind in the
> source directory, which should not happen. Everything created for the
> tests should be created in the meson directories. I traced the problem
> down to 005_char_signedness.pl. I believe the problem is likely that
> other pg_upgrade TAP tests include this locution, whereas
> 005_char_signedness.pl does not:
>
> # In a VPATH build, we'll be started in the source directory, but we want
> # to run pg_upgrade in the build directory so that any files generated finish
> # in it, like delete_old_cluster.{sh,bat}.
> chdir ${PostgreSQL::Test::Utils::tmp_check};

Thank you for the report.

I've confirmed the issue and attached a patch to fix it.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


0001-Fix-the-test-005_char_signedness.patch
Description: Binary data


Re: Separate GUC for replication origins

2025-03-17 Thread Euler Taveira
On Fri, Mar 14, 2025, at 5:48 AM, vignesh C wrote:
> 1) Should we add a test case to verify that if
> max_active_replication_origins is set to -1, it will use
> max_replication_slots value:

I don't think so. I added the following assert to test exactly this condition.

if (max_active_replication_origins == -1)   /* failed to apply it? */
SetConfigOption("max_active_replication_origins", buf,
PGC_POSTMASTER, PGC_S_OVERRIDE);
}
Assert(max_active_replication_origins != -1);

> 2) Should we consider about the origin's created by user using the
> replication managment function at [1] or is it intentionally left out:
> - linkend="guc-max-replication-slots-subscriber">max_replication_slots
> + linkend="guc-max-active-replication-origins">max_active_replication_origins
>  must be set to at least the number of subscriptions that will be added to
>  the subscriber, plus some reserve for table synchronization.

I kept a minimal patch. The current sentence is vague because (a) it doesn't
refer to replication solutions that don't create subscription (as you said) and
(b) it doesn't specify the maximum number of replication origins that are
simultaneously used for table synchronization.

We can certainly expand the replication origin documentation but I don't think
it is material for this patch. I also don't think it is appropriate to expose
implementation details about table synchronization in a end user page. If we
can explain it without saying too much about the implementation details, fine.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Separate GUC for replication origins

2025-03-17 Thread Masahiko Sawada
On Thu, Mar 13, 2025 at 5:55 PM Euler Taveira  wrote:
>
> On Thu, Mar 13, 2025, at 11:10 AM, vignesh C wrote:
>
> Few comments:
>
>
> Thanks for taking a look.
>
> 1) After selecting max_active_replication_origins setting in the
> SELECT query having order by, the first record returned will be the
> one with max_active_replication_origins, rather than the second
> record, because max_active_replication_origins appears before
> max_logical_replication_workers in the order.
>
>
> Fixed.
>
> 2) I felt max_replication_slots must be max_active_replication_origins
> here in logical-replication.sgml:
>
>
> Fixed.

Thank you for updating the patch. I have one comment:

 #max_logical_replication_workers = 4   # taken from max_worker_processes
# (change requires restart)
+#max_active_replication_origins = 10   # maximum number of active
replication origins
+   # (change requires restart)
 #max_sync_workers_per_subscription = 2 # taken from
max_logical_replication_workers
 #max_parallel_apply_workers_per_subscription = 2   # taken from
max_logical_replication_workers

I would suggest putting the new max_active_replication_origins after
max_parallel_apply_workers_per_subscription as both
max_sync_workers_per_subscription and
max_parallel_apply_workers_per_subscription are related to
max_logical_replication_workers.

The rest looks good to me.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Add semi-join pushdown to postgres_fdw

2025-03-17 Thread Alexander Korotkov
Hi, Robins!

On Tue, Mar 18, 2025 at 2:20 AM Robins Tharakan  wrote:
> On Mon, 4 Dec 2023 at 07:22, Alexander Korotkov  wrote:
> >
> >
> > Now, I think this looks good.  I'm going to push this if no objections.
>
> After this commit, I began seeing an unexpected ERROR - see this bug-report.
> https://www.postgresql.org/message-id/18852-fb75b88160678f78%40postgresql.org

Thank you for pointing.
I'll check this in the next couple of days.

--
Regards,
Alexander Korotkov
Supabase




Optimize truncation logic to reduce AccessExclusive lock impact

2025-03-17 Thread Stepan Neretin
Hi hackers,

We've encountered an issue where autovacuum holds an AccessExclusive lock
for an extended period when attempting to truncate heap tables. The root
cause appears to be the way autovacuum handles block truncation,
specifically when scanning shared buffers during smgrtruncate. This issue
is particularly painful for large shared buffer configurations (e.g., 1.8
TiB with NUMA and multiple workers), and pg_statistic is among the most
affected tables.

When autovacuum detects freeable blocks at the tail of a relation, it
attempts to truncate them. It acquires an AccessExclusive lock and scans
all shared buffers to remove the corresponding blocks. With a large shared
buffer configuration (e.g., 1.8 TiB, 200 million blocks, and multiple NUMA
nodes), this process is slow and significantly impacts performance.

We propose modifying the truncation condition in should_attempt_truncation
to avoid unnecessary full buffer scans. The new formula ensures we only
attempt truncation when we can free at least 3% of the relation size + 2
pages. This change prevents excessive truncation attempts on small tables
while reducing the impact on larger relations. Previously, small tables
could theoretically be truncated to a single page, but with this change,
they may remain around 3 pages instead. We don't see this as a significant
issue.

The idea for this patch was proposed by Yuriy Sokolov.

There is also an open discussion about optimizing the shared buffer
traversal to minimize the impact of AccessExclusive locks. This patch is a
step toward improving the situation.

The patch is attached to this message.

Best regards, Stepan Neretin.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 159b822740ab32989d8e5d4da71a61c3450ec0f3..d8211470fc34cac00b3da2d8c1bd786ba9ed1e08 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -67,8 +67,8 @@
  * REL_TRUNCATE_MINIMUM or (relsize / REL_TRUNCATE_FRACTION) (whichever
  * is less) potentially-freeable pages.
  */
-#define REL_TRUNCATE_MINIMUM	1000
-#define REL_TRUNCATE_FRACTION	16
+#define REL_TRUNCATE_MINIMUM	2
+#define REL_TRUNCATE_FRACTION	32
 
 /*
  * Timing parameters for truncate locking heuristics.
@@ -3089,9 +3089,7 @@ should_attempt_truncation(LVRelState *vacrel)
 		return false;
 
 	possibly_freeable = vacrel->rel_pages - vacrel->nonempty_pages;
-	if (possibly_freeable > 0 &&
-		(possibly_freeable >= REL_TRUNCATE_MINIMUM ||
-		 possibly_freeable >= vacrel->rel_pages / REL_TRUNCATE_FRACTION))
+	if (possibly_freeable >= vacrel->rel_pages / REL_TRUNCATE_FRACTION + REL_TRUNCATE_MINIMUM)
 		return true;
 
 	return false;


Re: pgsql: Avoid invalidating all RelationSyncCache entries on publication

2025-03-17 Thread Amit Kapila
On Mon, Mar 17, 2025 at 9:35 PM Robert Haas  wrote:
>
> On Thu, Mar 13, 2025 at 12:00 AM Amit Kapila  wrote:
> > Avoid invalidating all RelationSyncCache entries on publication rename.
> >
> > On Publication rename, we need to only invalidate the RelationSyncCache
> > entries corresponding to relations that are part of the publication being
> > renamed.
> >
> > As part of this patch, we introduce a new invalidation message to
> > invalidate the cache maintained by the logical decoding output plugin. We
> > can't use existing relcache invalidation for this purpose, as that would
> > unnecessarily cause relcache invalidations in other backends.
>
> This seems like too much infrastructure for a niche optimization. If
> there are plans to use this new invalidation message type to optimize
> a bunch of other cases, then maybe it's worth it, but adding this just
> to cover the presumably-rare case of ALTER PUBLICATION .. RENAME
> doesn't seem worth it to me.
>

This commit will be helpful for many other existing commands like:
CREATE PUBLICATION ...
ALTER PUBLICATION name SET (publication_parameter)
ALTER PUBLICATION name OWNER ...
DROP PUBLICATION ...

Before this, commit any publication command that modified
pg_publication catalog use to invalidate the entire RelationSyncCache.
The code that led to entire RelationSyncCache invalidation was removed
in this commit:
publication_invalidation_cb(Datum arg, int cacheid, uint32 hashvalue)
 {
publications_valid = false;
-
-   /*
-* Also invalidate per-relation cache so that next time the filtering info
-* is checked it will be updated with the new publication settings.
-*/
-   rel_sync_cache_publication_cb(arg, cacheid, hashvalue);
 }

We can't remove this without having a solution for ALTER PUBLICATION
.. RENAME command to invalidate the specific RelationSyncCache
entries. As mentioned in the commit message, the other idea was to
register a relcache invalidation for all relations that are part of a
publication which we felt could be harmful to other backends that are
not even involved in decoding. Now, the ALTER PUBLICATION name OWNER
command doesn't need to use this new invalidation at this stage
because it doesn't impact the RelationSyncCache entries but it
benefits from not requiring to invalidate the entire cache. Similarly,
the other commands mentioned above uses relcache invalidation for this
purpose but the difference is that the other commands do need relcache
invalidation for the purpose of correctness whereas ALTER PUBLICATION
.. RENAME command doesn't need it.

In the future, we can use this invalidation in existing publication
commands like altering a publication where we only publish 'INSERT'
(and or 'TRUNCATE') and change other publication properties like
'publish_via_partition_root', 'publish_generated_columns'. Similarly,
I think we can even use new invalidation for SET/ADD/DROP variants of
Alter PUBLICATION where the publication publishes only 'INSERT' and or
'TRUNCATE'. Before this commit, we didn't have a better way so we used
relcache invalidations for these cases as well. Also, any new
publication property or command should prefer to use this new
invalidation instead of using a relcache invalidation unless required
the same for correctness.

The impact of not doing any solution for this problem (aka removing
the above code in publication_invalidation_cb) could be magnified in
future commits where we are trying to solve a long-standing data loss
issue in logical decoding/replication via distributing invalidations
to all in-progress transactions [1]. You can read the commit message
to understand that problem and solution.

I think I should have mentioned the future use and other improvements
of this work in the commit message.

[1] - 
https://www.postgresql.org/message-id/OSCPR01MB14966DDB92FA7DA8FA8658216F5DE2%40OSCPR01MB14966.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.




Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-03-17 Thread David G. Johnston
On Monday, March 17, 2025, Shubham Khanna 
wrote:

>
> I have added validation for "all" to address both issues at once.
>
>
Usage needs to be changed to refer to object types and we should try and
specify which are valid there too.

The sgml docs need to mention “all” as a valid value and what it means
(namely, if new object types are added and “all” is specified those new
types will be removed as well).  Suggest not using it in scripts.

There are still more places where “object type” is needed instead of just
“object”.  I’ll pin-point or fix tomorrow if you don’t beat me to them.

It would be good if we could get this to play nicely with —dry-run; maybe
connecting to the source for the queries instead of the target.  That would
help alleviate my issue with the current auto-drop behavior.

David J.


Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-03-17 Thread Amit Kapila
On Tue, Mar 18, 2025 at 12:07 PM David G. Johnston
 wrote:
>
> On Monday, March 17, 2025, Shubham Khanna  wrote:
>>
>>
>> I have added validation for "all" to address both issues at once.
>>
>
> Usage needs to be changed to refer to object types and we should try and 
> specify which are valid there too.
>
> The sgml docs need to mention “all” as a valid value and what it means 
> (namely, if new object types are added and “all” is specified those new types 
> will be removed as well).  Suggest not using it in scripts.
>

Hmm, I think at this stage, we don't need to add the provision for
'all'. We can add it when more than one object 'publication' will be
allowed to be removed.

With Regards,
Amit Kapila.




Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET

2025-03-17 Thread David Rowley
On Mon, 17 Mar 2025 at 13:53, Michael Paquier  wrote:
> So, here is attached a counter-proposal, where we can simply added a
> counter tracking a node count in _jumbleNode() to add more entropy to
> the mix, incrementing it as well for NULL nodes.

I had a look at this and it seems the main difference will be that
this patch will protect against the case that a given node is non-null
but has a custom jumble function which selects to not jumble anything
in some cases. Since Ivan's patch only adds jumbling for non-null,
that could lead to the same bug if a custom jumble function decided to
not jumble anything at all.

FWIW, I did the same test to jumble all make check queries with your
patch, same as [1]:

master: 73.66 milliseconds
0001-Query-ID-Calculation-Fix-Variant-B.patch: 80.36 milliseconds
v4-0001-Add-more-entropy-to-query-jumbling.patch: 88.84 milliseconds

I know you've said in [2] that you're not worried about performance,
but I wanted to see if that's true... I can measure the cost of
compute_query_id=on with pgbench -S workload on my AMD 3990x machine.
I tried with that on "auto", then with "on" with master and again with
your v4 patch when set to "on":

(the -c 156 -j 156 is just there to ensure this machine properly
clocks up, which it still seems hesitant to do sometimes, even with
the performance power settings)

drowley@amd3990x:~$ echo master compute_query_id = auto && for i in
{1..10}; do pg_ctl stop -D pgdata > /dev/null && pg_ctl start -D
pgdata -l pg.log > /dev/null && pgbench -n -M simple -S -T 60 -c 156
-j 156 postgres | grep tps; done
master compute_query_id = auto
tps = 1202451.945998
tps = 1197645.856236
tps = 1182345.403291
tps = 1182440.562773
tps = 1180731.003390
tps = 1185277.478449
tps = 1174983.732094
tps = 1176310.828235
tps = 1179040.622103
tps = 1177068.520272
drowley@amd3990x:~$ echo master compute_query_id = on && for i in
{1..10}; do pg_ctl stop -D pgdata > /dev/null && pg_ctl start -D
pgdata -l pg.log > /dev/null && pgbench -n -M simple -S -T 60 -c 156
-j 156 postgres | grep tps; done
master compute_query_id = on
tps = 1158684.006821
tps = 1165808.752641
tps = 1158269.999683
tps = 1146730.269628
tps = 1154200.484194
tps = 1152750.152235
tps = 1150438.486321
tps = 1150649.669337
tps = 1144745.761593
tps = 1157743.530383
drowley@amd3990x:~$ echo
v4-0001-Add-more-entropy-to-query-jumbling.patch compute_query_id = on
&& for i in {1..10}; do pg_ctl stop -D pgdata > /dev/null && pg_ctl
start -D pgdata -l pg.log > /dev/null && pgbench -n -M simple -S -T 60
-
c 156 -j 156 postgres | grep tps; done
v4-0001-Add-more-entropy-to-query-jumbling.patch compute_query_id = on
tps = 1156782.516710
tps = 1162780.019911
tps = 1142104.075069
tps = 1145651.299640
tps = 1143157.945891
tps = 1150275.033626
tps = 1145817.267485
tps = 1135915.694987
tps = 1138703.153873
tps = 1138436.802882

It's certainly hard to see much slowdown between master and v4, but it
is visible if I sort the results and put them in a line graph and
scale the vertical axis a bit. A 0.7% slowdown. See attached.

I didn't run the same test on the
0001-Query-ID-Calculation-Fix-Variant-B.patch patch, but with the
first set of results I posted above, you could assume it's less than
0.7% overhead. Locally, I did play around with internalising the
AppendJumble function so my compiler would compile a dedicated
JumbleNull function, which removes a bit of branching and code for the
memcpy. I managed to get it going a bit faster that way.

I do agree that your v4 fixes the issue at hand and I'm not going to
stand in your way if you want to proceed with it. However, from my
point of view, it seems that we could achieve the same goal with much
less overhead by just insisting that custom jumble functions always
jumble something. We could provide a jumble function to do that if the
custom function conditionally opts to jumble nothing. That would save
having to add this additional jumbling of the node count.

Also, am I right in thinking that there's no way for an extension to
add a custom jumble function? If so, then we have full control over
any custom jumble function. That would make it easier to ensure these
always jumble something.

David

[1] 
https://postgr.es/m/CAApHDvqaKySJdBf2v2_ybNuT=kobauvbu4_5kgzfftmsjr-...@mail.gmail.com
[2] https://postgr.es/m/z9flafnlh4-1y...@paquier.xyz


Re: Statistics Import and Export

2025-03-17 Thread Corey Huinker
On Mon, Mar 17, 2025 at 10:24 AM Nathan Bossart 
wrote:

> On Sun, Mar 16, 2025 at 05:32:15PM -0400, Corey Huinker wrote:
> >>
> >> * The custom format actually does two WriteToc() calls, and since these
> >>   patches move the queries to this part of pg_dump, it means we'll run
> all
> >>   the queries twice.  The comments around this code suggest that the
> second
> >>   pass isn't strictly necessary and that it is really only useful for
> >>   data/parallel restore, so we could probably skip it for no-data dumps.
> >>
> >
> > Is there any reason we couldn't have stats objects remove themselves from
> > the list after completion?
>
> I'm assuming that writing a completely different TOC on the second pass
> would corrupt the dump file.  Perhaps we could teach it to skip stats
> entries on the second pass or something, but I'm not too wild about adding
> to the list of invasive changes we're making last-minute for v18.


I'm confused, are they needed in both places? If so, would it make sense to
write out each stat entry to a file and then re-read the file on the second
pass, or maybe do a \i filename in the sql script?

Not suggesting we do any of this for v18, but when I hear about doing
something twice when that thing was painful the first time, I look for ways
to avoid doing it, or set pan_is_hot = true for the next person.


Re: dblink: Add SCRAM pass-through authentication

2025-03-17 Thread Jacob Champion
On Mon, Mar 17, 2025 at 11:54 AM Matheus Alcantara
 wrote:
> > If the check functions aren't going to check those because it's
> > unnecessary, then that's fine, but then the comments should be
> > adjusted.
> >
> Ok, now I get all of your points. I've misinterpreted your comments,
> sorry about that. I've changed on v7 to validate the scram keys using
> the same approach implemented for require_auth, so that now we correctly
> check for overwritten scram keys on connection options. I think that the
> code comments should be correct now?

Looks good.

> > I was referring to the discussion upthread [1]; you'd mentioned that
> > the only reason that get_connect_string() didn't call GetUserMapping()
> > itself was because we needed that mapping later on for
> > UseScramPassthrough(). But that's no longer true for this patch,
> > because the later call to UseScramPassthrough() has been removed. So I
> > think we can move GetUserMapping() back down, and remove that part of
> > the refactoring from patch 0001.
> >
> Ok, it totally makes sense. Fixed on v7.

The fix is in, but it needs to be part of 0001 rather than 0002;
otherwise 0001 doesn't compile.

--

A pgperltidy run is needed for some of the more recent test changes.
But I'm rapidly running out of feedback, so I think this is very
close.

Thanks!
--Jacob




Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2025-03-17 Thread jian he
On Wed, Mar 12, 2025 at 4:26 PM Jim Jones  wrote:
>
> >> 2) Inconsistent terminology. Invalid values in "on_error set_to_null"
> >> mode are names as "erroneous", but as "invalid" in "on_error stop" mode.
> >> I don't want to get into the semantics of erroneous or invalid, but
> >> sticking to one terminology would IMHO look better.
> >>
> > I am open to changing it.
> > what do you think "invalid values in %llu row was replaced with null"?
>
> LGTM: "invalid values in %llu rows were replaced with null"
>
changed based on this.

also minor documentation tweaks.
From 3553eee56c8dd0c3ce334d1f37b511acbbc640af Mon Sep 17 00:00:00 2001
From: jian he 
Date: Tue, 18 Mar 2025 11:51:48 +0800
Subject: [PATCH v13 1/1] COPY (on_error set_to_null)

Extent "on_error action", introduce new option:  on_error set_to_null.
Current grammar makes us unable to use "on_error null", so we choose "on_error set_to_null".

Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applies when using the non-binary
format for COPY FROM.

However, the not-null constraint will still be enforced.
If a column have not-null constraint, successful (on_error set_to_null)
action will cause not-null constraint violation.
This also apply to column type is domain with not-null constraint.

A regression test for a domain with a not-null constraint has been added.

Author: Jian He 
Author: Kirill Reshke 

Reviewed-by:
Fujii Masao 
Jim Jones 
"David G. Johnston" 
Yugo NAGATA 
torikoshia 

discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=bp3d1_asfe...@mail.gmail.com
---
 doc/src/sgml/ref/copy.sgml   | 34 +++-
 src/backend/commands/copy.c  |  6 ++-
 src/backend/commands/copyfrom.c  | 29 +-
 src/backend/commands/copyfromparse.c | 43 +++-
 src/bin/psql/tab-complete.in.c   |  2 +-
 src/include/commands/copy.h  |  1 +
 src/test/regress/expected/copy2.out  | 60 
 src/test/regress/sql/copy2.sql   | 46 +
 8 files changed, 196 insertions(+), 25 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index df093da97c5..cdb725d7565 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,34 @@ COPY { table_name [ ( error_action value of
-  stop means fail the command, while
-  ignore means discard the input row and continue with the next one.
+  stop means fail the command,
+  ignore means discard the input row and continue with the next one, and
+  set_to_null means replace columns containing invalid input values with
+  NULL and move to the next field.
   The default is stop.
  
  
-  The ignore option is applicable only for COPY FROM
+  The ignore and set_to_null
+  options are applicable only for COPY FROM
   when the FORMAT is text or csv.
  
  
-  A NOTICE message containing the ignored row count is
-  emitted at the end of the COPY FROM if at least one
-  row was discarded. When LOG_VERBOSITY option is set to
-  verbose, a NOTICE message
+  For ignore option,
+  a NOTICE message containing the ignored row count is
+  emitted at the end of the COPY FROM if at least one row was discarded.
+  For set_to_null option,
+  a NOTICE message indicating the number of rows where invalid input values were replaced with
+  null is emitted at the end of the COPY FROM if at least one row was replaced.
+ 
+ 
+  When LOG_VERBOSITY option is set to verbose,
+  for ignore option, a NOTICE message
   containing the line of the input file and the column name whose input
-  conversion has failed is emitted for each discarded row.
-  When it is set to silent, no message is emitted
-  regarding ignored rows.
+  conversion has failed is emitted for each discarded row;
+  for set_to_null option,
+  a NOTICE message containing the line of the input file and the column name
+  where value was replaced with NULL for each input conversion failure.
+  When it is set to silent, no message is emitted regarding input conversion failed rows.
  
 

@@ -458,7 +469,8 @@ COPY { table_name [ ( 
  
   This is currently used in COPY FROM command when
-  ON_ERROR option is set to ignore.
+  ON_ERROR option is set to ignore
+  or set_to_null.
   
 

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..afe60758d40 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
  parser_errposition(pstate, def->location)));
 
 	/*
-	 * Allow "stop", or "ignore" values.
+	 * Allow "stop", "ignore", "set_to_null" values.
 	 */
 	if (pg_strcasecmp(sval, "stop") == 0)
 		return COPY_ON_ERROR_STOP;
 	

Re: support fix query_id for temp table

2025-03-17 Thread Michael Paquier
On Mon, Mar 17, 2025 at 10:38:36PM +0100, Christoph Berg wrote:
> Here's that patch with regression tests added. I would think changing
> this would be a big usability improvement for anyone using temp tables
> a lot.

Not the first time I am seeing this argument this month.  It is the
second time.

+/*
+ * If this is a temp table, jumble the name instead of the table oid.
+ */
+if (expr->rtekind == RTE_RELATION && 
isAnyTempNamespace(get_rel_namespace(expr->relid)))
+{
+rel_name = get_rel_name(expr->relid);
+AppendJumble(jstate, (const unsigned char *)rel_name, 
strlen(rel_name));
+}
+else
+JUMBLE_FIELD(relid);

This is OK on its own, still feels a bit incomplete, as the relid also
includes an assumption about the namespace.  I would suggested to add
a hardcoded "pg_temp" here, to keep track of this assumption, at
least.

 typedef struct RangeTblEntry
 {
-pg_node_attr(custom_read_write)
+pg_node_attr(custom_read_write, custom_query_jumble)

This structure still includes some query_jumble_ignore, which are not
required once custom_query_jumble is added.

We had better document at the top of RangeTblEntry why we are using a
custom function.
--
Michael


signature.asc
Description: PGP signature


Re: Update Unicode data to Unicode 16.0.0

2025-03-17 Thread vignesh C
On Mon, 17 Mar 2025 at 23:03, Jeff Davis  wrote:
>
> On Sun, 2025-03-16 at 19:10 +0530, vignesh C wrote:
> > We currently have two Commitfest entries for the same thread at [1]
> > and [2]. Are both still necessary, or can we consolidate tracking
> > into
> > a single entry?
>
> I'm fine removing my CF entry, but unfortunately there's no "withdrawn
> -- duplicate", so it might send the wrong message.

Yes, we don't have "withdrawn duplicate", I have closed it as
withdrawn, anyway we can get it reviewed and committed using the other
entry.

Regards,
Vignesh




Re: BitmapHeapScan streaming read user and prelim refactoring

2025-03-17 Thread Melanie Plageman
On Mon, Mar 17, 2025 at 2:55 PM Andres Freund  wrote:
>
> On 2025-03-17 14:52:02 -0400, Melanie Plageman wrote:
> > I don't feel strongly that we need to be as rigorous for
> > maintenance_io_concurrency, but I'm also not sure 160 seems reasonable
> > (which would be the same ratio as before).
>
> I'd lean towards just setting them to the same value for now.

Cool, here's the patch I intend to commit. I had a bit of trouble
clearly explaining the situation in the commit message which is why I
didn't directly commit it. If no one  has ideas on how to clarify it
by tomorrow, I'll just push it.

- Melanie
From babd220052bcef744e38302e35c6eaee8364027d Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 17 Mar 2025 17:08:49 -0400
Subject: [PATCH v1] Increase default maintenance_io_concurrency to 16

Since its introduction in fc34b0d9de27a, the default
maintenance_io_concurrency has been larger than the default
effective_io_concurrency. This made sense at the time because
maintenance_io_concurrency controlled I/O prefetching done on behalf of
many client sessions (such as in recovery).

ff79b5b2ab increased effective_io_concurrency to 16, so
we'll increase maintenance_io_concurrency as well.

For now, though, we'll keep the defaults of effective_io_concurrency and
maintenance_io_concurrency equal to one another (16).

Since 9256822608f and c3e775e608f, maintenance_io_concurrency also
controls the I/O concurrency of each vacuum worker. Since many
autovacuum workers may be simultaneously issuing I/Os, an appropriate
default value for maintenance_io_concurrency is likely closer to that of
the default effective_io_concurrency rather than the previous ratio
(10x).

Suggested-by: Jakub Wartak 
Discussion: https://postgr.es/m/CAKZiRmxdHQaU%2B2Zpe6d%3Dx%3D0vigJ1sfWwwVYLJAf%3Dud_wQ_VcUw%40mail.gmail.com
---
 doc/src/sgml/config.sgml  | 8 
 src/backend/utils/misc/postgresql.conf.sample | 2 +-
 src/include/storage/bufmgr.h  | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3d62c8bd274..a3aae67746b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2617,10 +2617,10 @@ include_dir 'conf.d'
  for maintenance work that is done on behalf of many client sessions.
 
 
- The default is 10 on supported systems, otherwise 0.  This value can
- be overridden for tables in a particular tablespace by setting the
- tablespace parameter of the same name (see
- ).
+ The default is 16 on supported systems, otherwise
+ 0.  This value can be overridden for tables in a
+ particular tablespace by setting the tablespace parameter of the same
+ name (see ).
 

   
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 8de86e0c945..613d5b3ee39 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -199,7 +199,7 @@
 
 #backend_flush_after = 0		# measured in pages, 0 disables
 #effective_io_concurrency = 16		# 1-1000; 0 disables prefetching
-#maintenance_io_concurrency = 10	# 1-1000; 0 disables prefetching
+#maintenance_io_concurrency = 16	# 1-1000; 0 disables prefetching
 #io_combine_limit = 128kB		# usually 1-32 blocks (depends on OS)
 
 # - Worker Processes -
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 79a89f87fcc..7f5def6bada 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -153,7 +153,7 @@ extern PGDLLIMPORT bool track_io_timing;
 /* only applicable when prefetching is available */
 #ifdef USE_PREFETCH
 #define DEFAULT_EFFECTIVE_IO_CONCURRENCY 16
-#define DEFAULT_MAINTENANCE_IO_CONCURRENCY 10
+#define DEFAULT_MAINTENANCE_IO_CONCURRENCY 16
 #else
 #define DEFAULT_EFFECTIVE_IO_CONCURRENCY 0
 #define DEFAULT_MAINTENANCE_IO_CONCURRENCY 0
-- 
2.34.1



Re: Bump soft open file limit (RLIMIT_NOFILE) to hard limit on startup

2025-03-17 Thread Jelte Fennema-Nio

On Mon Feb 24, 2025 at 12:01 PM CET, Jelte Fennema-Nio wrote:

Right after pressing send I realized I could remove two more useless
lines...


Rebased patchset attached (trivial conflict against pg_noreturn
changes).

From 249ebbac1b6c01b99795cfe9b0201ab7ee6bacfa Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Sun, 23 Feb 2025 16:52:29 +0100
Subject: [PATCH v7 1/3] Adds a helper for places that shell out to system()

We need to call system() in a few places, and to do so safely we need
some pre/post-fork logic. This encapsulates that logic into a single
System helper function. The main reason to do this is that in a follow
up commit we want to add even more logic pre and post-fork.
---
 src/backend/access/transam/xlogarchive.c | 28 +---
 src/backend/archive/shell_archive.c  |  6 +--
 src/backend/postmaster/startup.c | 54 +---
 src/include/postmaster/startup.h |  3 +-
 4 files changed, 43 insertions(+), 48 deletions(-)

diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 1ef1713c91a..59a0f191339 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -158,27 +158,7 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 			(errmsg_internal("executing restore command \"%s\"",
 			 xlogRestoreCmd)));
 
-	fflush(NULL);
-	pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND);
-
-	/*
-	 * PreRestoreCommand() informs the SIGTERM handler for the startup process
-	 * that it should proc_exit() right away.  This is done for the duration
-	 * of the system() call because there isn't a good way to break out while
-	 * it is executing.  Since we might call proc_exit() in a signal handler,
-	 * it is best to put any additional logic before or after the
-	 * PreRestoreCommand()/PostRestoreCommand() section.
-	 */
-	PreRestoreCommand();
-
-	/*
-	 * Copy xlog from archival storage to XLOGDIR
-	 */
-	rc = system(xlogRestoreCmd);
-
-	PostRestoreCommand();
-
-	pgstat_report_wait_end();
+	rc = System(xlogRestoreCmd, WAIT_EVENT_RESTORE_COMMAND);
 	pfree(xlogRestoreCmd);
 
 	if (rc == 0)
@@ -325,11 +305,7 @@ ExecuteRecoveryCommand(const char *command, const char *commandName,
 	/*
 	 * execute the constructed command
 	 */
-	fflush(NULL);
-	pgstat_report_wait_start(wait_event_info);
-	rc = system(xlogRecoveryCmd);
-	pgstat_report_wait_end();
-
+	rc = System(xlogRecoveryCmd, wait_event_info);
 	pfree(xlogRecoveryCmd);
 
 	if (rc != 0)
diff --git a/src/backend/archive/shell_archive.c b/src/backend/archive/shell_archive.c
index 828723afe47..7977e5a5092 100644
--- a/src/backend/archive/shell_archive.c
+++ b/src/backend/archive/shell_archive.c
@@ -22,6 +22,7 @@
 #include "archive/shell_archive.h"
 #include "common/percentrepl.h"
 #include "pgstat.h"
+#include "postmaster/startup.h"
 
 static bool shell_archive_configured(ArchiveModuleState *state);
 static bool shell_archive_file(ArchiveModuleState *state,
@@ -75,10 +76,7 @@ shell_archive_file(ArchiveModuleState *state, const char *file,
 			(errmsg_internal("executing archive command \"%s\"",
 			 xlogarchcmd)));
 
-	fflush(NULL);
-	pgstat_report_wait_start(WAIT_EVENT_ARCHIVE_COMMAND);
-	rc = system(xlogarchcmd);
-	pgstat_report_wait_end();
+	rc = System(xlogarchcmd, WAIT_EVENT_ARCHIVE_COMMAND);
 
 	if (rc != 0)
 	{
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 27e86cf393f..2e3d0ea8cf0 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -33,6 +33,7 @@
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/timeout.h"
+#include "utils/wait_event_types.h"
 
 
 #ifndef USE_POSTMASTER_DEATH_SIGNAL
@@ -264,24 +265,45 @@ StartupProcessMain(const void *startup_data, size_t startup_data_len)
 	proc_exit(0);
 }
 
-void
-PreRestoreCommand(void)
+/*
+ * A custom wrapper around the system() function that calls the necessary
+ * functions pre/post-fork.
+ */
+int
+System(const char *command, uint32 wait_event_info)
 {
-	/*
-	 * Set in_restore_command to tell the signal handler that we should exit
-	 * right away on SIGTERM. We know that we're at a safe point to do that.
-	 * Check if we had already received the signal, so that we don't miss a
-	 * shutdown request received just before this.
-	 */
-	in_restore_command = true;
-	if (shutdown_requested)
-		proc_exit(1);
-}
+	int			rc;
 
-void
-PostRestoreCommand(void)
-{
-	in_restore_command = false;
+	fflush(NULL);
+	pgstat_report_wait_start(wait_event_info);
+
+	if (wait_event_info == WAIT_EVENT_RESTORE_COMMAND)
+	{
+		/*
+		 * Set in_restore_command to tell the signal handler that we should
+		 * exit right away on SIGTERM. This is done for the duration of the
+		 * system() call because there isn't a good way to break out while it
+		 * is executing.  Since we might call proc_exit() in a signal handler,
+		 * it is best to put any additional logic outside of this section
+		 * where in_restor

Re: md.c vs elog.c vs smgrreleaseall() in barrier

2025-03-17 Thread Andres Freund
Hi,

On 2025-03-14 12:57:51 +1300, Thomas Munro wrote:
> On Fri, Mar 14, 2025 at 12:23 PM Andres Freund  wrote:
> 
> > > 3.  Considering errfinish()'s stated goal, a sort of backstop to help
> > > you cancel looping message-spewing code only, I wonder if we should
> > > just restrict the kinds of interrupts it processes, so that it only
> > > calls CFI() if we're definitely throwing ERROR or FATAL?
> >
> > I'm not even sure it'd be safe to ERROR out in all the relevant places...
> >
> > E.g.
> > /* implementation-specific initialization */
> > smgrsw[reln->smgr_which].smgr_open(reln);
> >
> > /* it is not pinned yet */
> > reln->pincount = 0;
> > dlist_push_tail(&unpinned_relns, &reln->node);
> >
> > doesn't this mean that ->pincount is uninitialized in case smgr_open() 
> > errors
> > out and that we'd loose track of the reln because it wasn't yet added to
> > unpinned_rels?
> 
> Ugh, right.

Patch for that attached.


> > > > If we go for that, I can see an argument for doing that in smgr.c 
> > > > instead of
> > > > md.c, afaict any plausible smgr implementation would run into this, as 
> > > > the
> > > > smgrcloseall() is triggered from the smgr level.
> > >
> > > Seems like maybe not a great idea to assume that future smgrs will be
> > > OK with being non-interruptible, if it can be avoided?
> >
> > I think you'd need a fairly large surgery of smgr.c to make that viable - I
> > rather doubt that smgr.c itself is actually safe against interrupts.
> >
> > I can see some arguable uses of smgr handling interrupts, say an smgr
> > implementation based on a network backed store, but you'd need rather 
> > massive
> > changes to actually be sure that smgr.c is called while accepting 
> > interrupts -
> > e.g. today sgmrwritev() will largely be called with interrupts held. Plenty
> > reads too.  I doubt that undoing a few HOLD_INTERRUPTS() is a meaningful 
> > part
> > of the necessary work.
> 
> Right, exactly the case I was thinking of: if some hypothetical future
> network smgr wants to be able to process *some* kinds of things
> carefully while waiting for the network.  I don't think we want to
> constrain ourselves to NFS-style "pretend it's local and make it
> non-interruptible" without any escape hatches, but you're quite right
> that that's probably a whole research project of its own and we just
> haven't refined the interrupt system enough for that yet, so yeah I
> see how you arrived at that conclusion and it makes sense.

Here's a proposed patch for this. It turns out that the bug might already be
reachable, even without defining FDDEBUG. There's a debug ereport() in
register_dirty_segment() - but it's hard to reach in practice.

I don't really know whether that means we ought to backpatch or not - which
makes me want to be conservative and not backpatch.

Greetings,

Andres Freund
>From 79c33df9e53038b7ebc9e1e6d5e575f6ef46e4f2 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 17 Mar 2025 19:41:54 -0400
Subject: [PATCH v2 1/2] smgr: Hold interrupts in most smgr functions

We need to hold interrupts across most of the smgr.c/md.c functions, as
otherwise interrupt processing, e.g. due to a debug elog/ereport, can trigger
procsignal processing, which in turn can trigger smgrreleaseall(). As the
relevant code is not reentrant, we quickly end up in a bad situation.

The only reason we haven't noticed this before is that there is only one
non-error ereport called in affected routines, in register_dirty_segments(),
and that one is extremely rarely reached. If one enables fd.c's FDDEBUG it's
easy to reproduce crashes.

It seems better to put the HOLD_INTERRUPTS()/RESUME_INTERRUPTS() in smgr.c,
instead of trying to push them down to md.c where possible: For one, every
smgr implementation would be vulnerable, for another, a good bit of smgr.c
code itself is affected too.

Eventually we might want a more targeted solution, allowing e.g. a networked
smgr implementation to be interrupted, but many other, more complicated,
problems would need to be fixed for that to be viable (e.g. smgr.c is often
called with interrupts already held).

Discussion: https://postgr.es/m/3vae7l5ozvqtxmd7rr7zaeq3qkuipz365u3rtim5t5wdkr6f4g@vkgf2fogjirl
---
 src/backend/storage/smgr/smgr.c | 97 -
 1 file changed, 94 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index ebe35c04de5..53a09fe4aaa 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -40,6 +40,18 @@
  * themselves, as there could pointers to them in active use.  See
  * smgrrelease() and smgrreleaseall().
  *
+ * NB: We need to hold interrupts across most of the functions in this file,
+ * as otherwise interrupt processing, e.g. due to a debug elog/ereport, can
+ * trigger procsignal processing, which in turn can trigger
+ * smgrreleaseall(). None of the releva

Re: optimize file transfer in pg_upgrade

2025-03-17 Thread Nathan Bossart
On Mon, Mar 17, 2025 at 04:04:45PM -0400, Robert Haas wrote:
> On Mon, Mar 17, 2025 at 3:34 PM Nathan Bossart  
> wrote:
>> * Once committed, I should update one of my buildfarm animals to use
>>   PG_TEST_PG_UPGRADE_MODE=--swap.
> 
> It would be better if we didn't need a separate buildfarm animal to
> test this, because that means you won't get notified by local testing
> OR by CI if you break this. Can we instead have one test that checks
> this which is part of the normal test run?

That's what I set out to do before I discovered PG_TEST_PG_UPGRADE_MODE.
The commit message for b059a24 seemed to indicate that we don't want to
automatically test all supported modes, but I agree that it would be nice
to have some basic coverage for --swap on CI/buildfarm regardless of
PG_TEST_PG_UPGRADE_MODE.  How about we add a simple TAP test (attached),
and I still plan on switching a buildfarm animal to --swap for more
in-depth testing?

-- 
nathan
>From 0a228f150d101ef2ffe38c88fe290e313142a2d9 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 19 Feb 2025 09:14:51 -0600
Subject: [PATCH v6 1/3] initdb: Add --no-sync-data-files.

This new option instructs initdb to skip synchronizing any files
in database directories and the database directories themselves,
i.e., everything in the base/ subdirectory and any other
tablespace directories.  Other files, such as those in pg_wal/ and
pg_xact/, will still be synchronized unless --no-sync is also
specified.  --no-sync-data-files is primarily intended for internal
use by tools that separately ensure the skipped files are
synchronized to disk.  A follow-up commit will use this to help
optimize pg_upgrade's file transfer step.

Discussion: https://postgr.es/m/Zyvop-LxLXBLrZil%40nathan
---
 doc/src/sgml/ref/initdb.sgml| 27 +++
 src/bin/initdb/initdb.c | 10 ++-
 src/bin/initdb/t/001_initdb.pl  |  1 +
 src/bin/pg_basebackup/pg_basebackup.c   |  2 +-
 src/bin/pg_checksums/pg_checksums.c |  2 +-
 src/bin/pg_combinebackup/pg_combinebackup.c |  2 +-
 src/bin/pg_rewind/file_ops.c|  2 +-
 src/common/file_utils.c | 85 +
 src/include/common/file_utils.h |  2 +-
 9 files changed, 96 insertions(+), 37 deletions(-)

diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 0026318485a..2f1f9a42f90 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -527,6 +527,33 @@ PostgreSQL documentation
   
  
 
+ 
+  --no-sync-data-files
+  
+   
+By default, initdb safely writes all database files
+to disk.  This option instructs initdb to skip
+synchronizing all files in the individual database directories, the
+database directories themselves, and the tablespace directories, i.e.,
+everything in the base subdirectory and any other
+tablespace directories.  Other files, such as those in
+pg_wal and pg_xact, will still be
+synchronized unless the --no-sync option is also
+specified.
+   
+   
+Note that if --no-sync-data-files is used in
+conjuction with --sync-method=syncfs, some or all of
+the aforementioned files and directories will be synchronized because
+syncfs processes entire file systems.
+   
+   
+This option is primarily intended for internal use by tools that
+separately ensure the skipped files are synchronized to disk.
+   
+  
+ 
+
  
   --no-instructions
   
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 21a0fe3ecd9..22b7d31b165 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -168,6 +168,7 @@ static bool data_checksums = true;
 static char *xlog_dir = NULL;
 static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
 static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
+static bool sync_data_files = true;
 
 
 /* internal vars */
@@ -2566,6 +2567,7 @@ usage(const char *progname)
printf(_("  -L DIRECTORY  where to find the input 
files\n"));
printf(_("  -n, --no-cleando not clean up after errors\n"));
printf(_("  -N, --no-sync do not wait for changes to be 
written safely to disk\n"));
+   printf(_("  --no-sync-data-files  do not sync files within database 
directories\n"));
printf(_("  --no-instructions do not print instructions for 
next steps\n"));
printf(_("  -s, --showshow internal settings, then 
exit\n"));
printf(_("  --sync-method=METHOD  set method for syncing files to 
disk\n"));
@@ -3208,6 +3210,7 @@ main(int argc, char *argv[])
{"icu-rules", required_argument, NULL, 18},
{"sync-method", required_argument, NULL, 19},
{"no-data-checksums", no_argument, NULL, 20},
+   

Re: pgsql: pg_upgrade: Preserve default char signedness value from old clus

2025-03-17 Thread Masahiko Sawada
On Mon, Mar 17, 2025 at 8:02 PM Robert Haas  wrote:
>
> On Mon, Mar 17, 2025 at 6:02 PM Masahiko Sawada  wrote:
> > I've confirmed the issue and attached a patch to fix it.
>
> Cool. The commit message refers to 003_char_signedness, but the test
> name is 005, not 003.

Thank you for reviewing the patch. I've pushed the patch after fixing it.


Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: making EXPLAIN extensible

2025-03-17 Thread Sami Imseih
> You only
> need the second hook if you want to check the values of options
> against the values of other options.
+1

I did not think of adding a new hook, because there must be a really good
reason to add a new hook. I think it's justified for this case. It's better than
my approach since the extension author can just put all their checks in one
place rather than having to register a bunch of handlers.

> some loadable module adds two new options LEFT and RIGHT and wants to
> check that you don't specify LEFT and RIGHT together? Either they
> register the same validate handler for both, or they register the real
> validate handler for one and a no-op handler for the other. Neither of
> those options seems very appealing.

When I thought about this, I figured that one of the options will register
a validate handler and the other option will set the handler to NULL. But,
I do see why this is not appealing.

--
Sami




Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-03-17 Thread David G. Johnston
On Mon, Mar 17, 2025 at 4:01 PM Euler Taveira  wrote:

> On Mon, Mar 17, 2025, at 9:34 AM, Shubham Khanna wrote:
>
> I have incorporated the "--remove/-r" parameter in the attached patch,
> as it seems more intuitive and straightforward for users.
> The attached patch contains the latest changes.
>
>
> There were a lot of discussion around the single vs multiple options since
> my
> last review [1] so I'm answering some of the questions here.
>
>

> Regarding the name, my preference is
> --drop since we already have other binaries with similar options
> (pg_receivewal
> and pg_recvlogical have --drop-slot).
>

A short form seems desired here and we cannot use -d/-D.  Also, the "act
and quit" nature of the command-like options in those two client
applications leads me to believe that this server application modifier-like
option, which behaves differently than a simple "drop named object and
return", should not have the same naming as those others.

We are not dropping named objects - the wording "Remove all given objects"
is incorrect.

"Remove all objects of the specified type from specified databases on the
target server."

"Multiple object types can be specified by writing multiple --remove
switches." (accepting switches instead of options pending bulk change)

More changes of this sort are needed.


>
> -   drop_publication(conn, &dbinfo[i]);
> +   if (dbinfos.remove_objects & OBJECT_PUBLICATIONS)
> +   drop_all_publications(conn, dbinfo);
> +   else
> +   drop_publication(conn, dbinfo, dbinfo->pubname);
>
> At first glance, I didn't like this change. You inform dbinfo->pubname as
> a 3rd
> parameter but the 2nd parameter is dbinfo. After reading
> drop_all_publication(), I realized that's the cause for this change. Is
> there a
> better way to do it?
>

I had the same impression.  I'm inclined to accept it as-is and let whoever
writes the next --remove object type implementation deal with cleaning it
up.  This is clear enough when talking only about publications since
whether you just remove the one or all of them the special one we created
goes away.


+   pg_log_error("wrong remove object is specified");
> +   pg_log_error_hint("Only \"publications\" can be removed.");
> +   exit(1);
> The main message sounds strange. Did you mean 'wrong object is specified'?
>

Error: invalid object type "%s" specified for --remove
Hint: The valid options are: "publications"

(yes, plural for a single option is "wrong", but extensible...)

If we want to just give the DBA the choice to say "all" now we could solve
two annoyances at once.

The valid options are: "all", "publications"

Should we be accepting these case-insensitively?

David J.


Re: Re: proposal: schema variables

2025-03-17 Thread Pavel Stehule
Hi

po 17. 3. 2025 v 21:53 odesílatel Marcos Pegoraro 
napsal:

> Em seg., 17 de mar. de 2025 às 15:33, Pavel Stehule <
> pavel.steh...@gmail.com> escreveu:
>
>> I was asked for sending a reduced patchset
>>
>
> Would be good to explain what this reduced patchset is.
> Complete patch contains this and that
> Reduced patch contains only this.
>

Reduced patch contains:

* possibility to create and drop session variables (support for catalog
pg_variable and related operations)
* possibility to set the session variable (command LET) and possibility to
use session variable in the query
* access right support
* support for DISCARD command
* memory cleaning at transaction end when the variable is dropped
* warning when variable is shadowed by column
* introduction of variable's fences - syntax VARIABLE(varname)

Complete patch contains plus

* LET can be described by EXPLAIN
* LET can be prepared statement
* temporary session variables
* RESET at transaction end
* implementation of DEFAULT value for the variable
* implementation IMMUTABLE and NOT NULL clauses for the variable
* variable can be used as an argument of CALL statement (and doesn't block
simple evaluation in plpgsql)
* used variable doesn't block with parallel execution
* LET from plpgsql can use simple expression evaluation (performance
optimization for PL/pgSQL)
* variables doesn't block inlining SQL functions
* fix message "column doesn't exist" to "column or variable doesn't exist"
* support for transactional variables (content can be transactional)
* possibility to specify the name of restored variable for pg_restore

Regards

Pavel





>
> Regards
> Marcos
>


Re: Update Unicode data to Unicode 16.0.0

2025-03-17 Thread Jeff Davis
On Sat, 2025-03-15 at 12:15 -0400, Tom Lane wrote:
> In fact, on the analogy of timezones, I think we should not only
> adopt newly-published Unicode versions pretty quickly but push
> them into released branches as well.

That approach suggests that we consider something like my previous
STRICT_UNICODE proposal[1]. If Postgres updates Unicode quickly enough,
there's not much reason that users would need to use unassigned code
points, so it would be practical to just reject them (as an option).
That would dramatically reduce the practical problems people would
encounter when we do update Unicode.

Note that assigned code points can still change behavior in later
versions, but not in ways that would typically cause a problem for
things like indexes. For instance, U+0363 changed from non-Alphabetic
to Alphabetic in Unicode 16, which changes the results of the
expression:

  U&'\0363' ~ '[[:alpha:]]' COLLATE PG_C_UTF8

from false to true, even though U+0363 is assigned in both Unicode
15.1.0 and 16.0.0. That might plausibly matter, but such cases would be
more obscure than case folding.

Regards,
Jeff Davis

[1] https://commitfest.postgresql.org/patch/4876/





Re: ecdh support causes unnecessary roundtrips

2025-03-17 Thread Jacob Champion
On Thu, Mar 13, 2025 at 2:41 PM Daniel Gustafsson  wrote:
> OpenSSL 3.4 also doesn't like it and AFAICT neither does the upcoming 3.5, 
> just
> haven't had the cycles yet to ship out a new patch with all the time-consuming
> testing it requires =)

Here is a squash fix to change the capitalization -- this passes
locally for LibreSSL 3.4 to 3.8 and OpenSSL 1.1.1 to 3.5.

(Hmm, my LibreSSLs are a little out of date; it looks like I need to
add 3.9 and 4.0 to the suite.)

Thanks,
--Jacob


v2-0001-Add-x25519-to-the-default-set-of-curves.patch
Description: Binary data


v2-0002-squash-Add-x25519-to-the-default-set-of-curves.patch
Description: Binary data


Re: Add Pipelining support in psql

2025-03-17 Thread Michael Paquier
On Mon, Mar 17, 2025 at 10:50:50AM +0100, Anthonin Bonnefoy wrote:
> 0001: This introduces the \sendpipeline meta-command and forbid \g in
> a pipeline. This is to fix the formatting options of \g that are not
> supported in a pipeline.

- count 

- 4
-(1 row)

This removal done in the regression tests was not intentional.

I have done some reordering of the code around the new meta-command so
as things are ordered alphabetically, and applied the result.

> 0002: Allows ';' to send a query using extended protocol when within a
> pipeline by using PQsendQueryParams with 0 parameters. It is not
> possible to send parameters with extended protocol this way and
> everything will be propagated through the query string, similar to a
> simple query.

I like the simplicity of what you are doing here, relying on
PSQL_SEND_QUERY being the default so as we use PQsendQueryParams()
with no parameters rather than PQsendQuery() when the pipeline mode is
not off.

How about adding a check on PIPELINE_COMMAND_COUNT when sending a
query through this path?  Should we check for more scenarios with
syncs and flushes as well when sending these queries?
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: pg_upgrade: Preserve default char signedness value from old clus

2025-03-17 Thread Robert Haas
On Mon, Mar 17, 2025 at 6:02 PM Masahiko Sawada  wrote:
> I've confirmed the issue and attached a patch to fix it.

Cool. The commit message refers to 003_char_signedness, but the test
name is 005, not 003.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Allow io_combine_limit up to 1MB

2025-03-17 Thread Thomas Munro
Here's a new version that also adjusts the code that just landed in da722699:

-   /*
-* Each IO handle can have an PG_IOV_MAX long iovec.
-*
-* XXX: Right now the amount of space available for each IO is
PG_IOV_MAX.
-* While it's tempting to use the io_combine_limit GUC, that's
-* PGC_USERSET, so we can't allocate shared memory based on that.
-*/
+   /* each IO handle can have up to io_max_combine_limit iovec objects */
return mul_size(sizeof(struct iovec),
-   mul_size(mul_size(PG_IOV_MAX,
AioProcs()),
+
mul_size(mul_size(io_max_combine_limit, AioProcs()),
 io_max_concurrency));

While doing that, this juxtaposition jumped out at me:

+   uint32  per_backend_iovecs = io_max_concurrency *
max_io_combine_limit;

Surely one of these names has to give!  First commit wins, so the new
name in this version is "io_max_combine_limit".

I was contemplating making the same change to the MAX_IO_COMBINE_LIMIT
macro when it occurred to me that we could just delete it.  It has few
references and they could just use PG_IOV_MAX instead; it's perhaps a
little less clear as names go, but it's also pretty confusing that
there's a macro with the same name as a GUC...
From 1200802f3d647a8c9107b2148595dab4b6a1bd82 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 12 Feb 2025 13:52:22 +1300
Subject: [PATCH v4 1/3] Introduce io_max_combine_limit.

The existing io_combine_limit setting can be changed by users.  The new
io_max_combine_limit setting is defined only at server startup time, and
applies as a silent cap.

This allows aio_init.c to know what the maximum could possibly be when
allocating shared memory at startup, instead of having to assume
PG_IOV_MAX (which a later commit will quadruple).

Since our GUC system doesn't allow dependencies between GUCs, the
user-settable one now assigns a "raw" value to io_combine_limit_guc, and
the lower of io_combine_limit_guc and io_max_combine_limit is maintained
in io_combine_limit.  Most code should continue to refer to
io_combine_limit directly, and io_combine_limit_guc is an implementation
detail that could eventually be removed if the GUC system ever learns
how to deal with dependencies.

Reviewed-by: Andres Freund 
Discussion: https://postgr.es/m/CA%2BhUKG%2B2T9p-%2BzM6Eeou-RAJjTML6eit1qn26f9twznX59qtCA%40mail.gmail.com
---
 doc/src/sgml/config.sgml  | 23 ++-
 src/backend/commands/variable.c   | 18 +++
 src/backend/storage/aio/aio_init.c| 17 +-
 src/backend/storage/buffer/bufmgr.c   |  5 +++-
 src/backend/utils/misc/guc_tables.c   | 16 -
 src/backend/utils/misc/postgresql.conf.sample |  2 ++
 src/include/storage/bufmgr.h  |  4 +++-
 src/include/utils/guc_hooks.h |  2 ++
 8 files changed, 72 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7ec18bb7627..d1080dac97f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2625,6 +2625,24 @@ include_dir 'conf.d'

   
 
+  
+   io_max_combine_limit (integer)
+   
+io_max_combine_limit configuration parameter
+   
+   
+   
+
+ Controls the largest I/O size in operations that combine I/O, and silently
+ limits the user-settable parameter io_combine_limit.
+ This parameter can only be set in
+ the postgresql.conf file or on the server
+ command line.
+ The default is 128kB.
+
+   
+  
+
   
io_combine_limit (integer)

@@ -2633,7 +2651,10 @@ include_dir 'conf.d'


 
- Controls the largest I/O size in operations that combine I/O.
+ Controls the largest I/O size in operations that combine I/O.  If set
+ higher than the io_max_combine_limit parameter, the
+ smaller value will silently be used instead, so both may need to be raised
+ to increase the I/O size.
  The default is 128kB.
 

diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index 4ad6e236d69..f550a3c0c63 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -1156,6 +1156,24 @@ assign_maintenance_io_concurrency(int newval, void *extra)
 #endif
 }
 
+/*
+ * GUC assign hooks that recompute io_combine_limit whenever
+ * io_combine_limit_guc and io_max_combine_limit are changed.  These are needed
+ * because the GUC subsystem doesn't support dependencies between GUCs, and
+ * they may be assigned in either order.
+ */
+void
+assign_io_max_combine_limit(int newval, void *extra)
+{
+	io_max_combine_limit = newval;
+	io_combine_limit = Min(io_max_combine_limit, io_combine_limit_guc);
+}
+void
+assign_io_combine_limit(int newval,

Re: Fix couple of typos

2025-03-17 Thread vignesh C
On Mon, 17 Mar 2025 at 16:57, Ashutosh Bapat
 wrote:
>
> On Mon, Mar 17, 2025 at 4:46 PM Amit Kapila  wrote:
> >
> > On Mon, Mar 17, 2025 at 3:44 PM vignesh C  wrote:
> > >
> > > While reviewing another patch, I found a couple of typos:
> > > 1) subid should have been pubid in AlterPublicationOwner and
> > > AlterPublicationOwner_oid functions.
> > > 2) Only tuples with XIDs/MXIDs older than the the
> > > FreezeLimit/MultiXactCutoff are frozen in the common case.
> > > Should have been"
> > > Only tuples with XIDs/MXIDs older than the FreezeLimit/MultiXactCutoff
> > > are frozen in the common case.
> > >
> > > The attached patch has the changes for the same.
> > >
> >
> > The changes look good to me.
>
> +1
>
> However they look unrelated. We should split them in two commits
> s/subid/pubid/ to one and s/the the/the/ to another. I also verified
> that this is the only occurrence of "the the" in the code base.

Thanks, the updated v2 has the changes for the same.

Regards,
Vignesh
From 8c1eb7d8962e4c7b06c2c7516746a49bfa368b4c Mon Sep 17 00:00:00 2001
From: Vignesh 
Date: Tue, 18 Mar 2025 08:15:30 +0530
Subject: [PATCH v2 1/2] Fix typo

Double the in comments.
---
 src/backend/access/heap/vacuumlazy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3b91d02605a..b1fbfca087e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -534,7 +534,7 @@ heap_vacuum_eager_scan_setup(LVRelState *vacrel, VacuumParams *params)
 	 * Tuples with XIDs older than OldestXmin or MXIDs older than OldestMxact
 	 * are technically freezable, but we won't freeze them unless the criteria
 	 * for opportunistic freezing is met. Only tuples with XIDs/MXIDs older
-	 * than the the FreezeLimit/MultiXactCutoff are frozen in the common case.
+	 * than the FreezeLimit/MultiXactCutoff are frozen in the common case.
 	 *
 	 * So, as a heuristic, we wait until the FreezeLimit has advanced past the
 	 * relfrozenxid or the MultiXactCutoff has advanced past the relminmxid to
-- 
2.43.0

From 7ff3e6a120d985ab87369d6ef530025b56b10145 Mon Sep 17 00:00:00 2001
From: Vignesh 
Date: Tue, 18 Mar 2025 08:19:34 +0530
Subject: [PATCH v2 2/2] Fix typo

subid should be pubid.
---
 src/backend/commands/publicationcmds.c | 12 ++--
 src/include/commands/publicationcmds.h |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index 3091d36ce98..0b23d94c38e 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -2052,7 +2052,7 @@ AlterPublicationOwner_internal(Relation rel, HeapTuple tup, Oid newOwnerId)
 ObjectAddress
 AlterPublicationOwner(const char *name, Oid newOwnerId)
 {
-	Oid			subid;
+	Oid			pubid;
 	HeapTuple	tup;
 	Relation	rel;
 	ObjectAddress address;
@@ -2068,11 +2068,11 @@ AlterPublicationOwner(const char *name, Oid newOwnerId)
  errmsg("publication \"%s\" does not exist", name)));
 
 	pubform = (Form_pg_publication) GETSTRUCT(tup);
-	subid = pubform->oid;
+	pubid = pubform->oid;
 
 	AlterPublicationOwner_internal(rel, tup, newOwnerId);
 
-	ObjectAddressSet(address, PublicationRelationId, subid);
+	ObjectAddressSet(address, PublicationRelationId, pubid);
 
 	heap_freetuple(tup);
 
@@ -2085,19 +2085,19 @@ AlterPublicationOwner(const char *name, Oid newOwnerId)
  * Change publication owner -- by OID
  */
 void
-AlterPublicationOwner_oid(Oid subid, Oid newOwnerId)
+AlterPublicationOwner_oid(Oid pubid, Oid newOwnerId)
 {
 	HeapTuple	tup;
 	Relation	rel;
 
 	rel = table_open(PublicationRelationId, RowExclusiveLock);
 
-	tup = SearchSysCacheCopy1(PUBLICATIONOID, ObjectIdGetDatum(subid));
+	tup = SearchSysCacheCopy1(PUBLICATIONOID, ObjectIdGetDatum(pubid));
 
 	if (!HeapTupleIsValid(tup))
 		ereport(ERROR,
 (errcode(ERRCODE_UNDEFINED_OBJECT),
- errmsg("publication with OID %u does not exist", subid)));
+ errmsg("publication with OID %u does not exist", pubid)));
 
 	AlterPublicationOwner_internal(rel, tup, newOwnerId);
 
diff --git a/src/include/commands/publicationcmds.h b/src/include/commands/publicationcmds.h
index e41df6db038..f90cf1ef896 100644
--- a/src/include/commands/publicationcmds.h
+++ b/src/include/commands/publicationcmds.h
@@ -29,7 +29,7 @@ extern void RemovePublicationRelById(Oid proid);
 extern void RemovePublicationSchemaById(Oid psoid);
 
 extern ObjectAddress AlterPublicationOwner(const char *name, Oid newOwnerId);
-extern void AlterPublicationOwner_oid(Oid subid, Oid newOwnerId);
+extern void AlterPublicationOwner_oid(Oid pubid, Oid newOwnerId);
 extern void InvalidatePublicationRels(List *relids);
 extern bool pub_rf_contains_invalid_column(Oid pubid, Relation relation,
 		   List *ancestors, bool pubviaroot);
-- 
2.43.0



Re: Support a wildcard in backtrace_functions

2025-03-17 Thread Jelte Fennema-Nio
On Wed, 26 Feb 2025 at 16:40, Tom Lane  wrote:
> If there's no place where
> that can plausibly be done, maybe you don't need the typedef after
> all?

Makes sense, I removed the unused LogBacktraceVerbosity typedef now.
It caused a rebase conflict just now, so this also solves that.
From 579ac166ab91f5b5e7c2d79d2c9a7f5e4a52fc6e Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Tue, 18 Mar 2025 01:31:57 +0100
Subject: [PATCH v12] Allow logging backtraces in more cases

We previously only had the backtrace_functions GUC to control when
backtraces were logged. Based on mailinglist discussion there were two
common cases that people wanted backtraces that were not covered by this
GUC though:

1. Logging backtraces for all internal errors
2. Logging backtraces for all errors

To support those two usecases, as well as to allow users to continue to
log backtraces for specific warnings using `backtrace_functions`, this
modifies the GUCs in the following way:

1. Adds a `log_backtrace` GUC, which can be set to `none` (default),
   `internal_error` and `all` to log different types of backtraces.
2. Change `backtrace_functions` to behave as an additional filter on top
   of `log_backtrace`. The empty string (the default) is now interpreted
   as doing no filtering based on function name.
3. Add a `backtrace_min_level` GUC, which limits for which log entries
   backtraces are written, based on their log level. This defaults to
   ERROR.

This does mean that setting `backtrace_functions=some_func` alone is not
enough to get backtraces for some_func. You now need to combine that
with `log_backtrace_mode=all` and if you want to get backtraces for
non-errors (which you previously got), you also need to change
backtrace_min_level to whichever level you want backtraces for.
---
 doc/src/sgml/config.sgml  | 82 +--
 src/backend/utils/error/elog.c| 30 ++-
 src/backend/utils/misc/guc_tables.c   | 50 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/utils/elog.h  |  8 ++
 src/include/utils/guc.h   |  1 +
 6 files changed, 161 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3d62c8bd274..da8cbb09d69 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7291,6 +7291,44 @@ local0.*/var/log/postgresql
   
  
 
+ 
+  log_backtrace_mode (boolean)
+  
+log_backtrace_mode configuration parameter
+  
+  
+  
+   
+If this parameter is set to all then for all log
+entries a backtrace is written to the server log together with the log
+message. If this parameter is set to internal_error then
+such a backtrace is only written for logs with error code XX000
+(internal error; see also ).
+This can be used to debug such internal errors (which should normally
+not happen in production). Finally, if this parameter is set to
+none (the default), no backtraces are ever written
+to the server log.
+   
+
+   
+The logs for which a backtrace is written can be further restricted
+using  (default:
+ERROR) and 
+(default: empty string, meaning all).
+   
+
+   
+Backtrace support is not available on all platforms, and the quality
+of the backtraces depends on compilation options.
+   
+
+   
+Only superusers and users with the appropriate SET
+privilege can change this setting.
+   
+  
+ 
+
  
   log_checkpoints (boolean)
   
@@ -11841,16 +11879,45 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
   

-This parameter contains a comma-separated list of C function names.
-If an error is raised and the name of the internal C function where
-the error happens matches a value in the list, then a backtrace is
-written to the server log together with the error message.  This can
-be used to debug specific areas of the source code.
+This parameter can contain a comma-separated list of C function names,
+which can be used to filter for which logs a backtrace is written to
+the server log.
+If a log entry is raised and the name of the
+internal C function where the error happens does not match any of the
+values in the list, then no backtrace is written to the server log.
+This can be used to only debug specific areas of the source code.

 

-Backtrace support is not available on all platforms, and the quality
-of the backtraces depends on compilation options.
+The empty string (the default) disables any such filtering. So for any
+logs that match both  and
+ a backtrace is
+written to the server log.
+   
+  
+ 

Re: Simplify the logic a bit (src/bin/scripts/reindexdb.c)

2025-03-17 Thread Ranier Vilela
Em sex., 7 de mar. de 2025 às 16:14, Ranier Vilela 
escreveu:

>
>
> Em sex., 7 de mar. de 2025 às 16:01, Álvaro Herrera <
> alvhe...@alvh.no-ip.org> escreveu:
>
>> On 2025-Mar-07, Álvaro Herrera wrote:
>>
>> > Anyway, my version of this is attached.  It fixes the problems with your
>> > patch, but not Orlov's fundamental bug.
>>
>> And of course I forgot to actually attach the patch.  Good grief.
>>
> Test with your v4 patch, on Windows 64 bits.
>
Rebased against 682c5be


Tests:
reindexdb -U postgres -d postgres -j4 --echo -i foo1 -i foo2 -i bar1 -i
bar2 -i baz1 -i baz2 -i baz3 -i baz4 | grep REINDEX
File STDIN:
REINDEX INDEX public.baz4;
REINDEX INDEX public.baz1;
REINDEX INDEX public.baz2;
REINDEX INDEX public.baz3;
REINDEX INDEX public.foo1;
REINDEX INDEX public.foo2;
REINDEX INDEX public.bar1;
REINDEX INDEX public.bar2;

best regards,
Ranier Vilela


v5-try-but-fail-to-unbreak-reindexdb.patch
Description: Binary data


RE: long-standing data loss bug in initial sync of logical replication

2025-03-17 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

Attached patch set contains proper commit message. It briefly describes the 
background
and handlings. Regarding the PG13, the same commit message is used for 0001.
0002 is still rough.

Renamed backpatches to .txt, to make cfbot happy.

Thanks Hou working for it.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


From 09c4c9482290a126f716482132813c9b0c09abc8 Mon Sep 17 00:00:00 2001
From: Shlok Kyal 
Date: Fri, 23 Aug 2024 14:02:20 +0530
Subject: [PATCH v20_REL_13 1/2] Fix data loss in logical replication

Previously, logical replication could lose data if one user modified a
publication to add a table while another user concurrently modified that table
and commit later than the publication modification transaction. The issue
arised during the decoding of transactions modifying the table: if the initial
catalog cache was built using a snapshot taken before the publication DDL
execution, all subsequent changes to that table were decoded with outdated
catalog cache, which caused them to be filtered from replication. This happened
because invalidation messages were only present in the publication modification
transaction, which was decoded before these subsequent changes.

This issue is not limited to publication DDLs; similar problems can occur with
ALTER TYPE statements executed concurrently with DMLs, leading to incorrect
decoding under outdated type contexts.

To address this, the commit improves logical decoding by ensuring that
invalidation messages from catalog-modifying transactions are distributed to
all concurrent in-progress transactions. This allows the necessary rebuild of
the catalog cache when decoding new changes, similar to handling historic
catalog snapshots (see SnapBuildDistributeNewCatalogSnapshot()).

Following this change, some performance regression is observed, primarily
during frequent execution of publication DDL statements that modify published
tables. This is an expected trade-off due to cache rebuild and distribution
overhead. The regression is minor or nearly nonexistent when DDLs do not affect
published tables or occur infrequently, making this a worthwhile cost to
resolve a longstanding data loss issue.

An alternative approach considered was to take a strong lock on each affected
table during publication modification. However, this would only address issues
related to publication DDLs and require locking every relation in the database
for publications created as FOR ALL TABLES, which is impractical. Thus, this
commit chooses to distribute invalidation messages as outlined above.
---
 contrib/test_decoding/Makefile|  2 +-
 .../expected/invalidation_distrubution.out| 20 ++
 .../specs/invalidation_distrubution.spec  | 32 ++
 .../replication/logical/reorderbuffer.c   | 64 ---
 src/backend/replication/logical/snapbuild.c   | 63 ++
 src/include/replication/reorderbuffer.h   |  4 ++
 6 files changed, 164 insertions(+), 21 deletions(-)
 create mode 100644 contrib/test_decoding/expected/invalidation_distrubution.out
 create mode 100644 contrib/test_decoding/specs/invalidation_distrubution.spec

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 735b7e7653c..f122dc3a82d 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -8,7 +8,7 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
spill slot truncate
 ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
oldest_xmin snapshot_transfer subxact_without_top 
catalog_change_snapshot \
-   skip_snapshot_restore
+   skip_snapshot_restore invalidation_distrubution
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/expected/invalidation_distrubution.out 
b/contrib/test_decoding/expected/invalidation_distrubution.out
new file mode 100644
index 000..eb70eda9042
--- /dev/null
+++ b/contrib/test_decoding/expected/invalidation_distrubution.out
@@ -0,0 +1,20 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1_insert_tbl1 s1_begin s1_insert_tbl1 
s2_alter_pub_add_tbl s1_commit s1_insert_tbl1 s2_get_binary_changes
+step s1_insert_tbl1: INSERT INTO tbl1 (val1, val2) VALUES (1, 1);
+step s1_begin: BEGIN;
+step s1_insert_tbl1: INSERT INTO tbl1 (val1, val2) VALUES (1, 1);
+step s2_alter_pub_add_tbl: ALTER PUBLICATION pub ADD TABLE tbl1;
+step s1_commit: COMMIT;
+step s1_insert_tbl1: INSERT INTO tbl1 (val1, val2) VALUES (1, 1);
+step s2_get_binary_changes: SELECT count(data) FROM 
pg_logical_slot_get_binary_changes('isolation_slot', NULL, NULL, 
'proto_version', '1', 'publication_names', 'pub') WHERE get_byte(data, 0) = 73;
+count
+-
+1
+(1 row)
+
+?column?
+
+stop
+(1 row)
+
diff --git a/contrib/test_decoding/specs/invalidation_distrubution.spec 
b/contrib/test_decodin

Re: Forbid to DROP temp tables of other sessions

2025-03-17 Thread Daniil Davydov
Hi,

On Tue, Mar 18, 2025 at 2:36 AM David G. Johnston
 wrote:
>
> "I want to give a better error message" is not a good enough reason to change 
> this long-standing behavior in a back-patchable bug fix.
> and I do not agree that it is an improvement worth making in HEAD.

OK, In this case I'd rather agree.


On Tue, Mar 18, 2025 at 3:30 AM David G. Johnston
 wrote:
>
> I sound like a broken record but this is a bug fix; introducing a GUC and 
> changing unrelated behaviors is not acceptable in a back-patch.
>
> Superusers have been able to drop the temporary tables of other sessions for 
> a long time - now is not the place to change that behavior.

Well, we can keep this option for the superuser, but we need to
explicitly indicate that we are inside the DROP operation.
Please see v5 patch. It is a bit experimental :
1) The original behavior in the LookupExplicitNamespace function has
been returned.
2) New RVROption has been added to indicate that we are inside the
DROP operation. Thus, superuser can drop other temp tables.

>
> Separately:
>
> Is the general logic here that we assume r->relpersistence is 
> RELPERSISTENCE_PERMANENT until and unless we can prove it to be 
> RELPERSISTENCE_TEMP?
>
> I'm trying to figure out whether there is still an issue when dealing with an 
> unqualified name that would be found in the temporary schema.
> We've obviously already marked it permanent since we didn't have pg_temp in 
> the query to inform us otherwise.
> But if we are changing permanent to temporary later because of search_path 
> resolution then why did we have to get it right in the case where pg_temp is 
> specified?
> I question whether "persistence" is something that gram.y should be dealing 
> with at all.  Shouldn't that property be solely determined in post-parse 
> analysis via catalog lookup?

Hm, let's dive into this question. Why do I consider the original
behavior to be incorrect?

1)
If schema is not specified, then gram.y marks the table as PERMANENT.
1.1)
If we are looking for our temporary table, then "pg_temp_N" is present
in search_path, and RelnameGetRelid will return us this table.
1.2)
If we are looking for other temporary table, then RelnameGetRelid will
return `InvalidOid` and we will get a "relation not found" error.

2)
If schema is specified, then gram.y ALSO marks the table as PERMANENT.
2.1)
If we are looking for our temporary table, then
LookupExplicitNamespace will return us MyTempNamespace and we can get
our table without search_path lookup.
2.2)
If we are looking for other temporary table, then
LookupExplicitNamespace will return some valid oid, and we can get
other temp table without search_path lookup. Then, we will perform any
specified operation, assuming that we are working with a PERMANENT
table. This is a bug.

Let's summarize. If schema is not specified, we can safely mark the
table as PERMANENT, because we always can get the table from
search_path (and if the schema is not specified, then it is obvious
that the user wants to refer specifically to his table).
BUT, if schema is specified, we must know that the given relation is
TEMP before calling RangeVarGetRelidExtended (otherwise, there will be
an error, which I wrote about in paragraph 2.2).

>
> IOW, the original code here seems incorrect if this is the definitive place 
> to determine relpersistence.  in "case 1:", where there is no schema name, 
> one cannot deduce relpersistence and it should remain NULL, IMO (not knowing 
> what that might break...).  The code this patch replaces is wrong for the 
> same reason.

I hope that I managed to clarify this issue. As far as "pg_temp_"
prefix is reserved by the postgres kernel, we can definitely say that
we have encountered a temporary table when we see this prefix. IMO
there is no problem, that gram.y will do it.

>
> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
> index 271ae26cbaf..f68948d475c 100644
> --- a/src/backend/parser/gram.y
> +++ b/src/backend/parser/gram.y
> @@ -19424,7 +19424,11 @@ makeRangeVarFromAnyName(List *names, int position, 
> core_yyscan_t yyscanner)
>   break;
>   }
>
> - r->relpersistence = RELPERSISTENCE_PERMANENT;
> + if (r->schemaname && strncmp(r->schemaname, "pg_temp", 7) == 0)
> + r->relpersistence = RELPERSISTENCE_TEMP;
> + else
> + r->relpersistence = RELPERSISTENCE_PERMANENT;
> +
>   r->location = position;
>
> The qualified name variant is fine since r->schemaname must be present by 
> definition.  Passing through the same if block, once with schemaname null (in 
> makeRangeVar) and once with it populated (end of 
> makeRangeVarFromQualifiedName) is a bit annoying.
>
> makeRangeVar has the same problem, assuming permanent when the schemaname 
> argument is null.

Thank you for noticing it. I suggest we first confirm that the logic
(with persistence definition) remains in gram.y and then fix this
problem.

--
Best regards,
Daniil Davydov
From 137a768b94bbe29a03ca8e527b9b483da5d8 Mon Sep 17 00:00:00 200

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

2025-03-17 Thread Matthias van de Meent
On Tue, 11 Mar 2025 at 16:53, Peter Geoghegan  wrote:
>
> On Sat, Mar 8, 2025 at 11:43 AM Peter Geoghegan  wrote:
> > I plan on committing this one soon. It's obviously pretty pointless to
> > make the BTMaxItemSize operate off of a page header, and not requiring
> > it is more flexible.
>
> Committed. And committed a revised version of "Show index search count
> in EXPLAIN ANALYZE" that addresses the issues with non-parallel-aware
> index scan executor nodes that run from a parallel worker.
>
> Attached is v28. This is just to keep the patch series applying
> cleanly -- no real changes here.

You asked off-list for my review of 0003. I'd already reviewed 0001
before that, so that review also included. I'll see if I can spend
some time on the other patches too, but for 0003 I think I got some
good consistent feedback.

0001:

> src/backend/access/nbtree/nbtsearch.c
> _bt_readpage

This hasn't changed meaningfully in this patch, but I noticed that
pstate.finaltup is never set for the final page of the scan direction
(i.e. P_RIGHTMOST or P_LEFTMOST for forward or backward,
respectively). If it isn't used more than once after the first element
of non-P_RIGHTMOST/LEFTMOST pages, why is it in pstate? Or, if it is
used more than once, why shouldn't it be used in

Apart from that, 0001 looks good to me.

0003:

> _bt_readpage

In forward scan mode, recovery from forcenonrequired happens after the
main loop over all page items. In backward mode, it's in the loop:

> +if (offnum == minoff && pstate.forcenonrequired)
> +{
> +Assert(so->skipScan);

I think there's a comment missing that details _why_ we do this;
probably something like:

/*
 * We're about to process the final item on the page.
 * Un-set forcenonrequired, so the next _bt_checkkeys will
 * evaluate required scankeys and signal an end to this
 * primitive scan if we've reached a stopping point.
 */

In line with that, could you explain a bit more about the
pstate.forcenonrequired optimization? I _think_ it's got something to
do with "required" scankeys adding some overhead per scankey, which
can be significant with skipscan evaluations and ignoring the
requiredness can thus save some cycles, but the exact method doesn't
seem to be very well articulated.

> _bt_skip_ikeyprefix

I _think_ it's worth special-casing firstchangingattnum=1, as in that
case we know in advance there is no (immediate) common ground between
the index tuples and thus any additional work we do towards parsing
the scankeys would be wasted - except for matching inequality bounds
for firstchangingatt, or matching "open" skip arrays for a prefix of
attributes starting at firstchangingattnum (as per the
array->null_elem case).

I also notice somed some other missed opportunities for optimizing
page accesses:

> +if (key->sk_strategy != BTEqualStrategyNumber)

The code halts optimizing "prefix prechecks" when we notice a
non-equality key. It seems to me that we can do the precheck on shared
prefixes with non-equality keys just the same as with equality keys;
and it'd improve performance in those cases, too.

> +if (!(key->sk_flags & SK_SEARCHARRAY))
> +if (key->sk_attno < firstchangingattnum)
> +{
> +if (result == 0)
> +continue;/* safe, = key satisfied by every tuple */
> +}
> +break;/* pstate.ikey to be set to scalar key's 
> ikey */

This code finds out that no tuple on the page can possibly match the
scankey (idxtup=scalar returns non-0 value) but doesn't (can't) use it
to exit the scan. I think that's a missed opportunity for
optimization; now we have to figure that out for every tuple in the
scan. Same applies to the SAOP -array case (i.e. non-skiparray).

Thank you for working on this.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: Statistics Import and Export

2025-03-17 Thread Nathan Bossart
On Mon, Mar 17, 2025 at 07:24:46PM -0400, Corey Huinker wrote:
> On Mon, Mar 17, 2025 at 10:24 AM Nathan Bossart 
> wrote:
>> I'm assuming that writing a completely different TOC on the second pass
>> would corrupt the dump file.  Perhaps we could teach it to skip stats
>> entries on the second pass or something, but I'm not too wild about adding
>> to the list of invasive changes we're making last-minute for v18.
> 
> I'm confused, are they needed in both places?

AFAICT yes.  The second pass rewrites the TOC to udpate the data offset
information.  If we wrote a different TOC the second time around, then the
dump file would be broken, right?

/*
 * If possible, re-write the TOC in order to update the data 
offset
 * information.  This is not essential, as pg_restore can cope 
in most
 * cases without it; but it can make pg_restore significantly 
faster
 * in some situations (especially parallel restore).
 */
if (ctx->hasSeek &&
fseeko(AH->FH, tpos, SEEK_SET) == 0)
WriteToc(AH);

-- 
nathan




Re: Separate GUC for replication origins

2025-03-17 Thread Euler Taveira
On Mon, Mar 17, 2025, at 8:44 PM, Masahiko Sawada wrote:
> I would suggest putting the new max_active_replication_origins after
> max_parallel_apply_workers_per_subscription as both
> max_sync_workers_per_subscription and
> max_parallel_apply_workers_per_subscription are related to
> max_logical_replication_workers.

Good point. Looking at the documentation, the old max_replication_slots
parameter was the first one in that section so I decided to use the same order
for the postgresql.conf.sample.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From bdd76cf3698d61eab455cd327de63c87769ef368 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Tue, 3 Sep 2024 12:10:20 -0300
Subject: [PATCH v7 1/2] Separate GUC for replication origins

The new GUC (max_active_replication_origins) defines the maximum number
of active replication origins. The max_replication_slots was used for
this purpose but it is confusing (when you are learning about logical
replication) and introduces a limitation (you cannot have a small number
of replication slots and a high number of subscriptions).

For backward compatibility, the default is -1, indicating that the value
of max_replication_slots is used instead.
---
 doc/src/sgml/config.sgml  | 27 ++
 doc/src/sgml/logical-replication.sgml |  8 +-
 doc/src/sgml/ref/pg_createsubscriber.sgml |  2 +-
 src/backend/replication/logical/launcher.c|  6 +-
 src/backend/replication/logical/origin.c  | 92 ---
 src/backend/utils/misc/guc_tables.c   | 12 +++
 src/backend/utils/misc/postgresql.conf.sample |  3 +
 src/bin/pg_basebackup/pg_createsubscriber.c   | 20 ++--
 .../t/040_pg_createsubscriber.pl  |  4 +-
 src/bin/pg_upgrade/check.c| 18 ++--
 src/bin/pg_upgrade/t/004_subscription.pl  | 17 ++--
 src/include/replication/origin.h  |  3 +
 12 files changed, 120 insertions(+), 92 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3d62c8bd274..5371324c369 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4376,13 +4376,6 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
  to replica or higher to allow replication slots to
  be used.
 
-
-
- Note that this parameter also applies on the subscriber side, but with
- a different meaning. See 
- in  for more
- details.
-

   
 
@@ -5128,10 +5121,10 @@ ANY num_sync ( 
-  max_replication_slots (integer)
+ 
+  max_active_replication_origins (integer)

-max_replication_slots configuration parameter
+max_active_replication_origins configuration parameter
 in a subscriber

   
@@ -5143,18 +5136,14 @@ ANY num_sync ( pg_replication_origin_status)
-will prevent the server from starting.
-max_replication_slots must be set to at least the
+will prevent the server from starting. It defaults to -1, indicating
+that the value of  should be
+used instead. This parameter can only be set at server start.
+
+max_active_replication_origins must be set to at least the
 number of subscriptions that will be added to the subscriber, plus some
 reserve for table synchronization.

-
-   
-Note that this parameter also applies on a sending server, but with
-a different meaning. See 
-in  for more
-details.
-   
   
  
 
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 3d18e507bbc..0b86ccc4907 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -2371,9 +2371,7 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
 
   
Logical replication requires several configuration options to be set. Most
-   options are relevant only on one side of the replication. However,
-   max_replication_slots is used on both the publisher and
-   the subscriber, but it has a different meaning for each.
+   options are relevant only on one side of the replication.
   
 
   
@@ -2413,7 +2411,7 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
Subscribers
 

-max_replication_slots
+max_active_replication_origins
 must be set to at least the number of subscriptions that will be added to
 the subscriber, plus some reserve for table synchronization.

@@ -2580,7 +2578,7 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
 
  
   The new cluster must have
-  max_replication_slots
+  max_active_replication_origins
   configured to a value greater than or equal to the number of
   subscriptions present in the old cluster.
  
diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscrib

Re: dsm_registry: Add detach and destroy features

2025-03-17 Thread Sungwoo Chang
Regression test failed because I didn't add an extra new line in the
expected file.

2025년 3월 17일 (월) 오전 9:55, Sungwoo Chang 님이 작성:
>
> There was a minor typo in the test module. I built and ran the
> test_dsm_registry extension before submitting the patch but perhaps it
> got mixed up with an older version.
From ef58c9f26a73f8b3714ad02053749d2799b9fd4f Mon Sep 17 00:00:00 2001
From: swchangdev 
Date: Fri, 14 Mar 2025 10:00:12 +0900
Subject: [PATCH] Add detach and destroy feature to dsm_registry

---
 src/backend/storage/ipc/dsm_registry.c| 167 +-
 src/include/storage/dsm_registry.h|   8 +
 .../expected/test_dsm_registry.out|  12 ++
 .../sql/test_dsm_registry.sql |   2 +
 .../test_dsm_registry--1.0.sql|   6 +
 .../test_dsm_registry/test_dsm_registry.c |  28 +++
 6 files changed, 222 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/ipc/dsm_registry.c b/src/backend/storage/ipc/dsm_registry.c
index 1d4fd31ffed..7915b70e47d 100644
--- a/src/backend/storage/ipc/dsm_registry.c
+++ b/src/backend/storage/ipc/dsm_registry.c
@@ -47,6 +47,12 @@ typedef struct DSMRegistryEntry
 	size_t		size;
 } DSMRegistryEntry;
 
+typedef struct DSMDetachCallbackContext
+{
+	void (*on_detach_callback) (void *);
+	void *arg;
+} DSMDetachCallbackContext;
+
 static const dshash_parameters dsh_params = {
 	offsetof(DSMRegistryEntry, handle),
 	sizeof(DSMRegistryEntry),
@@ -121,6 +127,36 @@ init_dsm_registry(void)
 	LWLockRelease(DSMRegistryLock);
 }
 
+static void
+call_on_detach_callback(dsm_segment *seg, Datum arg)
+{
+	char *ptr = DatumGetPointer(arg);
+	DSMDetachCallbackContext *cb_ctx = (DSMDetachCallbackContext *)ptr;
+	cb_ctx->on_detach_callback(cb_ctx->arg);
+}
+
+static void
+detach_dsm_segment(dsm_handle handle, void (*detach_cb) (void *), void *arg)
+{
+	dsm_segment *seg = dsm_find_mapping(handle);
+
+	/* Detach the segment only if it is already attached */
+	if (seg != NULL)
+	{
+		if (detach_cb != NULL)
+		{
+			DSMDetachCallbackContext *cb_ctx = palloc(sizeof(DSMDetachCallbackContext));
+			cb_ctx->on_detach_callback = detach_cb;
+			cb_ctx->arg = arg;
+
+			on_dsm_detach(seg, call_on_detach_callback, PointerGetDatum(cb_ctx));
+		}
+
+		dsm_unpin_mapping(seg);
+		dsm_detach(seg);
+	}
+}
+
 /*
  * Initialize or attach a named DSM segment.
  *
@@ -172,6 +208,8 @@ GetNamedDSMSegment(const char *name, size_t size,
 	}
 	else if (entry->size != size)
 	{
+		dshash_release_lock(dsm_registry_table, entry);
+		MemoryContextSwitchTo(oldcontext);
 		ereport(ERROR,
 (errmsg("requested DSM segment size does not match size of "
 		"existing segment")));
@@ -185,8 +223,11 @@ GetNamedDSMSegment(const char *name, size_t size,
 		{
 			seg = dsm_attach(entry->handle);
 			if (seg == NULL)
+			{
+dshash_release_lock(dsm_registry_table, entry);
+MemoryContextSwitchTo(oldcontext);
 elog(ERROR, "could not map dynamic shared memory segment");
-
+			}
 			dsm_pin_mapping(seg);
 		}
 
@@ -198,3 +239,127 @@ GetNamedDSMSegment(const char *name, size_t size,
 
 	return ret;
 }
+
+/*
+ * Detach a named DSM segment
+ *
+ * This routine detaches the DSM segment. If the DSM segment was not attached
+ * by this process, then the routine just returns. on_detach_callback is passed
+ * on to dsm_segment by calling on_dsm_detach for the corresponding dsm_segment
+ * before actually detaching.
+ */
+void
+DetachNamedDSMSegment(const char *name, size_t size,
+	  void (*on_detach_callback) (void *), void *arg)
+{
+	DSMRegistryEntry *entry;
+	MemoryContext oldcontext;
+
+	if (!name || *name == '\0')
+		ereport(ERROR,
+(errmsg("DSM segment name cannot be empty")));
+
+	if (strlen(name) >= offsetof(DSMRegistryEntry, handle))
+		ereport(ERROR,
+(errmsg("DSM segment name too long")));
+
+	if (size == 0)
+		ereport(ERROR,
+(errmsg("DSM segment size must be nonzero")));
+
+	/* Be sure any local memory allocated by DSM/DSA routines is persistent. */
+	oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+
+	/* Connect to the registry. */
+	init_dsm_registry();
+
+	entry = dshash_find(dsm_registry_table, name, true);
+
+	if (entry == NULL)
+	{
+		MemoryContextSwitchTo(oldcontext);
+		ereport(ERROR,
+(errmsg("cannot detach a DSM segment that does not exist")));
+	}
+
+	if (entry->size != size)
+	{
+		dshash_release_lock(dsm_registry_table, entry);
+		MemoryContextSwitchTo(oldcontext);
+		ereport(ERROR,
+(errmsg("requested DSM segment size does not match size of "
+		"existing segment")));
+	}
+
+	detach_dsm_segment(entry->handle, on_detach_callback, arg);
+
+	dshash_release_lock(dsm_registry_table, entry);
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Attempt to destroy a named DSM segment
+ *
+ * This routine attempts to destroy the DSM segment. We unpin the dsm_segment
+ * and delete the entry from dsm_registry_table. This may not destroy the
+ * dsm_segment instantly, but it would die out once all the other processes
+ 

Re: Vacuum statistics

2025-03-17 Thread Alena Rybakina


On 17.03.2025 09:42, vignesh C wrote:

On Thu, 27 Feb 2025 at 23:30, Alena Rybakina  wrote:

Hi!
On 17.02.2025 17:46, Alena Rybakina wrote:

On 04.02.2025 18:22, Alena Rybakina wrote:

Hi! Thank you for your review!

On 02.02.2025 23:43, Alexander Korotkov wrote:

On Mon, Jan 13, 2025 at 3:26 PM Alena Rybakina
  wrote:

I noticed that the cfbot is bad, the reason seems to be related to
the lack of a parameter in
src/backend/utils/misc/postgresql.conf.sample. I added it, it
should help.

The patch doesn't apply cleanly.  Please rebase.

I rebased them.

The patch needed a rebase again. There is nothing new since version
18, only a rebase.

The patch needed a new rebase.

I noticed that the CI failure reported at [1], Ilia's comment from
[2], changed the status to Waiting on Author, please address them and
update it to Needs review.
[1] 
-https://www.postgresql.org/message-id/CALdSSPiw_-0_L3YV%3DQn7oopPqY2XVrXwDSGLdSXS69QvMdXisQ%40mail.gmail.com
[2] 
-https://www.postgresql.org/message-id/47a7b784-5218-43f2-96e3-65f9a729c5a5%40tantorlabs.com

Okay, thank you!

--
Regards,
Alena Rybakina
Postgres Professional


Re: Fix couple of typos

2025-03-17 Thread Ashutosh Bapat
On Tue, Mar 18, 2025 at 8:49 AM vignesh C  wrote:
>
> On Mon, 17 Mar 2025 at 16:57, Ashutosh Bapat
>  wrote:
> >
> > On Mon, Mar 17, 2025 at 4:46 PM Amit Kapila  wrote:
> > >
> > > On Mon, Mar 17, 2025 at 3:44 PM vignesh C  wrote:
> > > >
> > > > While reviewing another patch, I found a couple of typos:
> > > > 1) subid should have been pubid in AlterPublicationOwner and
> > > > AlterPublicationOwner_oid functions.
> > > > 2) Only tuples with XIDs/MXIDs older than the the
> > > > FreezeLimit/MultiXactCutoff are frozen in the common case.
> > > > Should have been"
> > > > Only tuples with XIDs/MXIDs older than the FreezeLimit/MultiXactCutoff
> > > > are frozen in the common case.
> > > >
> > > > The attached patch has the changes for the same.
> > > >
> > >
> > > The changes look good to me.
> >
> > +1
> >
> > However they look unrelated. We should split them in two commits
> > s/subid/pubid/ to one and s/the the/the/ to another. I also verified
> > that this is the only occurrence of "the the" in the code base.
>
> Thanks, the updated v2 has the changes for the same.

LGTM. I would mention the function name, where the typos are fixed, in
the commit message. But that can be done at the time of commit.

--
Best Wishes,
Ashutosh Bapat




Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-03-17 Thread Shubham Khanna
On Tue, Mar 18, 2025 at 4:31 AM Euler Taveira  wrote:
>
> On Mon, Mar 17, 2025, at 9:34 AM, Shubham Khanna wrote:
>
> I have incorporated the "--remove/-r" parameter in the attached patch,
> as it seems more intuitive and straightforward for users.
> The attached patch contains the latest changes.
>
>
> There were a lot of discussion around the single vs multiple options since my
> last review [1] so I'm answering some of the questions here.
>
> Due to the nature of removing multiple objects, I would say that a general
> option that has a value and can be informed multiple times is the way to go. I
> saw that the discussion led to this. Regarding the name, my preference is
> --drop since we already have other binaries with similar options 
> (pg_receivewal
> and pg_recvlogical have --drop-slot).
>
> The commit message needs some adjustments. There are redundant information 
> (1st
> and last paragraph).
>

Fixed.

> +   publications. This option is useful when
> +   transitioning from streaming replication to logical replication as
> +   existing objects may no longer be needed. Before using this option,
>
> Use physical replication. That's what we used in the current
> pg_createsubscriber documentation and it is also the terminology referred in
> the glossary (see Replication).
>

Fixed.

> booltwo_phase;  /* enable-two-phase option */
> +   uint32  remove_objects; /* flag to remove objects on subscriber */
>
> uint32 -> bits32.
>

Fixed.

> -static void setup_subscriber(struct LogicalRepInfo *dbinfo,
> -const char *consistent_lsn);
> +static void setup_subscriber(const char *consistent_lsn);
>
> This is unrelated to this patch.
>

Fixed.

> - * replication setup.
> + * replication setup. If 'remove' parameter is specified, existing 
> publications
> + * on the subscriber will be dropped before creating new subscriptions.
>   */
> static void
> -setup_subscriber(struct LogicalRepInfo *dbinfo, const char *consistent_lsn)
> +setup_subscriber(const char *consistent_lsn)
>
> There is no parameter 'remove' in this function. I understand that you want to
> provide information about the remove option but it is not the right place to
> add it. If for some reason you need to change or remove such parameter, it is
> easier to left this comment because it is not near the place you are removing
> some lines of code.
>

Fixed.

> +   struct LogicalRepInfo *dbinfo = &dbinfos.dbinfo[i];
>
> /* Connect to subscriber. */
> -   conn = connect_database(dbinfo[i].subconninfo, true);
> +   conn = connect_database(dbinfo->subconninfo, true);
>
> This new dbinfo pointer is just a way to increase your patch size without
> improving readability.
>

Fixed.

> -   drop_publication(conn, &dbinfo[i]);
> +   if (dbinfos.remove_objects & OBJECT_PUBLICATIONS)
> +   drop_all_publications(conn, dbinfo);
> +   else
> +   drop_publication(conn, dbinfo, dbinfo->pubname);
>
> At first glance, I didn't like this change. You inform dbinfo->pubname as a 
> 3rd
> parameter but the 2nd parameter is dbinfo. After reading
> drop_all_publication(), I realized that's the cause for this change. Is there 
> a
> better way to do it?
>

I see your point. The use of dbinfo->pubname as the third parameter
while passing dbinfo as the second parameter does feel a bit
inconsistent. However, since drop_all_publications() relies on dbinfo
for context, this approach seemed necessary to keep the interface
consistent with drop_publication().
Currently, I do not have a better alternative that maintains clarity
and consistency, but I am open to suggestions if anyone has ideas to
improve this.

> +static void
> +drop_all_publications(PGconn *conn, struct LogicalRepInfo *dbinfo)
> +{
> +   PGresult   *res;
> +
>
> I would add an Assert(conn != NULL) here to follow the same pattern as the
> other functions.
>

Fixed.

> +   res = PQexec(conn, "SELECT pubname FROM pg_catalog.pg_publication;");
> +   if (PQresultStatus(res) != PGRES_TUPLES_OK)
> +   {
> +   pg_log_error("could not obtain publication information: %s",
> +PQresultErrorMessage(res));
> +   PQclear(res);
> +   return;
> +   }
>
> Shouldn't it disconnect_database() here? See the pattern in other functions
> that error out while querying the catalog.
>

Fixed.

> +   SimpleStringListCell *cell;
> +
> +   for (cell = opt.remove_objects.head; cell; cell = cell->next)
> +   {
>
> You could use SimpleStringListCell in the for loop without a separate 
> declaration.
>

Fixed.

> +   pg_log_error("wrong remove object is specified");
> +   pg_log_error_hint("Only \"publications\" can be removed.");
> +   exit(1);
>
> The main message sounds strange. Did you mean 'wrong object is specified'?
>

Fixed.

> +ok($node_s->safe_psql($db1, "SELECT COUNT(*) = 2 FROM pg_publication"),
> +   'two publications created before --remove is run');
>

Re: pg_recvlogical requires -d but not described on the documentation

2025-03-17 Thread David G. Johnston
On Monday, March 17, 2025, Fujii Masao  wrote:

>
> On 2025/03/12 14:59, David G. Johnston wrote:
>
>> On Monday, February 24, 2025, Hayato Kuroda (Fujitsu) <
>> kuroda.hay...@fujitsu.com > wrote:
>>
>>
>> OK, so I will exclude the point in the thread. The patch I've posted
>> contains all of fixes
>> which is required.
>>
>>
>> The patch fixes the synopsis and the mention of the default value.  Only
>> the later is required.  I cannot figure out a policy that would alter the
>> synopsis in the proposed manner.  I’d suggest leaving it alone for now and
>> tweak any of the option descriptions that may need clarification.
>>
>
> I agree that the synopsis doesn't need to be updated. Attached patch
> clarifies
> the required options for each action in the documentation. Thought?


Will look again tomorrow but seems ok aside from needing an editing pass.

The usage section for help probably needs a look as well.


>
> BTW, I'm curious why --dbname isn't required for the --drop-slot action.
> When I ran
> pg_recvlogical --drop-slot without --dbname, I got the following error:
>
>   pg_recvlogical: error: could not establish database-specific replication
> connection
>

That would be a bug.

I think this is too:

set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_basebackup"));

Though I’m not sure what it is doing.

David J.


Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-03-17 Thread Shubham Khanna
On Tue, Mar 18, 2025 at 5:48 AM David G. Johnston
 wrote:
>
> On Mon, Mar 17, 2025 at 4:01 PM Euler Taveira  wrote:
>>
>> On Mon, Mar 17, 2025, at 9:34 AM, Shubham Khanna wrote:
>>
>> I have incorporated the "--remove/-r" parameter in the attached patch,
>> as it seems more intuitive and straightforward for users.
>> The attached patch contains the latest changes.
>>
>>
>> There were a lot of discussion around the single vs multiple options since my
>> last review [1] so I'm answering some of the questions here.
>>
>
>>
>> Regarding the name, my preference is
>> --drop since we already have other binaries with similar options 
>> (pg_receivewal
>> and pg_recvlogical have --drop-slot).
>
>
> A short form seems desired here and we cannot use -d/-D.  Also, the "act and 
> quit" nature of the command-like options in those two client applications 
> leads me to believe that this server application modifier-like option, which 
> behaves differently than a simple "drop named object and return", should not 
> have the same naming as those others.
>
> We are not dropping named objects - the wording "Remove all given objects" is 
> incorrect.
>
> "Remove all objects of the specified type from specified databases on the 
> target server."
>
> "Multiple object types can be specified by writing multiple --remove 
> switches." (accepting switches instead of options pending bulk change)
>
> More changes of this sort are needed.
>
>>
>>
>> -   drop_publication(conn, &dbinfo[i]);
>> +   if (dbinfos.remove_objects & OBJECT_PUBLICATIONS)
>> +   drop_all_publications(conn, dbinfo);
>> +   else
>> +   drop_publication(conn, dbinfo, dbinfo->pubname);
>>
>> At first glance, I didn't like this change. You inform dbinfo->pubname as a 
>> 3rd
>> parameter but the 2nd parameter is dbinfo. After reading
>> drop_all_publication(), I realized that's the cause for this change. Is 
>> there a
>> better way to do it?
>
>
> I had the same impression.  I'm inclined to accept it as-is and let whoever 
> writes the next --remove object type implementation deal with cleaning it up. 
>  This is clear enough when talking only about publications since whether you 
> just remove the one or all of them the special one we created goes away.
>
>
>> +   pg_log_error("wrong remove object is specified");
>> +   pg_log_error_hint("Only \"publications\" can be removed.");
>> +   exit(1);
>> The main message sounds strange. Did you mean 'wrong object is specified'?
>
>
> Error: invalid object type "%s" specified for --remove
> Hint: The valid options are: "publications"
>
> (yes, plural for a single option is "wrong", but extensible...)
>
> If we want to just give the DBA the choice to say "all" now we could solve 
> two annoyances at once.
>
> The valid options are: "all", "publications"
>
> Should we be accepting these case-insensitively?
>

I have added validation for "all" to address both issues at once.
Additionally, the option parsing is now case-insensitive for better
usability.

The attached patch at [1] contains the suggested changes.

[1] - 
https://www.postgresql.org/message-id/CAHv8RjK8q%2BmWPWPh9K7CeH2tKF31vGn%2BoPV3qY7pdPCmtbjzkg%40mail.gmail.com

Thanks and regards,
Shubham Khanna.




Re: Next commitfest app release is planned for March 18th

2025-03-17 Thread Jelte Fennema-Nio
On Tue, 4 Mar 2025 at 02:21, Jelte Fennema-Nio  wrote:
>
> The next commitfest app release is planned for March 18th

I deployed the latest release of the commitfest app. Below is a
changelog that's slightly updated.

1. Major change: There's a new /me page which shows a dashboard of
open patches where you are author/reviewer/committer. These patches
are ordered & grouped in a hopefully useful way. Peter Geoghegan
suggested adding a "dashboard" of this kind. It's the first attempt
and can probably use some fine tuning based on feedback. So please try
it out and let me know what you think. I almost certainly won't have
time to make big changes to it in the next ~month though.
2. Show name of a committer in the "Committer" column (instead of only
the username).
3. Fix the "Review" form so that all checkboxes can actually be
clicked. Thanks to Maciek.
4. Allow sorting patches by "failing since", this can be done by
clicking the header. This will take a few days to work correctly,
since a "failing since" timestamp was not being collected before.
5. Remove the "latest activity" column. This did not contain useful
information, that "latest email" didn't already provide.
6. The "latest email" column now shows "time since" (e.g. 1 week ago)
by default instead of an exact timestamp. You can still see the exact
timestamp by hovering over the cell. If you prefer to see the exact
timestamp only, you can change this as a setting using the "edit
profile" link in the top right corner.
7. Searching patches by author/reviewer now isn't a dropdown with a
ton of options, but instead has become a dropdown with a search box.
This also greatly improves page load performance: By not putting all
users in the HTML as a dropdown option it's saving 600-700ms in my
testing.
8. Bugfix: Correctly show CI timeout as failure.




Re: RFC: Allow EXPLAIN to Output Page Fault Information

2025-03-17 Thread Jelte Fennema-Nio
On Mon, 10 Feb 2025 at 14:23, torikoshia  wrote:
> Thanks for reviewing the patch and comments!
> Fixed issues you pointed out and attached v2 patch.

This patch needs a rebase, because it's failing to compile currently.
So I marked this as "Waiting on Author" in the commitfest app.




Re: Add semi-join pushdown to postgres_fdw

2025-03-17 Thread Robins Tharakan
Hi Alexander,

On Mon, 4 Dec 2023 at 07:22, Alexander Korotkov 
wrote:
>
>
> Now, I think this looks good.  I'm going to push this if no objections.

After this commit, I began seeing an unexpected ERROR - see this bug-report.
https://www.postgresql.org/message-id/18852-fb75b88160678f78%40postgresql.org

-
robins
https://robins.in

Ref:
1.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=824dbea3e41efa3b35094163c834988dea7eb139


Re: maintenance_work_mem = 64kB doesn't work for vacuum

2025-03-17 Thread David Rowley
On Tue, 18 Mar 2025 at 05:49, Masahiko Sawada  wrote:
> I've attached the patch. I added the minimum regression tests for that.

I think the change to vacuumlazy.c is ok. The new test you've added
creates a table called pvactst2 but then adds a test that uses the
pvactst table.

Did you mean to skip the DROP TABLE pvactst2;?

Is there a reason to keep the maintenance_work_mem=64 for the
subsequent existing test?

David




Re: Forbid to DROP temp tables of other sessions

2025-03-17 Thread Daniil Davydov
Hi,
I see that the presence of isolation tests in the patch is
controversial. First things first, let's concentrate on fixing the
bug.
I attach a new version of patch (for `master` branch) to this letter.
It contains better comments and a few small improvements.

P.S.
Sorry for bad formatting in previous letter (idk how to fix it in gmail client)

--
Best regards,
Daniil Davydov
From e5f568ece877a33d307ac6480cef9dd088412aae Mon Sep 17 00:00:00 2001
From: Daniil Davidov 
Date: Mon, 17 Mar 2025 15:43:15 +0700
Subject: [PATCH v2] Fix accessing other sessions temp tables

---
 src/backend/catalog/namespace.c | 71 +++--
 src/backend/nodes/makefuncs.c   |  6 ++-
 src/backend/parser/gram.y   | 11 -
 3 files changed, 57 insertions(+), 31 deletions(-)

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index d97d632a7ef..688819299af 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -499,28 +499,40 @@ RangeVarGetRelidExtended(const RangeVar *relation, LOCKMODE lockmode,
 		 */
 		if (relation->relpersistence == RELPERSISTENCE_TEMP)
 		{
-			if (!OidIsValid(myTempNamespace))
-relId = InvalidOid; /* this probably can't happen? */
-			else
-			{
-if (relation->schemaname)
-{
-	Oid			namespaceId;
+			Oid	namespaceId;
 
-	namespaceId = LookupExplicitNamespace(relation->schemaname, missing_ok);
+			if (relation->schemaname)
+			{
+namespaceId = LookupExplicitNamespace(relation->schemaname, missing_ok);
 
+if (namespaceId != myTempNamespace)
+{
 	/*
-	 * For missing_ok, allow a non-existent schema name to
-	 * return InvalidOid.
+	 * We don't allow users to access temp tables of other
+	 * sessions (consider adding GUC for to allow to DROP such
+	 * tables?).
 	 */
-	if (namespaceId != myTempNamespace)
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
- errmsg("temporary tables cannot specify a schema name")));
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("could not access temporary relations of other sessions")));
 }
+			}
+			else
+			{
+namespaceId = myTempNamespace;
 
-relId = get_relname_relid(relation->relname, myTempNamespace);
+/*
+  * If this table was recognized as temporary, it means that we
+ * found it because backend's temporary namespace was specified
+ * in search_path. Thus, MyTempNamespace must contain valid oid.
+ */
+Assert(OidIsValid(namespaceId));
 			}
+
+			if (missing_ok && !OidIsValid(namespaceId))
+relId = InvalidOid;
+			else
+relId = get_relname_relid(relation->relname, namespaceId);
 		}
 		else if (relation->schemaname)
 		{
@@ -3387,17 +3399,18 @@ LookupExplicitNamespace(const char *nspname, bool missing_ok)
 	Oid			namespaceId;
 	AclResult	aclresult;
 
-	/* check for pg_temp alias */
+	/*
+	 * If namespace specified in 'pg_temp' format (without owner backend id
+	 * specification), we assume that this is our temporary namespace.
+	 */
 	if (strcmp(nspname, "pg_temp") == 0)
 	{
-		if (OidIsValid(myTempNamespace))
-			return myTempNamespace;
+		if (!OidIsValid(myTempNamespace))
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_SCHEMA_NAME),
+	 errmsg("pg_temp was specified, but you have not any temporary relations")));
 
-		/*
-		 * Since this is used only for looking up existing objects, there is
-		 * no point in trying to initialize the temp namespace here; and doing
-		 * so might create problems for some callers --- just fall through.
-		 */
+		return myTempNamespace;
 	}
 
 	namespaceId = get_namespace_oid(nspname, missing_ok);
@@ -3553,21 +3566,19 @@ get_namespace_oid(const char *nspname, bool missing_ok)
 RangeVar *
 makeRangeVarFromNameList(const List *names)
 {
-	RangeVar   *rel = makeRangeVar(NULL, NULL, -1);
+	RangeVar   *rel;
 
 	switch (list_length(names))
 	{
 		case 1:
-			rel->relname = strVal(linitial(names));
+			rel = makeRangeVar(NULL, strVal(linitial(names)), -1);
 			break;
 		case 2:
-			rel->schemaname = strVal(linitial(names));
-			rel->relname = strVal(lsecond(names));
+			rel = makeRangeVar(strVal(linitial(names)), strVal(lsecond(names)), -1);
 			break;
 		case 3:
+			rel = makeRangeVar(strVal(lsecond(names)), strVal(lthird(names)), -1);
 			rel->catalogname = strVal(linitial(names));
-			rel->schemaname = strVal(lsecond(names));
-			rel->relname = strVal(lthird(names));
 			break;
 		default:
 			ereport(ERROR,
@@ -3774,6 +3785,8 @@ GetTempNamespaceProcNumber(Oid namespaceId)
 		return INVALID_PROC_NUMBER; /* no such namespace? */
 	if (strncmp(nspname, "pg_temp_", 8) == 0)
 		result = atoi(nspname + 8);
+	else if (strncmp(nspname, "pg_temp", 7) == 0)
+		result = MyProcNumber;
 	else if (strncmp(nspname, "pg_toast_temp_", 14) == 0)
 		result = atoi(nspname + 14);
 	else
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index e2d9e9be41a..62edf24b5c2 100644
-

Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-03-17 Thread Shubham Khanna
On Mon, Mar 17, 2025 at 8:44 AM Amit Kapila  wrote:
>
> On Sat, Mar 15, 2025 at 8:03 PM David G. Johnston
>  wrote:
> >
> > On Friday, March 14, 2025, Amit Kapila  wrote:
> >>
> >>
> >> Style-1 sounds reasonable to me, but how exactly we want to do. One
> >> idea is to have short and long switches like -r and
> >> --remove_exiting_object=publication. The user would be allowed to give
> >> multiple options like -r publications -r slots, etc.
> >
> >
> > Either “existing” nor “object” are needed, a one-word long form suffices.  
> > Drop, remove, or prune.  If we want a short form option we should choose 
> > Remove and use -r; both D and P are already taken.
> >
> > So, I marginally prefer —prune with no short-form option; followed by 
> > —remove/-r
> >
>
> I am inclined towards "--remove/-r" as that will be relatively more
> straightforward to follow for users.
>
> > Communicating the semantic meaning of “prune” in the option name, we aren’t 
> > removing all objects of the given type, tips the balance for me.  But that 
> > can just be communicated in the description so it isn’t a strong desire.
> >
>
> BTW, with this option, we will be removing all the publications
> present on the subscriber because on standby there shouldn't be any
> more. But that may not be true for other objects, so we must
> communicate it via the description.
>
> --

I have incorporated the "--remove/-r" parameter in the attached patch,
as it seems more intuitive and straightforward for users.
The attached patch contains the latest changes.

Thanks and regards,
Shubham Khanna,


v17-0001-Support-for-dropping-all-publications-in-pg_crea.patch
Description: Binary data


Re: Allow default \watch interval in psql to be configured

2025-03-17 Thread Ashutosh Bapat
Hi Daniel,

On Fri, Mar 14, 2025 at 2:26 PM Daniel Gustafsson  wrote:
>
> On 13 Mar 2025, at 15:03, Greg Sabino Mullane  wrote:
>
> Patch looks good. One minor issue:
>
> greg=# \set WATCH_INTERVAL -1
> invalid value "-1" for "WATCH_INTERVAL": must be greater than 0.00
> greg=# \set WATCH_INTERVAL 0.00
> greg=#
>
> We should disallow 0 as the error message implies
>
>
> Ah, nice catch, fixed in the attached along with a test for the minimum bound 
> (ie zero).

#\watch c=4 i=
Mon 17 Mar 2025 05:52:50 PM IST (every 0s)

 ?column?
--
1
(1 row)

Mon 17 Mar 2025 05:52:50 PM IST (every 0s)

 ?column?
--
1
(1 row)

Mon 17 Mar 2025 05:52:50 PM IST (every 0s)

 ?column?
--
1
(1 row)

Mon 17 Mar 2025 05:52:50 PM IST (every 0s)

 ?column?
--
1
(1 row)

0 is an accepted value for interval, even though it might look insensible.

The behaviour should be same in both cases \watch i= and
\set WATCH_INTERVAL  \watch. In this case it's not. In
fact, we should use the same validation code in both the cases. Why
don't we perform the same  additional validations in
ParseVariableDouble() in exec_watch_command() as well?

The test only validate default variable value. We need a test where we
see variable value being honored lie tests between 369 to 376 in the
same file.

-- 
Best Wishes,
Ashutosh Bapat




Re: 64 bit numbers vs format strings

2025-03-17 Thread Thomas Munro
On Mon, Mar 17, 2025 at 8:03 PM Peter Eisentraut  wrote:
> On 15.03.25 03:42, Thomas Munro wrote:
> > So we should make pgoff_t a typedef for int64_t everywhere.  It's a
> > bit annoying that we have to teach everyone to remember to use PRId64
> > to print it, though.
>
> The ramifications of such a change are not clear to me.  I thought
> pgoff_t is supposed to be off_t on Unix systems.  If we change that,
> then how will this affect pointer type arguments, function pointers,
> etc.  This seems to be a much larger problem than what this thread is
> originally about.
>
> I think we should leave pgoff_t the way it is (mostly?) done now: Cast
> to long long int and print using %lld.

WFM.

> > How reasonable would it be to add an extra
> > filter into whatever script is used to run xgettext on our source
> > tree?  It could replace a very small number of agreed useful tokens to
> > match some macros that we would also define in our tree, so that we
> > could write PRI_PGOFF_T in our messages, but xgettext would see PRId64
> > and still emit those magic % tokens that GNU/NetBSD/Solaris
> > gettext() know how to translate on the fly when loading message
> > catalogues.
>
> This is not really possible.  The  behavior is baked deeply into
> the gettext code.  (Also note that you don't only need support in
> xgettext, which is part of our build system, but also in the runtime
> library, which we don't control.)

Hmm, but that's why I was asking about filtering the source *before*
xgettext sees it, but it sounds like I may still be confused about how
that works and I'm very happy to abandon that idea and leave those
bits unchanged.  Will update the patch shortly to incorporate your
other feedback.

Thanks!




Re: Forbid to DROP temp tables of other sessions

2025-03-17 Thread Daniil Davydov
Hi,

On Mon, Mar 17, 2025 at 5:33 PM Steven Niu  wrote:
>
> I mean RangeVarGetRelidExtended(), you deleted the following code:
>
> if (!OidIsValid(myTempNamespace))
> relId = InvalidOid; /* this probably can't happen? */

Hm, I got it. Let's start with the fact that the comment "this
probably can't happen?" is incorrect. Even if we don't have a
temporary namespace in our session, we still can specify "pg_temp_N"
in the psql query.
Next, if relation->schemaname is pg_temp, then we firstly call
LookupExplicitNamespace, where you can find if
(!OidIsValid(myTempNamespace)) check.

--
Best regards,
Daniil Davydov




Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-03-17 Thread jian he
hi.
I played around with it.

current syntax, we don't need to deal with column constraint grammar.
like the following can fail directly:
create table t0(a int constraint nn not null a not valid);
we only support table constraint cases like:
alter table lp add constraint nc1 not null a not valid;

since CREATE TABLE with invalid constraint does not make sense, so
we can just issue a warning. like:
create table t0(a int, constraint nn not null a not valid);
WARNING:  CREATE TABLE NOT NULL NOT VALID CONSTRAINT WILL SET TO VALID
the wording needs to change...

for not null not valid syntax, we only need to support:
ALTER TABLE ADD CONSTRAINT conname NOT NULL column_name NOT VALID
ALTER TABLE ADD NOT NULL column_name NOT VALID
ALTER TABLE VALIDATE CONSTRAINT conname

The attached is what I came up with:

ALTER TABLE ADD CONSTRAINT conname NOT NULL column_name NOT VALID
will create an invalidated check constraint. like
ALTER TABLE ADD CONSTRAINT conname CHECK (column_name IS NOT NULL) NOT VALID

when you validate the not-null constraint (internally it's a check constraint)
it will drop the check constraint and install a not-null constraint
with the same name.

drop a check constraint, it will call RemoveConstraintById.
within RemoveConstraintById it will lock pg_constraint.conrelid in
AccessExclusiveLock mode,
which is not ideal, because
ALTER TABLE VALIDATE CONSTRAINT only needs ShareUpdateExclusiveLock.
so we have to find a way to release that AccessExclusiveLock.

because we have converted a not-null constraint to a check constraint,
we need to somehow distinguish this case,
so pg_constraint adds another column: coninternaltype.
(the naming is not good, i guess)
because we dropped a invalid check constraint, but the inherited
constraint cannot be dropped.
so this ALTER TABLE VALIDATE CONSTRAINT will not work for partitions,
but it will work for the root partitioned table. (same logic for table
inheritance).
--
demo:

create table t(a int);
alter table t add constraint nc1 not null a not valid;
\d t
 Table "public.t"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   |  |
Check constraints:
"nc1" CHECK (a IS NOT NULL) NOT VALID

insert into t default values;
ERROR:  new row for relation "t" violates check constraint "nc1"
DETAIL:  Failing row contains (null).

alter table t validate constraint nc1;
\d+ t
Table "public.t"
 Column |  Type   | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
+-+---+--+-+-+-+--+-
 a  | integer |   | not null | | plain   |
|  |
Not-null constraints:
"nc1" NOT NULL "a"
Access method: heap

---
some regress tests added.
need more polishing, but overall it works as the above described.

not sure if this idea is crazy or not,
what do you think?
From 5a66a8da2a8c5fef88d7730e127900e5b83e0cde Mon Sep 17 00:00:00 2001
From: jian he 
Date: Sun, 16 Mar 2025 20:50:53 +0800
Subject: [PATCH v4 1/1] not null not valid implementation

---
 src/backend/catalog/heap.c|  14 +-
 src/backend/catalog/index.c   |   1 +
 src/backend/catalog/pg_constraint.c   |   3 +
 src/backend/commands/tablecmds.c  | 168 +-
 src/backend/commands/trigger.c|   1 +
 src/backend/commands/typecmds.c   |   2 +
 src/backend/parser/gram.y |   5 +-
 src/backend/parser/parse_utilcmd.c|  63 +++-
 src/include/catalog/heap.h|   1 +
 src/include/catalog/pg_constraint.h   |   3 +
 src/include/nodes/parsenodes.h|   1 +
 src/test/regress/expected/constraints.out |  63 
 src/test/regress/sql/constraints.sql  |  41 ++
 13 files changed, 355 insertions(+), 11 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index bd3554c0bfd..6dc0267bd5a 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -104,7 +104,7 @@ static ObjectAddress AddNewRelationType(const char *typeName,
 static void RelationRemoveInheritance(Oid relid);
 static Oid	StoreRelCheck(Relation rel, const char *ccname, Node *expr,
 		  bool is_enforced, bool is_validated, bool is_local,
-		  int16 inhcount, bool is_no_inherit, bool is_internal);
+		  int16 inhcount, bool is_no_inherit, char coninternaltype, bool is_internal);
 static void StoreConstraints(Relation rel, List *cooked_constraints,
 			 bool is_internal);
 static bool MergeWithExistingConstraint(Relation rel, const char *ccname, Node *expr,
@@ -2136,7 +2136,7 @@ SetAttrMissing(Oid relid, char *attname, char *value)
 static Oi

Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-03-17 Thread Alvaro Herrera
On 2025-Mar-17, jian he wrote:

> hi.
> I played around with it.
> 
> current syntax, we don't need to deal with column constraint grammar.
> like the following can fail directly:
> create table t0(a int constraint nn not null a not valid);
> we only support table constraint cases like:
> alter table lp add constraint nc1 not null a not valid;
> 
> since CREATE TABLE with invalid constraint does not make sense, so
> we can just issue a warning. like:
> create table t0(a int, constraint nn not null a not valid);
> WARNING:  CREATE TABLE NOT NULL NOT VALID CONSTRAINT WILL SET TO VALID
> the wording needs to change...

Yeah, we discussed this elsewhere.  I have an alpha-quality patch for
that, but I wasn't too sure about it ...

[1] 
https://postgr.es/m/cacjufxeqchnhn6m18jy1mqcgqq9gn9ofmeop47sdfde5b8w...@mail.gmail.com

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
>From 4eb5f71595753d035d5267aca93c8b2cca93b4a6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= 
Date: Tue, 14 Jan 2025 18:43:47 +0100
Subject: [PATCH] Throw warning if NOT VALID is given during CREATE TABLE

Discussion: https://postgr.es/m/cacjufxeqchnhn6m18jy1mqcgqq9gn9ofmeop47sdfde5b8w...@mail.gmail.com
---
 src/backend/catalog/heap.c| 8 
 src/backend/commands/tablecmds.c  | 7 +++
 src/backend/parser/gram.y | 2 ++
 src/include/nodes/parsenodes.h| 1 +
 src/test/regress/expected/alter_table.out | 1 +
 src/test/regress/expected/foreign_key.out | 3 ++-
 src/test/regress/sql/foreign_key.sql  | 2 +-
 7 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index bd3554c0bfd..0eb19f3c5cf 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2572,6 +2572,12 @@ AddRelationNewConstraints(Relation rel,
 checknames = lappend(checknames, ccname);
 			}
 
+			/* XXX improve error report */
+			if (cdef->was_not_valid && cdef->initially_valid && cdef->is_enforced)
+ereport(WARNING,
+		errcode(ERRCODE_WARNING),
+		errmsg("NOT VALID flag for constraint \"%s\" ignored", ccname));
+
 			/*
 			 * OK, store it.
 			 */
@@ -3039,6 +3045,8 @@ AddRelationNotNullConstraints(Relation rel, List *constraints,
 		   nnnames);
 		nnnames = lappend(nnnames, conname);
 
+		/* XXX if NOT VALID is given during CREATE TABLE, throw warning */
+
 		StoreRelNotNull(rel, conname,
 		attnum, true, true,
 		inhcount, constr->is_no_inherit);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 129c97fdf28..b59f25b79eb 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -10546,6 +10546,13 @@ addFkConstraint(addFkConstraintSides fkside,
 		connoinherit = rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE;
 	}
 
+	/* XXX improve error report */
+	if (fkconstraint->was_not_valid && fkconstraint->initially_valid)
+		ereport(WARNING,
+errcode(ERRCODE_WARNING),
+errmsg("NOT VALID flag for constraint \"%s\" ignored",
+	   fkconstraint->conname));
+
 	/*
 	 * Record the FK constraint in pg_constraint.
 	 */
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 271ae26cbaf..29b12a2ff46 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4213,6 +4213,7 @@ ConstraintElem:
    NULL, NULL, &n->is_enforced, &n->skip_validation,
    &n->is_no_inherit, yyscanner);
 	n->initially_valid = !n->skip_validation;
+	n->was_not_valid = n->skip_validation;
 	$$ = (Node *) n;
 }
 			| NOT NULL_P ColId ConstraintAttributeSpec
@@ -4347,6 +4348,7 @@ ConstraintElem:
    NULL, &n->skip_validation, NULL,
    yyscanner);
 	n->initially_valid = !n->skip_validation;
+	n->was_not_valid = n->skip_validation;
 	$$ = (Node *) n;
 }
 		;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 23c9e3c5abf..a1878e565ff 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2819,6 +2819,7 @@ typedef struct Constraint
 	bool		is_enforced;	/* enforced constraint? */
 	bool		skip_validation;	/* skip validation of existing rows? */
 	bool		initially_valid;	/* mark the new constraint as valid? */
+	bool		was_not_valid;	/* did the user request NOT VALID? */
 	bool		is_no_inherit;	/* is constraint non-inheritable? */
 	Node	   *raw_expr;		/* CHECK or DEFAULT expression, as
  * untransformed parse tree */
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 362f38856d2..59736648ce4 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -574,6 +574,7 @@ DROP TABLE attmp2;
 -- exclusion until validated
 set constraint_exclusion TO 'partition';
 create table nv_parent (d date, check (false) no inherit not valid);
+WARNING:  NOT VALID flag for constraint "nv_

RE: long-standing data loss bug in initial sync of logical replication

2025-03-17 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

> Regarding the PG13, it cannot be
> applied
> as-is thus some adjustments are needed. I will share it in upcoming posts.

Here is a patch set for PG13. Apart from PG14-17, the patch could be created 
as-is,
because...

1. WAL record for invalidation messages (XLOG_XACT_INVALIDATIONS) does not 
exist.
2. Thus the ReorderBufferChange for the invalidation does not exist.
   Our patch tries to distribute it but cannot be done as-is.
3. Codes assumed that invalidation messages can be added only once.
4. The timing when invalidation messages are consumed is limited:
  a. COMMAND_ID change is poped,
  b. start of decoding a transaction, or
  c. end of decoding a transaction.

Above means that invalidations cannot be executed while being decoded.
I created two patch sets to resolve the data loss issue. 0001 has less code
changes but could resolve a part of issue, 0002 has huge changes but provides a
complete solution.

0001 - mostly same as patches for other versions. 
ReorderBufferAddInvalidations()
   was adjusted to allow being called several times. As I said above,
   0001 cannot execute inval messages while decoding the transacitons.
0002 - introduces new ReorderBufferChange type to indicate inval messages.
   It would be handled like PG14+.

Here is an example. Assuming that the table foo exists on both nodes, a
publication "pub" which publishes all tables, and a subscription "sub" which
subscribes "pub". What if the workload is executed?

```
S1  S2
BEGIN;
INSERT INTO foo VALUES (1)
ALTER PUBLICATION pub RENAME TO pub_renamed;
INSERT INTO foo VALUES (2)
COMMIT;
LR -> ?
```

With 0001, tuples (1) and (2) would be replicated to the subscriber.
An error "publication "pub" does not exist" would raise when new changes are 
done
later.

0001+0002 works more aggressively; the error would raise when S1 transaction is 
decoded.
The behavior is same as for patched PG14-PG17.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



v19_REL_13-0001-Distribute-invalidatons-if-change-in-cata.patch
Description:  v19_REL_13-0001-Distribute-invalidatons-if-change-in-cata.patch


v19_REL_13-0002-Backpatch-introducing-invalidation-messag.patch
Description:  v19_REL_13-0002-Backpatch-introducing-invalidation-messag.patch


Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-03-17 Thread Ashutosh Bapat
On Mon, Mar 17, 2025 at 4:02 PM vignesh C  wrote:
>
> On Mon, 17 Mar 2025 at 11:28, Shubham Khanna
>  wrote:
> >
> > On Fri, Mar 14, 2025 at 5:43 PM Nisha Moond  
> > wrote:
> >
> > Fixed.
> >
> > The attached patch contains the suggested changes.
>
> I feel like we're trying to address two separate tasks in this thread:
> a) Enhancing pg_createsubscriber to automatically retrieve databases
> when none is provided. b) Refactoring all pg_createsubscriber tests.
>
> I suggest we keep this thread focused solely on retrieving all
> databases and start a new thread for test refactoring. This will allow
> us to tackle each task separately and ensure a cleaner commit.

I was expecting that the argument validation tests will go in one test
- existing as well as the ones for --all. But that's not how the patch
is splitting them. It has split only the existing test. I am fine if
we add a new test for --all option as the [patch does and leave the
existing test as is. Cramming everything in one test makes it
unmaintainable, though.

-- 
Best Wishes,
Ashutosh Bapat




Re: Fix couple of typos

2025-03-17 Thread Ashutosh Bapat
On Mon, Mar 17, 2025 at 4:46 PM Amit Kapila  wrote:
>
> On Mon, Mar 17, 2025 at 3:44 PM vignesh C  wrote:
> >
> > While reviewing another patch, I found a couple of typos:
> > 1) subid should have been pubid in AlterPublicationOwner and
> > AlterPublicationOwner_oid functions.
> > 2) Only tuples with XIDs/MXIDs older than the the
> > FreezeLimit/MultiXactCutoff are frozen in the common case.
> > Should have been"
> > Only tuples with XIDs/MXIDs older than the FreezeLimit/MultiXactCutoff
> > are frozen in the common case.
> >
> > The attached patch has the changes for the same.
> >
>
> The changes look good to me.

+1

However they look unrelated. We should split them in two commits
s/subid/pubid/ to one and s/the the/the/ to another. I also verified
that this is the only occurrence of "the the" in the code base.

-- 
Best Wishes,
Ashutosh Bapat




Fwd: lwlocknames.h beautification attempt

2025-03-17 Thread Gurjeet Singh
(resending the email because it was held for moderation; replaced image
attachment with a link, which might be the reason for being put in the
moderation queue)

On Sun, Mar 16, 2025 at 7:53 PM Robert Haas  wrote:
>
> On Fri, Mar 14, 2025 at 3:38 PM Álvaro Herrera  
> wrote:
> > I forgot to send a note here that I pushed this patch. Thank you.
>
> I'm confused. Tom and I both said we didn't like this change,

To me, Tom's feedback felt as being between ambivalent to the change and perhaps
agree with the change, as long as pgindent did not throw a fit, which
it did not.
It definitely did not feel like a -1.

On Mon, Mar 3, 2025 at 8:40 AM Robert Haas  wrote:
>
> On Sun, Mar 2, 2025 at 1:10 AM Gurjeet Singh  wrote:
> > I propose the following change to the generation script, 
> > generate-lwlocknames.pl
> >
> > - print $h "#define ${lockname}Lock (&MainLWLockArray[$lockidx].lock)\n";
> > + printf $h "#define %-30s %s\n", "${lockname}Lock", 
> > "(&MainLWLockArray[$lockidx].lock)";
>
> -1. That seems worse to me.

I read your objection to be about the perl code change, whereas the real change
the patch was seeking is in the generated output (lwlocknames.h). I'm guessing
some others may also have taken your feedback to mean the same as I did.

Alvaro committed the following patch, which is better code than mine. If your
objection was about the perl code, perhaps this addresses your concern.

- print $h "#define ${lockname}Lock (&MainLWLockArray[$lockidx].lock)\n";
+ printf $h "#define %-32s (&MainLWLockArray[$lockidx].lock)\n",
+ $lockname . "Lock";

I have ~attached~ linked a comparison of before and after screenshots of the
generated output. It's hard to argue that the generated code in the left/before
image is better than the right/after image.

https://photos.app.goo.gl/hNL3FaUMuEwnaYTt9

Best regards,
Gurjeet
http://Gurje.et




Re: Fix couple of typos

2025-03-17 Thread Amit Kapila
On Mon, Mar 17, 2025 at 3:44 PM vignesh C  wrote:
>
> While reviewing another patch, I found a couple of typos:
> 1) subid should have been pubid in AlterPublicationOwner and
> AlterPublicationOwner_oid functions.
> 2) Only tuples with XIDs/MXIDs older than the the
> FreezeLimit/MultiXactCutoff are frozen in the common case.
> Should have been"
> Only tuples with XIDs/MXIDs older than the FreezeLimit/MultiXactCutoff
> are frozen in the common case.
>
> The attached patch has the changes for the same.
>

The changes look good to me.

-- 
With Regards,
Amit Kapila.




Re: Proposal: Deferred Replica Filtering for PostgreSQL Logical Replication

2025-03-17 Thread Amit Kapila
On Sun, Mar 16, 2025 at 12:59 AM Dean  wrote:
>
> I'd like to propose an enhancement to PostgreSQL's logical replication 
> system: Deferred Replica Filtering (DRF). The goal of this feature is to 
> provide more granular control over which rows are replicated by applying 
> publication filters after the WAL decoding process, before sending data to 
> subscribers.
>
> Currently, PostgreSQL's logical replication filters apply deterministically. 
> Deferred filtering, however, operates after the WAL has been decoded, giving 
> it access to the complete row data and making filtering decisions based on 
> mutable values. Additionally, record columns may be omitted by the filter.
>
> This opens up several possibilities for granular control. Consider the 
> following examples:
> Alice and Bob subscribe to changes on a table with RLS enabled, allowing CRUD 
> operations based on user's IDs.
> 1. Alice needs to know the timestamp at which Bob updated the table. With 
> DRF, we can omit all columns except for the timestamp.
> 2. Bob wants to track DELETEs on the table. Without DRF, Bob can see all 
> columns on any deleted row, potentially exposing complete records he 
> shouldn't be authorized to view. DRF can filter these rows out.
>
> Deferred replica filtering allows for session-specific, per-row, and 
> per-column filtering - features currently not supported by existing 
> replication filters, enhancing security and data privacy.
>

We provide column lists [1] and row filters [2]. Doesn't that suffice
the need, if not, kindly let us know what exactly you need with some
examples.

[1] - https://www.postgresql.org/docs/devel/logical-replication-col-lists.html
[2] - https://www.postgresql.org/docs/devel/logical-replication-row-filter.html

-- 
With Regards,
Amit Kapila.




Re: Snapshot related assert failure on skink

2025-03-17 Thread Tomas Vondra
On 3/4/25 23:25, Andres Freund wrote:
> Hi,
> 
> I just saw a BF failure on skink (valgrind) that asserts out.
> 
> Check the 002_compare_backups failure in:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-04%2017%3A35%3A01
> 
> TRAP: failed Assert("TransactionIdPrecedesOrEquals(TransactionXmin, 
> RecentXmin)"), File: "../pgsql/src/backend/storage/ipc/procarray.c", Line: 
> 2132, PID: 3115649
> postgres: pitr2: bf postgres [local] SELECT(ExceptionalCondition+0x5f) 
> [0x6e45e0]
> postgres: pitr2: bf postgres [local] SELECT(+0x467c3d) [0x56fc3d]
> postgres: pitr2: bf postgres [local] SELECT(GetSnapshotData+0x60) [0x570fbd]
> postgres: pitr2: bf postgres [local] 
> SELECT(GetNonHistoricCatalogSnapshot+0x53) [0x7285c5]
> postgres: pitr2: bf postgres [local] SELECT(GetCatalogSnapshot+0x28) 
> [0x72969b]
> postgres: pitr2: bf postgres [local] SELECT(systable_beginscan+0xc9) 
> [0x252602]
> postgres: pitr2: bf postgres [local] SELECT(+0x5c1e56) [0x6c9e56]
> postgres: pitr2: bf postgres [local] SELECT(+0x5c20e9) [0x6ca0e9]
> postgres: pitr2: bf postgres [local] SELECT(SearchCatCache+0x18) [0x6ca71c]
> postgres: pitr2: bf postgres [local] SELECT(SearchSysCache+0x21) [0x6deeae]
> postgres: pitr2: bf postgres [local] SELECT(GetSysCacheOid+0x23) [0x6df219]
> postgres: pitr2: bf postgres [local] SELECT(get_namespace_oid+0x2f) [0x2dbae0]
> postgres: pitr2: bf postgres [local] SELECT(+0x1d3c13) [0x2dbc13]
> postgres: pitr2: bf postgres [local] SELECT(+0x1d3e1b) [0x2dbe1b]
> postgres: pitr2: bf postgres [local] SELECT(+0x1d3e72) [0x2dbe72]
> postgres: pitr2: bf postgres [local] SELECT(FuncnameGetCandidates+0xaf) 
> [0x2ddef1]
> postgres: pitr2: bf postgres [local] SELECT(func_get_detail+0x96) [0x4bf521]
> postgres: pitr2: bf postgres [local] SELECT(ParseFuncOrColumn+0x670) 
> [0x4c13b9]
> postgres: pitr2: bf postgres [local] SELECT(+0x3b1123) [0x4b9123]
> postgres: pitr2: bf postgres [local] SELECT(+0x3b12ec) [0x4b92ec]
> postgres: pitr2: bf postgres [local] SELECT(+0x3b2c60) [0x4bac60]
> postgres: pitr2: bf postgres [local] SELECT(+0x3b12dc) [0x4b92dc]
> postgres: pitr2: bf postgres [local] SELECT(transformExpr+0x20) [0x4b8faa]
> postgres: pitr2: bf postgres [local] SELECT(transformTargetEntry+0x7c) 
> [0x4cf4f9]
> postgres: pitr2: bf postgres [local] SELECT(transformTargetList+0x96) 
> [0x4cf5a6]
> postgres: pitr2: bf postgres [local] SELECT(+0x39a5b9) [0x4a25b9]
> postgres: pitr2: bf postgres [local] SELECT(transformStmt+0x141) [0x4a304b]
> postgres: pitr2: bf postgres [local] SELECT(+0x39c390) [0x4a4390]
> postgres: pitr2: bf postgres [local] SELECT(transformTopLevelStmt+0x19) 
> [0x4a43cc]
> postgres: pitr2: bf postgres [local] SELECT(parse_analyze_fixedparams+0x56) 
> [0x4a4424]
> postgres: pitr2: bf postgres [local] 
> SELECT(pg_analyze_and_rewrite_fixedparams+0x40) [0x59b021]
> postgres: pitr2: bf postgres [local] SELECT(+0x4937f6) [0x59b7f6]
> postgres: pitr2: bf postgres [local] SELECT(PostgresMain+0x940) [0x59db54]
> postgres: pitr2: bf postgres [local] SELECT(BackendMain+0x59) [0x596b38]
> postgres: pitr2: bf postgres [local] SELECT(postmaster_child_launch+0xeb) 
> [0x4ee282]
> postgres: pitr2: bf postgres [local] SELECT(+0x3e9271) [0x4f1271]
> postgres: pitr2: bf postgres [local] SELECT(+0x3eadb6) [0x4f2db6]
> postgres: pitr2: bf postgres [local] SELECT(PostmasterMain+0x118b) [0x4f43b2]
> postgres: pitr2: bf postgres [local] SELECT(main+0x1e2) [0x421da2]
> /lib/x86_64-linux-gnu/libc.so.6(+0x29ca8) [0x5a27ca8]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x5a27d65]
> postgres: pitr2: bf postgres [local] SELECT(_start+0x21) [0x1eb421]
> 2025-03-04 19:11:37.941 UTC [3104796][postmaster][:0] LOG:  client backend 
> (PID 3115649) was terminated by signal 6: Aborted
> 2025-03-04 19:11:37.941 UTC [3104796][postmaster][:0] DETAIL:  Failed process 
> was running: SELECT NOT pg_is_in_recovery();
> 2025-03-04 19:11:37.943 UTC [3104796][postmaster][:0] LOG:  terminating any 
> other active server processes
> 
> 
> The code directly in question hasn't changed in a few years. I haven't seen
> this crash before.  The only recent-ish changes in the wider area are:
> 
> commit 952365cded6
> Author: Heikki Linnakangas 
> Date:   2024-12-23 12:42:39 +0200
>  
> Remove unnecessary GetTransactionSnapshot() calls
> 
> commit 578a7fe7b6f8484f6d7caa2fda288abb3fe87aa0
> Author: Heikki Linnakangas 
> Date:   2024-12-21 23:42:39 +0200
>  
> Update TransactionXmin when MyProc->xmin is updated
> 
> but neither is immediately yelling "here" to me.
> 
> 
> I don't really see how pg_combinebackup could lead to this issue either.
> 
> 
> One potentially interesting bit is that the statement is running *during* a
> shutdown checkpoint:
> 
> 2025-03-04 19:11:37.377 UTC [3106505][checkpointer][:0] LOG:  checkpoint 
> starting: end-of-recovery immediate wait
> 2025-03-04 19:11:37.454 UTC [3115649][client backend][2/2:0] LOG:  statement: 
> SELECT NOT pg_is_in_recovery();
> TRAP: failed Assert("T

Re: BitmapHeapScan streaming read user and prelim refactoring

2025-03-17 Thread Jakub Wartak
On Thu, Mar 13, 2025 at 9:34 PM Melanie Plageman
 wrote:
> On Thu, Mar 13, 2025 at 5:46 AM Jakub Wartak
>  wrote:
> >
> > Cool, anything > 1 is just better. Just quick question, so now we have:
> >
> > #define DEFAULT_EFFECTIVE_IO_CONCURRENCY 16
> > #define DEFAULT_MAINTENANCE_IO_CONCURRENCY 10
> >
> > Shouldn't maintenance be now also at the same value (16) instead of
> > 10? Also, fc34b0d9de27 (5 years ago by Thomas) states about m_io_c:
> > "Use the new setting in heapam.c instead of the hard-coded formula
> > effective_io_concurrency + 10 introduced by commit 558a9165e08.", so
> > there was always assumption that m_io_c > e_io_c (?) Now it's kind of
> > inconsistent to see bitmap heap scans pushing more IOPs than
> > recovery(!)/ANALYZE/VACUUM or am I missing something? No pressure to
> > change, just asking what Your thoughts are.
>
> Yea, I'm not really sure what if any relationship the two GUCs should
> have to one another.
> As long as they are in the same ballpark, since
> they are tuning for the same machine, I don't see any reason their
> values need to be adjusted together. However, if it is confusing for
> maintenance_io_concurrency default to be 10 while
> effective_io_concurrency default is 16, I'm happy to change it.

Hi Melanie,

dunno, I've just asked if it isn't suspicious to anyone except me else
that e_io_c > m_io_c rather than e_io_c <= m_io_c. My understanding
was always that to tune max IO queue depth you would do this:
a. up to N backends (up to max_connections; usually much lower) * e_io_c
b. autovacuum_max_workers * m_io_c
c. just one (standby/recovering) * m_io_c

The thing (for me) is: if we are allowing for much higher IOPS "a"
scenario, then why standby cannot use just the same (if not higher)
IOPS for prefetching in "c" scenario. After all, it is a much more
critical and sensitive thing (lag).

> I just don't quite know what I would write in the commit message. "it seemed 
> confusing that they weren't the same?"

"To keep with the historical legacy of maintenance_io_concurrency
being traditionally higher than effective_io_concurrency ..."
OK, so right, nobody else commented on this, so maybe it's just me and
it was just a question after all.

-J.




Re: Draft for basic NUMA observability

2025-03-17 Thread Jakub Wartak
On Fri, Mar 14, 2025 at 1:08 PM Bertrand Drouvot
 wrote:
> On Fri, Mar 14, 2025 at 11:05:28AM +0100, Jakub Wartak wrote:
> > On Thu, Mar 13, 2025 at 3:15 PM Bertrand Drouvot
> >  wrote:
> >
> > Hi,
> >
> > Thank you very much for the review! I'm answering to both reviews in
> > one go and the results is attached v12, seems it all should be solved
> > now:
>
> Thanks for v12!
>
> I'll review 0001 and 0003 later, but want to share what I've done for 0002.
>
> I did prepare a patch file (attached as .txt to not disturb the cfbot) to 
> apply
> on top of v11 0002 (I just rebased it a bit so that it now applies on top of
> v12 0002).

Hey Bertrand,

all LGTM (good ideas), so here's v13 attached with applied all of that
(rebased, tested). BTW: I'm sending to make cfbot as it still tried to
apply that .patch (on my side it .patch, not .txt)

> === 9
>
> -static bool firstUseInBackend = true;
> +static bool firstNumaTouch = true;
>
> Looks better to me but still not 100% convinced by the name.

IMHO, Yes, it looks much better.

> === 10
>
>  static BufferCachePagesContext *
> -pg_buffercache_init_entries(FuncCallContext *funcctx, PG_FUNCTION_ARGS)
> +pg_buffercache_init_entries(FuncCallContext *funcctx, FunctionCallInfo 
> fcinfo)
>
> as PG_FUNCTION_ARGS is usually used for fmgr-compatible function and there is
> a lof of examples in the code that make use of "FunctionCallInfo" for non
> fmgr-compatible function.

Cool, thanks.

> and also:
>
> === 11
>
> I don't like the fact that we iterate 2 times over NBuffers in
> pg_buffercache_numa_pages().
>
> But I'm having having hard time finding a better approach given the fact that
> pg_numa_query_pages() needs all the pointers "prepared" before it can be 
> called...
>
> Those 2 loops are probably the best approach, unless someone has a better 
> idea.

IMHO, it doesn't hurt and I've also not been able to think of any better idea.

> === 12
>
> Upthread you asked "Can you please take a look again on this, is this up to 
> the
> project standards?"
>
> Was the question about using pg_buffercache_numa_prepare_ptrs() as an inlined 
> wrapper?

Yes, this was for an earlier doubt regarding question "19" about
reviewing the code after removal of `query_numa` variable. This is the
same code for 2 loops, IMHO it is good now.

> What do you think? The comments, doc and code changes are just proposals and 
> are
> fully open to discussion.

They are great, thank You!

-J.
From 1a55056446fff06e0441d8d05a9e84832dbdc821 Mon Sep 17 00:00:00 2001
From: Jakub Wartak 
Date: Fri, 21 Feb 2025 10:19:35 +0100
Subject: [PATCH v13 1/3] Add optional dependency to libnuma (Linux-only) for
 basic NUMA awareness routines and add minimal src/port/pg_numa.c portability
 wrapper. Other platforms can be added later.

This also adds function pg_numa_available() that can be used to check if
the server was linked with NUMA support.

libnuma is unavailable on 32-bit builds, so due to lack of i386 shared object,
we disable it there (it does not make sense anyway on i386 it is very memory
limited platform even with PAE)

Author: Jakub Wartak 
Co-authored-by: Bertrand Drouvot 
Reviewed-by: Andres Freund 
Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com
---
 .cirrus.tasks.yml   |  12 +-
 configure   |  87 ++
 configure.ac|  13 +++
 doc/src/sgml/func.sgml  |  13 +++
 doc/src/sgml/installation.sgml  |  21 
 meson.build |  23 
 meson_options.txt   |   3 +
 src/Makefile.global.in  |   1 +
 src/backend/utils/misc/guc_tables.c |   2 +-
 src/include/catalog/pg_proc.dat |   4 +
 src/include/pg_config.h.in  |   3 +
 src/include/port/pg_numa.h  |  46 
 src/include/storage/pg_shmem.h  |   1 +
 src/makefiles/meson.build   |   3 +
 src/port/Makefile   |   1 +
 src/port/meson.build|   1 +
 src/port/pg_numa.c  | 168 
 17 files changed, 397 insertions(+), 5 deletions(-)
 create mode 100644 src/include/port/pg_numa.h
 create mode 100644 src/port/pg_numa.c

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 5849cbb839a..7010dff7aef 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -445,8 +445,10 @@ task:
 EOF
 
   setup_additional_packages_script: |
-#apt-get update
-#DEBIAN_FRONTEND=noninteractive apt-get -y install ...
+apt-get update
+DEBIAN_FRONTEND=noninteractive apt-get -y install \
+  libnuma1 \
+  libnuma-dev
 
   matrix:
 # SPECIAL:
@@ -471,6 +473,7 @@ task:
 --enable-cassert --enable-injection-points --enable-debug \
 --enable-tap-tests --enable-nls \
 --with-segsize-blocks=6 \
+--with-libnuma \
 \
 ${LINUX_CONFIGURE_FEATURES} \
 \
@@ -519,6 +522,7 @@ task:
  

RE: long-standing data loss bug in initial sync of logical replication

2025-03-17 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

> A few comments:
> ==
> 1.
> +SnapBuildDistributeSnapshotAndInval(SnapBuild *builder, XLogRecPtr
> lsn, TransactionId xid)
> {
> dlist_iter txn_i;
> ReorderBufferTXN *txn;
> + ReorderBufferTXN *curr_txn;
> +
> + curr_txn = ReorderBufferTXNByXid(builder->reorder, xid, false, NULL,
> InvalidXLogRecPtr, false);
> 
> The above is used to access invalidations from curr_txn. I am thinking
> about whether it would be better to expose a new function to get
> invalidations for a txn based on xid instead of getting
> ReorderBufferTXN. It would avoid any layering violation and misuse of
> ReorderBufferTXN by other modules.

Sounds reasonable. I introduced new function ReorderBufferGetInvalidations() 
which
obtains the number of invalidations and its list. ReorderBufferTXN() is not 
exported
anymore.

> 2. The patch has a lot of tests to verify the same points. Can't we
> have one simple test using SQL API based on what Andres presented in
> an email [1]?

You meant that we need to test only the case reported by the Andres, right? New
version did like that. To make the test faster test was migrated to isolation 
tester
instead of the TAP test.

> 3. I have made minor changes in the comments in the attached.

Thanks, I included.

PSA new version for PG 14-master. Special thanks for Hou to minimize the test 
code.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



v19_REL_14-0001-Distribute-invalidatons-if-change-in-cata.patch
Description:  v19_REL_14-0001-Distribute-invalidatons-if-change-in-cata.patch


v19_REL_15-0001-Distribute-invalidatons-if-change-in-cata.patch
Description:  v19_REL_15-0001-Distribute-invalidatons-if-change-in-cata.patch


v19-0001-Distribute-invalidatons-if-change-in-catalog-tab.patch
Description:  v19-0001-Distribute-invalidatons-if-change-in-catalog-tab.patch


v19_REL_17-0001-Distribute-invalidatons-if-change-in-cata.patch
Description:  v19_REL_17-0001-Distribute-invalidatons-if-change-in-cata.patch


v19_REL_16-0001-Distribute-invalidatons-if-change-in-cata.patch
Description:  v19_REL_16-0001-Distribute-invalidatons-if-change-in-cata.patch


Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

2025-03-17 Thread Ashutosh Bapat
On Fri, Mar 14, 2025 at 5:36 PM Amit Langote  wrote:

>
> Thanks for the patch and the extensive benchmarking.
>
> Would you please also share a simple, self-contained example that I
> can use to reproduce and verify the performance improvements? It’s
> helpful to first see the patch in action with a representative query,
> and then refer to the more extensive benchmark results you've already
> shared as needed. I'm also having a hard time telling from the scripts
> which query was used to produce the numbers in the report.
>

Here are steps
1. Run setup.sql attached in [1]. It will add a few helper functions
and create required tables (partitioned and non-partitioned ones).
2. The sheet named "rae data" attached to [1] has queries whose
performance is measured. They use the tables created by setup.sql.

You may further use the same scripts to run benchmarks.

> Btw, in the commit message, you mention:
>
> ===
> When there are thousands of partitions in a partitioned table, there
> can be thousands of derived clauses in the list making it inefficient
> for a lookup.
> ===
>
> I haven’t found a description of how that many clauses end up in the
> ec->ec_derived list. IIUC, it's create_join_clause() where the child
> clauses get added, and it would be helpful to mention that, since that
> also appears to be the hotspot your patch is addressing.

you are right. create_join_clause() adds the derived clauses for
partitions. Please note that the optimization, being modelled after
join rel list, is applicable to partitioned, non-partititoned cases as
well as with or without partitionwise join.

>
> > I think the first patch in the attached set is worth committing, just
> > to tighten things up.
>
> I agree and am happy to commit it if there are no objections.

Thanks.

[1] 
https://www.postgresql.org/message-id/CAExHW5vnwgTgfsCiNM7E4TnkxD1b_ZHPafNe1f041u=o131...@mail.gmail.com


--
Best Wishes,
Ashutosh Bapat




Re: pg_stat_statements and "IN" conditions

2025-03-17 Thread vignesh C
On Mon, 17 Mar 2025 at 13:42, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > On Mon, Mar 17, 2025 at 12:07:44PM GMT, vignesh C wrote:
> >
> > I noticed that the feedback from Sami at [1] has not yet been
> > addressed, I have changed the status to Waiting on Author, kindly
> > address them and update the status to Needs review.
> > [1] - 
> > https://www.postgresql.org/message-id/CAA5RZ0vt29Om%2BtKFOcUNhXV%2BkKpNnj0yj6OFho3-wngcMHWnAQ%40mail.gmail.com
>
> I'm afraid there is a disagreement about this part of the feedback. I'm
> not yet convinced about the idea suggested over there (treating mutable
> functions in the same way as constants) and not planning to change
> anything, at least not in the current version of the patch.

@Sami Imseih Do you have any other suggestions to solve this?
Others: Any thoughts on which way is better in this case?

Regards,
Vignesh




  1   2   >