date:20250408

RE: [WIP]Vertical Clustered Index (columnar store extension) - take2

2025-04-08 Thread Aya Iwata (Fujitsu)

Hi Yura san,


> I just don't get, why it should be "in-memory"? All the same things you
> describe further, but storing in paged index on-disk with caching
> through shared_buffers - why this way it wouldn't work?

We make the columnar store resident in memory for maximum search performance.
But I'm not very particular about this. Comments are welcome.

Best regards,
Aya Iwata
FUJITSU LIMITED


> -Original Message-
> From: Yura Sokolov 
> Sent: Wednesday, January 15, 2025 11:44 PM
> To: pgsql-hackers@lists.postgresql.org
> Subject: Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
> 
> 07.10.2024 17:53, Aya Iwata (Fujitsu) wrote:
> > Hi All,
> >
> > Suggestions
> >
> > ==
> >
> > When analyzing real-time data collected by PostgreSQL,
> >
> > it can be difficult to tune the current PostgreSQL server for
> > satisfactory performance.
> >
> > Therefore, we propose Vertical Clustered Indexing (VCI), an in-memory
> > column store function that holds data in a state suitable for business
> > analysis and is also expected to improve analysis performance.
> 
> I just don't get, why it should be "in-memory"? All the same things you
> describe further, but storing in paged index on-disk with caching
> through shared_buffers - why this way it wouldn't work?
> 
>

Feature Recommendations for Logical Subscriptions

2025-04-08 Thread YeXiu

Business Scenario:
The BI department requires real-time data from the operational database. In our 
current approach (on platform 14), we create a separate database within our 
department's real-time backup instance, set up a logical replication account, 
replicate required tables to this isolated database via logical replication, 
and then create a dedicated account with column-level permissions on specific 
tables for the BI department.


Recommendations:


1、Column Filtering Functionality‌: During implementation, some tables may 
contain sensitive data or long fields (e.g., text columns), while other fields 
in these tables still need to be replicated to another database or instance. 
Manually specifying individual columns becomes cumbersome, especially for 
tables with many fields, and complicates future field additions. We recommend 
adding a ‌column filtering feature‌ to logical replication to streamline this 
process.


2、Logical Replication Account Permissions‌:
Logical replication permissions should be decoupled from general database 
access permissions.
Proposed workflow:
Create a dedicated account with ‌logical replication privileges only‌.
Create a logical replication slot and grant this account access ‌only to the 
authorized replication slot‌.
This account would have no direct access to the database itself but could 
subscribe to and consume data from the permitted replication slot.
This approach allows securely providing the account to the BI department. They 
can subscribe to the replication slot and perform downstream processing 
independently, without risking exposure of unauthorized data.




YeXiu
1518981...@qq.com

Re: Extend ALTER DEFAULT PRIVILEGES for large objects

2025-04-08 Thread Yugo NAGATA

On Fri, 4 Apr 2025 19:18:11 +0900
Fujii Masao  wrote:

> 
> 
> On 2025/04/04 0:21, Fujii Masao wrote:
> > Thanks for updating the patch!
> > 
> > If there are no objections, I'll proceed with committing it using the 
> > following commit log.
> 
> I've pushed the patch. Thanks!

Thank you!

 
> While testing the feature, I noticed that psql doesn't complete
> "ALTER DEFAULT PRIVILEGES GRANT/REVOKE ... ON LARGE OBJECTS" or
> "GRANT/REVOKE ... ON LARGE OBJECT ..." with TO/FROM. The attached
> patch adds tab-completion support for both cases.

This patch looks good to me. This works as expected.

While looking into this patch, I found that the tab completion suggests
TO/FROM even after "LARGE OBJECT", but it is not correct because
there should be largeobject id at that place. This is same for the
"FOREIGN SERVER", server names should be suggested ratar than TO/FROM
in this case. 

The additional patch 0002 fixed to prevents to suggest TO or FROM right
after LARGE OBJECT or FOREIGN SERVER. Also, it allows to suggest list of
foreign server names after FOREIGN SERVER.

The 0001 patch is the same you proposed.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
>From 64d5aead5ab080d40fa85f9bd51cb0a09490266f Mon Sep 17 00:00:00 2001
From: Yugo Nagata 
Date: Tue, 8 Apr 2025 12:15:45 +0900
Subject: [PATCH 2/2] psql: Some improvement of tab completion for GRANT/REVOKE

This prevents to suggest TO or FROM just after LARGE OBJECT
or FOREIGN SERVER because there should be largeobject id or
server name at that place. Also, it allows to suggest list of
foreign server names after FOREIGN SERVER.
---
 src/bin/psql/tab-complete.in.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index bdeb95fb3c8..c58ed27e872 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -4567,10 +4567,18 @@ match_previous_words(int pattern_id,
 	else if (Matches("ALTER", "DEFAULT", "PRIVILEGES", MatchAnyN, "TO", MatchAny))
 		COMPLETE_WITH("WITH GRANT OPTION");
 	/* Complete "GRANT/REVOKE ... ON * *" with TO/FROM */
-	else if (Matches("GRANT", MatchAnyN, "ON", MatchAny, MatchAny))
-		COMPLETE_WITH("TO");
-	else if (Matches("REVOKE", MatchAnyN, "ON", MatchAny, MatchAny))
-		COMPLETE_WITH("FROM");
+	else if (Matches("GRANT/REVOKE", MatchAnyN, "ON", MatchAny, MatchAny))
+	{
+		if (TailMatches("FOREIGN", "SERVER"))
+			COMPLETE_WITH_QUERY(Query_for_list_of_servers);
+		else if (!TailMatches("LARGE", "OBJECT"))
+		{
+			if (Matches("GRANT", MatchAnyN, "ON", MatchAny, MatchAny))
+COMPLETE_WITH("TO");
+			else
+COMPLETE_WITH("FROM");
+		}
+	}
 
 	/* Complete "GRANT/REVOKE * ON ALL * IN SCHEMA *" with TO/FROM */
 	else if (TailMatches("GRANT|REVOKE", MatchAny, "ON", "ALL", MatchAny, "IN", "SCHEMA", MatchAny) ||
-- 
2.34.1

>From 8402036dcc5e32a249a7af0fb9e27c31ba8b72a6 Mon Sep 17 00:00:00 2001
From: Fujii Masao 
Date: Fri, 4 Apr 2025 10:51:20 +0900
Subject: [PATCH 1/2] psql: Improve psql tab completion for GRANT/REVOKE on
 large objects.

This commit enhances psql's tab completion to support TO/FROM
after "GRANT/REVOKE ... ON LARGE OBJECT ...". Additionally,
since "ALTER DEFAULT PRIVILEGES" now supports large objects,
tab completion is also updated for "GRANT/REVOKE ... ON LARGE OBJECTS"
with TO/FROM.
---
 src/bin/psql/tab-complete.in.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index c916b9299a8..bdeb95fb3c8 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -4602,6 +4602,26 @@ match_previous_words(int pattern_id,
 			COMPLETE_WITH("FROM");
 	}
 
+	/* Complete "GRANT/REVOKE * ON LARGE OBJECT *" with TO/FROM */
+	else if (TailMatches("GRANT|REVOKE", MatchAny, "ON", "LARGE", "OBJECT", MatchAny) ||
+			 TailMatches("REVOKE", "GRANT", "OPTION", "FOR", MatchAny, "ON", "LARGE", "OBJECT", MatchAny))
+	{
+		if (TailMatches("GRANT", MatchAny, MatchAny, MatchAny, MatchAny, MatchAny))
+			COMPLETE_WITH("TO");
+		else
+			COMPLETE_WITH("FROM");
+	}
+
+	/* Complete "GRANT/REVOKE * ON LARGE OBJECTS" with TO/FROM */
+	else if (TailMatches("GRANT|REVOKE", MatchAny, "ON", "LARGE", "OBJECTS") ||
+			 TailMatches("REVOKE", "GRANT", "OPTION", "FOR", MatchAny, "ON", "LARGE", "OBJECTS"))
+	{
+		if (TailMatches("GRANT", MatchAny, MatchAny, MatchAny, MatchAny))
+			COMPLETE_WITH("TO");
+		else
+			COMPLETE_WITH("FROM");
+	}
+
 /* GROUP BY */
 	else if (TailMatches("FROM", MatchAny, "GROUP"))
 		COMPLETE_WITH("BY");
-- 
2.34.1

Re: [PATCH] clarify palloc comment on quote_literal_cstr

2025-04-08 Thread Ashutosh Bapat

On Tue, Apr 8, 2025 at 10:49 AM Michael Paquier  wrote:
>
> On Tue, Apr 08, 2025 at 08:47:53AM +0530, Ashutosh Bapat wrote:
> > Thanks for addressing the comment.
> >
> > In PG code, we start a multiline comment with just /* on the first
> > line and end with just */ on the last line. All the lines in-between
> > start with * . Please check other comments in the file.
> >
> > I would write a., b. c. d. on separate lines.
>
> Hmm.  I don't think that grouping all these details in the same
> comment block is an improvement in this case compared to the first
> version, where it is clear which part of the calculation applies to
> what.  In short, I'm OK with how things are on HEAD.

Ah! I didn't realize it was committed already.

-- 
Best Wishes,
Ashutosh Bapat

Re: BAS_BULKREAD vs read stream

2025-04-08 Thread Jakub Wartak

On Sun, Apr 6, 2025 at 10:15 PM Andres Freund  wrote:
>
> Hi,
[..]
> The obvious solution to that would be to increase BAS_BULKREAD substantially
> above 256kB.
>
> For quite a while was worried about increasing the size, because somewhere (I
> couldn't find it while writing this email, will add the reference once I
> refound it) we have a comment explaining that a small size was chosen because
> it helps with CPU cache efficiency.

Hi, FWIW, I was trying to understand the scope of this change and
GetAccessStrategy() actually asks to go to
src/backend/storage/buffer/README which explains the logic behind the
old (pre-commit now) rationale and value. It says
```
For sequential scans, a 256KB ring is used. That's small enough to fit in L2
cache, which makes transferring pages from OS cache to shared buffer cache
efficient.
```

-J.

Re: track generic and custom plans in pg_stat_statements

2025-04-08 Thread Ilia Evdokimov

After the introduction of pg_overexplain extension and Robert's comment 
[0], I'm starting to have doubts about whether it's still appropriate to 
add this information to EXPLAIN itself. If there are compelling reasons 
that showing the plan type would be broadly useful to users in EXPLAIN, 
I’m happy to proceed. Otherwise, perhaps this is something better suited 
for extension.


[0]: 
https://www.postgresql.org/message-id/CA%2BTgmoZ8qXiZmmn4P9Mk1cf2mjMMLFPOjSasCjuKSiHFcm-ncw%40mail.gmail.com


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: Some problems regarding the self-join elimination code

2025-04-08 Thread Andrei Lepikhov


On 4/2/25 15:26, Richard Guo wrote:

While working on another patch, I looked at ChangeVarNodes() and found
some issues introduced by the self-join elimination commit.  I think
it'd better to fix or improve them.

* ChangeVarNodes_walker includes special handling for RestrictInfo
nodes.  And it adjusts RestrictInfo.orclause only if the rt_index of
the relation to be removed is contained in RestrictInfo.clause_relids.

I don't think this is correct.  Even if the to-be-removed relation is
not present in clause_relids, it can still be found somewhere within
the orclause.  As an example, consider:
Yeah, discovering how we tolerated such a gaffe I found that the comment 
on the clause_relids field is quite vague here:


"The relids (varnos+varnullingrels) actually referenced in the clause:"

and only in the RestrictInfo description you can find some additional 
information. I think we should modify the check, uniting clause_relids 
and required_relids before searching for the removing relid.



+  rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);

I don't think this is correct either.  num_base_rels is the number of
base rels in clause_relids and should not count outer-join relids.
And we have no guarantee that clause_relids does not include any
outer-join relids.

It is a clear mistake, no apologies to me.


To fix, I think we should exclude all outer-join relids.  Perhaps we
can leverage root->outer_join_rels to achieve this.
I think we may use a difference in size of rinfo->clause_relids before 
and after adjustion. If something were removed, just decrease num_base_rels.



* Speaking of comments, I’ve noticed that some changes are lacking
comments.  For example, there are almost no comments regarding the
special handling of RestrictInfo nodes in ChangeVarNodes_walker,
except for an explanation about adding the NullTest.

Ok, it seems that comments were removed too hastily. Not addressed it yet.

I also added little code refactoring to make it more readable. See 
fix.diff in attachment as my proposal for future discussion.


--
regards, Andrei Lepikhovdiff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index 1c61085a0a7..df96b1bba6a 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -637,41 +637,58 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
 	}
 	if (IsA(node, RestrictInfo))
 	{
-		RestrictInfo *rinfo = (RestrictInfo *) node;
-		int			relid = -1;
-		bool		is_req_equal =
-			(rinfo->required_relids == rinfo->clause_relids);
-		bool		clause_relids_is_multiple =
-			(bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
-
-		if (bms_is_member(context->rt_index, rinfo->clause_relids))
+		RestrictInfo   *rinfo = (RestrictInfo *) node;
+		intrelid = -1;
+		bool			is_req_equal =
+			(rinfo->required_relids == rinfo->clause_relids);
+		bool			is_multiple =
+		(bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+
+		if (bms_is_member(context->rt_index, rinfo->clause_relids
+	/*bms_union(rinfo->clause_relids, rinfo->required_relids)*/))
 		{
-			expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
-			expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
+			Relids new_clause_relids;
 
-			rinfo->clause_relids =
-adjust_relid_set(rinfo->clause_relids, context->rt_index, context->new_index);
+			expression_tree_walker((Node *) rinfo->clause,
+   ChangeVarNodes_walker, (void *) context);
+			expression_tree_walker((Node *) rinfo->orclause,
+   ChangeVarNodes_walker, (void *) context);
+
+			new_clause_relids = adjust_relid_set(rinfo->clause_relids,
+ context->rt_index,
+ context->new_index);
+
+			/* This operation is legal until we remove only baserels */
+			rinfo->num_base_rels -= bms_num_members(rinfo->clause_relids) -
+			bms_num_members(new_clause_relids);
+
+			rinfo->clause_relids = new_clause_relids;
 			rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
-			rinfo->left_relids =
-adjust_relid_set(rinfo->left_relids, context->rt_index, context->new_index);
-			rinfo->right_relids =
-adjust_relid_set(rinfo->right_relids, context->rt_index, context->new_index);
+			rinfo->left_relids = adjust_relid_set(rinfo->left_relids,
+  context->rt_index,
+  context->new_index);
+			rinfo->right_relids = adjust_relid_set(rinfo->right_relids,
+   context->rt_index,
+   context->new_index);
 		}
 
 		if (is_req_equal)
 			rinfo->required_relids = rinfo->clause_relids;
 		else
-			rinfo->required_relids =
-adjust_relid_set(rinfo->required_relids, context->rt_index, context->new_index);
+			rinfo->required_relids = adjust_relid_set(rinfo->required_relids,
+	  context->rt_index,
+	  context->new_index);
 
-		rinfo->outer_relids =
-			adjust_relid_set(rinfo->outer_relids, context

Re: CREATE DATABASE with filesystem cloning

2025-04-08 Thread Thomas Munro

On Wed, Mar 26, 2025 at 12:47 AM Nazir Bilal Yavuz  wrote:
> Rebased version of the patch is attached.

I noticed that it wasn't working on a Mac I tested because
guc_tables.c needed to include  and fixed that.  I also
did some minor editig of the docs, mostly removing the mention of
specific file systems because that's bound to be incomplete (it
already was to my knowledge) so I think it's better to simply say what
system calls it uses, since the effects will vary.

And pushed.  Thanks for working on this!  I think this might be a
popular feature for some users.

Re: Move tests of contrib/spi/ out of the core regression tests

2025-04-08 Thread Heikki Linnakangas


On 08/04/2025 04:59, Tom Lane wrote:

The attached patch removes test cases concerned with contrib/spi/
from the core regression tests and instead puts them into new
test files in contrib/spi/ itself.


+1


My original motivation for looking at this was the discussion in
[1] about whether to remove contrib/spi/refint.c entirely, since
it's rather buggy and not a great example of our modern coding
practices.  So I wondered whether the core test cases that use it
were contributing any significant amount of code coverage on the
core code.  (Spoiler: nope.)  But I think this is generally good
cleanup anyway, because it locates the test code for contrib/spi
where a person would expect to find that, and removes some rather
grotty coding in src/test/regress's Makefile and meson.build.
As a side benefit, it removes some small number of cycles from
the core tests, which seems like a good thing.

The tests for the refint module are just moved over verbatim,
except for using CREATE EXTENSION instead of manual declaration
of the C functions.  Also, I kept the tests for COMMENT ON TRIGGER
in the core tests, by applying them to a different trigger.

The tests for autoinc were kind of messy, because the behavior of
autoinc itself was impossibly intertwined with the behavior of
"ttdummy", which is an undocumented C function in regress.c.
After some thought I decided we could just nuke ttdummy altogether,
so the new autoinc.sql test is much simpler and more straightforward.


My only worry would if 'ttdummy' was a good showcase for how to write a 
trigger function in C. But it's not a particularly good example. There 
is a better example in the docs, and there's the 'autoinc' trigger too.



This is too late for v18 of course, so I'll park it in the next CF.


In my opinion this would still be totally OK for v18. It's just test 
changes. I would rather commit cleanups like this sooner than later.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Horribly slow pg_upgrade performance with many Large Objects

2025-04-08 Thread Hannu Krosing

Looked like a bit illogical order on re-reading it so I want to make
clear that the pg_upgrade-like test showing 100min for 100 million LOs
is at the end of last message and the proposed solution is at the
beginning

On Tue, Apr 8, 2025 at 9:15 AM Hannu Krosing  wrote:
>
> I was testing on version 17
>
>
> On Tue, Apr 8, 2025 at 6:52 AM Michael Paquier  wrote:
> >
> > On Mon, Apr 07, 2025 at 05:25:32PM -0400, Tom Lane wrote:
> > > What version are you testing?  We did some work in that area in the
> > > v17 cycle (a45c78e32).
> >
> > I am puzzled by the target version used here, as well.
>
> I was testing on version 17
>
> Here is how you can easily test too (as --binary-upgrade does not dump
> the actual data it is ok for the test to not put anything there)
>
> hannuk@db01-c1a:~/work/lo-testing$ createdb -p 5433 lodb
> hannuk@db01-c1a:~/work/lo-testing$ psql -p 5433 lodb
> psql (17.4 (Ubuntu 17.4-1.pgdg22.04+2))
> Type "help" for help.
>
> lodb=# insert into pg_largeobject_metadata(oid, lomowner) SELECT i,
> 16384 FROM generate_series(1, 100_000_000) g(i);
> INSERT 0 1
> Time: 162414.216 ms (02:42.414)
> lodb=#
> \q
>
> hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --data-only -t
> pg_largeobject_metadata -p 5433 lodb | gzip >
> pg_largeobject_metadata.data.gz
> real 0m22.094s
> user 0m20.741s
> sys 0m2.085s
>
> hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --data-only -t
> pg_largeobject_metadata --format=custom -p 5433 lodb -f
> pg_largeobject_metadata.dump
> real 0m20.226s
> user 0m18.068s
> sys 0m0.824s
>
> >  If there is
> > more that can be improved, v19 would be the version to consider for
> > future improvements at this stage.
>
> If the internal format has changed in 16 the correct way would be to
> go through the data-only dump of pg_largeobject_metadata in all cases.
> Even for the 100M case where you get the restore in 2 minutes instead
> of 100 minutes
>
> hannuk@db01-c1a:~/work/lo-testing$ createdb -p 5434 lodb
> hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434
> --exit-on-error --transaction-size=1000 --dbname lodb
> pg_largeobject_metadata.dump
>
> real 2m2.277s
> user 0m2.594s
> sys 0m0.549s
>
> And even in case of the user-visible format change in acl format it is
> most likely that changing the visible format using some regexp magic,
> or even a dedicated function, would still me much faster than creating
> all the LOs though creation commands.
>
> --
> The commands I used to do the pg_upgrade-like test were
>
> hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --schema-only
> --quote-all-identifiers --binary-upgrade --format=custom
> --file=lodb100m.dump -p 5433 lodb
> real 1m58.241s
> user 0m35.229s
> sys 0m17.854s
>
> hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434
> --exit-on-error --transaction-size=1000 --dbname lodb lodb100m.dump
> real100m54.878s
> user3m23.885s
> sys 20m33.761s
>
> (I left out the --verbose part that pg_upgrade also sets as I did not
> want to get 100M lines of  "large object created " messages )
>
> also the postgres server at -p 5434 needs to be started with -b flag
> to accept the loading a dump from --binary-upgrade. In Debian/Ubuntu
> this can be directly passed to pg_ctlcluster as follows
>
> sudo pg_ctlcluster 17 target -o -b
>
> 
> Hannu
>
>
>
>
> > --
> > Michael

RE: [WIP]Vertical Clustered Index (columnar store extension) - take2

2025-04-08 Thread Aya Iwata (Fujitsu)

Hi Alvaro san,

I am sorry for my late reply. I continue to work on proposing VCI feature to 
the community.

> I think this is definitely an important and welcome development.
> I'm looking forward to patches in this area.

Thank you!
I am currently preparing to share VCI designs with PGConf.dev.
I look forward to sharing more about VCI with you.


Best regards,
Aya Iwata
FUJITSU LIMITED

Re: Horribly slow pg_upgrade performance with many Large Objects

2025-04-08 Thread Hannu Krosing

On Tue, Apr 8, 2025 at 12:17 AM Nathan Bossart  wrote:
>
> On Mon, Apr 07, 2025 at 10:33:47PM +0200, Hannu Krosing wrote:
> > The obvious solution would be to handle the table
> > `pg_largeobject_metadata` the same way as we currently handle
> > `pg_largeobject `by not doing anything with it in `pg_dump
> > --binary-upgrade` and just handle the contents it like we do for user
> > tables in pg_upgrade itself.
> >
> > This should work fine for all source database versions starting from PgSQL 
> > v12.
>
> Unfortunately, the storage format for aclitem changed in v16, so this would
> need to be restricted to upgrades from v16 and newer.

Have we also changed the external format of aclitem any time since v
9.2 or are the changes just to storage ?

If external formats have been stable we can still get reasonable
performance with dumping the data (2 min for 100M rows)
Plus dumping data would work for all the supported source versions.

The worst case would still be quite bad with 80+ min for the full set
of 4 billion LOs but even that would be much better than the 3 days
with current wayd.

> That being said, I
> regularly hear about slow upgrades with many LOs, so I think it'd be
> worthwhile to try to improve matters in v19.

Changing the LO export to dumping pg_largeobject_metadata content
instead of creating the LOs should be a nice small change confined to
pg_dump --binary-upgrade only so perhaps we could squeeze it in v18
still.

--
Hannu

Re: Fix 035_standby_logical_decoding.pl race conditions

2025-04-08 Thread Andres Freund

Hi,

On 2025-04-08 02:00:35 +, Hayato Kuroda (Fujitsu) wrote:
> > I have changed quite a few comments and commit message for the PG17
> > patch in the attached. Can you update PG16 patch based on this and
> > also use the same commit message as used in attached for all the three
> > patches?
> 
> Your patch looks good to me and it could pass on my env. PSA patches for PG16.
> Patch for PG17 is not changed, just renamed.

Thanks all for working together to for fix this.  These test failures were
really rather painful!

Now we just need to fix the issue that causes random CI failures on windows
and the one that causes similar, but different, random failures on macos...

Greetings,

Andres Freund

Re: [PATCH] clarify palloc comment on quote_literal_cstr

2025-04-08 Thread Ashutosh Bapat

On Tue, Apr 8, 2025 at 12:08 AM Steve Chavez  wrote:
>
> I haven't found a similar style of comment on any other function call.
>
> I've attached a new patch using the style you suggest.
>
> That being said, I do find the first form much more readable, but I 
> understand this is a debatable subject.

Thanks for addressing the comment.

In PG code, we start a multiline comment with just /* on the first
line and end with just */ on the last line. All the lines in-between
start with * . Please check other comments in the file.

I would write a., b. c. d. on separate lines.

-- 
Best Wishes,
Ashutosh Bapat

Re: BUG #18815: Logical replication worker Segmentation fault

2025-04-08 Thread Amit Kapila

On Mon, Apr 7, 2025 at 3:28 PM Zhijie Hou (Fujitsu)
 wrote:
>
> > What is the
> > behavior of conflict reporting code in case of exclusion constraints?
>
> Under logical replication context, since we do not detect conflicts for 
> exclusion
> constraints, it would simply report the original constraint violation ERROR.
>

Fair enough. On considering it again, I find your idea of building
conflict-related information when it is actually required sounds
better, as it may also save us performance in some corner cases.

-- 
With Regards,
Amit Kapila.

Remove unnecessary static type qualifiers

2025-04-08 Thread Japin Li



Hi, all

When I read the libpq source code, I found unnecessary static type qualifiers
in PQsetClientEncoding().

diff --git a/src/interfaces/libpq/fe-connect.c 
b/src/interfaces/libpq/fe-connect.c
index 0258d9ace3c..300ddfffd55 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -7717,7 +7717,7 @@ int
 PQsetClientEncoding(PGconn *conn, const char *encoding)
 {
charqbuf[128];
-   static const char query[] = "set client_encoding to '%s'";
+   const char  query[] = "set client_encoding to '%s'";
PGresult   *res;
int status;
 


-- 
Regrads,
Japin Li

Re: Can we use Statistics Import and Export feature to perforamance testing?

2025-04-08 Thread David Rowley

On Tue, 8 Apr 2025 at 12:21, Ryohei Takahashi (Fujitsu)
 wrote:
> By using Statistics Import and Export feature, is it possible to achieve the 
> above request by following procedure?
> (1) Export the statistics from production environment by using pg_dump 
> --statistics-only.
> (2) On the staging environment, set the autovacuum related parameters to 
> prevent autovacuum from running.
> (3) Import the statistics to staging environment by using the result of (1).

You could certainly test the performance, but this method isn't
guaranteed to give meaningful results just because the table stats
match. One important thing to remember is that the planner also looks
at the *actual size* of the relation and takes that into account when
scaling the statistics (see table_block_relation_estimate_size() in
tableam.c). If the table sizes don't match between the two servers
then there's no guarantees the planner will produce the same plan.

Also, there might be other subtleties regarding OIDs of indexes which
are not guaranteed to be the same after dump/restore. Given some
fuzzily close enough cost estimates (See add_path() and
compare_path_costs_fuzzily()), it is possible a plan would switch to
using another index if sorting the indexes by their OIDs didn't match
on each server. The chances of that might be fairly small, but not
zero.

You'd also need to ensure the configs are the same in terms of GUCs
that are used for costs.

You could probably use get_relation_info_hook to overwrite the sizes
and make sure the indexes are in the same order, etc.

David

Re: Horribly slow pg_upgrade performance with many Large Objects

2025-04-08 Thread Hannu Krosing

I was testing on version 17


On Tue, Apr 8, 2025 at 6:52 AM Michael Paquier  wrote:
>
> On Mon, Apr 07, 2025 at 05:25:32PM -0400, Tom Lane wrote:
> > What version are you testing?  We did some work in that area in the
> > v17 cycle (a45c78e32).
>
> I am puzzled by the target version used here, as well.

I was testing on version 17

Here is how you can easily test too (as --binary-upgrade does not dump
the actual data it is ok for the test to not put anything there)

hannuk@db01-c1a:~/work/lo-testing$ createdb -p 5433 lodb
hannuk@db01-c1a:~/work/lo-testing$ psql -p 5433 lodb
psql (17.4 (Ubuntu 17.4-1.pgdg22.04+2))
Type "help" for help.

lodb=# insert into pg_largeobject_metadata(oid, lomowner) SELECT i,
16384 FROM generate_series(1, 100_000_000) g(i);
INSERT 0 1
Time: 162414.216 ms (02:42.414)
lodb=#
\q

hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --data-only -t
pg_largeobject_metadata -p 5433 lodb | gzip >
pg_largeobject_metadata.data.gz
real 0m22.094s
user 0m20.741s
sys 0m2.085s

hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --data-only -t
pg_largeobject_metadata --format=custom -p 5433 lodb -f
pg_largeobject_metadata.dump
real 0m20.226s
user 0m18.068s
sys 0m0.824s

>  If there is
> more that can be improved, v19 would be the version to consider for
> future improvements at this stage.

If the internal format has changed in 16 the correct way would be to
go through the data-only dump of pg_largeobject_metadata in all cases.
Even for the 100M case where you get the restore in 2 minutes instead
of 100 minutes

hannuk@db01-c1a:~/work/lo-testing$ createdb -p 5434 lodb
hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434
--exit-on-error --transaction-size=1000 --dbname lodb
pg_largeobject_metadata.dump

real 2m2.277s
user 0m2.594s
sys 0m0.549s

And even in case of the user-visible format change in acl format it is
most likely that changing the visible format using some regexp magic,
or even a dedicated function, would still me much faster than creating
all the LOs though creation commands.

--
The commands I used to do the pg_upgrade-like test were

hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --schema-only
--quote-all-identifiers --binary-upgrade --format=custom
--file=lodb100m.dump -p 5433 lodb
real 1m58.241s
user 0m35.229s
sys 0m17.854s

hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434
--exit-on-error --transaction-size=1000 --dbname lodb lodb100m.dump
real100m54.878s
user3m23.885s
sys 20m33.761s

(I left out the --verbose part that pg_upgrade also sets as I did not
want to get 100M lines of  "large object created " messages )

also the postgres server at -p 5434 needs to be started with -b flag
to accept the loading a dump from --binary-upgrade. In Debian/Ubuntu
this can be directly passed to pg_ctlcluster as follows

sudo pg_ctlcluster 17 target -o -b


Hannu




> --
> Michael

Re: Draft for basic NUMA observability

2025-04-08 Thread Tomas Vondra

On 4/8/25 01:26, Shinoda, Noriyoshi (SXD Japan FSI) wrote:
> Hi, 
> 
> Thanks for developing this great feature. 
> The manual says that the 'size' column of the pg_shmem_allocations_numa view 
> is 'int4', but the implementation is 'int8'. 
> The attached small patch fixes the manual.
> 

Thank you for noticing this and for the fix! Pushed.

This also reminded me we agreed to change page_num to bigint, which I
forgot to change before commit. So I adjusted that too, separately.

regards

-- 
Tomas Vondra

Re: BAS_BULKREAD vs read stream

2025-04-08 Thread Andres Freund

Hi,

On 2025-04-08 18:11:04 +1200, Thomas Munro wrote:
> On Tue, Apr 8, 2025 at 2:20 PM Andres Freund  wrote:
> > In the attached I implemented the above idea, with some small additional
> > refinements:
> 
> LGTM.

Thanks for checking.


> How I wish EXPLAIN would show some clues about strategies...

Indeed. There will be some interesting piercing of layers to make that work...

Greetings,

Andres Freund

Re: Enhancing Memory Context Statistics Reporting

2025-04-08 Thread Daniel Gustafsson

> On 8 Apr 2025, at 10:03, Daniel Gustafsson  wrote:

> There was a bug in the shmem init function which caused it to fail on Windows,
> the attached fixes that.

With this building green in CI over several re-builds, and another pass over
the docs and code with pgindent etc done, I pushed this earlier today.  A few
BF animals have built green so far but I will continue to monitor it.

--
Daniel Gustafsson

RE: Can we use Statistics Import and Export feature to perforamance testing?

2025-04-08 Thread Ryohei Takahashi (Fujitsu)

Hi,


Thank you for your reply.
I understand that the access plans are not guaranteed to be the same.

Can we add these notes to the pg_dump page in the PostgreSQL Documentation
in order to prevent users from asking the same question?

Regards,
Ryohei Takahashi

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-08 Thread Bruce Momjian

On Tue, Apr  8, 2025 at 08:04:22AM -0700, Jacob Champion wrote:
> On Tue, Apr 8, 2025 at 7:32 AM Bruce Momjian  wrote:
> > Uh, where are we on the inclusion of curl in our build?  Maybe it was
> > explained but I have not seen it.
> 
> The above is discussing a patch to split this into its own loadable
> module. Andres and Christoph's feedback has been shaping where we put
> that module, exactly.

Uh, I was afraid that was the case, which is why I asked.  We have just
hit feature freeze, so it is not good that we are still "shaping" the
patch.  Should we consider reverting it?  It is true we still "adjust"
patches after feature freeze.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: Feature freeze

2025-04-08 Thread Tom Lane

Daniel Gustafsson  writes:
> On 8 Apr 2025, at 16:59, Bruce Momjian  wrote:
>> Frankly, I think the name "anywhere on Earth" is confusing, since it
>> really is "everywhere on Earth":

> I find both of the above needlessly confusing when we instead could use UTC
> which is a more universally understood concept.

Yeah, I always have to mentally translate "0:00 AoE" to "noon UTC",
and then I can figure out when it is.  I'd prefer we used the UTC
formulation.

regards, tom lane

Re: Draft for basic NUMA observability

2025-04-08 Thread Andres Freund

Hi,

On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote:
> On Mon, 7 Apr 2025 at 23:00, Tomas Vondra  wrote:
> > I'll let the CI run the tests on it, and
> > then will push, unless someone has more comments.
> >
> 
> 
> Hi! I noticed strange failure after this commit[0]
> 
> Looks like it is related to 65c298f61fc70f2f960437c05649f71b862e2c48
> 
> In file included from  [01m [K../pgsql/src/include/postgres.h:49 [m [K,
>  from  [01m [K../pgsql/src/port/pg_numa.c:16 [m [K:
>  [01m [K../pgsql/src/include/utils/elog.h:79:10: [m [K
>  [01;31m [Kfatal error:  [m [Kutils/errcodes.h: No such file or
> directory
>79 | #include  [01;31m [K"utils/errcodes.h" [m [K
>   |   [01;31m [K^~ [m [K
> compilation terminated.

$ ninja -t missingdeps
Missing dep: src/port/libpgport.a.p/pg_numa.c.o uses 
src/include/utils/errcodes.h (generated by CUSTOM_COMMAND)
Missing dep: src/port/libpgport_shlib.a.p/pg_numa.c.o uses 
src/include/utils/errcodes.h (generated by CUSTOM_COMMAND)
Processed 2384 nodes.
Error: There are 2 missing dependency paths.
2 targets had depfile dependencies on 1 distinct generated inputs (from 1 
rules)  without a non-depfile dep path to the generator.
There might be build flakiness if any of the targets listed above are built 
alone, or not late enough, in a clean output directory.

I think it's not right that something in src/port defines an SQL callable
function. The set of headers for that pull in a lot of things.

Since the file relies on things like GUCs, I suspect this should be in
src/backend/port or such instead.

Greetings,

Andres Freund

Re: Move tests of contrib/spi/ out of the core regression tests

2025-04-08 Thread Andrew Dunstan




On 2025-04-08 Tu 5:21 AM, Heikki Linnakangas wrote:




This is too late for v18 of course, so I'll park it in the next CF.


In my opinion this would still be totally OK for v18. It's just test 
changes. I would rather commit cleanups like this sooner than later.



Yeah, that was my reaction too.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Some problems regarding the self-join elimination code

2025-04-08 Thread Dean Rasheed

On Tue, 8 Apr 2025 at 12:31, Andrei Lepikhov  wrote:
>
> On 4/4/25 09:37, Richard Guo wrote:
> > On Wed, Apr 2, 2025 at 10:26 PM Richard Guo  wrote:
> > With more exposure to the changes in ChangeVarNodes(), I have some
> > more concerns.  As I understand it, ChangeVarNodes() is designed to
> > update the varno fields of Var nodes in the given tree that belong to
> > the specific relation; neat and clear.  However, now, within this
> > function, we also create new NullTest quals for SJE, which doesn't
> > seem like something this function should be handling.  It makes this
> > function look a bit like a kludge.
> To be precise, inside the function we replace old clause with the
> NullTest, because relid replacement action made it degenerate. It seems
> essential to me and I think it may be ok to add a comment at the top of
> ChangeVarNodes, describing this minor optimisation.
>

I recall raising the same concerns when I originally reviewed the
patch. I don't think this code belongs in the rewriter, because it's
very specific to how SJE wants to handle these kinds of nodes.

Note also that RestrictInfo is defined in nodes/pathnodes.h, which
describes its contents as internal data structures for the planner,
suggesting that the rewriter has no business processing those kinds of
nodes.

I don't think simply adding a comment addresses those concerns.

> > Additionally, I have a minor suggestion regarding the new boolean
> > parameter, "change_RangeTblRef".  AFAIU, by convention, we typically
> > use flags to control the behavior of walker functions, like in
> > pull_var_clause.  While it's fine to use a boolean parameter here
> > given that we currently only have one flag, for the sake of future
> > extensibility, I think using flags might be a better choice.
> That was exactly why I wasn't happy with replacing the change_relid
> routine with ChangeVarNodes.
> But variant with a flag seems non-trivial to implement. Do you have any
> proposal about the code?
>

Perhaps the way to do it is to make ChangeVarNodesExtended() take a
callback function to be invoked for each node visited. The callback
(which would then be in the planner, with the other SJE code) would do
any special handling it needed (for RangeTblRef and RestrictInfo
nodes), and call ChangeVarNodes_walker() for any other types of node,
to get the default behaviour.

Regards,
Dean

Re: prevent 006_transfer_modes.pl from leaving files behind

2025-04-08 Thread Tom Lane

Andrew Dunstan  writes:
> On 2025-04-07 Mo 7:41 PM, Michael Paquier wrote:
>> delete_old_cluster.sh would be left around even if not using a VPATH
>> build with ./configure (your commit message does not mention that).
>> Even if .gitignore discards it, the file is here.

> I don't think that matters. In non-vpath builds we expect the source 
> directory to be scribbled on. All sorts of things might be left around.

It's okay as long as .gitignore ignores it and "make clean" removes it.

When/if we give up makefile builds, a lot of these maintenance
issues will go away...

regards, tom lane

Re: Feature freeze

2025-04-08 Thread Ashutosh Bapat

On Tue, Apr 8, 2025 at 9:30 PM Peter Eisentraut  wrote:
>
> On 08.04.25 16:59, Bruce Momjian wrote:
> > On Tue, Apr  8, 2025 at 10:36:45AM -0400, Bruce Momjian wrote:
> >> Since we recorded feature freeze as April 8, 2025 0:00 AoE (anywhere on
> >> Earth):
> >>
> >>  
> >> https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items#Important_Dates
> >>  https://www.timeanddate.com/time/zones/aoe
> >>
> >> and it is now 2:34 AM AoE, I guess we are now in feature freeze.
> >
> > Frankly, I think the name "anywhere on Earth" is confusing, since it
> > really is "everywhere on Earth":
> >
> >   https://en.wikipedia.org/wiki/Anywhere_on_Earth
> >
> >   Anywhere on Earth (AoE) is a calendar designation that indicates
> >   that a period expires when the date passes everywhere on Earth.
>
> Yes, that works intuitively when you specify that sometimes ends when a
> certain day ends, for example:
>
> "The feature development phase ends at the end of day of April 7, AoE."
>
> That means, everyone everywhere can just look up at their clock and see,
> it's still April 7, it's still going.  (Of course, others can then do
> the analysis and keep going until some time on April 8, but that would
> be sort of against the spirit.)
>
> If you use it as a time zone with a time of day, it doesn't make
> intuitive sense.

+1. I think the idea is too simple to be acceptable.

Any timezone based deadline might be seen as unfair to those for whom
that time falls in their respective nights. AoE removes that
unfairness.

-- 
Best Wishes,
Ashutosh Bapat

Re: Enhancing Memory Context Statistics Reporting

2025-04-08 Thread Fujii Masao




On 2025/04/08 18:46, Daniel Gustafsson wrote:

On 8 Apr 2025, at 10:03, Daniel Gustafsson  wrote:



There was a bug in the shmem init function which caused it to fail on Windows,
the attached fixes that.


With this building green in CI over several re-builds, and another pass over
the docs and code with pgindent etc done, I pushed this earlier today.  A few
BF animals have built green so far but I will continue to monitor it.


Thanks for committing this feature!


I noticed that the third argument of pg_get_process_memory_contexts() is named
"retries" in pg_proc.dat, while the documentation refers to it as "timeout".
Since "retries" is misleading, how about renaming it to "timeout" in 
pg_proc.dat?
Patch attached.


Also, as I mentioned earlier, I encountered an issue when calling
pg_get_process_memory_contexts() on the PID of a backend that had just
encountered an error but hadn't finished rolling back. It led to
the following situation:

   Session 1 (PID=70011):
   =# begin;
   =# select 1/0;
   ERROR:  division by zero

   Session 2:
   =# select * from pg_get_process_memory_contexts(70011, false, 10);

   Session 1 terminated with:
   ERROR:  ResourceOwnerEnlarge called after release started
   FATAL:  terminating connection because protocol synchronization was lost

Shouldn't this be addressed?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From a79084fe3caaf791f2aa8d466603c085ccb8c5af Mon Sep 17 00:00:00 2001
From: Fujii Masao 
Date: Wed, 9 Apr 2025 01:27:48 +0900
Subject: [PATCH v1] Rename misleading argument in
 pg_get_process_memory_contexts().

Previously, the third argument of pg_get_process_memory_contexts()
was named retries in pg_proc.dat, even though it actually specifies
a timeout value in seconds. This name was misleading to users and
inconsistent with the documentation, which correctly referred to it
as timeout.

This commit renames the argument to timeout in pg_proc.dat to
improve clarity and maintain consistency with the documentation.
---
 src/include/catalog/pg_proc.dat | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4708f55be18..62beb71da28 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8578,7 +8578,7 @@
   prorettype => 'record', proargtypes => 'int4 bool float8',
   proallargtypes => 
'{int4,bool,float8,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4,timestamptz}',
   proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid, summary, retries, name, ident, type, path, level, 
total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, 
num_agg_contexts, stats_timestamp}',
+  proargnames => '{pid, summary, timeout, name, ident, type, path, level, 
total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, 
num_agg_contexts, stats_timestamp}',
   prosrc => 'pg_get_process_memory_contexts' },
 
 # non-persistent series generator
-- 
2.49.0

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-08 Thread Daniel Gustafsson

> On 8 Apr 2025, at 18:33, Bruce Momjian  wrote:

> Would we have to put out minor releases for curl CVEs?  I don't think we
> have to for OpenSSL so would curl be the same?

Why do you envision this being different from all other dependencies we have?
For OpenSSL we also happily build against a version (and until recently,
several versions) which is EOL and don't even recieve security fixes.

--
Daniel Gustafsson

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-08 Thread Jacob Champion

On Tue, Apr 8, 2025 at 9:33 AM Bruce Momjian  wrote:
> On Tue, Apr  8, 2025 at 09:17:03AM -0700, Jacob Champion wrote:
> > It allows packagers to ship the OAuth library separately, so end users
> > that don't want the additional exposure don't have to install it at
> > all.
>
> Okay, so how would they do that?  I understand how that would happen if
> it was an external extension, but how if it is under /src or /contrib.

By adding the new .so to a different package. For example, RPM specs
would just let you say "hey, this .so I just built doesn't go into the
main client package, it goes into an add-on that depends on the client
package." It's the same way separate client and server packages get
generated from the same single build of Postgres.

> Would we have to put out minor releases for curl CVEs?

In general, no.

Thanks,
--Jacob

Re: Feature freeze

2025-04-08 Thread Robert Haas

On Tue, Apr 8, 2025 at 12:29 PM Andrew Dunstan  wrote:
> The fact that there is this confusion is an indication that the AoE
> experiment is a failure. If it's not obvious, and people have to think
> about it, then it's not working. And I bet there is a huge number of
> people who have never even heard of it. Specify some time and data at
> UTC and everyone will understand.

+1.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Enhancing Memory Context Statistics Reporting

2025-04-08 Thread Daniel Gustafsson

> On 8 Apr 2025, at 18:41, Fujii Masao  wrote:
> On 2025/04/08 18:46, Daniel Gustafsson wrote:
>>> On 8 Apr 2025, at 10:03, Daniel Gustafsson  wrote:
>>> There was a bug in the shmem init function which caused it to fail on 
>>> Windows,
>>> the attached fixes that.
>> With this building green in CI over several re-builds, and another pass over
>> the docs and code with pgindent etc done, I pushed this earlier today.  A few
>> BF animals have built green so far but I will continue to monitor it.
> 
> Thanks for committing this feature!
> 
> I noticed that the third argument of pg_get_process_memory_contexts() is named
> "retries" in pg_proc.dat, while the documentation refers to it as "timeout".
> Since "retries" is misleading, how about renaming it to "timeout" in 
> pg_proc.dat?
> Patch attached.

Ugh, that's my bad.  It was changed from using retries to a timeout and I
missed that.

> Also, as I mentioned earlier, I encountered an issue when calling
> pg_get_process_memory_contexts() on the PID of a backend that had just
> encountered an error but hadn't finished rolling back. It led to
> the following situation:
> 
>   Session 1 (PID=70011):
>   =# begin;
>   =# select 1/0;
>   ERROR:  division by zero
> 
>   Session 2:
>   =# select * from pg_get_process_memory_contexts(70011, false, 10);
> 
>   Session 1 terminated with:
>   ERROR:  ResourceOwnerEnlarge called after release started
>   FATAL:  terminating connection because protocol synchronization was lost
> 
> Shouldn't this be addressed?

Sorry, this must've been missed in this fairly lon thread, will have a look at
it tonight.

--
Daniel Gustafsson

Finalizing read stream users' flag choices

2025-04-08 Thread Melanie Plageman

Hi,

Over the course of the last two releases, we have added many read
stream users. Each user specifies any number of flags (defined at the
top of read_stream.h) which govern different aspects of the read
stream behavior.

There are a few inconsistencies (many caused by me) that I want to
iron out and gain consensus on.

The first is whether maintenance_io_concurerency or
effective_io_concurrency affects the readahead distance.

We've said before that maintenance_io_concurrency should govern work
done on behalf of many different sessions. That was said to include at
least vacuum and recovery. I need to change the index vacuum users to
use READ_STREAM_MAINTENANCE. But I wonder about the other users like
amcheck and autoprewarm.

Another related question is if/how we should document which of these
are controlled by effective_io_concurrency or
maintenance_io_concurrency.

The second is related to how they ramp up the size of IOs and the
number read ahead:
READ_STREAM_DEFAULT ramps up the prefetch distance gradually.
READ_STREAM_FULL starts at full distance immediately

Some of the users specify DEFAULT and others don't (it is defined as 0
so this is fine technically). Perhaps that should be explicit for all
of them? Separately, Thomas Munro has mentioned he thinks we should
remove READ_STREAM_FULL.

And a somewhat related point, with buffered IO, READ_STREAM_SEQUENTIAL
disables prefetching to encourage OS readahead. I don't know if any
other users than sequential scan should do this.

Other than the obvious issue with index vacuuming read stream users
needing to set READ_STREAM_MAINTENANCE, the other questions are
subjective. Below are all of the read stream users in master and their
current flags.

sequential scan: READ_STREAM_SEQUENTIAL | READ_STREAM_USE_BATCHING

bitmap heap scan: READ_STREAM_DEFAULT | READ_STREAM_USE_BATCHING

phase I heap vacuum: READ_STREAM_MAINTENANCE

phase II index vacuuming:
  btree index vacuuming: READ_STREAM_FULL | READ_STREAM_USE_BATCHING
  spgist vacuum: READ_STREAM_FULL | READ_STREAM_USE_BATCHING
  gist vacuum: READ_STREAM_FULL | READ_STREAM_USE_BATCHING

phase III heap vacuuming: READ_STREAM_MAINTENANCE | READ_STREAM_USE_BATCHING

analyze: READ_STREAM_MAINTENANCE | READ_STREAM_USE_BATCHING

amcheck:
  with skipping: READ_STREAM_DEFAULT
  without skipping:  READ_STREAM_SEQUENTIAL | READ_STREAM_FULL |
READ_STREAM_USE_BATCHING;

autoprewarm: READ_STREAM_DEFAULT | READ_STREAM_USE_BATCHING

pg_prewarm: READ_STREAM_FULL | READ_STREAM_USE_BATCHING

pg_visibility:
  collect visibility data: READ_STREAM_FULL | READ_STREAM_USE_BATCHING
  collect corrupt items: READ_STREAM_FULL

createdb: READ_STREAM_FULL | READ_STREAM_USE_BATCHING

- Melanie

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-08 Thread Bruce Momjian

On Tue, Apr  8, 2025 at 09:43:01AM -0700, Jacob Champion wrote:
> On Tue, Apr 8, 2025 at 9:33 AM Bruce Momjian  wrote:
> > On Tue, Apr  8, 2025 at 09:17:03AM -0700, Jacob Champion wrote:
> > > It allows packagers to ship the OAuth library separately, so end users
> > > that don't want the additional exposure don't have to install it at
> > > all.
> >
> > Okay, so how would they do that?  I understand how that would happen if
> > it was an external extension, but how if it is under /src or /contrib.
> 
> By adding the new .so to a different package. For example, RPM specs
> would just let you say "hey, this .so I just built doesn't go into the
> main client package, it goes into an add-on that depends on the client
> package." It's the same way separate client and server packages get
> generated from the same single build of Postgres.

Do we have any idea how many packagers are interested in doing this?

> > Would we have to put out minor releases for curl CVEs?
> 
> In general, no.

Good.

FYI, I saw bug bounty dollar amounts next to each curl CVE:

https://curl.se/docs/security.html

No wonder some people ask for bounties.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Commitfest 2025 March is now closed

2025-04-08 Thread vignesh C

Hi,

Thanks a lot to all the members who participated in the commitfest.

Here are the final numbers at the end of the commitfest:
status |  End of Commitfest
+---
Needs review:   |54
Waiting on Author: |47
Ready for Committer:|18
Committed:|   150
Moved to next CF: | 38
Withdrawn:|  22
Rejected:   |   6
RWF: |   8
Total:  |   343

In comparison to previous March commitfests, we've made remarkable
progress this time by committing an impressive 150 entries.
2025: 150 committed
2024: 131 committed
2023: 127 committed
2022:  98 committed
2021: 122 committed
2020:  90 committed

A special thanks to the reviewers/committers who spent tireless effort
in moving the patches forward.

As per the developer meeting at FOSDEM, patches should only be moved
forward by someone involved in the patch who knows that the patch is
actually being worked on which is mentioned at [1]. So requesting
respective members to move their commitfest entries to the next
commitfest.
[1] - 
https://www.postgresql.org/message-id/flat/003e3a66-8fcc-4ca0-9e0e-c0afda1c9424%40eisentraut.org

Regards,
Vignesh

Re: Adding NetBSD and OpenBSD to Postgres CI

2025-04-08 Thread Nazir Bilal Yavuz

Hi,

meson version is upgraded 1.7.0 in NetBSD and that causes ecpg tests
to fail with [1]:

'Could not open file
/home/postgres/postgres/build/src/interfaces/ecpg/test/compat_informix/dec_test.c
for reading'

This is already fixed on b2bdb972c0, attached patch just adds
$MBUILD_TARGET to the NetBSD and OpenBSD CI tasks.

[1] https://cirrus-ci.com/task/5126123371102208?logs=test_world#L144

-- 
Regards,
Nazir Bilal Yavuz
Microsoft
From 0a265615e7bd9d6da2cbf1d7edfbe68a8bdf39f1 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Tue, 8 Apr 2025 14:21:42 +0300
Subject: [PATCH] ci: Add MBUILD_TARGET to OpenBSD/NetBSD meson builds

Otherwise, these tasks don't build test dependencies.
---
 .cirrus.tasks.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 6f4f5c674a1..98f3455eb72 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -322,7 +322,7 @@ task:
 build
 EOF
 
-  build_script: su postgres -c 'ninja -C build -j${BUILD_JOBS}'
+  build_script: su postgres -c 'ninja -C build -j${BUILD_JOBS} ${MBUILD_TARGET}'
   upload_caches: ccache
 
   test_world_script: |
-- 
2.49.0

Re: prevent 006_transfer_modes.pl from leaving files behind

2025-04-08 Thread Andrew Dunstan




On 2025-04-07 Mo 7:41 PM, Michael Paquier wrote:

On Mon, Apr 07, 2025 at 04:45:52PM -0500, Nathan Bossart wrote:

The other pg_upgrade tests chdir to tmp_check prior to running pg_upgrade
to avoid leaving behind delete_old_cluster.{sh,bat}.  006_transfer_modes.pl
should, too.  However, this test is a little different because it loops
over all of the available transfer modes and runs pg_upgrade for each one
supported by the platform.  From my testing and code analysis, it seems
sufficient to change the directory once at the beginning of the test, but
we could alternatively save the current directory and change back to it in
each iteration to be safe.

Thoughts?

Hmm.  Doing one chdir at the beginning of the test should be OK,
because we don't do a move to a different directory within the test
for each transfer mode tested.  So your patch looks like the simplest
thing to do to avoid the generation of these files.  Note that
delete_old_cluster.sh would be left around even if not using a VPATH
build with ./configure (your commit message does not mention that).
Even if .gitignore discards it, the file is here.



I don't think that matters. In non-vpath builds we expect the source 
directory to be scribbled on. All sorts of things might be left around.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Fix 035_standby_logical_decoding.pl race conditions

2025-04-08 Thread Michael Paquier

On Tue, Apr 08, 2025 at 06:19:02AM +, Bertrand Drouvot wrote:
> - A new injection_points_wakeup_detach() function that is holding the spinlock
> during the whole duration to ensure that no process can wait in between the
> wakeup and the detach.

That would not a correct spinlock use.  injection_points_detach() and
injection_points_wakeup_internal() do much more actions than what we
can internally do while holding a spinlock, including both
Postgres-specific calls as well as system calls.  strcmp() and
strlcpy() are still OK-ish, even as system calls, as they work
directly on the strings given in input.
--
Michael


signature.asc
Description: PGP signature

Re: Draft for basic NUMA observability

2025-04-08 Thread Andres Freund

Hi, 

On April 8, 2025 9:21:57 AM EDT, Tomas Vondra  wrote:
>On 4/8/25 15:06, Andres Freund wrote:
>> Hi,
>> 
>> On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote:
>>> On Mon, 7 Apr 2025 at 23:00, Tomas Vondra  wrote:
 I'll let the CI run the tests on it, and
 then will push, unless someone has more comments.

>>>
>>>
>>> Hi! I noticed strange failure after this commit[0]
>>>
>>> Looks like it is related to 65c298f61fc70f2f960437c05649f71b862e2c48
>>>
>>> In file included from  [01m [K../pgsql/src/include/postgres.h:49 [m [K,
>>>  from  [01m [K../pgsql/src/port/pg_numa.c:16 [m [K:
>>>  [01m [K../pgsql/src/include/utils/elog.h:79:10: [m [K
>>>  [01;31m [Kfatal error:  [m [Kutils/errcodes.h: No such file or
>>> directory
>>>79 | #include  [01;31m [K"utils/errcodes.h" [m [K
>>>   |   [01;31m [K^~ [m [K
>>> compilation terminated.
>> 
>> $ ninja -t missingdeps
>> Missing dep: src/port/libpgport.a.p/pg_numa.c.o uses 
>> src/include/utils/errcodes.h (generated by CUSTOM_COMMAND)
>> Missing dep: src/port/libpgport_shlib.a.p/pg_numa.c.o uses 
>> src/include/utils/errcodes.h (generated by CUSTOM_COMMAND)
>> Processed 2384 nodes.
>> Error: There are 2 missing dependency paths.
>> 2 targets had depfile dependencies on 1 distinct generated inputs (from 1 
>> rules)  without a non-depfile dep path to the generator.
>> There might be build flakiness if any of the targets listed above are built 
>> alone, or not late enough, in a clean output directory.
>> 
>> 
>> I think it's not right that something in src/port defines an SQL callable
>> function. The set of headers for that pull in a lot of things.
>> 
>> Since the file relies on things like GUCs, I suspect this should be in
>> src/backend/port or such instead.
>> 
>
>Yeah, I think you're right, src/backend/port seems like a better place
>for this. I'll look into moving that in the evening.

On a second look I wonder if just the SQL function and perhaps the page size 
function should be moved. There are FE programs that could potentially benefit 
from num a awareness (e.g. pgbench).

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Horribly slow pg_upgrade performance with many Large Objects

2025-04-08 Thread Nathan Bossart

On Tue, Apr 08, 2025 at 01:36:58PM -0400, Tom Lane wrote:
> Hmm ... one annoying thing for this project is that AFAICS pg_upgrade
> does *not* preserve database OIDs, which is problematic for using
> COPY to load pg_shdepend rows.

I think it does; see commit aa01051.

-- 
nathan

Re: Horribly slow pg_upgrade performance with many Large Objects

2025-04-08 Thread Tom Lane

Nathan Bossart  writes:
> Unless I'm missing something, we don't seem to have had any dependency
> handling before commit 12a53c7.  Was that broken before we moved to SQL
> commands?

Sounds like it :-(

regards, tom lane

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-08 Thread Jacob Champion

On Tue, Apr 8, 2025 at 9:32 AM Wolfgang Walther  wrote:
> This means that shipping another .so file will not happen with this approach. 
> Assuming OAuth will be picked up by some of the bigger providers, that 
> would... make me feel quite bad about it, actually.

It occurs to me that I didn't respond to this point explicitly. I
would like to avoid making your life harder.

Would anybody following along be opposed to a situation where
- dynamiclib builds go through the dlopen() shim
- staticlib builds always rely on statically linked symbols

Or do we need to be able to mix and match?

--Jacob

Re: Feature freeze

2025-04-08 Thread Nathan Bossart

On Tue, Apr 08, 2025 at 05:13:15PM +0200, Daniel Gustafsson wrote:
>> On 8 Apr 2025, at 16:59, Bruce Momjian  wrote:
>> Frankly, I think the name "anywhere on Earth" is confusing, since it
>> really is "everywhere on Earth":
> 
> I find both of the above needlessly confusing when we instead could use UTC
> which is a more universally understood concept.

+1 for UTC.

-- 
nathan

Re: Horribly slow pg_upgrade performance with many Large Objects

2025-04-08 Thread Tom Lane

Nathan Bossart  writes:
> On Tue, Apr 08, 2025 at 01:36:58PM -0400, Tom Lane wrote:
>> Hmm ... one annoying thing for this project is that AFAICS pg_upgrade
>> does *not* preserve database OIDs, which is problematic for using
>> COPY to load pg_shdepend rows.

> I think it does; see commit aa01051.

Ah --- I thought I remembered something having been done about that,
but I failed to find it because I was looking in pg_upgrade not
pg_dump.  Too bad aa01051 didn't update the comment at the top of
pg_upgrade.c.

regards, tom lane

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-08 Thread Jacob Champion

On Tue, Apr 8, 2025 at 10:10 AM Wolfgang Walther
 wrote:
> And if that means making libpq modular at run-time, then this should be 
> planned and built with all deps, and other use-cases (like static linking) in 
> mind - and not like it is right now.

I think that'd be neat in concept, but specifically this thread is
discussing a PG18 open item. For future releases, if we're happy with
how Curl gets split out, maybe that would be fuel for other
delay-loaded client dependencies. I'm not sure.

--Jacob

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-08 Thread Andres Freund

Hi,

On 2025-04-08 13:02:11 -0400, Bruce Momjian wrote:
> On Tue, Apr  8, 2025 at 06:57:18PM +0200, Wolfgang Walther wrote:
> > Jacob Champion:
> > > On Tue, Apr 8, 2025 at 9:32 AM Wolfgang Walther  
> > > wrote:
> > > > And that should also not be a problem for distributions - they could 
> > > > offer a libpq and a libpq_oauth package, where only one of them can be 
> > > > installed at the same time, I guess? *
> > > My outsider understanding is that maintaining this sort of thing
> > > becomes a major headache, because of combinatorics. You don't really
> > > want to ship a libpq and libpq-with-gss and libpq-with-oauth and
> > > libpq-with-oauth-and-gss and ...
> > 
> > That would only be the case, if you were to consider those other
> > dependencies as "dangerous" as cURL. But we already depend on them. So if
> > it's really the case that cURL is that much worse, that we consider loading
> > it as a module... then the combinatorics should not be a problem either.
> > 
> > However, if the other deps are considered problematic as well, then the ship
> > has already sailed, and there is not point for a special case here anymore.
> 
> Yes, I think this is what I am asking too.  For me it was curl's
> security reputation and whether that would taint the security reputation
> of libpq. For Tom, I think it was the dependency additions.

I'd say that curl's security reputation is higher than most of our other
dependencies. We have dependencies for libraries with regular security issues,
with those issues at times not getting addressed for prolonged amounts of
time.

I do agree that there's an issue increasing libpq's indirect footprint
substantially, but I don't think that's due to curl's reputation or
anything. It's just needing a significantly higher number of shared libraries,
which comes with runtime and distribution overhead.

Greetings,

Andres Freund

Re: long-standing data loss bug in initial sync of logical replication

2025-04-08 Thread Amit Kapila

On Tue, Mar 18, 2025 at 3:25 PM Amit Kapila  wrote:
>
> On Mon, Mar 17, 2025 at 4:56 PM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > > Regarding the PG13, it cannot be
> > > applied
> > > as-is thus some adjustments are needed. I will share it in upcoming posts.
> >
> > Here is a patch set for PG13. Apart from PG14-17, the patch could be 
> > created as-is,
> > because...
> >
> > 1. WAL record for invalidation messages (XLOG_XACT_INVALIDATIONS) does not 
> > exist.
> > 2. Thus the ReorderBufferChange for the invalidation does not exist.
> >Our patch tries to distribute it but cannot be done as-is.
> > 3. Codes assumed that invalidation messages can be added only once.
> > 4. The timing when invalidation messages are consumed is limited:
> >   a. COMMAND_ID change is poped,
> >   b. start of decoding a transaction, or
> >   c. end of decoding a transaction.
> >
> > Above means that invalidations cannot be executed while being decoded.
> > I created two patch sets to resolve the data loss issue. 0001 has less code
> > changes but could resolve a part of issue, 0002 has huge changes but 
> > provides a
> > complete solution.
> >
> > 0001 - mostly same as patches for other versions. 
> > ReorderBufferAddInvalidations()
> >was adjusted to allow being called several times. As I said above,
> >0001 cannot execute inval messages while decoding the transacitons.
> > 0002 - introduces new ReorderBufferChange type to indicate inval messages.
> >It would be handled like PG14+.
> >
> > Here is an example. Assuming that the table foo exists on both nodes, a
> > publication "pub" which publishes all tables, and a subscription "sub" which
> > subscribes "pub". What if the workload is executed?
> >
> > ```
> > S1  S2
> > BEGIN;
> > INSERT INTO foo VALUES (1)
> > ALTER PUBLICATION pub RENAME TO pub_renamed;
> > INSERT INTO foo VALUES (2)
> > COMMIT;
> > LR -> ?
> > ```
> >
> > With 0001, tuples (1) and (2) would be replicated to the subscriber.
> > An error "publication "pub" does not exist" would raise when new changes 
> > are done
> > later.
> >
> > 0001+0002 works more aggressively; the error would raise when S1 
> > transaction is decoded.
> > The behavior is same as for patched PG14-PG17.
> >
>
> I understand that with 0001 the fix is partial in the sense that
> because invalidations are processed at the transaction end, the
> changes of concurrent DDL will only be reflected for the next
> transaction. Now, on one hand, it is prudent to not add a new type of
> ReorderBufferChange in the backbranch (v13) but the change is not that
> invasive, so we can go with it as well. My preference would be to go
> with just 0001 for v13 to minimize the risk of adding new bugs or
> breaking something unintentionally.
>
> Sawada-San, and others involved here, do you have any suggestions on
> this matter?
>

Seeing no responses for a long time, I am planning to push the fix
till 14 tomorrow unless there are some opinions on the fix for 13. We
can continue to discuss the scope of the fix for 13.

-- 
With Regards,
Amit Kapila.

Re: Large expressions in indexes can't be stored (non-TOASTable)

2025-04-08 Thread Amit Kapila

On Fri, Apr 4, 2025 at 7:58 PM Nathan Bossart  wrote:
>
> On Fri, Apr 04, 2025 at 05:16:43PM +0530, Amit Kapila wrote:
> > Can we dodge adding this push call if we restrict the length of the
> > origin name to some reasonable limit like 256 or 512 and avoid the
> > need of toast altogether?
>
> We did consider just removing pg_replication_origin's TOAST table earlier
> [0], but we decided against it at that time.  Maybe it's worth
> reconsidering...
>

I don't see any good reason in that email for not removing the TOAST
table for pg_replication_origin. I would prefer to remove it rather
than add protection related to its access.

-- 
With Regards,
Amit Kapila.

Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2025-04-08 Thread jian he

On Tue, Apr 8, 2025 at 6:42 AM Masahiko Sawada  wrote:
>
>
> BTW have you measured the overheads of calling InputFunctionCallSafe
> twice? If it's significant, we might want to find other ways to
> achieve it as it would not be good to incur overhead just for
> relatively rare cases.
>

Please check the attached two patches
v17-0001-COPY-on_error-set_null.original,
v17-0001-COPY-on_error-set_null.patch

for non-domain types, (on_error set_null), the performance of these
two are the same.
for domain type with or without constraint,
(on_error set_null): v17.original is slower than v17.patch.


test script:

create unlogged table t2(a text);
insert into t2 select 'a' from generate_Series(1, 10_000_000) g;
copy t2 to '/tmp/2.txt';
CREATE DOMAIN d1 AS INT ;
CREATE DOMAIN d2 AS INT check (value > 0);
create unlogged table t3(a int);
create unlogged table t4(a d1);
create unlogged table t5(a d2);


performance result:
v17-0001-COPY-on_error-set_null.patch
-- 764.903 ms
copy t3 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- 779.253 ms
copy t4 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- Time: 750.390 ms
copy t5 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1

v17-0001-COPY-on_error-set_null.original
-- 774.943 ms
copy t3 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- 867.671 ms
copy t4 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1
-- 927.685 ms
copy t5 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1


> Here are some comments:
>
> +   if (InputFunctionCallSafe(&in_functions[m],
> + NULL,
> + typioparams[m],
> + att->atttypmod,
> + NULL,
> + &values[m]))
>
> Given that we pass NULL to escontext, does this function return false
> in an error case? Or can we use InputFunctionCall instead?
>
> I think we should mention that SET_NULL still could fail if the data
> type of the column doesn't accept NULL.
>
> How about restructuring the codes around handling data incompatibility
> errors like:
>
> else if (!InputFunctionCallSafe(...))
> {
> if (cstate->opts.on_error == IGNORE)
> {
> cstate->num_errors++;
> if (cstate->opts.log_verbosity == VERBOSE)
> write a NOTICE message;
> return true; // ignore whole row.
> }
> else if (cstate->opts.on_error == SET_NULL)
> {
> current_row_erroneous = true;
> set NULL to the column;
> if (cstate->opts.log_verbosity == VERBOSE)
> write a NOTICE message;
> continue; // go to the next column.
> }
>
> That way, we have similar structures for both on_error handling and
> don't need to reset cstate->cur_attname at the end of SET_NULL
> handling.
>

I think we still need to reset cstate->cur_attname.
the current code structure is
``
foreach(cur, cstate->attnumlist)
{
   if (condition x)
continue;
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
}
``
In some cases (last column , condition x is satisfied), once we reach
the ``continue``, then we cannot reach.
``
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
``



> ---
> From the regression tests:
>
> --fail, column a is domain with not-null constraint
> COPY t_on_error_null FROM STDIN WITH (on_error set_null);
> a   11  14
> \.
> ERROR:  domain d_int_not_null does not allow null values
> CONTEXT:  COPY t_on_error_null, line 1, column a: "a"
>
> I guess that the log messages could confuse users since while the
> actual error was caused by setting NULL to the non-NULL domain type
> column, the context message says the data 'a' was erroneous.
>

if the second function is InputFunctionCall, then we cannot customize
the error message.
we can't have both.
I guess we need a second InputFunctionCallSafe with escontext NOT NULL.

now i change it to
if (!cstate->domain_with_constraint[m] ||
InputFunctionCallSafe(&in_functions[m],
  NULL,
  typioparams[m],
  att->atttypmod,
  (Node *) cstate->escontext,
  &values[m]))
else if (string == NULL)
ereport(ERROR,
errcode(ERRCODE_NOT_NULL_VIOLATION),
errmsg("domain %s does not allow null
values", format_type_be(typioparams[m])),
errdatatype(typioparams[m]));
else
ereport(ERROR,
errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("invalid input value for domain %s: \"%s\"",
   format_type_be(typiopara

Re: An incorrect check in get_memoize_path

2025-04-08 Thread Andrei Lepikhov


On 4/8/25 08:32, Richard Guo wrote:

On Mon, Apr 7, 2025 at 9:54 PM Andrei Lepikhov  wrote:

On 4/7/25 09:50, Richard Guo wrote:

Consider the join to t3.  It is a unique join, and not all of its
restriction clauses are parameterized.  Despite this, the check still
passes.



At the same time I think term 'Incorrect' is not good unless you show an
example where data returned is not consistent to the expected.
I think this inequality check has worked in couple with the
get_equal_hashops.


Do you mean this check is designed to work in conjunction with the
clause_sides_match_join check in paraminfo_get_equal_hashops?  I would
consider this a very poor design.
As I have written before, I am quite happy with the change you propose. 
I just pointed out that term 'incorrect' usually means you may provide a 
query which causes an error or inconsistent data which we can add to the 
tests set. Current code may be described as 'kludge' lines - but I'm not 
a native speaker, don't bikeshed here too much.


--
regards, Andrei Lepikhov

Re: Feature freeze

2025-04-08 Thread Daniel Gustafsson

> On 8 Apr 2025, at 16:59, Bruce Momjian  wrote:
> 
> On Tue, Apr  8, 2025 at 10:36:45AM -0400, Bruce Momjian wrote:
>> Since we recorded feature freeze as April 8, 2025 0:00 AoE (anywhere on
>> Earth):
>> 
>> https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items#Important_Dates
>> https://www.timeanddate.com/time/zones/aoe
>> 
>> and it is now 2:34 AM AoE, I guess we are now in feature freeze.
> 
> Frankly, I think the name "anywhere on Earth" is confusing, since it
> really is "everywhere on Earth":

I find both of the above needlessly confusing when we instead could use UTC
which is a more universally understood concept.

--
Daniel Gustafsson

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-08 Thread Bruce Momjian

On Tue, Apr  8, 2025 at 11:20:11AM -0400, Andres Freund wrote:
> Hi,
> 
> On 2025-04-08 11:13:51 -0400, Bruce Momjian wrote:
> > On Tue, Apr  8, 2025 at 08:04:22AM -0700, Jacob Champion wrote:
> > > On Tue, Apr 8, 2025 at 7:32 AM Bruce Momjian  wrote:
> > > > Uh, where are we on the inclusion of curl in our build?  Maybe it was
> > > > explained but I have not seen it.
> > > 
> > > The above is discussing a patch to split this into its own loadable
> > > module. Andres and Christoph's feedback has been shaping where we put
> > > that module, exactly.
> > 
> > Uh, I was afraid that was the case, which is why I asked.  We have just
> > hit feature freeze, so it is not good that we are still "shaping" the
> > patch.  Should we consider reverting it?  It is true we still "adjust"
> > patches after feature freeze.
> 
> You brought the dependency concern up well after the feature was merged, after
> it had been in development for a *long* time. It wasn't a secret that it had a
> dependency on curl.  I don't think it's fair to penalize a feature's authors
> to not finish implementing a complicated and completely new requirement within
> 17 days.

Fair point --- I was just asking.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: Move tests of contrib/spi/ out of the core regression tests

2025-04-08 Thread Heikki Linnakangas


On 08/04/2025 16:55, Tom Lane wrote:

Well, I don't mind pushing it, but does anyone want to review
it first?  It sounded like Heikki had at least eyeballed the
patch, but I'm not sure if he's ready to sign off on it.


It looks good to me.


diff --git a/doc/src/sgml/contrib-spi.sgml b/doc/src/sgml/contrib-spi.sgml
index 55d3fac7a69..6fa9479d1b9 100644
--- a/doc/src/sgml/contrib-spi.sgml
+++ b/doc/src/sgml/contrib-spi.sgml
@@ -81,10 +81,12 @@
   
autoinc() is a trigger that stores the next value of
a sequence into an integer field.  This has some overlap with the
-   built-in serial column feature, but it is not the same:
-   autoinc() will override attempts to substitute a
-   different field value during inserts, and optionally it can be
-   used to increment the field during updates, too.
+   built-in serial column feature, but it is not the same.
+   The trigger will replace the field's value only if that value is
+   initially zero or null (after the action of the SQL statement that
+   inserted or updated the row).  Also, if the sequence's next value is
+   zero, nextval() will be called a second time in
+   order to obtain a non-zero value.
   


That's a much clearer explanation of the behavior, but now that I read 
that paragraph, I wonder *why* it behaves like that. I'm sure it's just 
historical reasons. But perhaps we should nuke autoinc altogether?


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Large expressions in indexes can't be stored (non-TOASTable)

2025-04-08 Thread Nathan Bossart

On Tue, Apr 08, 2025 at 04:44:02PM +0530, Amit Kapila wrote:
> On Fri, Apr 4, 2025 at 7:58 PM Nathan Bossart  
> wrote:
>> On Fri, Apr 04, 2025 at 05:16:43PM +0530, Amit Kapila wrote:
>> > Can we dodge adding this push call if we restrict the length of the
>> > origin name to some reasonable limit like 256 or 512 and avoid the
>> > need of toast altogether?
>>
>> We did consider just removing pg_replication_origin's TOAST table earlier
>> [0], but we decided against it at that time.  Maybe it's worth
>> reconsidering...
> 
> I don't see any good reason in that email for not removing the TOAST
> table for pg_replication_origin. I would prefer to remove it rather
> than add protection related to its access.

The only reason I can think of is that folks might have existing
replication origins with extremely long names that would cause upgrades to
fail.  While I think it would be probably be okay to remove the TOAST table
and restrict origin names to 512 bytes (like we did for password hashes in
commit 8275325), folks routinely complain about NAMEDATALEN, so I think we
should be cautious here.

-- 
nathan

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-08 Thread Bruce Momjian

On Tue, Apr  8, 2025 at 06:42:19PM +0200, Daniel Gustafsson wrote:
> > On 8 Apr 2025, at 18:33, Bruce Momjian  wrote:
> 
> > Would we have to put out minor releases for curl CVEs?  I don't think we
> > have to for OpenSSL so would curl be the same?
> 
> Why do you envision this being different from all other dependencies we have?
> For OpenSSL we also happily build against a version (and until recently,
> several versions) which is EOL and don't even receive security fixes.

I don't know.  I know people scan our downloads and report when the
scanners detect something, but I am unclear what those scanners are
doing.  Would they see some new risks with curl?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: [PoC] Reducing planning time when tables have many partitions

2025-04-08 Thread David Rowley

On Wed, 9 Apr 2025 at 02:24, Tom Lane  wrote:
>
> David Rowley  writes:
> > I've pushed the patch now. Thanks for all the reviews of my adjustments.
>
> Shouldn't the CF entry be marked committed?

I've done that now.

88f55bc97 added code to do faster lookups of ec_derives clauses, so I
think that likely renders part of the remaining patches obsolete. The
v35-0004 also had some indexing of ec_sources. It would be good to
know if there's still any gains to be had from indexing those. If
there's work to do, then a new CF entry can be made for that.

David

Re: Feature freeze

2025-04-08 Thread David Rowley

On Wed, 9 Apr 2025 at 03:54, Bruce Momjian  wrote:
> We did have this discussion when AoE was chosen for PG 18 and the idea
> was that as long as it is before April 18 midnight wherever you are, it
> is not feature freeze yet.

I think it maybe once made sense for the moment to stop accepting new
patches into a commitfest so that nobody got upset from their patch
missing the cut-off because the CF was changed to in progress "too
early" as it was still $previous_month in their timezone. Holding that
moment back til it was the correct month in every timezone didn't stop
people who lived further East from reviewing patches and committing
things, so I think it was done to keep everyone happy.

Much less of these reasons are applicable for feature freeze.
Committers want to calculate the time in their timezone when the
freeze hits so they can plan and not commit anything beyond that.
That's generally easier to do from UTC as people are generally more
used to that. There's also the whole "which day does midnight fall on"
problem, which, for some reason, is ambiguous to some. That's why
governments and airline companies sometimes do 23:59 or 00:01.

For me, I'm exactly 24 hours ahead of AoE, so it should be an easy
calc, but I still have more confidence that I'm correct if I'm
calculating from UTC. So, +1 for UTC.

David

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-08 Thread Wolfgang Walther


Jacob Champion:

On Tue, Apr 8, 2025 at 9:32 AM Wolfgang Walther  wrote:

And that should also not be a problem for distributions - they could offer a 
libpq and a libpq_oauth package, where only one of them can be installed at the 
same time, I guess? *

My outsider understanding is that maintaining this sort of thing
becomes a major headache, because of combinatorics. You don't really
want to ship a libpq and libpq-with-gss and libpq-with-oauth and
libpq-with-oauth-and-gss and ...


That would only be the case, if you were to consider those other 
dependencies as "dangerous" as cURL. But we already depend on them. So 
if it's really the case that cURL is that much worse, that we consider 
loading it as a module... then the combinatorics should not be a problem 
either.


However, if the other deps are considered problematic as well, then the 
ship has already sailed, and there is not point for a special case here 
anymore.


Best,

Wolfgang

Re: Move tests of contrib/spi/ out of the core regression tests

2025-04-08 Thread Tom Lane

Andrew Dunstan  writes:
> On 2025-04-08 Tu 5:21 AM, Heikki Linnakangas wrote:
>>> This is too late for v18 of course, so I'll park it in the next CF.

>> In my opinion this would still be totally OK for v18. It's just test 
>> changes. I would rather commit cleanups like this sooner than later.

> Yeah, that was my reaction too.

Well, I don't mind pushing it, but does anyone want to review
it first?  It sounded like Heikki had at least eyeballed the
patch, but I'm not sure if he's ready to sign off on it.

regards, tom lane

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-08 Thread Jacob Champion

On Tue, Apr 8, 2025 at 9:14 AM Bruce Momjian  wrote:
> How does this patch help us avoid having to handle curl CVEs and its
> curl's additional dependencies?  As I understand the patch, it makes
> libpq _not_ have additional dependencies but moves the dependencies to a
> special loadable library that libpq can use.

It allows packagers to ship the OAuth library separately, so end users
that don't want the additional exposure don't have to install it at
all.

--Jacob

Re: Feature freeze

2025-04-08 Thread Bruce Momjian

On Tue, Apr  8, 2025 at 06:00:27PM +0200, Peter Eisentraut wrote:
> On 08.04.25 16:59, Bruce Momjian wrote:
> > On Tue, Apr  8, 2025 at 10:36:45AM -0400, Bruce Momjian wrote:
> > > Since we recorded feature freeze as April 8, 2025 0:00 AoE (anywhere on
> > > Earth):
> > > 
> > >   
> > > https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items#Important_Dates
> > >   https://www.timeanddate.com/time/zones/aoe
> > > 
> > > and it is now 2:34 AM AoE, I guess we are now in feature freeze.
> > 
> > Frankly, I think the name "anywhere on Earth" is confusing, since it
> > really is "everywhere on Earth":
> > 
> > https://en.wikipedia.org/wiki/Anywhere_on_Earth
> > 
> > Anywhere on Earth (AoE) is a calendar designation that indicates
> > that a period expires when the date passes everywhere on Earth.
> 
> Yes, that works intuitively when you specify that sometimes ends when a
> certain day ends, for example:
> 
> "The feature development phase ends at the end of day of April 7, AoE."
> 
> That means, everyone everywhere can just look up at their clock and see,
> it's still April 7, it's still going.  (Of course, others can then do the
> analysis and keep going until some time on April 8, but that would be sort
> of against the spirit.)
> 
> If you use it as a time zone with a time of day, it doesn't make intuitive
> sense.

Well, they kind of did this by saying midnight on April 8 AoE, rather
than end-of-day in April 7 AoE.  Actually, I had originally said April 8
AoE and then was told I had to specify a time --- maybe the time was the
mistake, and we still have April 8 to add features.   ;-)

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: Feature freeze

2025-04-08 Thread Heikki Linnakangas


On 08/04/2025 19:11, Bruce Momjian wrote:

On Tue, Apr  8, 2025 at 06:00:27PM +0200, Peter Eisentraut wrote:

On 08.04.25 16:59, Bruce Momjian wrote:

On Tue, Apr  8, 2025 at 10:36:45AM -0400, Bruce Momjian wrote:

Since we recorded feature freeze as April 8, 2025 0:00 AoE (anywhere on
Earth):


https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items#Important_Dates
https://www.timeanddate.com/time/zones/aoe

and it is now 2:34 AM AoE, I guess we are now in feature freeze.


Frankly, I think the name "anywhere on Earth" is confusing, since it
really is "everywhere on Earth":

https://en.wikipedia.org/wiki/Anywhere_on_Earth

Anywhere on Earth (AoE) is a calendar designation that indicates
that a period expires when the date passes everywhere on Earth.


Yes, that works intuitively when you specify that sometimes ends when a
certain day ends, for example:

"The feature development phase ends at the end of day of April 7, AoE."

That means, everyone everywhere can just look up at their clock and see,
it's still April 7, it's still going.  (Of course, others can then do the
analysis and keep going until some time on April 8, but that would be sort
of against the spirit.)

If you use it as a time zone with a time of day, it doesn't make intuitive
sense.


Well, they kind of did this by saying midnight on April 8 AoE, rather
than end-of-day in April 7 AoE.  Actually, I had originally said April 8
AoE and then was told I had to specify a time --- maybe the time was the
mistake, and we still have April 8 to add features.   ;-)


At the end of the day (pun not intended), it doesn't matter much. 
Nothing special happens when the feature freeze begins. If some 
committers interpret it a little differently, it doesn't matter.


That said, +1 for using UTC in the future for clarity.

- Heikki

Re: Remove unnecessary static type qualifiers

2025-04-08 Thread Peter Eisentraut


On 08.04.25 14:22, Junwang Zhao wrote:

When I read the libpq source code, I found unnecessary static type qualifiers
in PQsetClientEncoding().

diff --git a/src/interfaces/libpq/fe-connect.c 
b/src/interfaces/libpq/fe-connect.c
index 0258d9ace3c..300ddfffd55 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -7717,7 +7717,7 @@ int
  PQsetClientEncoding(PGconn *conn, const char *encoding)
  {
 charqbuf[128];
-   static const char query[] = "set client_encoding to '%s'";
+   const char  query[] = "set client_encoding to '%s'";
 PGresult   *res;
 int status;



I doubt that, if you remove the *static*, it will allocate more memory
on stack and need more instructions to copy the string to that area.


To avoid creating an array on the stack, you could maybe write it with a 
pointer instead, like:


const char * const query = "...";

I haven't tested whether that results in different or better compiled 
code.  The original code is probably good enough.

Re: Feature freeze

2025-04-08 Thread Nathan Bossart

On Tue, Apr 08, 2025 at 06:00:27PM +0200, Peter Eisentraut wrote:
> On 08.04.25 16:59, Bruce Momjian wrote:
>>  Anywhere on Earth (AoE) is a calendar designation that indicates
>>  that a period expires when the date passes everywhere on Earth.
> 
> Yes, that works intuitively when you specify that sometimes ends when a
> certain day ends, for example:
> 
> "The feature development phase ends at the end of day of April 7, AoE."
> 
> That means, everyone everywhere can just look up at their clock and see,
> it's still April 7, it's still going.  (Of course, others can then do the
> analysis and keep going until some time on April 8, but that would be sort
> of against the spirit.)

I always forget if AoE is UTC+12 or UTC-12.  "Anywhere on Earth" sounds to
me like it means "the first moment it's this time anywhere on Earth," which
would be some point during April 7th for me.  So every year, I go to
Wikipedia, which reminds me it actually means "the moment this time has
passed everywhere on Earth."  At this point, I can finally convert to UTC
and then to my own time zone in my head.

If we just said April 8th, 12:00:00 UTC, I'd immediately know that my
entire April 7th was fair game.  Of course, I hope to usually be done
committing things much earlier...

-- 
nathan

Re: Feature freeze

2025-04-08 Thread Andrew Dunstan




On 2025-04-08 Tu 12:11 PM, Bruce Momjian wrote:

On Tue, Apr  8, 2025 at 06:00:27PM +0200, Peter Eisentraut wrote:

On 08.04.25 16:59, Bruce Momjian wrote:

On Tue, Apr  8, 2025 at 10:36:45AM -0400, Bruce Momjian wrote:

Since we recorded feature freeze as April 8, 2025 0:00 AoE (anywhere on
Earth):


https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items#Important_Dates
https://www.timeanddate.com/time/zones/aoe

and it is now 2:34 AM AoE, I guess we are now in feature freeze.

Frankly, I think the name "anywhere on Earth" is confusing, since it
really is "everywhere on Earth":

https://en.wikipedia.org/wiki/Anywhere_on_Earth

Anywhere on Earth (AoE) is a calendar designation that indicates
that a period expires when the date passes everywhere on Earth.

Yes, that works intuitively when you specify that sometimes ends when a
certain day ends, for example:

"The feature development phase ends at the end of day of April 7, AoE."

That means, everyone everywhere can just look up at their clock and see,
it's still April 7, it's still going.  (Of course, others can then do the
analysis and keep going until some time on April 8, but that would be sort
of against the spirit.)

If you use it as a time zone with a time of day, it doesn't make intuitive
sense.

Well, they kind of did this by saying midnight on April 8 AoE, rather
than end-of-day in April 7 AoE.  Actually, I had originally said April 8
AoE and then was told I had to specify a time --- maybe the time was the
mistake, and we still have April 8 to add features.   ;-)




The fact that there is this confusion is an indication that the AoE 
experiment is a failure. If it's not obvious, and people have to think 
about it, then it's not working. And I bet there is a huge number of 
people who have never even heard of it. Specify some time and data at 
UTC and everyone will understand.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-08 Thread Wolfgang Walther


Jacob Champion:

The currently proposed patch would have you package and install a
separate .so module implementing OAuth, which the staticlib would load
once when needed. Similarly to how you still have to somehow
dynamically link your static app against Curl.

As a staticlib user, how do you feel about that?


When linking statically, I am producing entirely statically linked 
single binaries. Those contain libpq, all other dependencies, and would 
also contain curl.


The "entirely statically linked" thing is actually enforced by the build 
system (NixOS' pkgsStatic here), so dlopen() might just not be possible. 
Not exactly sure right now, whether it's stubbed out or just not 
available at all.


This means that shipping another .so file will not happen with this 
approach. Assuming OAuth will be picked up by some of the bigger 
providers, that would... make me feel quite bad about it, actually.


I'm not seeing the overall problem, yet. When I build with 
--enable-curl... ofc, I have a dependency on cURL. That's kind of the 
point. When I don't want that, then I just disable it. And that should 
also not be a problem for distributions - they could offer a libpq and a 
libpq_oauth package, where only one of them can be installed at the same 
time, I guess? *


Best,

Wolfgang

* Currently, the two build systems don't handle the "please build only 
libpq" scenario well. If that was supported better, building a second 
package with oauth support could be much easier.

Re: Feature freeze

2025-04-08 Thread Bruce Momjian

On Tue, Apr  8, 2025 at 11:45:09AM -0400, Peter Geoghegan wrote:
> On Tue, Apr 8, 2025 at 11:20 AM Nathan Bossart  
> wrote:
> > +1 for UTC.
> 
> +1, I think that AoE is needlessly obscure

We did have this discussion when AoE was chosen for PG 18 and the idea
was that as long as it is before April 18 midnight wherever you are, it
is not feature freeze yet.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: Enhancing Memory Context Statistics Reporting

2025-04-08 Thread Daniel Gustafsson

> On 8 Apr 2025, at 18:41, Fujii Masao  wrote:

> I noticed that the third argument of pg_get_process_memory_contexts() is named
> "retries" in pg_proc.dat, while the documentation refers to it as "timeout".

I've committed this patch as it was obviously correct, thanks!

> Also, as I mentioned earlier, I encountered an issue when calling
> pg_get_process_memory_contexts() on the PID of a backend that had just
> encountered an error but hadn't finished rolling back. It led to
> the following situation:

I reconfirmed that the bugfix that Rahila shared in [0] fixes this issue (and
will fix others like it, as it's not related to this patch in particular but is
a bug in DSM attaching).  My plan is to take that for a more thorough review
and test tomorrow and see how far it can be safely backpatched.  Thanks for
bringing this up, sorry about it getting a bit lost among all the emails.

--
Daniel Gustafsson

[0] cah2l28shr0j3je5v3cxdfmdh-agtsnh2v8pr23x0uhrmbdq...@mail.gmail.com

Re: Horribly slow pg_upgrade performance with many Large Objects

2025-04-08 Thread Jan Wieck


On 4/8/25 15:41, Hannu Krosing wrote:

On Tue, Apr 8, 2025 at 8:39 PM Nathan Bossart  wrote:



...


I've also verified that the dependency information is carried over in
upgrades to later versions (AFAICT all the supported ones).


If I remember correctly the change to not copying
pg_largeobject_metadata data file but instead moving LOs as part of
schema was done in v12 when oid,, which had been a system column in
v11, became a user column, so upgrade to v11 is likely also missing
the dependencies




I remember an incident where large amounts of LOs ran pg_upgrade into a 
transaction-ID wrap around because the restore part would create 
individual single statement transactions per LO to create, then change 
permissions and ownership and finally fill in the data. Could that be 
related here?



Regards, Jan

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-08 Thread Jacob Champion

On Mon, Apr 7, 2025 at 9:41 AM Jacob Champion
 wrote:
> > Not sure, the code looks correct at first glance.  However, you could
> > also just keep the libpq-oauth strings in the libpq catalog.  There
> > isn't really a need to make a separate one, since the versions you end
> > up installing are locked to each other.  So you could for example in
> > libpq's nls.mk just add
> >
> > ../libpq-oauth/oauth-curl.c
> >
> > etc. to the files.
>
> Oh, that's an interesting idea. Thanks, I'll give it a try.

A consequence of this is that our copy of libpq_binddomain isn't using
the same mutex as libpq's copy to protect the "libpq-18" message
domain. We could discuss whether or not it matters, since we don't
support Windows, but it doesn't feel architecturally sound to me. If
we want to reuse the same domain, I think the module should be using
libpq's libpq_gettext(). (Which we could do, again through the magic
of dependency injection.)

> > Maybe it would also make sense to make libpq-oauth a subdirectory of the
> > libpq directory instead of a peer.
>
> Works for me.

It does not, however, work for our $(recurse) setup in the makefiles
-- a shared library depending on a parent directory's shared library
leads to infinite recursion, with the current tools -- so I'll keep it
at the current directory level for now.

Thanks,
--Jacob

Re: Finalizing read stream users' flag choices

2025-04-08 Thread Andres Freund

hi,

On 2025-04-08 12:06:47 -0400, Melanie Plageman wrote:
> And a somewhat related point, with buffered IO, READ_STREAM_SEQUENTIAL
> disables prefetching to encourage OS readahead. I don't know if any
> other users than sequential scan should do this.

Worth adding that prefetches are only issued when io_method==sync and thus
READ_STREAM_SEQUENTIAL only has an effect if io_method==sync.

I suspect we should retire READ_STREAM_SEQUENTIAL in 19 or so:

a) The sequential-ness detection has gotten smarter / more granular, reducing
   the need for forcing read_stream's hand.
b) There are plenty cases where READ_STREAM_SEQUENTIAL *hurts* with seqscans
   and io_method==sync, e.g. if there are plenty pre-existing buffers in s_b.
c) It doesn't have an effect with io_method != sync

But it'll depend a bit on our experiences.

Greetings,

Andres Freund

Re: Draft for basic NUMA observability

2025-04-08 Thread Tomas Vondra



On 4/8/25 16:59, Andres Freund wrote:
> Hi,
> 
> On 2025-04-08 09:35:37 -0400, Andres Freund wrote:
>> On April 8, 2025 9:21:57 AM EDT, Tomas Vondra  wrote:
>>> On 4/8/25 15:06, Andres Freund wrote:
 On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote:
 I think it's not right that something in src/port defines an SQL callable
 function. The set of headers for that pull in a lot of things.

 Since the file relies on things like GUCs, I suspect this should be in
 src/backend/port or such instead.

>>>
>>> Yeah, I think you're right, src/backend/port seems like a better place
>>> for this. I'll look into moving that in the evening.
>>
>> On a second look I wonder if just the SQL function and perhaps the page size
>> function should be moved. There are FE programs that could potentially
>> benefit from num a awareness (e.g. pgbench).
> 
> I would move pg_numa_available() to something like
> src/backend/storage/ipc/shmem.c.
> 

Makes sense, done in the attached patch.

> I wonder if pg_numa_get_pagesize() should loose the _numa_ in the name, it's
> not actually directly NUMA related? If it were pg_get_pagesize() it'd fit in
> with shmem.c or such.
> 

True. It's true it's not really "NUMA page", but page size for the shmem
segment. So renamed to pg_get_shmem_pagesize() and moved to shmem.c,
same as pg_numa_available().

The rest of pg_numa.c is moved to src/backend/port

> Or we could just make it work in FE code by making this part
> 
>   Assert(IsUnderPostmaster);
>   Assert(huge_pages_status != HUGE_PAGES_UNKNOWN);
> 
>   if (huge_pages_status == HUGE_PAGES_ON)
>   GetHugePageSize(&os_page_size, NULL);
> 
> #ifndef FRONTEND - we don't currently support using huge pages in FE programs
> after all. But querying the page size might still be useful.
> 

I don't really like this. Why shouldn't the FE program simply call
sysconf(_SC_PAGESIZE)? It'd be just confusing if in backend it'd also
verify huge page status.

> 
> Regardless of all of that, I don't think the include of fmgr.h in pg_numa.h is
> needed?
> 

Right, that was left there after moving the prototype into the .c file.


regards

-- 
Tomas Vondra
diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c b/contrib/pg_buffercache/pg_buffercache_pages.c
index c9ceba604b1..e1701bd56ef 100644
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -343,7 +343,7 @@ pg_buffercache_numa_pages(PG_FUNCTION_ARGS)
 		 * This information is needed before calling move_pages() for NUMA
 		 * node id inquiry.
 		 */
-		os_page_size = pg_numa_get_pagesize();
+		os_page_size = pg_get_shmem_pagesize();
 
 		/*
 		 * The pages and block size is expected to be 2^k, so one divides the
diff --git a/src/backend/port/Makefile b/src/backend/port/Makefile
index 47338d99229..5dafbf7c0c0 100644
--- a/src/backend/port/Makefile
+++ b/src/backend/port/Makefile
@@ -24,6 +24,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = \
 	$(TAS) \
 	atomics.o \
+	pg_numa.o \
 	pg_sema.o \
 	pg_shmem.o
 
diff --git a/src/backend/port/meson.build b/src/backend/port/meson.build
index 09d54e01d13..a9f7120aef4 100644
--- a/src/backend/port/meson.build
+++ b/src/backend/port/meson.build
@@ -2,6 +2,7 @@
 
 backend_sources += files(
   'atomics.c',
+  'pg_numa.c',
 )
 
 
diff --git a/src/port/pg_numa.c b/src/backend/port/pg_numa.c
similarity index 71%
rename from src/port/pg_numa.c
rename to src/backend/port/pg_numa.c
index 5e2523cf798..20be13f669d 100644
--- a/src/port/pg_numa.c
+++ b/src/backend/port/pg_numa.c
@@ -20,7 +20,6 @@
 #include 
 #endif
 
-#include "fmgr.h"
 #include "miscadmin.h"
 #include "port/pg_numa.h"
 #include "storage/pg_shmem.h"
@@ -36,8 +35,6 @@
 #include 
 #include 
 
-Datum		pg_numa_available(PG_FUNCTION_ARGS);
-
 /* libnuma requires initialization as per numa(3) on Linux */
 int
 pg_numa_init(void)
@@ -66,8 +63,6 @@ pg_numa_get_max_node(void)
 
 #else
 
-Datum		pg_numa_available(PG_FUNCTION_ARGS);
-
 /* Empty wrappers */
 int
 pg_numa_init(void)
@@ -89,32 +84,3 @@ pg_numa_get_max_node(void)
 }
 
 #endif
-
-Datum
-pg_numa_available(PG_FUNCTION_ARGS)
-{
-	PG_RETURN_BOOL(pg_numa_init() != -1);
-}
-
-/* This should be used only after the server is started */
-Size
-pg_numa_get_pagesize(void)
-{
-	Size		os_page_size;
-#ifdef WIN32
-	SYSTEM_INFO sysinfo;
-
-	GetSystemInfo(&sysinfo);
-	os_page_size = sysinfo.dwPageSize;
-#else
-	os_page_size = sysconf(_SC_PAGESIZE);
-#endif
-
-	Assert(IsUnderPostmaster);
-	Assert(huge_pages_status != HUGE_PAGES_UNKNOWN);
-
-	if (huge_pages_status == HUGE_PAGES_ON)
-		GetHugePageSize(&os_page_size, NULL);
-
-	return os_page_size;
-}
diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
index e10b380e5c7..0903eb50f54 100644
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -93,6 +93,8 @@ static HTAB *ShmemIndex = NULL; /* primary index hashtable for shmem */
 /* To get reliable results for NU

Re: BitmapHeapScan streaming read user and prelim refactoring

2025-04-08 Thread James Hunter

On Mon, Apr 7, 2025 at 7:34 PM Thomas Munro  wrote:
>
> On Thu, Feb 13, 2025 at 1:40 PM Melanie Plageman
>  wrote:
> > Thomas mentioned this to me off-list, and I think he's right. We
> > likely need to rethink the way parallel bitmap heap scan workers get
> > block assignments for reading and prefetching to make it more similar
> > to parallel sequential scan. The workers should probably get
> > assignments of a range of blocks. On master, each worker does end up
> > issuing reads/fadvises for a bunch of blocks in a row -- even though
> > that isn't the intent of the parallel bitmap table scan
> > implementation. We are losing some of that with the patch -- but only
> > because it is behaving as you would expect given the implementation
> > and design. I don't consider that a blocker, though.
>
> I had a crack at this one a few weeks back, and wanted to share some
> thoughts.

Fwiw, I've had reasonably good results, in a similar situation, by
just batching + double-buffering.

Since you're using AIO (as Melanie points out) there should be no real
penalty to having a single worker request the next batch of blocks,
when the current block is used up.

Then you can separate "reading" from "prefetching": whoever reads the
last block in the current batch, prefetches the next batch.

This way, you could preserve your existing "reading" logic, and you
wouldn't need to create several new, related queues.

James

Re: Some problems regarding the self-join elimination code

2025-04-08 Thread Richard Guo

On Tue, Apr 8, 2025 at 11:12 PM Dean Rasheed  wrote:
> On Tue, 8 Apr 2025 at 12:31, Andrei Lepikhov  wrote:
> > On 4/4/25 09:37, Richard Guo wrote:
> > > With more exposure to the changes in ChangeVarNodes(), I have some
> > > more concerns.  As I understand it, ChangeVarNodes() is designed to
> > > update the varno fields of Var nodes in the given tree that belong to
> > > the specific relation; neat and clear.  However, now, within this
> > > function, we also create new NullTest quals for SJE, which doesn't
> > > seem like something this function should be handling.  It makes this
> > > function look a bit like a kludge.

> > To be precise, inside the function we replace old clause with the
> > NullTest, because relid replacement action made it degenerate. It seems
> > essential to me and I think it may be ok to add a comment at the top of
> > ChangeVarNodes, describing this minor optimisation.

> I recall raising the same concerns when I originally reviewed the
> patch. I don't think this code belongs in the rewriter, because it's
> very specific to how SJE wants to handle these kinds of nodes.

Correct.  And I don't think it's this function's responsibility to
create new NullTest quals for SJE.

> Note also that RestrictInfo is defined in nodes/pathnodes.h, which
> describes its contents as internal data structures for the planner,
> suggesting that the rewriter has no business processing those kinds of
> nodes.

> I don't think simply adding a comment addresses those concerns.

Agreed.  We may need more than just a comment change.

> > > Additionally, I have a minor suggestion regarding the new boolean
> > > parameter, "change_RangeTblRef".  AFAIU, by convention, we typically
> > > use flags to control the behavior of walker functions, like in
> > > pull_var_clause.  While it's fine to use a boolean parameter here
> > > given that we currently only have one flag, for the sake of future
> > > extensibility, I think using flags might be a better choice.

> > That was exactly why I wasn't happy with replacing the change_relid
> > routine with ChangeVarNodes.
> > But variant with a flag seems non-trivial to implement. Do you have any
> > proposal about the code?

> Perhaps the way to do it is to make ChangeVarNodesExtended() take a
> callback function to be invoked for each node visited. The callback
> (which would then be in the planner, with the other SJE code) would do
> any special handling it needed (for RangeTblRef and RestrictInfo
> nodes), and call ChangeVarNodes_walker() for any other types of node,
> to get the default behaviour.

Yeah, this might be a better approach.  Perhaps we can borrow some
ideas from replace_rte_variables.

Thanks
Richard

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-04-08 Thread Peter Smith

Hi,

I was looking at the recent push for the pg_createsubscription "--all"
option [1], because I was absent for several weeks prior.

I found some minor non-functional issues as follows:

==
doc/src/sgml/ref/pg_createsubscriber.sgml


1.
+   If the database name is not specified in publisher-server, the postgres
+   database will be used, or if that does not exist, template1
will be used.

1a.
There is missing SGML markup in the description of "--all"

- "publisher-server" ==> "the --publisher-server
connection string"
- "postgres database" ==> "postgres database"
- "template1 database" ==> "template1 database"

~

1b.
That sentence "If the database name ..." doesn't give any context
about what it is referring to. I think it needs to be worded more
similarly to the vacuum --maintenance-db option [2] to say it is
connecting to this database in order to "discover" the list of
databases.

SUGGESTION
To discover the list of all databases, connect to the source server
using the database name specified in the
--publisher-server connection string, or if not
specified, the postgres database will be used, or
if that does not exist, template1 will be used.

~

1c.
I didn't think the sentence quite belongs here because it isn't
describing anything about the --all option. Instead, it is  describing
how the source server connection is made. So, I felt this might be
better put later in the "Notes/Prerequisites" section where other
source server connection information is described.

==
src/bin/pg_basebackup/pg_createsubscriber.c

get_publisher_databases:

2.
+/*
+ * Fetch a list of all not-template databases from the source server.
+ * Internally, this is treated as if the user specified multiple --database
+ * options, one for each source database.
+ */

/not-template/connectable non-template/

==

Patches are attached to implement the above.

0001. Docs some fixes to the --all description. (see #1a and #1b)
0002. Docs moves part of the --all description to the
Note/Prerequisites section. (see #1c)
0003. Fixes to code comments. (see #2)

==
[1] 
https://github.com/postgres/postgres/commit/fb2ea12f42b9453853be043b8ed107e136e1ccb7
[2] https://www.postgresql.org/docs/current/app-vacuumdb.html

Kind Regards,
Peter Smith.
Fujitsu Australia


v20250409-0001-DOCS-change-markup-and-wording-for-the-all.patch
Description: Binary data


v20250409-0003-Minor-comment-fix.patch
Description: Binary data


v20250409-0002-DOCS-move-the-description-how-the-all-conn.patch
Description: Binary data

Re: Large expressions in indexes can't be stored (non-TOASTable)

2025-04-08 Thread Euler Taveira

On Tue, Apr 8, 2025, at 5:25 PM, Nathan Bossart wrote:
> On Tue, Apr 08, 2025 at 04:44:02PM +0530, Amit Kapila wrote:
> > On Fri, Apr 4, 2025 at 7:58 PM Nathan Bossart  
> > wrote:
> >> On Fri, Apr 04, 2025 at 05:16:43PM +0530, Amit Kapila wrote:
> >> > Can we dodge adding this push call if we restrict the length of the
> >> > origin name to some reasonable limit like 256 or 512 and avoid the
> >> > need of toast altogether?
> >>
> >> We did consider just removing pg_replication_origin's TOAST table earlier
> >> [0], but we decided against it at that time.  Maybe it's worth
> >> reconsidering...
> > 
> > I don't see any good reason in that email for not removing the TOAST
> > table for pg_replication_origin. I would prefer to remove it rather
> > than add protection related to its access.
> 
> The only reason I can think of is that folks might have existing
> replication origins with extremely long names that would cause upgrades to
> fail.  While I think it would be probably be okay to remove the TOAST table
> and restrict origin names to 512 bytes (like we did for password hashes in
> commit 8275325), folks routinely complain about NAMEDATALEN, so I think we
> should be cautious here.

The logical replication creates origin names as pg_SUBOID_RELID or pg_SUBOID.
It means the maximum origin name is 24. This limited origin name also applies
to pglogical that limits the name to 54 IIRC. I think that covers the majority
of the logical replication setups. There might be a small number of custom
logical replication systems that possibly use long names for replication
origin. I've never seen a replication origin name longer than NAMEDATALEN.

If you consider that the maximum number of replication origin is limited to 2
bytes (65k distinct names), it is reasonable to restrict the replication
origin names to 512 due to the high number of combinations. We generally
expects that a catalog string uses "name" as type if it is an identifier; it
could be the case for roname if author decided to be strict.

This additional TOAST table has no or rare use. +1 for removing it. It is one
less file, one less table and one less index; in short, one less source of data
corruption. ;)

--
Euler Taveira
EDB   https://www.enterprisedb.com/

pg_createsubscriber: Adding another synopsis for the --all option

2025-04-08 Thread Peter Smith

Hi,

In another thread I had previously suggested adding a 2nd synopsis to
the pg_createsubscriber docs. See [1, comment #5].

--
CURRENT
pg_createsubscriber [option...] { -d | --database }dbname { -D |
--pgdata }datadir { -P | --publisher-server }connstr

SUGGESTION
pg_createsubscriber [option...] {-a | --all} { -D | --pgdata }datadir
{ -P | --publisher-server }connstr
pg_createsubscriber [option...] { -d | --database }dbname { -D |
--pgdata }datadir { -P | --publisher-server }connstr
--

A patch was made for doing this [2, v23-0002] but after some debate it
was omitted from the push for the --all option [3]. I was away at the
time so I missed a chance to defend why I felt adding the extra
synopsis may be a good idea. I'll try to do that now:

Nominating which databas(es) the pg_createsubscriber should process is
maybe the most fundamental option of this tool. My reason for the 2nd
synopsis boils down to the fact that "--all" option and "--database"
options are incompatible. E.g there are 2 ways to nominate the
databases: 1) specify them individually or 2) use all of them. It
already describes this in all the option descriptions, but I felt
adding the --all synopsis just makes that point more clear.

Now, vacuumdb was previously cited as a precedent to leave this
unchanged. E.g. vacuumdb doesn't have a separate synopsis for just for
"--all", so pg_createsubscriber doesn't need to either. I think that's
an invalid comparison. The vacuumdb actually does make the distinction
between specifically naming a database and saying --all:  e.g. "[
dbname | -a | --all ]". But, it can do that because as far as I know
vacuumdb only accepts a single "--dbname", whereas pg_createsubscriber
allows multiple "--database" options.

This means it would too become complex trying to include --all along
with --database in just one single pg_createsubscriber synopsis. Which
is why, I think 2 separate synopses are warranted.

Thoughts?

(I've reattached the same v23-0002 patch here because it still works)

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPuyBsOJTSygus2-yp60sw_phwYQ-JyC%2BU6fCBMos9x2LA%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAHv8RjKU24jCHR2fOHocmdSTqhu7ige5UQsUQMkaTZniLc9DbA%40mail.gmail.com
[3] 
https://github.com/postgres/postgres/commit/fb2ea12f42b9453853be043b8ed107e136e1ccb7

Kind Regards,
Peter Smith.
Fujitsu Australia


v23-0002-Synopsis-for-all-option.patch
Description: Binary data

Re: Fix 035_standby_logical_decoding.pl race conditions

2025-04-08 Thread Michael Paquier

On Tue, Apr 08, 2025 at 06:42:53AM +, Bertrand Drouvot wrote:
> Fully agree. Will need to find another way to prevent a process to wait 
> between the
> wakeup and the detach. I'll open a dedicated thread.

By the way, there is a small thing that's itching me a bit about the
change done in LogStandbySnapshot() for 105b2cb33617.  Could it be
useful for debugging to add a elog(DEBUG1) with the LSN returned by
GetInsertRecPtr() when taking the short path?  We don't have any
visibility when the shortcut path is taken, which seems annoying in
the long term if we use the injection point skip-log-running-xacts for
other tests, and I suspect that there will be some as the standby
snapshots can be really annoying in tests where we want a predictible
set of WAL records when wal_level is "replica" or "logical".
--
Michael


signature.asc
Description: PGP signature

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-04-08 Thread Peter Smith

On Wed, Mar 26, 2025 at 3:54 PM Amit Kapila  wrote:
>
> On Tue, Mar 25, 2025 at 5:30 PM Ashutosh Bapat
>  wrote:
> >
> > On Thu, Mar 20, 2025 at 5:54 PM Amit Kapila  wrote:
> > >
> > > *  
> > > +  
> > > +   pg_createsubscriber
> > > +   option
> > > +   
> > > +
> > > + -a
> > > + --all
> > > +
> > > +
> > > + -D 
> > > + --pgdata
> > > +
> > > +datadir
> > > +
> > > + -P
> > > + --publisher-server
> > > +
> > > +connstr
> > >
> > > Most of this is unrelated to this patch. I suggest making a top-up
> > > patch, we can commit it separately.
> >
> > Isn't this documenting -a/--all option in synopsis. Why do you think
> > it's unrelated?
> >
>
> I was not clear at that stage whether to include it along with the
> core patch, but today, I looked at it while responding to Euler and
> found that it is not required. We can still discuss whether to include
> it, but the main patch can be committed even without this.
>

I've created a separate thread [1] to continue the discussion about
the synopsis.

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPueY_DyuBy6SuvJev2DWJVGtg%3D9WG9WXvYQDJu39gV6TQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

RE: Fix slot synchronization with two_phase decoding enabled

2025-04-08 Thread Zhijie Hou (Fujitsu)

On Thu, Apr 3, 2025 at 3:16 PM Zhijie Hou (Fujitsu) wrote:
> 
> On Thu, Apr 3, 2025 at 1:38 PM Masahiko Sawada wrote:
> 
> >
> > On Wed, Apr 2, 2025 at 7:58 PM Amit Kapila 
> > wrote:
> > >
> > > On Thu, Apr 3, 2025 at 7:50 AM Zhijie Hou (Fujitsu)
> > >  wrote:
> > > >
> > > > On Thu, Apr 3, 2025 at 3:30 AM Masahiko Sawada wrote:
> > > >
> > > > >
> > > > > On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu)
> > > > >  wrote:
> > > > >
> > > > > Thank you for the explanation! I agree that the issue happens in
> > > > > these
> > cases.
> > > > >
> > > > > As another idea, I wonder if we could somehow defer to make the
> > > > > synced slot as 'sync-ready' until we can ensure that the slot
> > > > > doesn't have any transactions that are prepared before the point
> > > > > of enabling two_phase. For example, when the slotsync worker
> > > > > fetches the remote slot, it remembers the confirmed_flush_lsn
> > > > > (say
> > > > > LSN-1) if the local slot's two_phase becomes true or the local
> > > > > slot is newly created with enabling two_phase, and then it makes
> > > > > the slot 'sync-ready' once it confirmed that the slot's
> > > > > restart_lsn passed
> > LSN-1. Does it work?
> > > >
> > > > Thanks for the idea!
> > > >
> > > > We considered a similar approach in [1] to confirm there is no
> > > > prepared transactions before two_phase_at, but the issue is that
> > > > when the two_phase flag is switched from 'false' to 'true' (as in
> > > > the case with (copy_data=true, failover=true, two_phase=true)). In
> > > > this case, the slot may have already been marked as sync-ready
> > > > before the two_phase flag is enabled, as slotsync is unaware of
> > > > potential
> > future changes to the two_phase flag.
> > >
> > > This can happen because when copy_data is true, tablesync can take a
> > > long time to complete the sync and in the meantime, slot without a
> > > two_phase flag would have been synced to standby. Such a slot would
> > > be marked as sync-ready even if we follow the calculation proposed
> > > by Sawada-san. Note that we enable two_phase once all the tables are
> > > in ready state (See run_apply_worker() and comments atop worker.c
> > > (TWO_PHASE TRANSACTIONS)).
> >
> > Right. It doesn't make sense to make the slot not-sync-ready and then
> > back to sync-ready.
> >
> > While I agree with the approach for HEAD and it seems difficult to
> > find a solution, I'm concerned that disallowing to use both failover
> > and two_phase in a minor release would affect users much. Users who
> > are already using that combination might end up needing to re-think
> > their system architecture. So I'm trying to narrow down use cases
> > where we're going to prohibit or to find workarounds.

We analyzed more for the backbranch fix, and here is a summary of different
approaches that could be used for PG17.

--
Approach 1
--

This is the original approach implemented in V4 patch.

In this approach, we entirely disallow enabling both failover and the two-phase
feature together for a replication slot or subscription.

pros:

This restriction is simple to implement and easy for users to comprehend.

cons:

Users would be unable to use the two-phase feature in conjunction with
failover.

Following the upgrade to a new release with this fix, existing subscriptions
that have both failover and two-phase enabled would require manual re-creation,
which is time-consuming.

--
Approach 2
--

Instead of disallowing the use of two-phase and failover together, a more
flexible strategy could be only restrict failover for slots with two-phase
enabled when there's a possibility of existing prepared transactions before the
two_phase_at that are not yet replicated. During slot creation with two-phase
and failover, we could check for any decoded prepared transactions when
determining the decoding start point (DecodingContextFindStartpoint). For
subsequent attempts to alter failover to true, we ensure that two_phase_at is
less than restart_lsn, indicating that all prepared transactions have been
committed and replicated, thus the bug would not happen.

pros:

This method minimizes restrictions for users. Especially during slot creation
with (two_phase=on, failover=on), as it’s uncommon for transactions to prepare
during consistent snapshot creation, the restriction becomes almost
unnoticeable.

Users are not always forced to re-create subscriptions post-upgrade.

cons:

The logic involved for (restart_lsn > two_phase_at) might be less intuitive for
users.

After upgrading, it's recommended that users verify whether two_phase_at for a
source slot is less than restart_lsn of the synced slot. If it isn't, users
should be careful when using the synced slot on a standby. This might be
complex to understand.

--
Approach 3
--

This approach is similar to Approach 2 but simplifies some aspects by avoiding
checks for prepared transactions during slot creation. It disallows enabling
both

Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-04-08 Thread jian he

hi.

attached patch is for address pg_dump inconsistency
when parent is "not null not valid" while child is "not null".


The following query before/after pg_dump should return the same result.
select conrelid::regclass::text, conname, convalidated, coninhcount,
conislocal, conparentid, contype
from pg_constraint
where conrelid::regclass::text = ANY('{inhnn, inhnn_cc, inhnn_cc_1}')
order by 1,2;

--test cases:
CREATE TABLE inhnn (a INTEGER);
ALTER TABLE inhnn ADD CONSTRAINT cc not null a NOT VALID;
CREATE TABLE inhnn_cc(a INTEGER) INHERITS(inhnn);
CREATE TABLE inhnn_cc_1(a INTEGER) INHERITS(inhnn_cc, inhnn);

master pg_dump output is:

CREATE TABLE public.inhnn (a integer);
CREATE TABLE public.inhnn_cc (a integer) INHERITS (public.inhnn);
CREATE TABLE public.inhnn_cc_1 (a integer) INHERITS (public.inhnn_cc,
public.inhnn);
ALTER TABLE public.inhnn ADD CONSTRAINT cc NOT NULL a NOT VALID;


with the attached patch, pg_dump output is:
CREATE TABLE public.inhnn (a integer);

CREATE TABLE public.inhnn_cc (a integer CONSTRAINT cc NOT NULL)
INHERITS (public.inhnn);
CREATE TABLE public.inhnn_cc_1 (a integer CONSTRAINT cc NOT NULL)
INHERITS (public.inhnn_cc, public.inhnn);

ALTER TABLE public.inhnn ADD CONSTRAINT cc NOT NULL a NOT VALID;
-

As you can see, in master, pg_dump will make {inhnn, inhnn_cc, inhnn_cc_1}
not-null constraint's pg_constraint.convalidated set as false.
but we should only make inhnn's not-null constraint convalidated as false.
From 9505f36287403aa8efd7642dddf71b77996796dd Mon Sep 17 00:00:00 2001
From: jian he 
Date: Wed, 9 Apr 2025 13:07:58 +0800
Subject: [PATCH v1 1/1] pg_dump not null not valid

make sure pg_dump have the same pg_constraint meta before
and after pg_dump.

if the parent not-null constraint is invalid, child is valid.
then pg_dump need locally print the not-null constraint on that child
too.
otherwise pg_dump may make child's convalidated may set to false.

that means we also need adjust conislocal in AdjustNotNullInheritance.
---
 src/backend/catalog/pg_constraint.c | 10 ++
 src/bin/pg_dump/common.c|  4 
 src/bin/pg_dump/pg_dump.c   | 17 +
 src/bin/pg_dump/pg_dump.h   |  5 +
 4 files changed, 36 insertions(+)

diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index 2f73085961b..9e65b96143f 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -776,6 +776,16 @@ AdjustNotNullInheritance(Oid relid, AttrNumber attnum,
 ereport(ERROR,
 		errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
 		errmsg("too many inheritance parents"));
+
+			/*
+			 * If the child already has a valid constraint and we are
+			 * creating an invalid one with same definition on it.  The
+			 * child's constraint will remain valid, but can no longer be
+			 * marked as local.
+			*/
+			if (is_notvalid && conform->convalidated && conform->conenforced)
+conform->conislocal = false;
+
 			changed = true;
 		}
 		else if (!conform->conislocal)
diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 56b6c368acf..ff6a4eacda0 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -546,6 +546,10 @@ flagInhAttrs(Archive *fout, DumpOptions *dopt, TableInfo *tblinfo, int numTables
 		parent->notnull_constrs[inhAttrInd] != NULL)
 		foundNotNull = true;
 
+	if (fout->remoteVersion >= 18 &&
+		parent->notnull_invalid[inhAttrInd])
+		tbinfo->notnull_parent_invalid[j] = true;
+
 	foundDefault |= (parentDef != NULL &&
 	 strcmp(parentDef->adef_expr, "NULL") != 0 &&
 	 !parent->attgenerated[inhAttrInd]);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 25264f8c9fb..8d131523366 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -9255,6 +9255,8 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
 		tbinfo->attfdwoptions = (char **) pg_malloc(numatts * sizeof(char *));
 		tbinfo->attmissingval = (char **) pg_malloc(numatts * sizeof(char *));
 		tbinfo->notnull_constrs = (char **) pg_malloc(numatts * sizeof(char *));
+		tbinfo->notnull_invalid = (bool *) pg_malloc(numatts * sizeof(bool));
+		tbinfo->notnull_parent_invalid = (bool *) pg_malloc(numatts * sizeof(bool));
 		tbinfo->notnull_noinh = (bool *) pg_malloc(numatts * sizeof(bool));
 		tbinfo->notnull_islocal = (bool *) pg_malloc(numatts * sizeof(bool));
 		tbinfo->attrdefs = (AttrDefInfo **) pg_malloc(numatts * sizeof(AttrDefInfo *));
@@ -9280,6 +9282,8 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
 			tbinfo->attlen[j] = atoi(PQgetvalue(res, r, i_attlen));
 			tbinfo->attalign[j] = *(PQgetvalue(res, r, i_attalign));
 			tbinfo->attislocal[j] = (PQgetvalue(res, r, i_attislocal)[0] == 't');
+			tbinfo->notnull_parent_invalid[j] = false;	/* it only change in flagInhAttrs */
+			tbinfo->notnull_invalid[j] = false;			/* it only change in determineNotNullFlags */

Re: [PoC] Reducing planning time when tables have many partitions

2025-04-08 Thread David Rowley

On Wed, 9 Apr 2025 at 17:09, Amit Langote  wrote:
> Should the following paragraph in src/backend/optimizer/README be
> updated to reflect the new reality after recent changes?
>
> An EquivalenceClass can contain "em_is_child" members, which are copies
> of members that contain appendrel parent relation Vars, transposed to
> contain the equivalent child-relation variables or expressions. These
> members are not full-fledged members of the EquivalenceClass and do not
> affect the class's overall properties at all. They are kept only to
> simplify matching of child-relation expressions to EquivalenceClasses.
> Most operations on EquivalenceClasses should ignore child members.
>
> The part about these being in the EquivalenceClass might be worth
> rewording now that we keep them in a separate array.

I did read over that as part of the search I did for things that need
to be updated, but I didn't see the need to adjust anything since the
text doesn't talk about where the members are stored.  The only thing
I see as a hint to that is the final sentence.

If the README is light on documentation about where members are
stored, do we really need to start detailing that because of this
change?  I've tried to be fairly comprehensive about where members are
stored in the header comment for struct EquivalenceClass. Wouldn't
stating something similar in the README just be duplicating that?  I
always think of the READMEs as more of an overview on how things fit
together with some high-level theory. I think talking about data
structures might be a bit too much detail.

I'm happy to view wording suggestions if you think we need to detail
this further. Maybe there's something that can be adjusted without
going into too much depth.

David

Re: [PoC] Reducing planning time when tables have many partitions

2025-04-08 Thread Ashutosh Bapat

On Wed, Apr 9, 2025 at 10:51 AM David Rowley  wrote:
>
> On Wed, 9 Apr 2025 at 17:09, Amit Langote  wrote:
> > Should the following paragraph in src/backend/optimizer/README be
> > updated to reflect the new reality after recent changes?
> >
> > An EquivalenceClass can contain "em_is_child" members, which are copies
> > of members that contain appendrel parent relation Vars, transposed to
> > contain the equivalent child-relation variables or expressions. These
> > members are not full-fledged members of the EquivalenceClass and do not
> > affect the class's overall properties at all. They are kept only to
> > simplify matching of child-relation expressions to EquivalenceClasses.
> > Most operations on EquivalenceClasses should ignore child members.
> >
> > The part about these being in the EquivalenceClass might be worth
> > rewording now that we keep them in a separate array.
>
> I did read over that as part of the search I did for things that need
> to be updated, but I didn't see the need to adjust anything since the
> text doesn't talk about where the members are stored.  The only thing
> I see as a hint to that is the final sentence.
>
> If the README is light on documentation about where members are
> stored, do we really need to start detailing that because of this
> change?  I've tried to be fairly comprehensive about where members are
> stored in the header comment for struct EquivalenceClass. Wouldn't
> stating something similar in the README just be duplicating that?  I
> always think of the READMEs as more of an overview on how things fit
> together with some high-level theory. I think talking about data
> structures might be a bit too much detail.

This change didn't require us to change the README indicates that the
README is at the right level. The code changes internally reorganize
how child EMs are stored within an EC, so changing the comments for
relevant structures seems good enough, not a change that should bubble
up to the README. The only change in the EC interface is the addition
of a new iterator method - maybe we could document that in README but
that too seems more detail than what README is about.

-- 
Best Wishes,
Ashutosh Bapat

Re: Changing shared_buffers without restart

2025-04-08 Thread Ashutosh Bapat

On Mon, Apr 7, 2025 at 2:13 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> In the new v4 version
> of the patch the first option is implemented.
>

The patches don't apply cleanly using git am but patch -p1 applies
them cleanly. However I see following compilation errors
RuntimeError: command "ninja" failed with error [1/1954] Generating
src/include/utils/errcodes with a custom command
[2/1954] Generating src/include/storage/lwlocknames_h with a custom command
[3/1954] Generating src/include/utils/wait_event_names with a custom command
[4/1954] Compiling C object src/port/libpgport.a.p/pg_popcount_aarch64.c.o
[5/1954] Compiling C object src/port/libpgport.a.p/pg_numa.c.o
FAILED: src/port/libpgport.a.p/pg_numa.c.o
cc -Isrc/port/libpgport.a.p -Isrc/include
-I../../coderoot/pg/src/include -fdiagnostics-color=always
-D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -g
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE
-Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wimplicit-fallthrough=3
-Wcast-function-type -Wshadow=compatible-local -Wformat-security
-Wdeclaration-after-statement -Wno-format-truncation
-Wno-stringop-truncation -fPIC -DFRONTEND -MD -MQ
src/port/libpgport.a.p/pg_numa.c.o -MF
src/port/libpgport.a.p/pg_numa.c.o.d -o
src/port/libpgport.a.p/pg_numa.c.o -c
../../coderoot/pg/src/port/pg_numa.c
In file included from ../../coderoot/pg/src/include/storage/spin.h:54,
 from
../../coderoot/pg/src/include/storage/condition_variable.h:26,
 from ../../coderoot/pg/src/include/storage/barrier.h:22,
 from ../../coderoot/pg/src/include/storage/pg_shmem.h:27,
 from ../../coderoot/pg/src/port/pg_numa.c:26:
../../coderoot/pg/src/include/storage/s_lock.h:93:2: error: #error
"s_lock.h may not be included from frontend code"
   93 | #error "s_lock.h may not be included from frontend code"
  |  ^
In file included from ../../coderoot/pg/src/port/pg_numa.c:26:
../../coderoot/pg/src/include/storage/pg_shmem.h:66:9: error: unknown
type name ‘pg_atomic_uint32’
   66 | pg_atomic_uint32NSharedBuffers;
  | ^~~~
../../coderoot/pg/src/include/storage/pg_shmem.h:68:9: error: unknown
type name ‘pg_atomic_uint64’
   68 | pg_atomic_uint64Generation;
  | ^~~~
../../coderoot/pg/src/port/pg_numa.c: In function ‘pg_numa_get_pagesize’:
../../coderoot/pg/src/port/pg_numa.c:117:17: error: too few arguments
to function ‘GetHugePageSize’
  117 | GetHugePageSize(&os_page_size, NULL);
  | ^~~
In file included from ../../coderoot/pg/src/port/pg_numa.c:26:
../../coderoot/pg/src/include/storage/pg_shmem.h:127:13: note: declared here
  127 | extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags,
  | ^~~

-- 
Best Wishes,
Ashutosh Bapat

Re: [PoC] Reducing planning time when tables have many partitions

2025-04-08 Thread Tom Lane

Amit Langote  writes:
> Should the following paragraph in src/backend/optimizer/README be
> updated to reflect the new reality after recent changes?

> An EquivalenceClass can contain "em_is_child" members, which are copies
> of members that contain appendrel parent relation Vars, transposed to
> contain the equivalent child-relation variables or expressions.

Hm.  They still are "in" the EquivalenceClass in a very real sense;
there is no other data structure that links to them.  So this isn't
incorrect.  I do get your feeling that maybe some rewording is
warranted, but I'm not sure what.

regards, tom lane

Re: Memoize ANTI and SEMI JOIN inner

2025-04-08 Thread Richard Guo

On Thu, Mar 20, 2025 at 3:02 PM David Rowley  wrote:
> For making this work, I think the attached should be about the guts of
> the code changes. I didn't look at the comments. Right now I can't
> think of any reason why this can't be done, but some experimentation
> might reveal some reason that it can't.

I conducted some experiments, and I'm afraid it's not safe to consider
Memoize for semi or anti joins, unless the inner side is provably
unique.  As an example, please consider:

create table t (a int, b boolean);
insert into t select i%3, false from generate_series(1,100)i;
analyze t;

select * from t t1 where t1.a in
 (select a from t t2 where t2.b in
  (select t1.b from t t3 where t2.a > 1 offset 0));
ERROR:  cache entry already complete

With the proposed patch, this query results in an error.

The problem is that join clauses from the upper level may be moved to
the semi join.  For a given outer tuple, the first inner tuple that
satisfies the current parameters will mark the cache entry as complete
because singlerow is set to true.  However, if that inner tuple and
the current outer tuple don't satisfy the join clauses, the second
inner tuple that satisfies the parameters will complain that the cache
entry is already marked as complete.

If the inner side is provably unique, there will be no such problem,
as there won't be a second matching tuple.  OTOH, in this case, the
semi join will be reduced to an inner join by reduce_unique_semijoins.
Therefore, it doesn't make much sense to prove inner_unique for semi
joins in add_paths_to_joinrel.

Perhaps we could spend some planner cycles proving inner_unique for
anti joins, so that Memoize nodes can be considered for them?

Thanks
Richard

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-08 Thread Bruce Momjian

On Tue, Apr  8, 2025 at 06:57:18PM +0200, Wolfgang Walther wrote:
> Jacob Champion:
> > On Tue, Apr 8, 2025 at 9:32 AM Wolfgang Walther  
> > wrote:
> > > And that should also not be a problem for distributions - they could 
> > > offer a libpq and a libpq_oauth package, where only one of them can be 
> > > installed at the same time, I guess? *
> > My outsider understanding is that maintaining this sort of thing
> > becomes a major headache, because of combinatorics. You don't really
> > want to ship a libpq and libpq-with-gss and libpq-with-oauth and
> > libpq-with-oauth-and-gss and ...
> 
> That would only be the case, if you were to consider those other
> dependencies as "dangerous" as cURL. But we already depend on them. So if
> it's really the case that cURL is that much worse, that we consider loading
> it as a module... then the combinatorics should not be a problem either.
> 
> However, if the other deps are considered problematic as well, then the ship
> has already sailed, and there is not point for a special case here anymore.

Yes, I think this is what I am asking too.  For me it was curl's
security reputation and whether that would taint the security reputation
of libpq. For Tom, I think it was the dependency additions.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-08 Thread Bruce Momjian

On Tue, Apr  8, 2025 at 09:17:03AM -0700, Jacob Champion wrote:
> On Tue, Apr 8, 2025 at 9:14 AM Bruce Momjian  wrote:
> > How does this patch help us avoid having to handle curl CVEs and its
> > curl's additional dependencies?  As I understand the patch, it makes
> > libpq _not_ have additional dependencies but moves the dependencies to a
> > special loadable library that libpq can use.
> 
> It allows packagers to ship the OAuth library separately, so end users
> that don't want the additional exposure don't have to install it at
> all.

Okay, so how would they do that?  I understand how that would happen if
it was an external extension, but how if it is under /src or /contrib.

FYI, I see a good number of curl CVEs:

https://curl.se/docs/security.html

Would we have to put out minor releases for curl CVEs?  I don't think we
have to for OpenSSL so would curl be the same?

I am asking these questions now so we can save time in getting this
closed.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: Enhancing Memory Context Statistics Reporting

2025-04-08 Thread Andres Freund

Hi,

On 2025-04-08 01:17:17 +0200, Daniel Gustafsson wrote:
> > On 7 Apr 2025, at 17:43, Andres Freund  wrote:
> 
> >> + /*
> >> + * Hold the process lock to protect writes to process specific memory. Two
> >> + * processes publishing statistics do not block each other.
> >> + */
> > 
> > s/specific/process specific/
> 
> That's what it says though.. isn't it? I might be missing something obvious.

Understandable confusion, not sure what my brain was doing anymore
either...



> >> +} MemoryContextState;
> > 
> > IMO that's too generic a name for something in a header.
> > 
> >> +} MemoryContextId;
> > 
> > This too.  Particularly because MemoryContextData->ident exist but is
> > something different.
> 
> Renamed both to use MemoryContextReporting* namespace, which leaves
> MemoryContextReportingBackendState at an unwieldly long name.  I'm running out
> of ideas on how to improve and it does make purpose quite explicit at least.

How about

MemoryContextReportingBackendState -> MemoryStatsBackendState
MemoryContextReportingId -> MemoryStatsContextId
MemoryContextReportingSharedState -> MemoryStatsCtl
MemoryContextReportingStatsEntry -> MemoryStatsEntry


> >> + /* context id starts with 1 */
> >> + entry->context_id = ++(*stats_count);
> > 
> > Given that we don't actually do anything here relating to starting with 1, I
> > find that comment confusing.
> 
> Reworded, not sure if it's much better tbh.

I'd probably just remove the comment.


> > Hm. First I thought we'd leak memory if this second (and subsequent)
> > dsa_allocate failed. Then I thought we'd be ok, because the memory would be
> > memory because it'd be reachable from memCtxState[idx].memstats_dsa_pointer.
> > 
> > But I think it wouldn't *quite* work, because memCtxState[idx].total_stats 
> > is
> > only set *after* we would have failed.
> 
> Keeping a running total in .total_stats should make the leak window smaller.

Why not just initialize .total_stats *before* calling any fallible code?
Afaict it's zero-allocated, so the free function should have no problem
dealing with the entries that haven't yet been populated/

Greetings,

Andres Freund

Re: Modern SHA2- based password hashes for pgcrypto

2025-04-08 Thread Andres Freund

Hi,

On 2025-04-07 09:09:39 +0200, Bernd Helmle wrote:
> Am Sonntag, dem 06.04.2025 um 23:02 -0400 schrieb Andres Freund:
> > On 2025-04-05 19:22:58 +0200, Alvaro Herrera wrote:
> > > I have pushed this now, hoping it won't explode.
> > 
> > I have a WIP patch that adds gcc specific allocator attributes for
> > palloc et
> > al. Just rebased that. It warns on the new code as follows:
> > 
> > [1489/1804 42  82%] Compiling C object
> > contrib/pgcrypto/pgcrypto.so.p/crypt-sha.c.o
> > ../../../../../home/andres/src/postgresql/contrib/pgcrypto/crypt-
> > sha.c: In function 'px_crypt_shacrypt':
> > ../../../../../home/andres/src/postgresql/contrib/pgcrypto/crypt-
> > sha.c:605:13: warning: pointer 'cp' may be used after 'pfree' [-Wuse-
> > after-free]
> >   605 | *cp = '\0';
> >   | ^~
> > ../../../../../home/andres/src/postgresql/contrib/pgcrypto/crypt-
> > sha.c:533:9: note: call to 'pfree' here
> >   533 | pfree(s_bytes);
> >   | ^~
> > 
> > And it sure seems to have a point.  I'm surprised this isn't causing
> > wider
> > issues...
> 
> Indeed. I think this assignment is useless anyways, since s_bytes is
> already allocated with palloc0. I must have overseen this one when
> rearranging code...but yes, strange that it didn't cause drama.

Valgrind did find it actually, I just missed it due to other failures:

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2025-04-05%2020%3A52%3A38&stg=recovery-check

==2917549== VALGRINDERROR-BEGIN
==2917549== Invalid write of size 1
==2917549==at 0x11D4AD9C: px_crypt_shacrypt (crypt-sha.c:605)
==2917549==by 0x11D54B50: run_crypt_sha (px-crypt.c:76)
==2917549==by 0x11D54BCB: px_crypt (px-crypt.c:119)
==2917549==by 0x11D4BCBB: pg_crypt (pgcrypto.c:228)
==2917549==by 0x3BBCEA: ExecInterpExpr (execExprInterp.c:1001)
==2917549==by 0x3B790E: ExecInterpExprStillValid (execExprInterp.c:2299)
==2917549==by 0x48FD40: ExecEvalExprSwitchContext (executor.h:466)
==2917549==by 0x48FD40: evaluate_expr (clauses.c:5012)
==2917549==by 0x48FF08: evaluate_function (clauses.c:4519)
==2917549==by 0x492054: simplify_function (clauses.c:4108)
==2917549==by 0x4901FA: eval_const_expressions_mutator (clauses.c:2593)
==2917549==by 0x432CDE: expression_tree_mutator_impl (nodeFuncs.c:3486)
==2917549==by 0x48FFB9: eval_const_expressions_mutator (clauses.c:3727)
==2917549==  Address 0x82fa740 is 912 bytes inside a recently re-allocated 
block of size 8,192 alloc'd
==2917549==at 0x4844818: malloc (in 
/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==2917549==by 0x71EF75: AllocSetContextCreateInternal (aset.c:444)
==2917549==by 0x3D17AE: CreateExprContextInternal (execUtils.c:260)
==2917549==by 0x3D1CA8: CreateExprContext (execUtils.c:310)
==2917549==by 0x3D1ED8: MakePerTupleExprContext (execUtils.c:462)
==2917549==by 0x48FDBF: evaluate_expr (clauses.c:5013)
==2917549==by 0x48FF08: evaluate_function (clauses.c:4519)
==2917549==by 0x492054: simplify_function (clauses.c:4108)
==2917549==by 0x4901FA: eval_const_expressions_mutator (clauses.c:2593)
==2917549==by 0x432CDE: expression_tree_mutator_impl (nodeFuncs.c:3486)
==2917549==by 0x48FFB9: eval_const_expressions_mutator (clauses.c:3727)
==2917549==by 0x432EFD: expression_tree_mutator_impl (nodeFuncs.c:3572)
==2917549== 
==2917549== VALGRINDERROR-END
{
   
   Memcheck:Addr1
   fun:px_crypt_shacrypt
   fun:run_crypt_sha
   fun:px_crypt
   fun:pg_crypt
   fun:ExecInterpExpr
   fun:ExecInterpExprStillValid
   fun:ExecEvalExprSwitchContext
   fun:evaluate_expr
   fun:evaluate_function
   fun:simplify_function
   fun:eval_const_expressions_mutator
   fun:expression_tree_mutator_impl
   fun:eval_const_expressions_mutator
}
**2917549** Valgrind detected 1 error(s) during execution of "SELECT crypt('', 
'$5$Szzz0yzz');"

Greetings,

Andres

Re: Draft for basic NUMA observability

2025-04-08 Thread Tomas Vondra

On 4/8/25 15:06, Andres Freund wrote:
> Hi,
> 
> On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote:
>> On Mon, 7 Apr 2025 at 23:00, Tomas Vondra  wrote:
>>> I'll let the CI run the tests on it, and
>>> then will push, unless someone has more comments.
>>>
>>
>>
>> Hi! I noticed strange failure after this commit[0]
>>
>> Looks like it is related to 65c298f61fc70f2f960437c05649f71b862e2c48
>>
>> In file included from  [01m [K../pgsql/src/include/postgres.h:49 [m [K,
>>  from  [01m [K../pgsql/src/port/pg_numa.c:16 [m [K:
>>  [01m [K../pgsql/src/include/utils/elog.h:79:10: [m [K
>>  [01;31m [Kfatal error:  [m [Kutils/errcodes.h: No such file or
>> directory
>>79 | #include  [01;31m [K"utils/errcodes.h" [m [K
>>   |   [01;31m [K^~ [m [K
>> compilation terminated.
> 
> $ ninja -t missingdeps
> Missing dep: src/port/libpgport.a.p/pg_numa.c.o uses 
> src/include/utils/errcodes.h (generated by CUSTOM_COMMAND)
> Missing dep: src/port/libpgport_shlib.a.p/pg_numa.c.o uses 
> src/include/utils/errcodes.h (generated by CUSTOM_COMMAND)
> Processed 2384 nodes.
> Error: There are 2 missing dependency paths.
> 2 targets had depfile dependencies on 1 distinct generated inputs (from 1 
> rules)  without a non-depfile dep path to the generator.
> There might be build flakiness if any of the targets listed above are built 
> alone, or not late enough, in a clean output directory.
> 
> 
> I think it's not right that something in src/port defines an SQL callable
> function. The set of headers for that pull in a lot of things.
> 
> Since the file relies on things like GUCs, I suspect this should be in
> src/backend/port or such instead.
> 

Yeah, I think you're right, src/backend/port seems like a better place
for this. I'll look into moving that in the evening.


regards

-- 
Tomas Vondra

Re: Large expressions in indexes can't be stored (non-TOASTable)

2025-04-08 Thread Michael Paquier

On Tue, Apr 08, 2025 at 11:21:31PM -0300, Euler Taveira wrote:
> The logical replication creates origin names as pg_SUBOID_RELID or pg_SUBOID.
> It means the maximum origin name is 24. This limited origin name also applies
> to pglogical that limits the name to 54 IIRC. I think that covers the majority
> of the logical replication setups. There might be a small number of custom
> logical replication systems that possibly use long names for replication
> origin. I've never seen a replication origin name longer than NAMEDATALEN.

pg_replication_origin_create() can be used with text as input for the
origin name, still your argument sounds sensible here as I would
suspect that most setups of logical replication are these.

> If you consider that the maximum number of replication origin is limited to 2
> bytes (65k distinct names), it is reasonable to restrict the replication
> origin names to 512 due to the high number of combinations. We generally
> expects that a catalog string uses "name" as type if it is an identifier; it
> could be the case for roname if author decided to be strict.

I would be more cautious than a limit on NAMEDATALEN.  The restriction
suggested by Nathan at 512 bytes should be plenty enough.

> This additional TOAST table has no or rare use. +1 for removing it. It is one
> less file, one less table and one less index; in short, one less source of 
> data
> corruption. ;)

I guess that's the consensus, then.  No objections to the removal here.
--
Michael


signature.asc
Description: PGP signature

Re: Remove unnecessary static type qualifiers

2025-04-08 Thread Japin Li

On Wed, 09 Apr 2025 at 10:34, Junwang Zhao  wrote:
> On Wed, Apr 9, 2025 at 5:22 AM David Rowley  wrote:
>>
>> On Wed, 9 Apr 2025 at 03:46, Peter Eisentraut  wrote:
>> > To avoid creating an array on the stack, you could maybe write it with a
>> > pointer instead, like:
>> >
>> > const char * const query = "...";
>> >
>> > I haven't tested whether that results in different or better compiled
>> > code.  The original code is probably good enough.
>>
>> I expect it's been done the way it has to make the overflow detection
>> code work. The problem with having a pointer to a string constant is
>> that sizeof() will return the size of the pointer and not the space
>> needed to store the string.
>>
>> We can get rid of the variable and make the overflow work by checking
>> the return length of snprintf. I think that's all C99 spec now...
>>
>> len = snprintf(qbuf, "set client_encoding to '%s'", encoding);
>> /* check query buffer overflow */
>> if (len >= sizeof(qbuf))
>> return -1;
>>
>
> Agree, this is a better coding style. Please see if the following code makes
> sense to you.
>
> diff --git a/src/interfaces/libpq/fe-connect.c
> b/src/interfaces/libpq/fe-connect.c
> index 0258d9ace3c..247d079faa6 100644
> --- a/src/interfaces/libpq/fe-connect.c
> +++ b/src/interfaces/libpq/fe-connect.c
> @@ -7717,9 +7717,9 @@ int
>  PQsetClientEncoding(PGconn *conn, const char *encoding)
>  {
> charqbuf[128];
> -   static const char query[] = "set client_encoding to '%s'";
> PGresult   *res;
> int status;
> +   int len;
>
> if (!conn || conn->status != CONNECTION_OK)
> return -1;
> @@ -7731,12 +7731,11 @@ PQsetClientEncoding(PGconn *conn, const char 
> *encoding)
> if (strcmp(encoding, "auto") == 0)
> encoding =
> pg_encoding_to_char(pg_get_encoding_from_locale(NULL, true));
>
> -   /* check query buffer overflow */
> -   if (sizeof(qbuf) < (sizeof(query) + strlen(encoding)))
> +   len = snprintf(qbuf, sizeof(qbuf), "set client_encoding to
> '%s'", encoding);
> +   if (len >= sizeof(qbuf))
> return -1;
>
> /* ok, now send a query */
> -   sprintf(qbuf, query, encoding);
> res = PQexec(conn, qbuf);
>
> if (res == NULL)
>

Thank you for all the explanations! The above code looks good to me, except
for a minor issue; since snprintf may return a negative value, should we
check for this?

-- 
Regrads,
Japin Li

Re: Remove unnecessary static type qualifiers

2025-04-08 Thread Tom Lane

Junwang Zhao  writes:
> On Tue, Apr 8, 2025 at 4:29 PM Japin Li  wrote:
>> -   static const char query[] = "set client_encoding to '%s'";
>> +   const char  query[] = "set client_encoding to '%s'";

> I doubt that, if you remove the *static*, it will allocate more memory
> on stack and need more instructions to copy the string to that area.

Yeah, it's not an improvement as written.  This might be OK:

>> +   const char  *query = "set client_encoding to '%s'";

But I kinda doubt it's worth touching.

regards, tom lane

Re: Horribly slow pg_upgrade performance with many Large Objects

2025-04-08 Thread Nathan Bossart

On Tue, Apr 08, 2025 at 12:37:43PM -0400, Tom Lane wrote:
> Hannu Krosing  writes:
>> I think we do preserve role oids
> 
> Oh ... I'd been looking for mentions of "role" in
> pg_upgrade_support.c, but what I should have looked for was
> "pg_authid".  So yeah, we do preserve role OIDs, and maybe that's
> enough to make this workable, at least with source versions that
> share the same rules for what goes into pg_largeobject_metadata and
> pg_shdepend.  It's not something I'd risk back-patching though.

I do think it's worth considering going back to copying
pg_largobject_metadata's files for upgrades from v16 and newer.  That
sounds restrictive at the moment, but it'll mean that all but one supported
major version can copy the files during upgrade to v19.  I'll admit I'm a
tad worried about having to go back to copying via SQL commands in the
future and re-regressing things (leading to unpredictable differences in
upgrade downtime), but I'm not sure that's a great reason to withhold this
optimization.

Of course, I wouldn't be opposed to optimizing the SQL command strategy,
too, but I suspect that won't compare to copying the files.

-- 
nathan

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-04-08 Thread Jacob Champion

On Tue, Apr 8, 2025 at 9:32 AM Wolfgang Walther  wrote:
> And that should also not be a problem for distributions - they could offer a 
> libpq and a libpq_oauth package, where only one of them can be installed at 
> the same time, I guess? *

My outsider understanding is that maintaining this sort of thing
becomes a major headache, because of combinatorics. You don't really
want to ship a libpq and libpq-with-gss and libpq-with-oauth and
libpq-with-oauth-and-gss and ...

--Jacob

Re: Horribly slow pg_upgrade performance with many Large Objects

2025-04-08 Thread Tom Lane

Nathan Bossart  writes:
> I do think it's worth considering going back to copying
> pg_largobject_metadata's files for upgrades from v16 and newer.

(If we do this) I don't see why we'd need to stop at v16.  I'm
envisioning that we'd use COPY, which will be dealing in the
text representation of aclitems, and I don't think that's changed
in a long time.  The sort of thing that would break it is changes
in the set of available/default privilege bits for large objects.

That is, where the dump currently contains something like

SELECT pg_catalog.lo_create('2121');
ALTER LARGE OBJECT 2121 OWNER TO postgres;
GRANT ALL ON LARGE OBJECT 2121 TO joe;

we'd have

COPY pg_largeobject_metadata FROM STDIN;
...
212110  {postgres=rw/postgres,joe=rw/postgres}
...

and some appropriate COPY data for pg_shdepend too.

(The fact that this representation will contain both numeric and
symbolic role OIDs is why I was concerned about OID stability.)

The key thing that worries me here is if the source and target
versions have different ideas of which roles are pinned, which would
silently change what appears in pg_shdepend.  But it'd only really
break if a role mentioned in some LO's owner or ACL is pinned in the
source and not in the target, which seems unlikely.  (In the other
direction, we'd just be adding a useless row in pg_shdepend.)

regards, tom lane

Re: Horribly slow pg_upgrade performance with many Large Objects

2025-04-08 Thread Tom Lane

Hannu Krosing  writes:
> In copy case I would expect the presence of grants to not make much 
> difference.

aclitemin is slower than a lot of other datatype input functions,
but it's still got to be faster than a GRANT.

regards, tom lane

1 2 >

1 - 100 of 149 matches

Mail list logo