RE: [WIP]Vertical Clustered Index (columnar store extension) - take2
Hi Yura san, > I just don't get, why it should be "in-memory"? All the same things you > describe further, but storing in paged index on-disk with caching > through shared_buffers - why this way it wouldn't work? We make the columnar store resident in memory for maximum search performance. But I'm not very particular about this. Comments are welcome. Best regards, Aya Iwata FUJITSU LIMITED > -Original Message- > From: Yura Sokolov > Sent: Wednesday, January 15, 2025 11:44 PM > To: pgsql-hackers@lists.postgresql.org > Subject: Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 > > 07.10.2024 17:53, Aya Iwata (Fujitsu) wrote: > > Hi All, > > > > Suggestions > > > > == > > > > When analyzing real-time data collected by PostgreSQL, > > > > it can be difficult to tune the current PostgreSQL server for > > satisfactory performance. > > > > Therefore, we propose Vertical Clustered Indexing (VCI), an in-memory > > column store function that holds data in a state suitable for business > > analysis and is also expected to improve analysis performance. > > I just don't get, why it should be "in-memory"? All the same things you > describe further, but storing in paged index on-disk with caching > through shared_buffers - why this way it wouldn't work? > >
Feature Recommendations for Logical Subscriptions
Business Scenario: The BI department requires real-time data from the operational database. In our current approach (on platform 14), we create a separate database within our department's real-time backup instance, set up a logical replication account, replicate required tables to this isolated database via logical replication, and then create a dedicated account with column-level permissions on specific tables for the BI department. Recommendations: 1、Column Filtering Functionality: During implementation, some tables may contain sensitive data or long fields (e.g., text columns), while other fields in these tables still need to be replicated to another database or instance. Manually specifying individual columns becomes cumbersome, especially for tables with many fields, and complicates future field additions. We recommend adding a column filtering feature to logical replication to streamline this process. 2、Logical Replication Account Permissions: Logical replication permissions should be decoupled from general database access permissions. Proposed workflow: Create a dedicated account with logical replication privileges only. Create a logical replication slot and grant this account access only to the authorized replication slot. This account would have no direct access to the database itself but could subscribe to and consume data from the permitted replication slot. This approach allows securely providing the account to the BI department. They can subscribe to the replication slot and perform downstream processing independently, without risking exposure of unauthorized data. YeXiu 1518981...@qq.com
Re: Extend ALTER DEFAULT PRIVILEGES for large objects
On Fri, 4 Apr 2025 19:18:11 +0900 Fujii Masao wrote: > > > On 2025/04/04 0:21, Fujii Masao wrote: > > Thanks for updating the patch! > > > > If there are no objections, I'll proceed with committing it using the > > following commit log. > > I've pushed the patch. Thanks! Thank you! > While testing the feature, I noticed that psql doesn't complete > "ALTER DEFAULT PRIVILEGES GRANT/REVOKE ... ON LARGE OBJECTS" or > "GRANT/REVOKE ... ON LARGE OBJECT ..." with TO/FROM. The attached > patch adds tab-completion support for both cases. This patch looks good to me. This works as expected. While looking into this patch, I found that the tab completion suggests TO/FROM even after "LARGE OBJECT", but it is not correct because there should be largeobject id at that place. This is same for the "FOREIGN SERVER", server names should be suggested ratar than TO/FROM in this case. The additional patch 0002 fixed to prevents to suggest TO or FROM right after LARGE OBJECT or FOREIGN SERVER. Also, it allows to suggest list of foreign server names after FOREIGN SERVER. The 0001 patch is the same you proposed. Regards, Yugo Nagata -- Yugo NAGATA >From 64d5aead5ab080d40fa85f9bd51cb0a09490266f Mon Sep 17 00:00:00 2001 From: Yugo Nagata Date: Tue, 8 Apr 2025 12:15:45 +0900 Subject: [PATCH 2/2] psql: Some improvement of tab completion for GRANT/REVOKE This prevents to suggest TO or FROM just after LARGE OBJECT or FOREIGN SERVER because there should be largeobject id or server name at that place. Also, it allows to suggest list of foreign server names after FOREIGN SERVER. --- src/bin/psql/tab-complete.in.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c index bdeb95fb3c8..c58ed27e872 100644 --- a/src/bin/psql/tab-complete.in.c +++ b/src/bin/psql/tab-complete.in.c @@ -4567,10 +4567,18 @@ match_previous_words(int pattern_id, else if (Matches("ALTER", "DEFAULT", "PRIVILEGES", MatchAnyN, "TO", MatchAny)) COMPLETE_WITH("WITH GRANT OPTION"); /* Complete "GRANT/REVOKE ... ON * *" with TO/FROM */ - else if (Matches("GRANT", MatchAnyN, "ON", MatchAny, MatchAny)) - COMPLETE_WITH("TO"); - else if (Matches("REVOKE", MatchAnyN, "ON", MatchAny, MatchAny)) - COMPLETE_WITH("FROM"); + else if (Matches("GRANT/REVOKE", MatchAnyN, "ON", MatchAny, MatchAny)) + { + if (TailMatches("FOREIGN", "SERVER")) + COMPLETE_WITH_QUERY(Query_for_list_of_servers); + else if (!TailMatches("LARGE", "OBJECT")) + { + if (Matches("GRANT", MatchAnyN, "ON", MatchAny, MatchAny)) +COMPLETE_WITH("TO"); + else +COMPLETE_WITH("FROM"); + } + } /* Complete "GRANT/REVOKE * ON ALL * IN SCHEMA *" with TO/FROM */ else if (TailMatches("GRANT|REVOKE", MatchAny, "ON", "ALL", MatchAny, "IN", "SCHEMA", MatchAny) || -- 2.34.1 >From 8402036dcc5e32a249a7af0fb9e27c31ba8b72a6 Mon Sep 17 00:00:00 2001 From: Fujii Masao Date: Fri, 4 Apr 2025 10:51:20 +0900 Subject: [PATCH 1/2] psql: Improve psql tab completion for GRANT/REVOKE on large objects. This commit enhances psql's tab completion to support TO/FROM after "GRANT/REVOKE ... ON LARGE OBJECT ...". Additionally, since "ALTER DEFAULT PRIVILEGES" now supports large objects, tab completion is also updated for "GRANT/REVOKE ... ON LARGE OBJECTS" with TO/FROM. --- src/bin/psql/tab-complete.in.c | 20 1 file changed, 20 insertions(+) diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c index c916b9299a8..bdeb95fb3c8 100644 --- a/src/bin/psql/tab-complete.in.c +++ b/src/bin/psql/tab-complete.in.c @@ -4602,6 +4602,26 @@ match_previous_words(int pattern_id, COMPLETE_WITH("FROM"); } + /* Complete "GRANT/REVOKE * ON LARGE OBJECT *" with TO/FROM */ + else if (TailMatches("GRANT|REVOKE", MatchAny, "ON", "LARGE", "OBJECT", MatchAny) || + TailMatches("REVOKE", "GRANT", "OPTION", "FOR", MatchAny, "ON", "LARGE", "OBJECT", MatchAny)) + { + if (TailMatches("GRANT", MatchAny, MatchAny, MatchAny, MatchAny, MatchAny)) + COMPLETE_WITH("TO"); + else + COMPLETE_WITH("FROM"); + } + + /* Complete "GRANT/REVOKE * ON LARGE OBJECTS" with TO/FROM */ + else if (TailMatches("GRANT|REVOKE", MatchAny, "ON", "LARGE", "OBJECTS") || + TailMatches("REVOKE", "GRANT", "OPTION", "FOR", MatchAny, "ON", "LARGE", "OBJECTS")) + { + if (TailMatches("GRANT", MatchAny, MatchAny, MatchAny, MatchAny)) + COMPLETE_WITH("TO"); + else + COMPLETE_WITH("FROM"); + } + /* GROUP BY */ else if (TailMatches("FROM", MatchAny, "GROUP")) COMPLETE_WITH("BY"); -- 2.34.1
Re: [PATCH] clarify palloc comment on quote_literal_cstr
On Tue, Apr 8, 2025 at 10:49 AM Michael Paquier wrote: > > On Tue, Apr 08, 2025 at 08:47:53AM +0530, Ashutosh Bapat wrote: > > Thanks for addressing the comment. > > > > In PG code, we start a multiline comment with just /* on the first > > line and end with just */ on the last line. All the lines in-between > > start with * . Please check other comments in the file. > > > > I would write a., b. c. d. on separate lines. > > Hmm. I don't think that grouping all these details in the same > comment block is an improvement in this case compared to the first > version, where it is clear which part of the calculation applies to > what. In short, I'm OK with how things are on HEAD. Ah! I didn't realize it was committed already. -- Best Wishes, Ashutosh Bapat
Re: BAS_BULKREAD vs read stream
On Sun, Apr 6, 2025 at 10:15 PM Andres Freund wrote: > > Hi, [..] > The obvious solution to that would be to increase BAS_BULKREAD substantially > above 256kB. > > For quite a while was worried about increasing the size, because somewhere (I > couldn't find it while writing this email, will add the reference once I > refound it) we have a comment explaining that a small size was chosen because > it helps with CPU cache efficiency. Hi, FWIW, I was trying to understand the scope of this change and GetAccessStrategy() actually asks to go to src/backend/storage/buffer/README which explains the logic behind the old (pre-commit now) rationale and value. It says ``` For sequential scans, a 256KB ring is used. That's small enough to fit in L2 cache, which makes transferring pages from OS cache to shared buffer cache efficient. ``` -J.
Re: track generic and custom plans in pg_stat_statements
After the introduction of pg_overexplain extension and Robert's comment [0], I'm starting to have doubts about whether it's still appropriate to add this information to EXPLAIN itself. If there are compelling reasons that showing the plan type would be broadly useful to users in EXPLAIN, I’m happy to proceed. Otherwise, perhaps this is something better suited for extension. [0]: https://www.postgresql.org/message-id/CA%2BTgmoZ8qXiZmmn4P9Mk1cf2mjMMLFPOjSasCjuKSiHFcm-ncw%40mail.gmail.com -- Best regards, Ilia Evdokimov, Tantor Labs LLC.
Re: Some problems regarding the self-join elimination code
On 4/2/25 15:26, Richard Guo wrote: While working on another patch, I looked at ChangeVarNodes() and found some issues introduced by the self-join elimination commit. I think it'd better to fix or improve them. * ChangeVarNodes_walker includes special handling for RestrictInfo nodes. And it adjusts RestrictInfo.orclause only if the rt_index of the relation to be removed is contained in RestrictInfo.clause_relids. I don't think this is correct. Even if the to-be-removed relation is not present in clause_relids, it can still be found somewhere within the orclause. As an example, consider: Yeah, discovering how we tolerated such a gaffe I found that the comment on the clause_relids field is quite vague here: "The relids (varnos+varnullingrels) actually referenced in the clause:" and only in the RestrictInfo description you can find some additional information. I think we should modify the check, uniting clause_relids and required_relids before searching for the removing relid. + rinfo->num_base_rels = bms_num_members(rinfo->clause_relids); I don't think this is correct either. num_base_rels is the number of base rels in clause_relids and should not count outer-join relids. And we have no guarantee that clause_relids does not include any outer-join relids. It is a clear mistake, no apologies to me. To fix, I think we should exclude all outer-join relids. Perhaps we can leverage root->outer_join_rels to achieve this. I think we may use a difference in size of rinfo->clause_relids before and after adjustion. If something were removed, just decrease num_base_rels. * Speaking of comments, I’ve noticed that some changes are lacking comments. For example, there are almost no comments regarding the special handling of RestrictInfo nodes in ChangeVarNodes_walker, except for an explanation about adding the NullTest. Ok, it seems that comments were removed too hastily. Not addressed it yet. I also added little code refactoring to make it more readable. See fix.diff in attachment as my proposal for future discussion. -- regards, Andrei Lepikhovdiff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c index 1c61085a0a7..df96b1bba6a 100644 --- a/src/backend/rewrite/rewriteManip.c +++ b/src/backend/rewrite/rewriteManip.c @@ -637,41 +637,58 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context) } if (IsA(node, RestrictInfo)) { - RestrictInfo *rinfo = (RestrictInfo *) node; - int relid = -1; - bool is_req_equal = - (rinfo->required_relids == rinfo->clause_relids); - bool clause_relids_is_multiple = - (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE); - - if (bms_is_member(context->rt_index, rinfo->clause_relids)) + RestrictInfo *rinfo = (RestrictInfo *) node; + intrelid = -1; + bool is_req_equal = + (rinfo->required_relids == rinfo->clause_relids); + bool is_multiple = + (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE); + + if (bms_is_member(context->rt_index, rinfo->clause_relids + /*bms_union(rinfo->clause_relids, rinfo->required_relids)*/)) { - expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context); - expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context); + Relids new_clause_relids; - rinfo->clause_relids = -adjust_relid_set(rinfo->clause_relids, context->rt_index, context->new_index); + expression_tree_walker((Node *) rinfo->clause, + ChangeVarNodes_walker, (void *) context); + expression_tree_walker((Node *) rinfo->orclause, + ChangeVarNodes_walker, (void *) context); + + new_clause_relids = adjust_relid_set(rinfo->clause_relids, + context->rt_index, + context->new_index); + + /* This operation is legal until we remove only baserels */ + rinfo->num_base_rels -= bms_num_members(rinfo->clause_relids) - + bms_num_members(new_clause_relids); + + rinfo->clause_relids = new_clause_relids; rinfo->num_base_rels = bms_num_members(rinfo->clause_relids); - rinfo->left_relids = -adjust_relid_set(rinfo->left_relids, context->rt_index, context->new_index); - rinfo->right_relids = -adjust_relid_set(rinfo->right_relids, context->rt_index, context->new_index); + rinfo->left_relids = adjust_relid_set(rinfo->left_relids, + context->rt_index, + context->new_index); + rinfo->right_relids = adjust_relid_set(rinfo->right_relids, + context->rt_index, + context->new_index); } if (is_req_equal) rinfo->required_relids = rinfo->clause_relids; else - rinfo->required_relids = -adjust_relid_set(rinfo->required_relids, context->rt_index, context->new_index); + rinfo->required_relids = adjust_relid_set(rinfo->required_relids, + context->rt_index, + context->new_index); - rinfo->outer_relids = - adjust_relid_set(rinfo->outer_relids, context
Re: CREATE DATABASE with filesystem cloning
On Wed, Mar 26, 2025 at 12:47 AM Nazir Bilal Yavuz wrote: > Rebased version of the patch is attached. I noticed that it wasn't working on a Mac I tested because guc_tables.c needed to include and fixed that. I also did some minor editig of the docs, mostly removing the mention of specific file systems because that's bound to be incomplete (it already was to my knowledge) so I think it's better to simply say what system calls it uses, since the effects will vary. And pushed. Thanks for working on this! I think this might be a popular feature for some users.
Re: Move tests of contrib/spi/ out of the core regression tests
On 08/04/2025 04:59, Tom Lane wrote: The attached patch removes test cases concerned with contrib/spi/ from the core regression tests and instead puts them into new test files in contrib/spi/ itself. +1 My original motivation for looking at this was the discussion in [1] about whether to remove contrib/spi/refint.c entirely, since it's rather buggy and not a great example of our modern coding practices. So I wondered whether the core test cases that use it were contributing any significant amount of code coverage on the core code. (Spoiler: nope.) But I think this is generally good cleanup anyway, because it locates the test code for contrib/spi where a person would expect to find that, and removes some rather grotty coding in src/test/regress's Makefile and meson.build. As a side benefit, it removes some small number of cycles from the core tests, which seems like a good thing. The tests for the refint module are just moved over verbatim, except for using CREATE EXTENSION instead of manual declaration of the C functions. Also, I kept the tests for COMMENT ON TRIGGER in the core tests, by applying them to a different trigger. The tests for autoinc were kind of messy, because the behavior of autoinc itself was impossibly intertwined with the behavior of "ttdummy", which is an undocumented C function in regress.c. After some thought I decided we could just nuke ttdummy altogether, so the new autoinc.sql test is much simpler and more straightforward. My only worry would if 'ttdummy' was a good showcase for how to write a trigger function in C. But it's not a particularly good example. There is a better example in the docs, and there's the 'autoinc' trigger too. This is too late for v18 of course, so I'll park it in the next CF. In my opinion this would still be totally OK for v18. It's just test changes. I would rather commit cleanups like this sooner than later. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Horribly slow pg_upgrade performance with many Large Objects
Looked like a bit illogical order on re-reading it so I want to make clear that the pg_upgrade-like test showing 100min for 100 million LOs is at the end of last message and the proposed solution is at the beginning On Tue, Apr 8, 2025 at 9:15 AM Hannu Krosing wrote: > > I was testing on version 17 > > > On Tue, Apr 8, 2025 at 6:52 AM Michael Paquier wrote: > > > > On Mon, Apr 07, 2025 at 05:25:32PM -0400, Tom Lane wrote: > > > What version are you testing? We did some work in that area in the > > > v17 cycle (a45c78e32). > > > > I am puzzled by the target version used here, as well. > > I was testing on version 17 > > Here is how you can easily test too (as --binary-upgrade does not dump > the actual data it is ok for the test to not put anything there) > > hannuk@db01-c1a:~/work/lo-testing$ createdb -p 5433 lodb > hannuk@db01-c1a:~/work/lo-testing$ psql -p 5433 lodb > psql (17.4 (Ubuntu 17.4-1.pgdg22.04+2)) > Type "help" for help. > > lodb=# insert into pg_largeobject_metadata(oid, lomowner) SELECT i, > 16384 FROM generate_series(1, 100_000_000) g(i); > INSERT 0 1 > Time: 162414.216 ms (02:42.414) > lodb=# > \q > > hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --data-only -t > pg_largeobject_metadata -p 5433 lodb | gzip > > pg_largeobject_metadata.data.gz > real 0m22.094s > user 0m20.741s > sys 0m2.085s > > hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --data-only -t > pg_largeobject_metadata --format=custom -p 5433 lodb -f > pg_largeobject_metadata.dump > real 0m20.226s > user 0m18.068s > sys 0m0.824s > > > If there is > > more that can be improved, v19 would be the version to consider for > > future improvements at this stage. > > If the internal format has changed in 16 the correct way would be to > go through the data-only dump of pg_largeobject_metadata in all cases. > Even for the 100M case where you get the restore in 2 minutes instead > of 100 minutes > > hannuk@db01-c1a:~/work/lo-testing$ createdb -p 5434 lodb > hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434 > --exit-on-error --transaction-size=1000 --dbname lodb > pg_largeobject_metadata.dump > > real 2m2.277s > user 0m2.594s > sys 0m0.549s > > And even in case of the user-visible format change in acl format it is > most likely that changing the visible format using some regexp magic, > or even a dedicated function, would still me much faster than creating > all the LOs though creation commands. > > -- > The commands I used to do the pg_upgrade-like test were > > hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --schema-only > --quote-all-identifiers --binary-upgrade --format=custom > --file=lodb100m.dump -p 5433 lodb > real 1m58.241s > user 0m35.229s > sys 0m17.854s > > hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434 > --exit-on-error --transaction-size=1000 --dbname lodb lodb100m.dump > real100m54.878s > user3m23.885s > sys 20m33.761s > > (I left out the --verbose part that pg_upgrade also sets as I did not > want to get 100M lines of "large object created " messages ) > > also the postgres server at -p 5434 needs to be started with -b flag > to accept the loading a dump from --binary-upgrade. In Debian/Ubuntu > this can be directly passed to pg_ctlcluster as follows > > sudo pg_ctlcluster 17 target -o -b > > > Hannu > > > > > > -- > > Michael
RE: [WIP]Vertical Clustered Index (columnar store extension) - take2
Hi Alvaro san, I am sorry for my late reply. I continue to work on proposing VCI feature to the community. > I think this is definitely an important and welcome development. > I'm looking forward to patches in this area. Thank you! I am currently preparing to share VCI designs with PGConf.dev. I look forward to sharing more about VCI with you. Best regards, Aya Iwata FUJITSU LIMITED
Re: Horribly slow pg_upgrade performance with many Large Objects
On Tue, Apr 8, 2025 at 12:17 AM Nathan Bossart wrote: > > On Mon, Apr 07, 2025 at 10:33:47PM +0200, Hannu Krosing wrote: > > The obvious solution would be to handle the table > > `pg_largeobject_metadata` the same way as we currently handle > > `pg_largeobject `by not doing anything with it in `pg_dump > > --binary-upgrade` and just handle the contents it like we do for user > > tables in pg_upgrade itself. > > > > This should work fine for all source database versions starting from PgSQL > > v12. > > Unfortunately, the storage format for aclitem changed in v16, so this would > need to be restricted to upgrades from v16 and newer. Have we also changed the external format of aclitem any time since v 9.2 or are the changes just to storage ? If external formats have been stable we can still get reasonable performance with dumping the data (2 min for 100M rows) Plus dumping data would work for all the supported source versions. The worst case would still be quite bad with 80+ min for the full set of 4 billion LOs but even that would be much better than the 3 days with current wayd. > That being said, I > regularly hear about slow upgrades with many LOs, so I think it'd be > worthwhile to try to improve matters in v19. Changing the LO export to dumping pg_largeobject_metadata content instead of creating the LOs should be a nice small change confined to pg_dump --binary-upgrade only so perhaps we could squeeze it in v18 still. -- Hannu
Re: Fix 035_standby_logical_decoding.pl race conditions
Hi, On 2025-04-08 02:00:35 +, Hayato Kuroda (Fujitsu) wrote: > > I have changed quite a few comments and commit message for the PG17 > > patch in the attached. Can you update PG16 patch based on this and > > also use the same commit message as used in attached for all the three > > patches? > > Your patch looks good to me and it could pass on my env. PSA patches for PG16. > Patch for PG17 is not changed, just renamed. Thanks all for working together to for fix this. These test failures were really rather painful! Now we just need to fix the issue that causes random CI failures on windows and the one that causes similar, but different, random failures on macos... Greetings, Andres Freund
Re: [PATCH] clarify palloc comment on quote_literal_cstr
On Tue, Apr 8, 2025 at 12:08 AM Steve Chavez wrote: > > I haven't found a similar style of comment on any other function call. > > I've attached a new patch using the style you suggest. > > That being said, I do find the first form much more readable, but I > understand this is a debatable subject. Thanks for addressing the comment. In PG code, we start a multiline comment with just /* on the first line and end with just */ on the last line. All the lines in-between start with * . Please check other comments in the file. I would write a., b. c. d. on separate lines. -- Best Wishes, Ashutosh Bapat
Re: BUG #18815: Logical replication worker Segmentation fault
On Mon, Apr 7, 2025 at 3:28 PM Zhijie Hou (Fujitsu) wrote: > > > What is the > > behavior of conflict reporting code in case of exclusion constraints? > > Under logical replication context, since we do not detect conflicts for > exclusion > constraints, it would simply report the original constraint violation ERROR. > Fair enough. On considering it again, I find your idea of building conflict-related information when it is actually required sounds better, as it may also save us performance in some corner cases. -- With Regards, Amit Kapila.
Remove unnecessary static type qualifiers
Hi, all When I read the libpq source code, I found unnecessary static type qualifiers in PQsetClientEncoding(). diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 0258d9ace3c..300ddfffd55 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -7717,7 +7717,7 @@ int PQsetClientEncoding(PGconn *conn, const char *encoding) { charqbuf[128]; - static const char query[] = "set client_encoding to '%s'"; + const char query[] = "set client_encoding to '%s'"; PGresult *res; int status; -- Regrads, Japin Li
Re: Can we use Statistics Import and Export feature to perforamance testing?
On Tue, 8 Apr 2025 at 12:21, Ryohei Takahashi (Fujitsu) wrote: > By using Statistics Import and Export feature, is it possible to achieve the > above request by following procedure? > (1) Export the statistics from production environment by using pg_dump > --statistics-only. > (2) On the staging environment, set the autovacuum related parameters to > prevent autovacuum from running. > (3) Import the statistics to staging environment by using the result of (1). You could certainly test the performance, but this method isn't guaranteed to give meaningful results just because the table stats match. One important thing to remember is that the planner also looks at the *actual size* of the relation and takes that into account when scaling the statistics (see table_block_relation_estimate_size() in tableam.c). If the table sizes don't match between the two servers then there's no guarantees the planner will produce the same plan. Also, there might be other subtleties regarding OIDs of indexes which are not guaranteed to be the same after dump/restore. Given some fuzzily close enough cost estimates (See add_path() and compare_path_costs_fuzzily()), it is possible a plan would switch to using another index if sorting the indexes by their OIDs didn't match on each server. The chances of that might be fairly small, but not zero. You'd also need to ensure the configs are the same in terms of GUCs that are used for costs. You could probably use get_relation_info_hook to overwrite the sizes and make sure the indexes are in the same order, etc. David
Re: Horribly slow pg_upgrade performance with many Large Objects
I was testing on version 17 On Tue, Apr 8, 2025 at 6:52 AM Michael Paquier wrote: > > On Mon, Apr 07, 2025 at 05:25:32PM -0400, Tom Lane wrote: > > What version are you testing? We did some work in that area in the > > v17 cycle (a45c78e32). > > I am puzzled by the target version used here, as well. I was testing on version 17 Here is how you can easily test too (as --binary-upgrade does not dump the actual data it is ok for the test to not put anything there) hannuk@db01-c1a:~/work/lo-testing$ createdb -p 5433 lodb hannuk@db01-c1a:~/work/lo-testing$ psql -p 5433 lodb psql (17.4 (Ubuntu 17.4-1.pgdg22.04+2)) Type "help" for help. lodb=# insert into pg_largeobject_metadata(oid, lomowner) SELECT i, 16384 FROM generate_series(1, 100_000_000) g(i); INSERT 0 1 Time: 162414.216 ms (02:42.414) lodb=# \q hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --data-only -t pg_largeobject_metadata -p 5433 lodb | gzip > pg_largeobject_metadata.data.gz real 0m22.094s user 0m20.741s sys 0m2.085s hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --data-only -t pg_largeobject_metadata --format=custom -p 5433 lodb -f pg_largeobject_metadata.dump real 0m20.226s user 0m18.068s sys 0m0.824s > If there is > more that can be improved, v19 would be the version to consider for > future improvements at this stage. If the internal format has changed in 16 the correct way would be to go through the data-only dump of pg_largeobject_metadata in all cases. Even for the 100M case where you get the restore in 2 minutes instead of 100 minutes hannuk@db01-c1a:~/work/lo-testing$ createdb -p 5434 lodb hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434 --exit-on-error --transaction-size=1000 --dbname lodb pg_largeobject_metadata.dump real 2m2.277s user 0m2.594s sys 0m0.549s And even in case of the user-visible format change in acl format it is most likely that changing the visible format using some regexp magic, or even a dedicated function, would still me much faster than creating all the LOs though creation commands. -- The commands I used to do the pg_upgrade-like test were hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --schema-only --quote-all-identifiers --binary-upgrade --format=custom --file=lodb100m.dump -p 5433 lodb real 1m58.241s user 0m35.229s sys 0m17.854s hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434 --exit-on-error --transaction-size=1000 --dbname lodb lodb100m.dump real100m54.878s user3m23.885s sys 20m33.761s (I left out the --verbose part that pg_upgrade also sets as I did not want to get 100M lines of "large object created " messages ) also the postgres server at -p 5434 needs to be started with -b flag to accept the loading a dump from --binary-upgrade. In Debian/Ubuntu this can be directly passed to pg_ctlcluster as follows sudo pg_ctlcluster 17 target -o -b Hannu > -- > Michael
Re: Draft for basic NUMA observability
On 4/8/25 01:26, Shinoda, Noriyoshi (SXD Japan FSI) wrote: > Hi, > > Thanks for developing this great feature. > The manual says that the 'size' column of the pg_shmem_allocations_numa view > is 'int4', but the implementation is 'int8'. > The attached small patch fixes the manual. > Thank you for noticing this and for the fix! Pushed. This also reminded me we agreed to change page_num to bigint, which I forgot to change before commit. So I adjusted that too, separately. regards -- Tomas Vondra
Re: BAS_BULKREAD vs read stream
Hi, On 2025-04-08 18:11:04 +1200, Thomas Munro wrote: > On Tue, Apr 8, 2025 at 2:20 PM Andres Freund wrote: > > In the attached I implemented the above idea, with some small additional > > refinements: > > LGTM. Thanks for checking. > How I wish EXPLAIN would show some clues about strategies... Indeed. There will be some interesting piercing of layers to make that work... Greetings, Andres Freund
Re: Enhancing Memory Context Statistics Reporting
> On 8 Apr 2025, at 10:03, Daniel Gustafsson wrote: > There was a bug in the shmem init function which caused it to fail on Windows, > the attached fixes that. With this building green in CI over several re-builds, and another pass over the docs and code with pgindent etc done, I pushed this earlier today. A few BF animals have built green so far but I will continue to monitor it. -- Daniel Gustafsson
RE: Can we use Statistics Import and Export feature to perforamance testing?
Hi, Thank you for your reply. I understand that the access plans are not guaranteed to be the same. Can we add these notes to the pg_dump page in the PostgreSQL Documentation in order to prevent users from asking the same question? Regards, Ryohei Takahashi
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Tue, Apr 8, 2025 at 08:04:22AM -0700, Jacob Champion wrote: > On Tue, Apr 8, 2025 at 7:32 AM Bruce Momjian wrote: > > Uh, where are we on the inclusion of curl in our build? Maybe it was > > explained but I have not seen it. > > The above is discussing a patch to split this into its own loadable > module. Andres and Christoph's feedback has been shaping where we put > that module, exactly. Uh, I was afraid that was the case, which is why I asked. We have just hit feature freeze, so it is not good that we are still "shaping" the patch. Should we consider reverting it? It is true we still "adjust" patches after feature freeze. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Feature freeze
Daniel Gustafsson writes: > On 8 Apr 2025, at 16:59, Bruce Momjian wrote: >> Frankly, I think the name "anywhere on Earth" is confusing, since it >> really is "everywhere on Earth": > I find both of the above needlessly confusing when we instead could use UTC > which is a more universally understood concept. Yeah, I always have to mentally translate "0:00 AoE" to "noon UTC", and then I can figure out when it is. I'd prefer we used the UTC formulation. regards, tom lane
Re: Draft for basic NUMA observability
Hi, On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote: > On Mon, 7 Apr 2025 at 23:00, Tomas Vondra wrote: > > I'll let the CI run the tests on it, and > > then will push, unless someone has more comments. > > > > > Hi! I noticed strange failure after this commit[0] > > Looks like it is related to 65c298f61fc70f2f960437c05649f71b862e2c48 > > In file included from [01m [K../pgsql/src/include/postgres.h:49 [m [K, > from [01m [K../pgsql/src/port/pg_numa.c:16 [m [K: > [01m [K../pgsql/src/include/utils/elog.h:79:10: [m [K > [01;31m [Kfatal error: [m [Kutils/errcodes.h: No such file or > directory >79 | #include [01;31m [K"utils/errcodes.h" [m [K > | [01;31m [K^~ [m [K > compilation terminated. $ ninja -t missingdeps Missing dep: src/port/libpgport.a.p/pg_numa.c.o uses src/include/utils/errcodes.h (generated by CUSTOM_COMMAND) Missing dep: src/port/libpgport_shlib.a.p/pg_numa.c.o uses src/include/utils/errcodes.h (generated by CUSTOM_COMMAND) Processed 2384 nodes. Error: There are 2 missing dependency paths. 2 targets had depfile dependencies on 1 distinct generated inputs (from 1 rules) without a non-depfile dep path to the generator. There might be build flakiness if any of the targets listed above are built alone, or not late enough, in a clean output directory. I think it's not right that something in src/port defines an SQL callable function. The set of headers for that pull in a lot of things. Since the file relies on things like GUCs, I suspect this should be in src/backend/port or such instead. Greetings, Andres Freund
Re: Move tests of contrib/spi/ out of the core regression tests
On 2025-04-08 Tu 5:21 AM, Heikki Linnakangas wrote: This is too late for v18 of course, so I'll park it in the next CF. In my opinion this would still be totally OK for v18. It's just test changes. I would rather commit cleanups like this sooner than later. Yeah, that was my reaction too. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Some problems regarding the self-join elimination code
On Tue, 8 Apr 2025 at 12:31, Andrei Lepikhov wrote: > > On 4/4/25 09:37, Richard Guo wrote: > > On Wed, Apr 2, 2025 at 10:26 PM Richard Guo wrote: > > With more exposure to the changes in ChangeVarNodes(), I have some > > more concerns. As I understand it, ChangeVarNodes() is designed to > > update the varno fields of Var nodes in the given tree that belong to > > the specific relation; neat and clear. However, now, within this > > function, we also create new NullTest quals for SJE, which doesn't > > seem like something this function should be handling. It makes this > > function look a bit like a kludge. > To be precise, inside the function we replace old clause with the > NullTest, because relid replacement action made it degenerate. It seems > essential to me and I think it may be ok to add a comment at the top of > ChangeVarNodes, describing this minor optimisation. > I recall raising the same concerns when I originally reviewed the patch. I don't think this code belongs in the rewriter, because it's very specific to how SJE wants to handle these kinds of nodes. Note also that RestrictInfo is defined in nodes/pathnodes.h, which describes its contents as internal data structures for the planner, suggesting that the rewriter has no business processing those kinds of nodes. I don't think simply adding a comment addresses those concerns. > > Additionally, I have a minor suggestion regarding the new boolean > > parameter, "change_RangeTblRef". AFAIU, by convention, we typically > > use flags to control the behavior of walker functions, like in > > pull_var_clause. While it's fine to use a boolean parameter here > > given that we currently only have one flag, for the sake of future > > extensibility, I think using flags might be a better choice. > That was exactly why I wasn't happy with replacing the change_relid > routine with ChangeVarNodes. > But variant with a flag seems non-trivial to implement. Do you have any > proposal about the code? > Perhaps the way to do it is to make ChangeVarNodesExtended() take a callback function to be invoked for each node visited. The callback (which would then be in the planner, with the other SJE code) would do any special handling it needed (for RangeTblRef and RestrictInfo nodes), and call ChangeVarNodes_walker() for any other types of node, to get the default behaviour. Regards, Dean
Re: prevent 006_transfer_modes.pl from leaving files behind
Andrew Dunstan writes: > On 2025-04-07 Mo 7:41 PM, Michael Paquier wrote: >> delete_old_cluster.sh would be left around even if not using a VPATH >> build with ./configure (your commit message does not mention that). >> Even if .gitignore discards it, the file is here. > I don't think that matters. In non-vpath builds we expect the source > directory to be scribbled on. All sorts of things might be left around. It's okay as long as .gitignore ignores it and "make clean" removes it. When/if we give up makefile builds, a lot of these maintenance issues will go away... regards, tom lane
Re: Feature freeze
On Tue, Apr 8, 2025 at 9:30 PM Peter Eisentraut wrote: > > On 08.04.25 16:59, Bruce Momjian wrote: > > On Tue, Apr 8, 2025 at 10:36:45AM -0400, Bruce Momjian wrote: > >> Since we recorded feature freeze as April 8, 2025 0:00 AoE (anywhere on > >> Earth): > >> > >> > >> https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items#Important_Dates > >> https://www.timeanddate.com/time/zones/aoe > >> > >> and it is now 2:34 AM AoE, I guess we are now in feature freeze. > > > > Frankly, I think the name "anywhere on Earth" is confusing, since it > > really is "everywhere on Earth": > > > > https://en.wikipedia.org/wiki/Anywhere_on_Earth > > > > Anywhere on Earth (AoE) is a calendar designation that indicates > > that a period expires when the date passes everywhere on Earth. > > Yes, that works intuitively when you specify that sometimes ends when a > certain day ends, for example: > > "The feature development phase ends at the end of day of April 7, AoE." > > That means, everyone everywhere can just look up at their clock and see, > it's still April 7, it's still going. (Of course, others can then do > the analysis and keep going until some time on April 8, but that would > be sort of against the spirit.) > > If you use it as a time zone with a time of day, it doesn't make > intuitive sense. +1. I think the idea is too simple to be acceptable. Any timezone based deadline might be seen as unfair to those for whom that time falls in their respective nights. AoE removes that unfairness. -- Best Wishes, Ashutosh Bapat
Re: Enhancing Memory Context Statistics Reporting
On 2025/04/08 18:46, Daniel Gustafsson wrote: On 8 Apr 2025, at 10:03, Daniel Gustafsson wrote: There was a bug in the shmem init function which caused it to fail on Windows, the attached fixes that. With this building green in CI over several re-builds, and another pass over the docs and code with pgindent etc done, I pushed this earlier today. A few BF animals have built green so far but I will continue to monitor it. Thanks for committing this feature! I noticed that the third argument of pg_get_process_memory_contexts() is named "retries" in pg_proc.dat, while the documentation refers to it as "timeout". Since "retries" is misleading, how about renaming it to "timeout" in pg_proc.dat? Patch attached. Also, as I mentioned earlier, I encountered an issue when calling pg_get_process_memory_contexts() on the PID of a backend that had just encountered an error but hadn't finished rolling back. It led to the following situation: Session 1 (PID=70011): =# begin; =# select 1/0; ERROR: division by zero Session 2: =# select * from pg_get_process_memory_contexts(70011, false, 10); Session 1 terminated with: ERROR: ResourceOwnerEnlarge called after release started FATAL: terminating connection because protocol synchronization was lost Shouldn't this be addressed? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION From a79084fe3caaf791f2aa8d466603c085ccb8c5af Mon Sep 17 00:00:00 2001 From: Fujii Masao Date: Wed, 9 Apr 2025 01:27:48 +0900 Subject: [PATCH v1] Rename misleading argument in pg_get_process_memory_contexts(). Previously, the third argument of pg_get_process_memory_contexts() was named retries in pg_proc.dat, even though it actually specifies a timeout value in seconds. This name was misleading to users and inconsistent with the documentation, which correctly referred to it as timeout. This commit renames the argument to timeout in pg_proc.dat to improve clarity and maintain consistency with the documentation. --- src/include/catalog/pg_proc.dat | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 4708f55be18..62beb71da28 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -8578,7 +8578,7 @@ prorettype => 'record', proargtypes => 'int4 bool float8', proallargtypes => '{int4,bool,float8,text,text,text,_int4,int4,int8,int8,int8,int8,int8,int4,timestamptz}', proargmodes => '{i,i,i,o,o,o,o,o,o,o,o,o,o,o,o}', - proargnames => '{pid, summary, retries, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}', + proargnames => '{pid, summary, timeout, name, ident, type, path, level, total_bytes, total_nblocks, free_bytes, free_chunks, used_bytes, num_agg_contexts, stats_timestamp}', prosrc => 'pg_get_process_memory_contexts' }, # non-persistent series generator -- 2.49.0
Re: [PoC] Federated Authn/z with OAUTHBEARER
> On 8 Apr 2025, at 18:33, Bruce Momjian wrote: > Would we have to put out minor releases for curl CVEs? I don't think we > have to for OpenSSL so would curl be the same? Why do you envision this being different from all other dependencies we have? For OpenSSL we also happily build against a version (and until recently, several versions) which is EOL and don't even recieve security fixes. -- Daniel Gustafsson
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Tue, Apr 8, 2025 at 9:33 AM Bruce Momjian wrote: > On Tue, Apr 8, 2025 at 09:17:03AM -0700, Jacob Champion wrote: > > It allows packagers to ship the OAuth library separately, so end users > > that don't want the additional exposure don't have to install it at > > all. > > Okay, so how would they do that? I understand how that would happen if > it was an external extension, but how if it is under /src or /contrib. By adding the new .so to a different package. For example, RPM specs would just let you say "hey, this .so I just built doesn't go into the main client package, it goes into an add-on that depends on the client package." It's the same way separate client and server packages get generated from the same single build of Postgres. > Would we have to put out minor releases for curl CVEs? In general, no. Thanks, --Jacob
Re: Feature freeze
On Tue, Apr 8, 2025 at 12:29 PM Andrew Dunstan wrote: > The fact that there is this confusion is an indication that the AoE > experiment is a failure. If it's not obvious, and people have to think > about it, then it's not working. And I bet there is a huge number of > people who have never even heard of it. Specify some time and data at > UTC and everyone will understand. +1. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Enhancing Memory Context Statistics Reporting
> On 8 Apr 2025, at 18:41, Fujii Masao wrote: > On 2025/04/08 18:46, Daniel Gustafsson wrote: >>> On 8 Apr 2025, at 10:03, Daniel Gustafsson wrote: >>> There was a bug in the shmem init function which caused it to fail on >>> Windows, >>> the attached fixes that. >> With this building green in CI over several re-builds, and another pass over >> the docs and code with pgindent etc done, I pushed this earlier today. A few >> BF animals have built green so far but I will continue to monitor it. > > Thanks for committing this feature! > > I noticed that the third argument of pg_get_process_memory_contexts() is named > "retries" in pg_proc.dat, while the documentation refers to it as "timeout". > Since "retries" is misleading, how about renaming it to "timeout" in > pg_proc.dat? > Patch attached. Ugh, that's my bad. It was changed from using retries to a timeout and I missed that. > Also, as I mentioned earlier, I encountered an issue when calling > pg_get_process_memory_contexts() on the PID of a backend that had just > encountered an error but hadn't finished rolling back. It led to > the following situation: > > Session 1 (PID=70011): > =# begin; > =# select 1/0; > ERROR: division by zero > > Session 2: > =# select * from pg_get_process_memory_contexts(70011, false, 10); > > Session 1 terminated with: > ERROR: ResourceOwnerEnlarge called after release started > FATAL: terminating connection because protocol synchronization was lost > > Shouldn't this be addressed? Sorry, this must've been missed in this fairly lon thread, will have a look at it tonight. -- Daniel Gustafsson
Finalizing read stream users' flag choices
Hi, Over the course of the last two releases, we have added many read stream users. Each user specifies any number of flags (defined at the top of read_stream.h) which govern different aspects of the read stream behavior. There are a few inconsistencies (many caused by me) that I want to iron out and gain consensus on. The first is whether maintenance_io_concurerency or effective_io_concurrency affects the readahead distance. We've said before that maintenance_io_concurrency should govern work done on behalf of many different sessions. That was said to include at least vacuum and recovery. I need to change the index vacuum users to use READ_STREAM_MAINTENANCE. But I wonder about the other users like amcheck and autoprewarm. Another related question is if/how we should document which of these are controlled by effective_io_concurrency or maintenance_io_concurrency. The second is related to how they ramp up the size of IOs and the number read ahead: READ_STREAM_DEFAULT ramps up the prefetch distance gradually. READ_STREAM_FULL starts at full distance immediately Some of the users specify DEFAULT and others don't (it is defined as 0 so this is fine technically). Perhaps that should be explicit for all of them? Separately, Thomas Munro has mentioned he thinks we should remove READ_STREAM_FULL. And a somewhat related point, with buffered IO, READ_STREAM_SEQUENTIAL disables prefetching to encourage OS readahead. I don't know if any other users than sequential scan should do this. Other than the obvious issue with index vacuuming read stream users needing to set READ_STREAM_MAINTENANCE, the other questions are subjective. Below are all of the read stream users in master and their current flags. sequential scan: READ_STREAM_SEQUENTIAL | READ_STREAM_USE_BATCHING bitmap heap scan: READ_STREAM_DEFAULT | READ_STREAM_USE_BATCHING phase I heap vacuum: READ_STREAM_MAINTENANCE phase II index vacuuming: btree index vacuuming: READ_STREAM_FULL | READ_STREAM_USE_BATCHING spgist vacuum: READ_STREAM_FULL | READ_STREAM_USE_BATCHING gist vacuum: READ_STREAM_FULL | READ_STREAM_USE_BATCHING phase III heap vacuuming: READ_STREAM_MAINTENANCE | READ_STREAM_USE_BATCHING analyze: READ_STREAM_MAINTENANCE | READ_STREAM_USE_BATCHING amcheck: with skipping: READ_STREAM_DEFAULT without skipping: READ_STREAM_SEQUENTIAL | READ_STREAM_FULL | READ_STREAM_USE_BATCHING; autoprewarm: READ_STREAM_DEFAULT | READ_STREAM_USE_BATCHING pg_prewarm: READ_STREAM_FULL | READ_STREAM_USE_BATCHING pg_visibility: collect visibility data: READ_STREAM_FULL | READ_STREAM_USE_BATCHING collect corrupt items: READ_STREAM_FULL createdb: READ_STREAM_FULL | READ_STREAM_USE_BATCHING - Melanie
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Tue, Apr 8, 2025 at 09:43:01AM -0700, Jacob Champion wrote: > On Tue, Apr 8, 2025 at 9:33 AM Bruce Momjian wrote: > > On Tue, Apr 8, 2025 at 09:17:03AM -0700, Jacob Champion wrote: > > > It allows packagers to ship the OAuth library separately, so end users > > > that don't want the additional exposure don't have to install it at > > > all. > > > > Okay, so how would they do that? I understand how that would happen if > > it was an external extension, but how if it is under /src or /contrib. > > By adding the new .so to a different package. For example, RPM specs > would just let you say "hey, this .so I just built doesn't go into the > main client package, it goes into an add-on that depends on the client > package." It's the same way separate client and server packages get > generated from the same single build of Postgres. Do we have any idea how many packagers are interested in doing this? > > Would we have to put out minor releases for curl CVEs? > > In general, no. Good. FYI, I saw bug bounty dollar amounts next to each curl CVE: https://curl.se/docs/security.html No wonder some people ask for bounties. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Commitfest 2025 March is now closed
Hi, Thanks a lot to all the members who participated in the commitfest. Here are the final numbers at the end of the commitfest: status | End of Commitfest +--- Needs review: |54 Waiting on Author: |47 Ready for Committer:|18 Committed:| 150 Moved to next CF: | 38 Withdrawn:| 22 Rejected: | 6 RWF: | 8 Total: | 343 In comparison to previous March commitfests, we've made remarkable progress this time by committing an impressive 150 entries. 2025: 150 committed 2024: 131 committed 2023: 127 committed 2022: 98 committed 2021: 122 committed 2020: 90 committed A special thanks to the reviewers/committers who spent tireless effort in moving the patches forward. As per the developer meeting at FOSDEM, patches should only be moved forward by someone involved in the patch who knows that the patch is actually being worked on which is mentioned at [1]. So requesting respective members to move their commitfest entries to the next commitfest. [1] - https://www.postgresql.org/message-id/flat/003e3a66-8fcc-4ca0-9e0e-c0afda1c9424%40eisentraut.org Regards, Vignesh
Re: Adding NetBSD and OpenBSD to Postgres CI
Hi, meson version is upgraded 1.7.0 in NetBSD and that causes ecpg tests to fail with [1]: 'Could not open file /home/postgres/postgres/build/src/interfaces/ecpg/test/compat_informix/dec_test.c for reading' This is already fixed on b2bdb972c0, attached patch just adds $MBUILD_TARGET to the NetBSD and OpenBSD CI tasks. [1] https://cirrus-ci.com/task/5126123371102208?logs=test_world#L144 -- Regards, Nazir Bilal Yavuz Microsoft From 0a265615e7bd9d6da2cbf1d7edfbe68a8bdf39f1 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Tue, 8 Apr 2025 14:21:42 +0300 Subject: [PATCH] ci: Add MBUILD_TARGET to OpenBSD/NetBSD meson builds Otherwise, these tasks don't build test dependencies. --- .cirrus.tasks.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml index 6f4f5c674a1..98f3455eb72 100644 --- a/.cirrus.tasks.yml +++ b/.cirrus.tasks.yml @@ -322,7 +322,7 @@ task: build EOF - build_script: su postgres -c 'ninja -C build -j${BUILD_JOBS}' + build_script: su postgres -c 'ninja -C build -j${BUILD_JOBS} ${MBUILD_TARGET}' upload_caches: ccache test_world_script: | -- 2.49.0
Re: prevent 006_transfer_modes.pl from leaving files behind
On 2025-04-07 Mo 7:41 PM, Michael Paquier wrote: On Mon, Apr 07, 2025 at 04:45:52PM -0500, Nathan Bossart wrote: The other pg_upgrade tests chdir to tmp_check prior to running pg_upgrade to avoid leaving behind delete_old_cluster.{sh,bat}. 006_transfer_modes.pl should, too. However, this test is a little different because it loops over all of the available transfer modes and runs pg_upgrade for each one supported by the platform. From my testing and code analysis, it seems sufficient to change the directory once at the beginning of the test, but we could alternatively save the current directory and change back to it in each iteration to be safe. Thoughts? Hmm. Doing one chdir at the beginning of the test should be OK, because we don't do a move to a different directory within the test for each transfer mode tested. So your patch looks like the simplest thing to do to avoid the generation of these files. Note that delete_old_cluster.sh would be left around even if not using a VPATH build with ./configure (your commit message does not mention that). Even if .gitignore discards it, the file is here. I don't think that matters. In non-vpath builds we expect the source directory to be scribbled on. All sorts of things might be left around. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Fix 035_standby_logical_decoding.pl race conditions
On Tue, Apr 08, 2025 at 06:19:02AM +, Bertrand Drouvot wrote: > - A new injection_points_wakeup_detach() function that is holding the spinlock > during the whole duration to ensure that no process can wait in between the > wakeup and the detach. That would not a correct spinlock use. injection_points_detach() and injection_points_wakeup_internal() do much more actions than what we can internally do while holding a spinlock, including both Postgres-specific calls as well as system calls. strcmp() and strlcpy() are still OK-ish, even as system calls, as they work directly on the strings given in input. -- Michael signature.asc Description: PGP signature
Re: Draft for basic NUMA observability
Hi, On April 8, 2025 9:21:57 AM EDT, Tomas Vondra wrote: >On 4/8/25 15:06, Andres Freund wrote: >> Hi, >> >> On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote: >>> On Mon, 7 Apr 2025 at 23:00, Tomas Vondra wrote: I'll let the CI run the tests on it, and then will push, unless someone has more comments. >>> >>> >>> Hi! I noticed strange failure after this commit[0] >>> >>> Looks like it is related to 65c298f61fc70f2f960437c05649f71b862e2c48 >>> >>> In file included from [01m [K../pgsql/src/include/postgres.h:49 [m [K, >>> from [01m [K../pgsql/src/port/pg_numa.c:16 [m [K: >>> [01m [K../pgsql/src/include/utils/elog.h:79:10: [m [K >>> [01;31m [Kfatal error: [m [Kutils/errcodes.h: No such file or >>> directory >>>79 | #include [01;31m [K"utils/errcodes.h" [m [K >>> | [01;31m [K^~ [m [K >>> compilation terminated. >> >> $ ninja -t missingdeps >> Missing dep: src/port/libpgport.a.p/pg_numa.c.o uses >> src/include/utils/errcodes.h (generated by CUSTOM_COMMAND) >> Missing dep: src/port/libpgport_shlib.a.p/pg_numa.c.o uses >> src/include/utils/errcodes.h (generated by CUSTOM_COMMAND) >> Processed 2384 nodes. >> Error: There are 2 missing dependency paths. >> 2 targets had depfile dependencies on 1 distinct generated inputs (from 1 >> rules) without a non-depfile dep path to the generator. >> There might be build flakiness if any of the targets listed above are built >> alone, or not late enough, in a clean output directory. >> >> >> I think it's not right that something in src/port defines an SQL callable >> function. The set of headers for that pull in a lot of things. >> >> Since the file relies on things like GUCs, I suspect this should be in >> src/backend/port or such instead. >> > >Yeah, I think you're right, src/backend/port seems like a better place >for this. I'll look into moving that in the evening. On a second look I wonder if just the SQL function and perhaps the page size function should be moved. There are FE programs that could potentially benefit from num a awareness (e.g. pgbench). Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Horribly slow pg_upgrade performance with many Large Objects
On Tue, Apr 08, 2025 at 01:36:58PM -0400, Tom Lane wrote: > Hmm ... one annoying thing for this project is that AFAICS pg_upgrade > does *not* preserve database OIDs, which is problematic for using > COPY to load pg_shdepend rows. I think it does; see commit aa01051. -- nathan
Re: Horribly slow pg_upgrade performance with many Large Objects
Nathan Bossart writes: > Unless I'm missing something, we don't seem to have had any dependency > handling before commit 12a53c7. Was that broken before we moved to SQL > commands? Sounds like it :-( regards, tom lane
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Tue, Apr 8, 2025 at 9:32 AM Wolfgang Walther wrote: > This means that shipping another .so file will not happen with this approach. > Assuming OAuth will be picked up by some of the bigger providers, that > would... make me feel quite bad about it, actually. It occurs to me that I didn't respond to this point explicitly. I would like to avoid making your life harder. Would anybody following along be opposed to a situation where - dynamiclib builds go through the dlopen() shim - staticlib builds always rely on statically linked symbols Or do we need to be able to mix and match? --Jacob
Re: Feature freeze
On Tue, Apr 08, 2025 at 05:13:15PM +0200, Daniel Gustafsson wrote: >> On 8 Apr 2025, at 16:59, Bruce Momjian wrote: >> Frankly, I think the name "anywhere on Earth" is confusing, since it >> really is "everywhere on Earth": > > I find both of the above needlessly confusing when we instead could use UTC > which is a more universally understood concept. +1 for UTC. -- nathan
Re: Horribly slow pg_upgrade performance with many Large Objects
Nathan Bossart writes: > On Tue, Apr 08, 2025 at 01:36:58PM -0400, Tom Lane wrote: >> Hmm ... one annoying thing for this project is that AFAICS pg_upgrade >> does *not* preserve database OIDs, which is problematic for using >> COPY to load pg_shdepend rows. > I think it does; see commit aa01051. Ah --- I thought I remembered something having been done about that, but I failed to find it because I was looking in pg_upgrade not pg_dump. Too bad aa01051 didn't update the comment at the top of pg_upgrade.c. regards, tom lane
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Tue, Apr 8, 2025 at 10:10 AM Wolfgang Walther wrote: > And if that means making libpq modular at run-time, then this should be > planned and built with all deps, and other use-cases (like static linking) in > mind - and not like it is right now. I think that'd be neat in concept, but specifically this thread is discussing a PG18 open item. For future releases, if we're happy with how Curl gets split out, maybe that would be fuel for other delay-loaded client dependencies. I'm not sure. --Jacob
Re: [PoC] Federated Authn/z with OAUTHBEARER
Hi, On 2025-04-08 13:02:11 -0400, Bruce Momjian wrote: > On Tue, Apr 8, 2025 at 06:57:18PM +0200, Wolfgang Walther wrote: > > Jacob Champion: > > > On Tue, Apr 8, 2025 at 9:32 AM Wolfgang Walther > > > wrote: > > > > And that should also not be a problem for distributions - they could > > > > offer a libpq and a libpq_oauth package, where only one of them can be > > > > installed at the same time, I guess? * > > > My outsider understanding is that maintaining this sort of thing > > > becomes a major headache, because of combinatorics. You don't really > > > want to ship a libpq and libpq-with-gss and libpq-with-oauth and > > > libpq-with-oauth-and-gss and ... > > > > That would only be the case, if you were to consider those other > > dependencies as "dangerous" as cURL. But we already depend on them. So if > > it's really the case that cURL is that much worse, that we consider loading > > it as a module... then the combinatorics should not be a problem either. > > > > However, if the other deps are considered problematic as well, then the ship > > has already sailed, and there is not point for a special case here anymore. > > Yes, I think this is what I am asking too. For me it was curl's > security reputation and whether that would taint the security reputation > of libpq. For Tom, I think it was the dependency additions. I'd say that curl's security reputation is higher than most of our other dependencies. We have dependencies for libraries with regular security issues, with those issues at times not getting addressed for prolonged amounts of time. I do agree that there's an issue increasing libpq's indirect footprint substantially, but I don't think that's due to curl's reputation or anything. It's just needing a significantly higher number of shared libraries, which comes with runtime and distribution overhead. Greetings, Andres Freund
Re: long-standing data loss bug in initial sync of logical replication
On Tue, Mar 18, 2025 at 3:25 PM Amit Kapila wrote: > > On Mon, Mar 17, 2025 at 4:56 PM Hayato Kuroda (Fujitsu) > wrote: > > > > > Regarding the PG13, it cannot be > > > applied > > > as-is thus some adjustments are needed. I will share it in upcoming posts. > > > > Here is a patch set for PG13. Apart from PG14-17, the patch could be > > created as-is, > > because... > > > > 1. WAL record for invalidation messages (XLOG_XACT_INVALIDATIONS) does not > > exist. > > 2. Thus the ReorderBufferChange for the invalidation does not exist. > >Our patch tries to distribute it but cannot be done as-is. > > 3. Codes assumed that invalidation messages can be added only once. > > 4. The timing when invalidation messages are consumed is limited: > > a. COMMAND_ID change is poped, > > b. start of decoding a transaction, or > > c. end of decoding a transaction. > > > > Above means that invalidations cannot be executed while being decoded. > > I created two patch sets to resolve the data loss issue. 0001 has less code > > changes but could resolve a part of issue, 0002 has huge changes but > > provides a > > complete solution. > > > > 0001 - mostly same as patches for other versions. > > ReorderBufferAddInvalidations() > >was adjusted to allow being called several times. As I said above, > >0001 cannot execute inval messages while decoding the transacitons. > > 0002 - introduces new ReorderBufferChange type to indicate inval messages. > >It would be handled like PG14+. > > > > Here is an example. Assuming that the table foo exists on both nodes, a > > publication "pub" which publishes all tables, and a subscription "sub" which > > subscribes "pub". What if the workload is executed? > > > > ``` > > S1 S2 > > BEGIN; > > INSERT INTO foo VALUES (1) > > ALTER PUBLICATION pub RENAME TO pub_renamed; > > INSERT INTO foo VALUES (2) > > COMMIT; > > LR -> ? > > ``` > > > > With 0001, tuples (1) and (2) would be replicated to the subscriber. > > An error "publication "pub" does not exist" would raise when new changes > > are done > > later. > > > > 0001+0002 works more aggressively; the error would raise when S1 > > transaction is decoded. > > The behavior is same as for patched PG14-PG17. > > > > I understand that with 0001 the fix is partial in the sense that > because invalidations are processed at the transaction end, the > changes of concurrent DDL will only be reflected for the next > transaction. Now, on one hand, it is prudent to not add a new type of > ReorderBufferChange in the backbranch (v13) but the change is not that > invasive, so we can go with it as well. My preference would be to go > with just 0001 for v13 to minimize the risk of adding new bugs or > breaking something unintentionally. > > Sawada-San, and others involved here, do you have any suggestions on > this matter? > Seeing no responses for a long time, I am planning to push the fix till 14 tomorrow unless there are some opinions on the fix for 13. We can continue to discuss the scope of the fix for 13. -- With Regards, Amit Kapila.
Re: Large expressions in indexes can't be stored (non-TOASTable)
On Fri, Apr 4, 2025 at 7:58 PM Nathan Bossart wrote: > > On Fri, Apr 04, 2025 at 05:16:43PM +0530, Amit Kapila wrote: > > Can we dodge adding this push call if we restrict the length of the > > origin name to some reasonable limit like 256 or 512 and avoid the > > need of toast altogether? > > We did consider just removing pg_replication_origin's TOAST table earlier > [0], but we decided against it at that time. Maybe it's worth > reconsidering... > I don't see any good reason in that email for not removing the TOAST table for pg_replication_origin. I would prefer to remove it rather than add protection related to its access. -- With Regards, Amit Kapila.
Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
On Tue, Apr 8, 2025 at 6:42 AM Masahiko Sawada wrote: > > > BTW have you measured the overheads of calling InputFunctionCallSafe > twice? If it's significant, we might want to find other ways to > achieve it as it would not be good to incur overhead just for > relatively rare cases. > Please check the attached two patches v17-0001-COPY-on_error-set_null.original, v17-0001-COPY-on_error-set_null.patch for non-domain types, (on_error set_null), the performance of these two are the same. for domain type with or without constraint, (on_error set_null): v17.original is slower than v17.patch. test script: create unlogged table t2(a text); insert into t2 select 'a' from generate_Series(1, 10_000_000) g; copy t2 to '/tmp/2.txt'; CREATE DOMAIN d1 AS INT ; CREATE DOMAIN d2 AS INT check (value > 0); create unlogged table t3(a int); create unlogged table t4(a d1); create unlogged table t5(a d2); performance result: v17-0001-COPY-on_error-set_null.patch -- 764.903 ms copy t3 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1 -- 779.253 ms copy t4 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1 -- Time: 750.390 ms copy t5 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1 v17-0001-COPY-on_error-set_null.original -- 774.943 ms copy t3 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1 -- 867.671 ms copy t4 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1 -- 927.685 ms copy t5 from '/tmp/2.txt' (on_error set_null) \watch c=10 i=0.1 > Here are some comments: > > + if (InputFunctionCallSafe(&in_functions[m], > + NULL, > + typioparams[m], > + att->atttypmod, > + NULL, > + &values[m])) > > Given that we pass NULL to escontext, does this function return false > in an error case? Or can we use InputFunctionCall instead? > > I think we should mention that SET_NULL still could fail if the data > type of the column doesn't accept NULL. > > How about restructuring the codes around handling data incompatibility > errors like: > > else if (!InputFunctionCallSafe(...)) > { > if (cstate->opts.on_error == IGNORE) > { > cstate->num_errors++; > if (cstate->opts.log_verbosity == VERBOSE) > write a NOTICE message; > return true; // ignore whole row. > } > else if (cstate->opts.on_error == SET_NULL) > { > current_row_erroneous = true; > set NULL to the column; > if (cstate->opts.log_verbosity == VERBOSE) > write a NOTICE message; > continue; // go to the next column. > } > > That way, we have similar structures for both on_error handling and > don't need to reset cstate->cur_attname at the end of SET_NULL > handling. > I think we still need to reset cstate->cur_attname. the current code structure is `` foreach(cur, cstate->attnumlist) { if (condition x) continue; cstate->cur_attname = NULL; cstate->cur_attval = NULL; } `` In some cases (last column , condition x is satisfied), once we reach the ``continue``, then we cannot reach. `` cstate->cur_attname = NULL; cstate->cur_attval = NULL; `` > --- > From the regression tests: > > --fail, column a is domain with not-null constraint > COPY t_on_error_null FROM STDIN WITH (on_error set_null); > a 11 14 > \. > ERROR: domain d_int_not_null does not allow null values > CONTEXT: COPY t_on_error_null, line 1, column a: "a" > > I guess that the log messages could confuse users since while the > actual error was caused by setting NULL to the non-NULL domain type > column, the context message says the data 'a' was erroneous. > if the second function is InputFunctionCall, then we cannot customize the error message. we can't have both. I guess we need a second InputFunctionCallSafe with escontext NOT NULL. now i change it to if (!cstate->domain_with_constraint[m] || InputFunctionCallSafe(&in_functions[m], NULL, typioparams[m], att->atttypmod, (Node *) cstate->escontext, &values[m])) else if (string == NULL) ereport(ERROR, errcode(ERRCODE_NOT_NULL_VIOLATION), errmsg("domain %s does not allow null values", format_type_be(typioparams[m])), errdatatype(typioparams[m])); else ereport(ERROR, errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("invalid input value for domain %s: \"%s\"", format_type_be(typiopara
Re: An incorrect check in get_memoize_path
On 4/8/25 08:32, Richard Guo wrote: On Mon, Apr 7, 2025 at 9:54 PM Andrei Lepikhov wrote: On 4/7/25 09:50, Richard Guo wrote: Consider the join to t3. It is a unique join, and not all of its restriction clauses are parameterized. Despite this, the check still passes. At the same time I think term 'Incorrect' is not good unless you show an example where data returned is not consistent to the expected. I think this inequality check has worked in couple with the get_equal_hashops. Do you mean this check is designed to work in conjunction with the clause_sides_match_join check in paraminfo_get_equal_hashops? I would consider this a very poor design. As I have written before, I am quite happy with the change you propose. I just pointed out that term 'incorrect' usually means you may provide a query which causes an error or inconsistent data which we can add to the tests set. Current code may be described as 'kludge' lines - but I'm not a native speaker, don't bikeshed here too much. -- regards, Andrei Lepikhov
Re: Feature freeze
> On 8 Apr 2025, at 16:59, Bruce Momjian wrote: > > On Tue, Apr 8, 2025 at 10:36:45AM -0400, Bruce Momjian wrote: >> Since we recorded feature freeze as April 8, 2025 0:00 AoE (anywhere on >> Earth): >> >> https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items#Important_Dates >> https://www.timeanddate.com/time/zones/aoe >> >> and it is now 2:34 AM AoE, I guess we are now in feature freeze. > > Frankly, I think the name "anywhere on Earth" is confusing, since it > really is "everywhere on Earth": I find both of the above needlessly confusing when we instead could use UTC which is a more universally understood concept. -- Daniel Gustafsson
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Tue, Apr 8, 2025 at 11:20:11AM -0400, Andres Freund wrote: > Hi, > > On 2025-04-08 11:13:51 -0400, Bruce Momjian wrote: > > On Tue, Apr 8, 2025 at 08:04:22AM -0700, Jacob Champion wrote: > > > On Tue, Apr 8, 2025 at 7:32 AM Bruce Momjian wrote: > > > > Uh, where are we on the inclusion of curl in our build? Maybe it was > > > > explained but I have not seen it. > > > > > > The above is discussing a patch to split this into its own loadable > > > module. Andres and Christoph's feedback has been shaping where we put > > > that module, exactly. > > > > Uh, I was afraid that was the case, which is why I asked. We have just > > hit feature freeze, so it is not good that we are still "shaping" the > > patch. Should we consider reverting it? It is true we still "adjust" > > patches after feature freeze. > > You brought the dependency concern up well after the feature was merged, after > it had been in development for a *long* time. It wasn't a secret that it had a > dependency on curl. I don't think it's fair to penalize a feature's authors > to not finish implementing a complicated and completely new requirement within > 17 days. Fair point --- I was just asking. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Move tests of contrib/spi/ out of the core regression tests
On 08/04/2025 16:55, Tom Lane wrote: Well, I don't mind pushing it, but does anyone want to review it first? It sounded like Heikki had at least eyeballed the patch, but I'm not sure if he's ready to sign off on it. It looks good to me. diff --git a/doc/src/sgml/contrib-spi.sgml b/doc/src/sgml/contrib-spi.sgml index 55d3fac7a69..6fa9479d1b9 100644 --- a/doc/src/sgml/contrib-spi.sgml +++ b/doc/src/sgml/contrib-spi.sgml @@ -81,10 +81,12 @@ autoinc() is a trigger that stores the next value of a sequence into an integer field. This has some overlap with the - built-in serial column feature, but it is not the same: - autoinc() will override attempts to substitute a - different field value during inserts, and optionally it can be - used to increment the field during updates, too. + built-in serial column feature, but it is not the same. + The trigger will replace the field's value only if that value is + initially zero or null (after the action of the SQL statement that + inserted or updated the row). Also, if the sequence's next value is + zero, nextval() will be called a second time in + order to obtain a non-zero value. That's a much clearer explanation of the behavior, but now that I read that paragraph, I wonder *why* it behaves like that. I'm sure it's just historical reasons. But perhaps we should nuke autoinc altogether? -- Heikki Linnakangas Neon (https://neon.tech)
Re: Large expressions in indexes can't be stored (non-TOASTable)
On Tue, Apr 08, 2025 at 04:44:02PM +0530, Amit Kapila wrote: > On Fri, Apr 4, 2025 at 7:58 PM Nathan Bossart > wrote: >> On Fri, Apr 04, 2025 at 05:16:43PM +0530, Amit Kapila wrote: >> > Can we dodge adding this push call if we restrict the length of the >> > origin name to some reasonable limit like 256 or 512 and avoid the >> > need of toast altogether? >> >> We did consider just removing pg_replication_origin's TOAST table earlier >> [0], but we decided against it at that time. Maybe it's worth >> reconsidering... > > I don't see any good reason in that email for not removing the TOAST > table for pg_replication_origin. I would prefer to remove it rather > than add protection related to its access. The only reason I can think of is that folks might have existing replication origins with extremely long names that would cause upgrades to fail. While I think it would be probably be okay to remove the TOAST table and restrict origin names to 512 bytes (like we did for password hashes in commit 8275325), folks routinely complain about NAMEDATALEN, so I think we should be cautious here. -- nathan
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Tue, Apr 8, 2025 at 06:42:19PM +0200, Daniel Gustafsson wrote: > > On 8 Apr 2025, at 18:33, Bruce Momjian wrote: > > > Would we have to put out minor releases for curl CVEs? I don't think we > > have to for OpenSSL so would curl be the same? > > Why do you envision this being different from all other dependencies we have? > For OpenSSL we also happily build against a version (and until recently, > several versions) which is EOL and don't even receive security fixes. I don't know. I know people scan our downloads and report when the scanners detect something, but I am unclear what those scanners are doing. Would they see some new risks with curl? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: [PoC] Reducing planning time when tables have many partitions
On Wed, 9 Apr 2025 at 02:24, Tom Lane wrote: > > David Rowley writes: > > I've pushed the patch now. Thanks for all the reviews of my adjustments. > > Shouldn't the CF entry be marked committed? I've done that now. 88f55bc97 added code to do faster lookups of ec_derives clauses, so I think that likely renders part of the remaining patches obsolete. The v35-0004 also had some indexing of ec_sources. It would be good to know if there's still any gains to be had from indexing those. If there's work to do, then a new CF entry can be made for that. David
Re: Feature freeze
On Wed, 9 Apr 2025 at 03:54, Bruce Momjian wrote: > We did have this discussion when AoE was chosen for PG 18 and the idea > was that as long as it is before April 18 midnight wherever you are, it > is not feature freeze yet. I think it maybe once made sense for the moment to stop accepting new patches into a commitfest so that nobody got upset from their patch missing the cut-off because the CF was changed to in progress "too early" as it was still $previous_month in their timezone. Holding that moment back til it was the correct month in every timezone didn't stop people who lived further East from reviewing patches and committing things, so I think it was done to keep everyone happy. Much less of these reasons are applicable for feature freeze. Committers want to calculate the time in their timezone when the freeze hits so they can plan and not commit anything beyond that. That's generally easier to do from UTC as people are generally more used to that. There's also the whole "which day does midnight fall on" problem, which, for some reason, is ambiguous to some. That's why governments and airline companies sometimes do 23:59 or 00:01. For me, I'm exactly 24 hours ahead of AoE, so it should be an easy calc, but I still have more confidence that I'm correct if I'm calculating from UTC. So, +1 for UTC. David
Re: [PoC] Federated Authn/z with OAUTHBEARER
Jacob Champion: On Tue, Apr 8, 2025 at 9:32 AM Wolfgang Walther wrote: And that should also not be a problem for distributions - they could offer a libpq and a libpq_oauth package, where only one of them can be installed at the same time, I guess? * My outsider understanding is that maintaining this sort of thing becomes a major headache, because of combinatorics. You don't really want to ship a libpq and libpq-with-gss and libpq-with-oauth and libpq-with-oauth-and-gss and ... That would only be the case, if you were to consider those other dependencies as "dangerous" as cURL. But we already depend on them. So if it's really the case that cURL is that much worse, that we consider loading it as a module... then the combinatorics should not be a problem either. However, if the other deps are considered problematic as well, then the ship has already sailed, and there is not point for a special case here anymore. Best, Wolfgang
Re: Move tests of contrib/spi/ out of the core regression tests
Andrew Dunstan writes: > On 2025-04-08 Tu 5:21 AM, Heikki Linnakangas wrote: >>> This is too late for v18 of course, so I'll park it in the next CF. >> In my opinion this would still be totally OK for v18. It's just test >> changes. I would rather commit cleanups like this sooner than later. > Yeah, that was my reaction too. Well, I don't mind pushing it, but does anyone want to review it first? It sounded like Heikki had at least eyeballed the patch, but I'm not sure if he's ready to sign off on it. regards, tom lane
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Tue, Apr 8, 2025 at 9:14 AM Bruce Momjian wrote: > How does this patch help us avoid having to handle curl CVEs and its > curl's additional dependencies? As I understand the patch, it makes > libpq _not_ have additional dependencies but moves the dependencies to a > special loadable library that libpq can use. It allows packagers to ship the OAuth library separately, so end users that don't want the additional exposure don't have to install it at all. --Jacob
Re: Feature freeze
On Tue, Apr 8, 2025 at 06:00:27PM +0200, Peter Eisentraut wrote: > On 08.04.25 16:59, Bruce Momjian wrote: > > On Tue, Apr 8, 2025 at 10:36:45AM -0400, Bruce Momjian wrote: > > > Since we recorded feature freeze as April 8, 2025 0:00 AoE (anywhere on > > > Earth): > > > > > > > > > https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items#Important_Dates > > > https://www.timeanddate.com/time/zones/aoe > > > > > > and it is now 2:34 AM AoE, I guess we are now in feature freeze. > > > > Frankly, I think the name "anywhere on Earth" is confusing, since it > > really is "everywhere on Earth": > > > > https://en.wikipedia.org/wiki/Anywhere_on_Earth > > > > Anywhere on Earth (AoE) is a calendar designation that indicates > > that a period expires when the date passes everywhere on Earth. > > Yes, that works intuitively when you specify that sometimes ends when a > certain day ends, for example: > > "The feature development phase ends at the end of day of April 7, AoE." > > That means, everyone everywhere can just look up at their clock and see, > it's still April 7, it's still going. (Of course, others can then do the > analysis and keep going until some time on April 8, but that would be sort > of against the spirit.) > > If you use it as a time zone with a time of day, it doesn't make intuitive > sense. Well, they kind of did this by saying midnight on April 8 AoE, rather than end-of-day in April 7 AoE. Actually, I had originally said April 8 AoE and then was told I had to specify a time --- maybe the time was the mistake, and we still have April 8 to add features. ;-) -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Feature freeze
On 08/04/2025 19:11, Bruce Momjian wrote: On Tue, Apr 8, 2025 at 06:00:27PM +0200, Peter Eisentraut wrote: On 08.04.25 16:59, Bruce Momjian wrote: On Tue, Apr 8, 2025 at 10:36:45AM -0400, Bruce Momjian wrote: Since we recorded feature freeze as April 8, 2025 0:00 AoE (anywhere on Earth): https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items#Important_Dates https://www.timeanddate.com/time/zones/aoe and it is now 2:34 AM AoE, I guess we are now in feature freeze. Frankly, I think the name "anywhere on Earth" is confusing, since it really is "everywhere on Earth": https://en.wikipedia.org/wiki/Anywhere_on_Earth Anywhere on Earth (AoE) is a calendar designation that indicates that a period expires when the date passes everywhere on Earth. Yes, that works intuitively when you specify that sometimes ends when a certain day ends, for example: "The feature development phase ends at the end of day of April 7, AoE." That means, everyone everywhere can just look up at their clock and see, it's still April 7, it's still going. (Of course, others can then do the analysis and keep going until some time on April 8, but that would be sort of against the spirit.) If you use it as a time zone with a time of day, it doesn't make intuitive sense. Well, they kind of did this by saying midnight on April 8 AoE, rather than end-of-day in April 7 AoE. Actually, I had originally said April 8 AoE and then was told I had to specify a time --- maybe the time was the mistake, and we still have April 8 to add features. ;-) At the end of the day (pun not intended), it doesn't matter much. Nothing special happens when the feature freeze begins. If some committers interpret it a little differently, it doesn't matter. That said, +1 for using UTC in the future for clarity. - Heikki
Re: Remove unnecessary static type qualifiers
On 08.04.25 14:22, Junwang Zhao wrote: When I read the libpq source code, I found unnecessary static type qualifiers in PQsetClientEncoding(). diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 0258d9ace3c..300ddfffd55 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -7717,7 +7717,7 @@ int PQsetClientEncoding(PGconn *conn, const char *encoding) { charqbuf[128]; - static const char query[] = "set client_encoding to '%s'"; + const char query[] = "set client_encoding to '%s'"; PGresult *res; int status; I doubt that, if you remove the *static*, it will allocate more memory on stack and need more instructions to copy the string to that area. To avoid creating an array on the stack, you could maybe write it with a pointer instead, like: const char * const query = "..."; I haven't tested whether that results in different or better compiled code. The original code is probably good enough.
Re: Feature freeze
On Tue, Apr 08, 2025 at 06:00:27PM +0200, Peter Eisentraut wrote: > On 08.04.25 16:59, Bruce Momjian wrote: >> Anywhere on Earth (AoE) is a calendar designation that indicates >> that a period expires when the date passes everywhere on Earth. > > Yes, that works intuitively when you specify that sometimes ends when a > certain day ends, for example: > > "The feature development phase ends at the end of day of April 7, AoE." > > That means, everyone everywhere can just look up at their clock and see, > it's still April 7, it's still going. (Of course, others can then do the > analysis and keep going until some time on April 8, but that would be sort > of against the spirit.) I always forget if AoE is UTC+12 or UTC-12. "Anywhere on Earth" sounds to me like it means "the first moment it's this time anywhere on Earth," which would be some point during April 7th for me. So every year, I go to Wikipedia, which reminds me it actually means "the moment this time has passed everywhere on Earth." At this point, I can finally convert to UTC and then to my own time zone in my head. If we just said April 8th, 12:00:00 UTC, I'd immediately know that my entire April 7th was fair game. Of course, I hope to usually be done committing things much earlier... -- nathan
Re: Feature freeze
On 2025-04-08 Tu 12:11 PM, Bruce Momjian wrote: On Tue, Apr 8, 2025 at 06:00:27PM +0200, Peter Eisentraut wrote: On 08.04.25 16:59, Bruce Momjian wrote: On Tue, Apr 8, 2025 at 10:36:45AM -0400, Bruce Momjian wrote: Since we recorded feature freeze as April 8, 2025 0:00 AoE (anywhere on Earth): https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items#Important_Dates https://www.timeanddate.com/time/zones/aoe and it is now 2:34 AM AoE, I guess we are now in feature freeze. Frankly, I think the name "anywhere on Earth" is confusing, since it really is "everywhere on Earth": https://en.wikipedia.org/wiki/Anywhere_on_Earth Anywhere on Earth (AoE) is a calendar designation that indicates that a period expires when the date passes everywhere on Earth. Yes, that works intuitively when you specify that sometimes ends when a certain day ends, for example: "The feature development phase ends at the end of day of April 7, AoE." That means, everyone everywhere can just look up at their clock and see, it's still April 7, it's still going. (Of course, others can then do the analysis and keep going until some time on April 8, but that would be sort of against the spirit.) If you use it as a time zone with a time of day, it doesn't make intuitive sense. Well, they kind of did this by saying midnight on April 8 AoE, rather than end-of-day in April 7 AoE. Actually, I had originally said April 8 AoE and then was told I had to specify a time --- maybe the time was the mistake, and we still have April 8 to add features. ;-) The fact that there is this confusion is an indication that the AoE experiment is a failure. If it's not obvious, and people have to think about it, then it's not working. And I bet there is a huge number of people who have never even heard of it. Specify some time and data at UTC and everyone will understand. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: [PoC] Federated Authn/z with OAUTHBEARER
Jacob Champion: The currently proposed patch would have you package and install a separate .so module implementing OAuth, which the staticlib would load once when needed. Similarly to how you still have to somehow dynamically link your static app against Curl. As a staticlib user, how do you feel about that? When linking statically, I am producing entirely statically linked single binaries. Those contain libpq, all other dependencies, and would also contain curl. The "entirely statically linked" thing is actually enforced by the build system (NixOS' pkgsStatic here), so dlopen() might just not be possible. Not exactly sure right now, whether it's stubbed out or just not available at all. This means that shipping another .so file will not happen with this approach. Assuming OAuth will be picked up by some of the bigger providers, that would... make me feel quite bad about it, actually. I'm not seeing the overall problem, yet. When I build with --enable-curl... ofc, I have a dependency on cURL. That's kind of the point. When I don't want that, then I just disable it. And that should also not be a problem for distributions - they could offer a libpq and a libpq_oauth package, where only one of them can be installed at the same time, I guess? * Best, Wolfgang * Currently, the two build systems don't handle the "please build only libpq" scenario well. If that was supported better, building a second package with oauth support could be much easier.
Re: Feature freeze
On Tue, Apr 8, 2025 at 11:45:09AM -0400, Peter Geoghegan wrote: > On Tue, Apr 8, 2025 at 11:20 AM Nathan Bossart > wrote: > > +1 for UTC. > > +1, I think that AoE is needlessly obscure We did have this discussion when AoE was chosen for PG 18 and the idea was that as long as it is before April 18 midnight wherever you are, it is not feature freeze yet. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Enhancing Memory Context Statistics Reporting
> On 8 Apr 2025, at 18:41, Fujii Masao wrote: > I noticed that the third argument of pg_get_process_memory_contexts() is named > "retries" in pg_proc.dat, while the documentation refers to it as "timeout". I've committed this patch as it was obviously correct, thanks! > Also, as I mentioned earlier, I encountered an issue when calling > pg_get_process_memory_contexts() on the PID of a backend that had just > encountered an error but hadn't finished rolling back. It led to > the following situation: I reconfirmed that the bugfix that Rahila shared in [0] fixes this issue (and will fix others like it, as it's not related to this patch in particular but is a bug in DSM attaching). My plan is to take that for a more thorough review and test tomorrow and see how far it can be safely backpatched. Thanks for bringing this up, sorry about it getting a bit lost among all the emails. -- Daniel Gustafsson [0] cah2l28shr0j3je5v3cxdfmdh-agtsnh2v8pr23x0uhrmbdq...@mail.gmail.com
Re: Horribly slow pg_upgrade performance with many Large Objects
On 4/8/25 15:41, Hannu Krosing wrote: On Tue, Apr 8, 2025 at 8:39 PM Nathan Bossart wrote: ... I've also verified that the dependency information is carried over in upgrades to later versions (AFAICT all the supported ones). If I remember correctly the change to not copying pg_largeobject_metadata data file but instead moving LOs as part of schema was done in v12 when oid,, which had been a system column in v11, became a user column, so upgrade to v11 is likely also missing the dependencies I remember an incident where large amounts of LOs ran pg_upgrade into a transaction-ID wrap around because the restore part would create individual single statement transactions per LO to create, then change permissions and ownership and finally fill in the data. Could that be related here? Regards, Jan
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Mon, Apr 7, 2025 at 9:41 AM Jacob Champion wrote: > > Not sure, the code looks correct at first glance. However, you could > > also just keep the libpq-oauth strings in the libpq catalog. There > > isn't really a need to make a separate one, since the versions you end > > up installing are locked to each other. So you could for example in > > libpq's nls.mk just add > > > > ../libpq-oauth/oauth-curl.c > > > > etc. to the files. > > Oh, that's an interesting idea. Thanks, I'll give it a try. A consequence of this is that our copy of libpq_binddomain isn't using the same mutex as libpq's copy to protect the "libpq-18" message domain. We could discuss whether or not it matters, since we don't support Windows, but it doesn't feel architecturally sound to me. If we want to reuse the same domain, I think the module should be using libpq's libpq_gettext(). (Which we could do, again through the magic of dependency injection.) > > Maybe it would also make sense to make libpq-oauth a subdirectory of the > > libpq directory instead of a peer. > > Works for me. It does not, however, work for our $(recurse) setup in the makefiles -- a shared library depending on a parent directory's shared library leads to infinite recursion, with the current tools -- so I'll keep it at the current directory level for now. Thanks, --Jacob
Re: Finalizing read stream users' flag choices
hi, On 2025-04-08 12:06:47 -0400, Melanie Plageman wrote: > And a somewhat related point, with buffered IO, READ_STREAM_SEQUENTIAL > disables prefetching to encourage OS readahead. I don't know if any > other users than sequential scan should do this. Worth adding that prefetches are only issued when io_method==sync and thus READ_STREAM_SEQUENTIAL only has an effect if io_method==sync. I suspect we should retire READ_STREAM_SEQUENTIAL in 19 or so: a) The sequential-ness detection has gotten smarter / more granular, reducing the need for forcing read_stream's hand. b) There are plenty cases where READ_STREAM_SEQUENTIAL *hurts* with seqscans and io_method==sync, e.g. if there are plenty pre-existing buffers in s_b. c) It doesn't have an effect with io_method != sync But it'll depend a bit on our experiences. Greetings, Andres Freund
Re: Draft for basic NUMA observability
On 4/8/25 16:59, Andres Freund wrote: > Hi, > > On 2025-04-08 09:35:37 -0400, Andres Freund wrote: >> On April 8, 2025 9:21:57 AM EDT, Tomas Vondra wrote: >>> On 4/8/25 15:06, Andres Freund wrote: On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote: I think it's not right that something in src/port defines an SQL callable function. The set of headers for that pull in a lot of things. Since the file relies on things like GUCs, I suspect this should be in src/backend/port or such instead. >>> >>> Yeah, I think you're right, src/backend/port seems like a better place >>> for this. I'll look into moving that in the evening. >> >> On a second look I wonder if just the SQL function and perhaps the page size >> function should be moved. There are FE programs that could potentially >> benefit from num a awareness (e.g. pgbench). > > I would move pg_numa_available() to something like > src/backend/storage/ipc/shmem.c. > Makes sense, done in the attached patch. > I wonder if pg_numa_get_pagesize() should loose the _numa_ in the name, it's > not actually directly NUMA related? If it were pg_get_pagesize() it'd fit in > with shmem.c or such. > True. It's true it's not really "NUMA page", but page size for the shmem segment. So renamed to pg_get_shmem_pagesize() and moved to shmem.c, same as pg_numa_available(). The rest of pg_numa.c is moved to src/backend/port > Or we could just make it work in FE code by making this part > > Assert(IsUnderPostmaster); > Assert(huge_pages_status != HUGE_PAGES_UNKNOWN); > > if (huge_pages_status == HUGE_PAGES_ON) > GetHugePageSize(&os_page_size, NULL); > > #ifndef FRONTEND - we don't currently support using huge pages in FE programs > after all. But querying the page size might still be useful. > I don't really like this. Why shouldn't the FE program simply call sysconf(_SC_PAGESIZE)? It'd be just confusing if in backend it'd also verify huge page status. > > Regardless of all of that, I don't think the include of fmgr.h in pg_numa.h is > needed? > Right, that was left there after moving the prototype into the .c file. regards -- Tomas Vondra diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c b/contrib/pg_buffercache/pg_buffercache_pages.c index c9ceba604b1..e1701bd56ef 100644 --- a/contrib/pg_buffercache/pg_buffercache_pages.c +++ b/contrib/pg_buffercache/pg_buffercache_pages.c @@ -343,7 +343,7 @@ pg_buffercache_numa_pages(PG_FUNCTION_ARGS) * This information is needed before calling move_pages() for NUMA * node id inquiry. */ - os_page_size = pg_numa_get_pagesize(); + os_page_size = pg_get_shmem_pagesize(); /* * The pages and block size is expected to be 2^k, so one divides the diff --git a/src/backend/port/Makefile b/src/backend/port/Makefile index 47338d99229..5dafbf7c0c0 100644 --- a/src/backend/port/Makefile +++ b/src/backend/port/Makefile @@ -24,6 +24,7 @@ include $(top_builddir)/src/Makefile.global OBJS = \ $(TAS) \ atomics.o \ + pg_numa.o \ pg_sema.o \ pg_shmem.o diff --git a/src/backend/port/meson.build b/src/backend/port/meson.build index 09d54e01d13..a9f7120aef4 100644 --- a/src/backend/port/meson.build +++ b/src/backend/port/meson.build @@ -2,6 +2,7 @@ backend_sources += files( 'atomics.c', + 'pg_numa.c', ) diff --git a/src/port/pg_numa.c b/src/backend/port/pg_numa.c similarity index 71% rename from src/port/pg_numa.c rename to src/backend/port/pg_numa.c index 5e2523cf798..20be13f669d 100644 --- a/src/port/pg_numa.c +++ b/src/backend/port/pg_numa.c @@ -20,7 +20,6 @@ #include #endif -#include "fmgr.h" #include "miscadmin.h" #include "port/pg_numa.h" #include "storage/pg_shmem.h" @@ -36,8 +35,6 @@ #include #include -Datum pg_numa_available(PG_FUNCTION_ARGS); - /* libnuma requires initialization as per numa(3) on Linux */ int pg_numa_init(void) @@ -66,8 +63,6 @@ pg_numa_get_max_node(void) #else -Datum pg_numa_available(PG_FUNCTION_ARGS); - /* Empty wrappers */ int pg_numa_init(void) @@ -89,32 +84,3 @@ pg_numa_get_max_node(void) } #endif - -Datum -pg_numa_available(PG_FUNCTION_ARGS) -{ - PG_RETURN_BOOL(pg_numa_init() != -1); -} - -/* This should be used only after the server is started */ -Size -pg_numa_get_pagesize(void) -{ - Size os_page_size; -#ifdef WIN32 - SYSTEM_INFO sysinfo; - - GetSystemInfo(&sysinfo); - os_page_size = sysinfo.dwPageSize; -#else - os_page_size = sysconf(_SC_PAGESIZE); -#endif - - Assert(IsUnderPostmaster); - Assert(huge_pages_status != HUGE_PAGES_UNKNOWN); - - if (huge_pages_status == HUGE_PAGES_ON) - GetHugePageSize(&os_page_size, NULL); - - return os_page_size; -} diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c index e10b380e5c7..0903eb50f54 100644 --- a/src/backend/storage/ipc/shmem.c +++ b/src/backend/storage/ipc/shmem.c @@ -93,6 +93,8 @@ static HTAB *ShmemIndex = NULL; /* primary index hashtable for shmem */ /* To get reliable results for NU
Re: BitmapHeapScan streaming read user and prelim refactoring
On Mon, Apr 7, 2025 at 7:34 PM Thomas Munro wrote: > > On Thu, Feb 13, 2025 at 1:40 PM Melanie Plageman > wrote: > > Thomas mentioned this to me off-list, and I think he's right. We > > likely need to rethink the way parallel bitmap heap scan workers get > > block assignments for reading and prefetching to make it more similar > > to parallel sequential scan. The workers should probably get > > assignments of a range of blocks. On master, each worker does end up > > issuing reads/fadvises for a bunch of blocks in a row -- even though > > that isn't the intent of the parallel bitmap table scan > > implementation. We are losing some of that with the patch -- but only > > because it is behaving as you would expect given the implementation > > and design. I don't consider that a blocker, though. > > I had a crack at this one a few weeks back, and wanted to share some > thoughts. Fwiw, I've had reasonably good results, in a similar situation, by just batching + double-buffering. Since you're using AIO (as Melanie points out) there should be no real penalty to having a single worker request the next batch of blocks, when the current block is used up. Then you can separate "reading" from "prefetching": whoever reads the last block in the current batch, prefetches the next batch. This way, you could preserve your existing "reading" logic, and you wouldn't need to create several new, related queues. James
Re: Some problems regarding the self-join elimination code
On Tue, Apr 8, 2025 at 11:12 PM Dean Rasheed wrote: > On Tue, 8 Apr 2025 at 12:31, Andrei Lepikhov wrote: > > On 4/4/25 09:37, Richard Guo wrote: > > > With more exposure to the changes in ChangeVarNodes(), I have some > > > more concerns. As I understand it, ChangeVarNodes() is designed to > > > update the varno fields of Var nodes in the given tree that belong to > > > the specific relation; neat and clear. However, now, within this > > > function, we also create new NullTest quals for SJE, which doesn't > > > seem like something this function should be handling. It makes this > > > function look a bit like a kludge. > > To be precise, inside the function we replace old clause with the > > NullTest, because relid replacement action made it degenerate. It seems > > essential to me and I think it may be ok to add a comment at the top of > > ChangeVarNodes, describing this minor optimisation. > I recall raising the same concerns when I originally reviewed the > patch. I don't think this code belongs in the rewriter, because it's > very specific to how SJE wants to handle these kinds of nodes. Correct. And I don't think it's this function's responsibility to create new NullTest quals for SJE. > Note also that RestrictInfo is defined in nodes/pathnodes.h, which > describes its contents as internal data structures for the planner, > suggesting that the rewriter has no business processing those kinds of > nodes. > I don't think simply adding a comment addresses those concerns. Agreed. We may need more than just a comment change. > > > Additionally, I have a minor suggestion regarding the new boolean > > > parameter, "change_RangeTblRef". AFAIU, by convention, we typically > > > use flags to control the behavior of walker functions, like in > > > pull_var_clause. While it's fine to use a boolean parameter here > > > given that we currently only have one flag, for the sake of future > > > extensibility, I think using flags might be a better choice. > > That was exactly why I wasn't happy with replacing the change_relid > > routine with ChangeVarNodes. > > But variant with a flag seems non-trivial to implement. Do you have any > > proposal about the code? > Perhaps the way to do it is to make ChangeVarNodesExtended() take a > callback function to be invoked for each node visited. The callback > (which would then be in the planner, with the other SJE code) would do > any special handling it needed (for RangeTblRef and RestrictInfo > nodes), and call ChangeVarNodes_walker() for any other types of node, > to get the default behaviour. Yeah, this might be a better approach. Perhaps we can borrow some ideas from replace_rte_variables. Thanks Richard
Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.
Hi, I was looking at the recent push for the pg_createsubscription "--all" option [1], because I was absent for several weeks prior. I found some minor non-functional issues as follows: == doc/src/sgml/ref/pg_createsubscriber.sgml 1. + If the database name is not specified in publisher-server, the postgres + database will be used, or if that does not exist, template1 will be used. 1a. There is missing SGML markup in the description of "--all" - "publisher-server" ==> "the --publisher-server connection string" - "postgres database" ==> "postgres database" - "template1 database" ==> "template1 database" ~ 1b. That sentence "If the database name ..." doesn't give any context about what it is referring to. I think it needs to be worded more similarly to the vacuum --maintenance-db option [2] to say it is connecting to this database in order to "discover" the list of databases. SUGGESTION To discover the list of all databases, connect to the source server using the database name specified in the --publisher-server connection string, or if not specified, the postgres database will be used, or if that does not exist, template1 will be used. ~ 1c. I didn't think the sentence quite belongs here because it isn't describing anything about the --all option. Instead, it is describing how the source server connection is made. So, I felt this might be better put later in the "Notes/Prerequisites" section where other source server connection information is described. == src/bin/pg_basebackup/pg_createsubscriber.c get_publisher_databases: 2. +/* + * Fetch a list of all not-template databases from the source server. + * Internally, this is treated as if the user specified multiple --database + * options, one for each source database. + */ /not-template/connectable non-template/ == Patches are attached to implement the above. 0001. Docs some fixes to the --all description. (see #1a and #1b) 0002. Docs moves part of the --all description to the Note/Prerequisites section. (see #1c) 0003. Fixes to code comments. (see #2) == [1] https://github.com/postgres/postgres/commit/fb2ea12f42b9453853be043b8ed107e136e1ccb7 [2] https://www.postgresql.org/docs/current/app-vacuumdb.html Kind Regards, Peter Smith. Fujitsu Australia v20250409-0001-DOCS-change-markup-and-wording-for-the-all.patch Description: Binary data v20250409-0003-Minor-comment-fix.patch Description: Binary data v20250409-0002-DOCS-move-the-description-how-the-all-conn.patch Description: Binary data
Re: Large expressions in indexes can't be stored (non-TOASTable)
On Tue, Apr 8, 2025, at 5:25 PM, Nathan Bossart wrote: > On Tue, Apr 08, 2025 at 04:44:02PM +0530, Amit Kapila wrote: > > On Fri, Apr 4, 2025 at 7:58 PM Nathan Bossart > > wrote: > >> On Fri, Apr 04, 2025 at 05:16:43PM +0530, Amit Kapila wrote: > >> > Can we dodge adding this push call if we restrict the length of the > >> > origin name to some reasonable limit like 256 or 512 and avoid the > >> > need of toast altogether? > >> > >> We did consider just removing pg_replication_origin's TOAST table earlier > >> [0], but we decided against it at that time. Maybe it's worth > >> reconsidering... > > > > I don't see any good reason in that email for not removing the TOAST > > table for pg_replication_origin. I would prefer to remove it rather > > than add protection related to its access. > > The only reason I can think of is that folks might have existing > replication origins with extremely long names that would cause upgrades to > fail. While I think it would be probably be okay to remove the TOAST table > and restrict origin names to 512 bytes (like we did for password hashes in > commit 8275325), folks routinely complain about NAMEDATALEN, so I think we > should be cautious here. The logical replication creates origin names as pg_SUBOID_RELID or pg_SUBOID. It means the maximum origin name is 24. This limited origin name also applies to pglogical that limits the name to 54 IIRC. I think that covers the majority of the logical replication setups. There might be a small number of custom logical replication systems that possibly use long names for replication origin. I've never seen a replication origin name longer than NAMEDATALEN. If you consider that the maximum number of replication origin is limited to 2 bytes (65k distinct names), it is reasonable to restrict the replication origin names to 512 due to the high number of combinations. We generally expects that a catalog string uses "name" as type if it is an identifier; it could be the case for roname if author decided to be strict. This additional TOAST table has no or rare use. +1 for removing it. It is one less file, one less table and one less index; in short, one less source of data corruption. ;) -- Euler Taveira EDB https://www.enterprisedb.com/
pg_createsubscriber: Adding another synopsis for the --all option
Hi, In another thread I had previously suggested adding a 2nd synopsis to the pg_createsubscriber docs. See [1, comment #5]. -- CURRENT pg_createsubscriber [option...] { -d | --database }dbname { -D | --pgdata }datadir { -P | --publisher-server }connstr SUGGESTION pg_createsubscriber [option...] {-a | --all} { -D | --pgdata }datadir { -P | --publisher-server }connstr pg_createsubscriber [option...] { -d | --database }dbname { -D | --pgdata }datadir { -P | --publisher-server }connstr -- A patch was made for doing this [2, v23-0002] but after some debate it was omitted from the push for the --all option [3]. I was away at the time so I missed a chance to defend why I felt adding the extra synopsis may be a good idea. I'll try to do that now: Nominating which databas(es) the pg_createsubscriber should process is maybe the most fundamental option of this tool. My reason for the 2nd synopsis boils down to the fact that "--all" option and "--database" options are incompatible. E.g there are 2 ways to nominate the databases: 1) specify them individually or 2) use all of them. It already describes this in all the option descriptions, but I felt adding the --all synopsis just makes that point more clear. Now, vacuumdb was previously cited as a precedent to leave this unchanged. E.g. vacuumdb doesn't have a separate synopsis for just for "--all", so pg_createsubscriber doesn't need to either. I think that's an invalid comparison. The vacuumdb actually does make the distinction between specifically naming a database and saying --all: e.g. "[ dbname | -a | --all ]". But, it can do that because as far as I know vacuumdb only accepts a single "--dbname", whereas pg_createsubscriber allows multiple "--database" options. This means it would too become complex trying to include --all along with --database in just one single pg_createsubscriber synopsis. Which is why, I think 2 separate synopses are warranted. Thoughts? (I've reattached the same v23-0002 patch here because it still works) == [1] https://www.postgresql.org/message-id/CAHut%2BPuyBsOJTSygus2-yp60sw_phwYQ-JyC%2BU6fCBMos9x2LA%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAHv8RjKU24jCHR2fOHocmdSTqhu7ige5UQsUQMkaTZniLc9DbA%40mail.gmail.com [3] https://github.com/postgres/postgres/commit/fb2ea12f42b9453853be043b8ed107e136e1ccb7 Kind Regards, Peter Smith. Fujitsu Australia v23-0002-Synopsis-for-all-option.patch Description: Binary data
Re: Fix 035_standby_logical_decoding.pl race conditions
On Tue, Apr 08, 2025 at 06:42:53AM +, Bertrand Drouvot wrote: > Fully agree. Will need to find another way to prevent a process to wait > between the > wakeup and the detach. I'll open a dedicated thread. By the way, there is a small thing that's itching me a bit about the change done in LogStandbySnapshot() for 105b2cb33617. Could it be useful for debugging to add a elog(DEBUG1) with the LSN returned by GetInsertRecPtr() when taking the short path? We don't have any visibility when the shortcut path is taken, which seems annoying in the long term if we use the injection point skip-log-running-xacts for other tests, and I suspect that there will be some as the standby snapshots can be really annoying in tests where we want a predictible set of WAL records when wal_level is "replica" or "logical". -- Michael signature.asc Description: PGP signature
Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.
On Wed, Mar 26, 2025 at 3:54 PM Amit Kapila wrote: > > On Tue, Mar 25, 2025 at 5:30 PM Ashutosh Bapat > wrote: > > > > On Thu, Mar 20, 2025 at 5:54 PM Amit Kapila wrote: > > > > > > * > > > + > > > + pg_createsubscriber > > > + option > > > + > > > + > > > + -a > > > + --all > > > + > > > + > > > + -D > > > + --pgdata > > > + > > > +datadir > > > + > > > + -P > > > + --publisher-server > > > + > > > +connstr > > > > > > Most of this is unrelated to this patch. I suggest making a top-up > > > patch, we can commit it separately. > > > > Isn't this documenting -a/--all option in synopsis. Why do you think > > it's unrelated? > > > > I was not clear at that stage whether to include it along with the > core patch, but today, I looked at it while responding to Euler and > found that it is not required. We can still discuss whether to include > it, but the main patch can be committed even without this. > I've created a separate thread [1] to continue the discussion about the synopsis. == [1] https://www.postgresql.org/message-id/CAHut%2BPueY_DyuBy6SuvJev2DWJVGtg%3D9WG9WXvYQDJu39gV6TQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
RE: Fix slot synchronization with two_phase decoding enabled
On Thu, Apr 3, 2025 at 3:16 PM Zhijie Hou (Fujitsu) wrote: > > On Thu, Apr 3, 2025 at 1:38 PM Masahiko Sawada wrote: > > > > > On Wed, Apr 2, 2025 at 7:58 PM Amit Kapila > > wrote: > > > > > > On Thu, Apr 3, 2025 at 7:50 AM Zhijie Hou (Fujitsu) > > > wrote: > > > > > > > > On Thu, Apr 3, 2025 at 3:30 AM Masahiko Sawada wrote: > > > > > > > > > > > > > > On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu) > > > > > wrote: > > > > > > > > > > Thank you for the explanation! I agree that the issue happens in > > > > > these > > cases. > > > > > > > > > > As another idea, I wonder if we could somehow defer to make the > > > > > synced slot as 'sync-ready' until we can ensure that the slot > > > > > doesn't have any transactions that are prepared before the point > > > > > of enabling two_phase. For example, when the slotsync worker > > > > > fetches the remote slot, it remembers the confirmed_flush_lsn > > > > > (say > > > > > LSN-1) if the local slot's two_phase becomes true or the local > > > > > slot is newly created with enabling two_phase, and then it makes > > > > > the slot 'sync-ready' once it confirmed that the slot's > > > > > restart_lsn passed > > LSN-1. Does it work? > > > > > > > > Thanks for the idea! > > > > > > > > We considered a similar approach in [1] to confirm there is no > > > > prepared transactions before two_phase_at, but the issue is that > > > > when the two_phase flag is switched from 'false' to 'true' (as in > > > > the case with (copy_data=true, failover=true, two_phase=true)). In > > > > this case, the slot may have already been marked as sync-ready > > > > before the two_phase flag is enabled, as slotsync is unaware of > > > > potential > > future changes to the two_phase flag. > > > > > > This can happen because when copy_data is true, tablesync can take a > > > long time to complete the sync and in the meantime, slot without a > > > two_phase flag would have been synced to standby. Such a slot would > > > be marked as sync-ready even if we follow the calculation proposed > > > by Sawada-san. Note that we enable two_phase once all the tables are > > > in ready state (See run_apply_worker() and comments atop worker.c > > > (TWO_PHASE TRANSACTIONS)). > > > > Right. It doesn't make sense to make the slot not-sync-ready and then > > back to sync-ready. > > > > While I agree with the approach for HEAD and it seems difficult to > > find a solution, I'm concerned that disallowing to use both failover > > and two_phase in a minor release would affect users much. Users who > > are already using that combination might end up needing to re-think > > their system architecture. So I'm trying to narrow down use cases > > where we're going to prohibit or to find workarounds. We analyzed more for the backbranch fix, and here is a summary of different approaches that could be used for PG17. -- Approach 1 -- This is the original approach implemented in V4 patch. In this approach, we entirely disallow enabling both failover and the two-phase feature together for a replication slot or subscription. pros: This restriction is simple to implement and easy for users to comprehend. cons: Users would be unable to use the two-phase feature in conjunction with failover. Following the upgrade to a new release with this fix, existing subscriptions that have both failover and two-phase enabled would require manual re-creation, which is time-consuming. -- Approach 2 -- Instead of disallowing the use of two-phase and failover together, a more flexible strategy could be only restrict failover for slots with two-phase enabled when there's a possibility of existing prepared transactions before the two_phase_at that are not yet replicated. During slot creation with two-phase and failover, we could check for any decoded prepared transactions when determining the decoding start point (DecodingContextFindStartpoint). For subsequent attempts to alter failover to true, we ensure that two_phase_at is less than restart_lsn, indicating that all prepared transactions have been committed and replicated, thus the bug would not happen. pros: This method minimizes restrictions for users. Especially during slot creation with (two_phase=on, failover=on), as it’s uncommon for transactions to prepare during consistent snapshot creation, the restriction becomes almost unnoticeable. Users are not always forced to re-create subscriptions post-upgrade. cons: The logic involved for (restart_lsn > two_phase_at) might be less intuitive for users. After upgrading, it's recommended that users verify whether two_phase_at for a source slot is less than restart_lsn of the synced slot. If it isn't, users should be careful when using the synced slot on a standby. This might be complex to understand. -- Approach 3 -- This approach is similar to Approach 2 but simplifies some aspects by avoiding checks for prepared transactions during slot creation. It disallows enabling both
Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints
hi. attached patch is for address pg_dump inconsistency when parent is "not null not valid" while child is "not null". The following query before/after pg_dump should return the same result. select conrelid::regclass::text, conname, convalidated, coninhcount, conislocal, conparentid, contype from pg_constraint where conrelid::regclass::text = ANY('{inhnn, inhnn_cc, inhnn_cc_1}') order by 1,2; --test cases: CREATE TABLE inhnn (a INTEGER); ALTER TABLE inhnn ADD CONSTRAINT cc not null a NOT VALID; CREATE TABLE inhnn_cc(a INTEGER) INHERITS(inhnn); CREATE TABLE inhnn_cc_1(a INTEGER) INHERITS(inhnn_cc, inhnn); master pg_dump output is: CREATE TABLE public.inhnn (a integer); CREATE TABLE public.inhnn_cc (a integer) INHERITS (public.inhnn); CREATE TABLE public.inhnn_cc_1 (a integer) INHERITS (public.inhnn_cc, public.inhnn); ALTER TABLE public.inhnn ADD CONSTRAINT cc NOT NULL a NOT VALID; with the attached patch, pg_dump output is: CREATE TABLE public.inhnn (a integer); CREATE TABLE public.inhnn_cc (a integer CONSTRAINT cc NOT NULL) INHERITS (public.inhnn); CREATE TABLE public.inhnn_cc_1 (a integer CONSTRAINT cc NOT NULL) INHERITS (public.inhnn_cc, public.inhnn); ALTER TABLE public.inhnn ADD CONSTRAINT cc NOT NULL a NOT VALID; - As you can see, in master, pg_dump will make {inhnn, inhnn_cc, inhnn_cc_1} not-null constraint's pg_constraint.convalidated set as false. but we should only make inhnn's not-null constraint convalidated as false. From 9505f36287403aa8efd7642dddf71b77996796dd Mon Sep 17 00:00:00 2001 From: jian he Date: Wed, 9 Apr 2025 13:07:58 +0800 Subject: [PATCH v1 1/1] pg_dump not null not valid make sure pg_dump have the same pg_constraint meta before and after pg_dump. if the parent not-null constraint is invalid, child is valid. then pg_dump need locally print the not-null constraint on that child too. otherwise pg_dump may make child's convalidated may set to false. that means we also need adjust conislocal in AdjustNotNullInheritance. --- src/backend/catalog/pg_constraint.c | 10 ++ src/bin/pg_dump/common.c| 4 src/bin/pg_dump/pg_dump.c | 17 + src/bin/pg_dump/pg_dump.h | 5 + 4 files changed, 36 insertions(+) diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c index 2f73085961b..9e65b96143f 100644 --- a/src/backend/catalog/pg_constraint.c +++ b/src/backend/catalog/pg_constraint.c @@ -776,6 +776,16 @@ AdjustNotNullInheritance(Oid relid, AttrNumber attnum, ereport(ERROR, errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("too many inheritance parents")); + + /* + * If the child already has a valid constraint and we are + * creating an invalid one with same definition on it. The + * child's constraint will remain valid, but can no longer be + * marked as local. + */ + if (is_notvalid && conform->convalidated && conform->conenforced) +conform->conislocal = false; + changed = true; } else if (!conform->conislocal) diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c index 56b6c368acf..ff6a4eacda0 100644 --- a/src/bin/pg_dump/common.c +++ b/src/bin/pg_dump/common.c @@ -546,6 +546,10 @@ flagInhAttrs(Archive *fout, DumpOptions *dopt, TableInfo *tblinfo, int numTables parent->notnull_constrs[inhAttrInd] != NULL) foundNotNull = true; + if (fout->remoteVersion >= 18 && + parent->notnull_invalid[inhAttrInd]) + tbinfo->notnull_parent_invalid[j] = true; + foundDefault |= (parentDef != NULL && strcmp(parentDef->adef_expr, "NULL") != 0 && !parent->attgenerated[inhAttrInd]); diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 25264f8c9fb..8d131523366 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -9255,6 +9255,8 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables) tbinfo->attfdwoptions = (char **) pg_malloc(numatts * sizeof(char *)); tbinfo->attmissingval = (char **) pg_malloc(numatts * sizeof(char *)); tbinfo->notnull_constrs = (char **) pg_malloc(numatts * sizeof(char *)); + tbinfo->notnull_invalid = (bool *) pg_malloc(numatts * sizeof(bool)); + tbinfo->notnull_parent_invalid = (bool *) pg_malloc(numatts * sizeof(bool)); tbinfo->notnull_noinh = (bool *) pg_malloc(numatts * sizeof(bool)); tbinfo->notnull_islocal = (bool *) pg_malloc(numatts * sizeof(bool)); tbinfo->attrdefs = (AttrDefInfo **) pg_malloc(numatts * sizeof(AttrDefInfo *)); @@ -9280,6 +9282,8 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables) tbinfo->attlen[j] = atoi(PQgetvalue(res, r, i_attlen)); tbinfo->attalign[j] = *(PQgetvalue(res, r, i_attalign)); tbinfo->attislocal[j] = (PQgetvalue(res, r, i_attislocal)[0] == 't'); + tbinfo->notnull_parent_invalid[j] = false; /* it only change in flagInhAttrs */ + tbinfo->notnull_invalid[j] = false; /* it only change in determineNotNullFlags */
Re: [PoC] Reducing planning time when tables have many partitions
On Wed, 9 Apr 2025 at 17:09, Amit Langote wrote: > Should the following paragraph in src/backend/optimizer/README be > updated to reflect the new reality after recent changes? > > An EquivalenceClass can contain "em_is_child" members, which are copies > of members that contain appendrel parent relation Vars, transposed to > contain the equivalent child-relation variables or expressions. These > members are not full-fledged members of the EquivalenceClass and do not > affect the class's overall properties at all. They are kept only to > simplify matching of child-relation expressions to EquivalenceClasses. > Most operations on EquivalenceClasses should ignore child members. > > The part about these being in the EquivalenceClass might be worth > rewording now that we keep them in a separate array. I did read over that as part of the search I did for things that need to be updated, but I didn't see the need to adjust anything since the text doesn't talk about where the members are stored. The only thing I see as a hint to that is the final sentence. If the README is light on documentation about where members are stored, do we really need to start detailing that because of this change? I've tried to be fairly comprehensive about where members are stored in the header comment for struct EquivalenceClass. Wouldn't stating something similar in the README just be duplicating that? I always think of the READMEs as more of an overview on how things fit together with some high-level theory. I think talking about data structures might be a bit too much detail. I'm happy to view wording suggestions if you think we need to detail this further. Maybe there's something that can be adjusted without going into too much depth. David
Re: [PoC] Reducing planning time when tables have many partitions
On Wed, Apr 9, 2025 at 10:51 AM David Rowley wrote: > > On Wed, 9 Apr 2025 at 17:09, Amit Langote wrote: > > Should the following paragraph in src/backend/optimizer/README be > > updated to reflect the new reality after recent changes? > > > > An EquivalenceClass can contain "em_is_child" members, which are copies > > of members that contain appendrel parent relation Vars, transposed to > > contain the equivalent child-relation variables or expressions. These > > members are not full-fledged members of the EquivalenceClass and do not > > affect the class's overall properties at all. They are kept only to > > simplify matching of child-relation expressions to EquivalenceClasses. > > Most operations on EquivalenceClasses should ignore child members. > > > > The part about these being in the EquivalenceClass might be worth > > rewording now that we keep them in a separate array. > > I did read over that as part of the search I did for things that need > to be updated, but I didn't see the need to adjust anything since the > text doesn't talk about where the members are stored. The only thing > I see as a hint to that is the final sentence. > > If the README is light on documentation about where members are > stored, do we really need to start detailing that because of this > change? I've tried to be fairly comprehensive about where members are > stored in the header comment for struct EquivalenceClass. Wouldn't > stating something similar in the README just be duplicating that? I > always think of the READMEs as more of an overview on how things fit > together with some high-level theory. I think talking about data > structures might be a bit too much detail. This change didn't require us to change the README indicates that the README is at the right level. The code changes internally reorganize how child EMs are stored within an EC, so changing the comments for relevant structures seems good enough, not a change that should bubble up to the README. The only change in the EC interface is the addition of a new iterator method - maybe we could document that in README but that too seems more detail than what README is about. -- Best Wishes, Ashutosh Bapat
Re: Changing shared_buffers without restart
On Mon, Apr 7, 2025 at 2:13 PM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > In the new v4 version > of the patch the first option is implemented. > The patches don't apply cleanly using git am but patch -p1 applies them cleanly. However I see following compilation errors RuntimeError: command "ninja" failed with error [1/1954] Generating src/include/utils/errcodes with a custom command [2/1954] Generating src/include/storage/lwlocknames_h with a custom command [3/1954] Generating src/include/utils/wait_event_names with a custom command [4/1954] Compiling C object src/port/libpgport.a.p/pg_popcount_aarch64.c.o [5/1954] Compiling C object src/port/libpgport.a.p/pg_numa.c.o FAILED: src/port/libpgport.a.p/pg_numa.c.o cc -Isrc/port/libpgport.a.p -Isrc/include -I../../coderoot/pg/src/include -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -g -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation -fPIC -DFRONTEND -MD -MQ src/port/libpgport.a.p/pg_numa.c.o -MF src/port/libpgport.a.p/pg_numa.c.o.d -o src/port/libpgport.a.p/pg_numa.c.o -c ../../coderoot/pg/src/port/pg_numa.c In file included from ../../coderoot/pg/src/include/storage/spin.h:54, from ../../coderoot/pg/src/include/storage/condition_variable.h:26, from ../../coderoot/pg/src/include/storage/barrier.h:22, from ../../coderoot/pg/src/include/storage/pg_shmem.h:27, from ../../coderoot/pg/src/port/pg_numa.c:26: ../../coderoot/pg/src/include/storage/s_lock.h:93:2: error: #error "s_lock.h may not be included from frontend code" 93 | #error "s_lock.h may not be included from frontend code" | ^ In file included from ../../coderoot/pg/src/port/pg_numa.c:26: ../../coderoot/pg/src/include/storage/pg_shmem.h:66:9: error: unknown type name ‘pg_atomic_uint32’ 66 | pg_atomic_uint32NSharedBuffers; | ^~~~ ../../coderoot/pg/src/include/storage/pg_shmem.h:68:9: error: unknown type name ‘pg_atomic_uint64’ 68 | pg_atomic_uint64Generation; | ^~~~ ../../coderoot/pg/src/port/pg_numa.c: In function ‘pg_numa_get_pagesize’: ../../coderoot/pg/src/port/pg_numa.c:117:17: error: too few arguments to function ‘GetHugePageSize’ 117 | GetHugePageSize(&os_page_size, NULL); | ^~~ In file included from ../../coderoot/pg/src/port/pg_numa.c:26: ../../coderoot/pg/src/include/storage/pg_shmem.h:127:13: note: declared here 127 | extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags, | ^~~ -- Best Wishes, Ashutosh Bapat
Re: [PoC] Reducing planning time when tables have many partitions
Amit Langote writes: > Should the following paragraph in src/backend/optimizer/README be > updated to reflect the new reality after recent changes? > An EquivalenceClass can contain "em_is_child" members, which are copies > of members that contain appendrel parent relation Vars, transposed to > contain the equivalent child-relation variables or expressions. Hm. They still are "in" the EquivalenceClass in a very real sense; there is no other data structure that links to them. So this isn't incorrect. I do get your feeling that maybe some rewording is warranted, but I'm not sure what. regards, tom lane
Re: Memoize ANTI and SEMI JOIN inner
On Thu, Mar 20, 2025 at 3:02 PM David Rowley wrote: > For making this work, I think the attached should be about the guts of > the code changes. I didn't look at the comments. Right now I can't > think of any reason why this can't be done, but some experimentation > might reveal some reason that it can't. I conducted some experiments, and I'm afraid it's not safe to consider Memoize for semi or anti joins, unless the inner side is provably unique. As an example, please consider: create table t (a int, b boolean); insert into t select i%3, false from generate_series(1,100)i; analyze t; select * from t t1 where t1.a in (select a from t t2 where t2.b in (select t1.b from t t3 where t2.a > 1 offset 0)); ERROR: cache entry already complete With the proposed patch, this query results in an error. The problem is that join clauses from the upper level may be moved to the semi join. For a given outer tuple, the first inner tuple that satisfies the current parameters will mark the cache entry as complete because singlerow is set to true. However, if that inner tuple and the current outer tuple don't satisfy the join clauses, the second inner tuple that satisfies the parameters will complain that the cache entry is already marked as complete. If the inner side is provably unique, there will be no such problem, as there won't be a second matching tuple. OTOH, in this case, the semi join will be reduced to an inner join by reduce_unique_semijoins. Therefore, it doesn't make much sense to prove inner_unique for semi joins in add_paths_to_joinrel. Perhaps we could spend some planner cycles proving inner_unique for anti joins, so that Memoize nodes can be considered for them? Thanks Richard
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Tue, Apr 8, 2025 at 06:57:18PM +0200, Wolfgang Walther wrote: > Jacob Champion: > > On Tue, Apr 8, 2025 at 9:32 AM Wolfgang Walther > > wrote: > > > And that should also not be a problem for distributions - they could > > > offer a libpq and a libpq_oauth package, where only one of them can be > > > installed at the same time, I guess? * > > My outsider understanding is that maintaining this sort of thing > > becomes a major headache, because of combinatorics. You don't really > > want to ship a libpq and libpq-with-gss and libpq-with-oauth and > > libpq-with-oauth-and-gss and ... > > That would only be the case, if you were to consider those other > dependencies as "dangerous" as cURL. But we already depend on them. So if > it's really the case that cURL is that much worse, that we consider loading > it as a module... then the combinatorics should not be a problem either. > > However, if the other deps are considered problematic as well, then the ship > has already sailed, and there is not point for a special case here anymore. Yes, I think this is what I am asking too. For me it was curl's security reputation and whether that would taint the security reputation of libpq. For Tom, I think it was the dependency additions. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Tue, Apr 8, 2025 at 09:17:03AM -0700, Jacob Champion wrote: > On Tue, Apr 8, 2025 at 9:14 AM Bruce Momjian wrote: > > How does this patch help us avoid having to handle curl CVEs and its > > curl's additional dependencies? As I understand the patch, it makes > > libpq _not_ have additional dependencies but moves the dependencies to a > > special loadable library that libpq can use. > > It allows packagers to ship the OAuth library separately, so end users > that don't want the additional exposure don't have to install it at > all. Okay, so how would they do that? I understand how that would happen if it was an external extension, but how if it is under /src or /contrib. FYI, I see a good number of curl CVEs: https://curl.se/docs/security.html Would we have to put out minor releases for curl CVEs? I don't think we have to for OpenSSL so would curl be the same? I am asking these questions now so we can save time in getting this closed. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Enhancing Memory Context Statistics Reporting
Hi, On 2025-04-08 01:17:17 +0200, Daniel Gustafsson wrote: > > On 7 Apr 2025, at 17:43, Andres Freund wrote: > > >> + /* > >> + * Hold the process lock to protect writes to process specific memory. Two > >> + * processes publishing statistics do not block each other. > >> + */ > > > > s/specific/process specific/ > > That's what it says though.. isn't it? I might be missing something obvious. Understandable confusion, not sure what my brain was doing anymore either... > >> +} MemoryContextState; > > > > IMO that's too generic a name for something in a header. > > > >> +} MemoryContextId; > > > > This too. Particularly because MemoryContextData->ident exist but is > > something different. > > Renamed both to use MemoryContextReporting* namespace, which leaves > MemoryContextReportingBackendState at an unwieldly long name. I'm running out > of ideas on how to improve and it does make purpose quite explicit at least. How about MemoryContextReportingBackendState -> MemoryStatsBackendState MemoryContextReportingId -> MemoryStatsContextId MemoryContextReportingSharedState -> MemoryStatsCtl MemoryContextReportingStatsEntry -> MemoryStatsEntry > >> + /* context id starts with 1 */ > >> + entry->context_id = ++(*stats_count); > > > > Given that we don't actually do anything here relating to starting with 1, I > > find that comment confusing. > > Reworded, not sure if it's much better tbh. I'd probably just remove the comment. > > Hm. First I thought we'd leak memory if this second (and subsequent) > > dsa_allocate failed. Then I thought we'd be ok, because the memory would be > > memory because it'd be reachable from memCtxState[idx].memstats_dsa_pointer. > > > > But I think it wouldn't *quite* work, because memCtxState[idx].total_stats > > is > > only set *after* we would have failed. > > Keeping a running total in .total_stats should make the leak window smaller. Why not just initialize .total_stats *before* calling any fallible code? Afaict it's zero-allocated, so the free function should have no problem dealing with the entries that haven't yet been populated/ Greetings, Andres Freund
Re: Modern SHA2- based password hashes for pgcrypto
Hi, On 2025-04-07 09:09:39 +0200, Bernd Helmle wrote: > Am Sonntag, dem 06.04.2025 um 23:02 -0400 schrieb Andres Freund: > > On 2025-04-05 19:22:58 +0200, Alvaro Herrera wrote: > > > I have pushed this now, hoping it won't explode. > > > > I have a WIP patch that adds gcc specific allocator attributes for > > palloc et > > al. Just rebased that. It warns on the new code as follows: > > > > [1489/1804 42 82%] Compiling C object > > contrib/pgcrypto/pgcrypto.so.p/crypt-sha.c.o > > ../../../../../home/andres/src/postgresql/contrib/pgcrypto/crypt- > > sha.c: In function 'px_crypt_shacrypt': > > ../../../../../home/andres/src/postgresql/contrib/pgcrypto/crypt- > > sha.c:605:13: warning: pointer 'cp' may be used after 'pfree' [-Wuse- > > after-free] > > 605 | *cp = '\0'; > > | ^~ > > ../../../../../home/andres/src/postgresql/contrib/pgcrypto/crypt- > > sha.c:533:9: note: call to 'pfree' here > > 533 | pfree(s_bytes); > > | ^~ > > > > And it sure seems to have a point. I'm surprised this isn't causing > > wider > > issues... > > Indeed. I think this assignment is useless anyways, since s_bytes is > already allocated with palloc0. I must have overseen this one when > rearranging code...but yes, strange that it didn't cause drama. Valgrind did find it actually, I just missed it due to other failures: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2025-04-05%2020%3A52%3A38&stg=recovery-check ==2917549== VALGRINDERROR-BEGIN ==2917549== Invalid write of size 1 ==2917549==at 0x11D4AD9C: px_crypt_shacrypt (crypt-sha.c:605) ==2917549==by 0x11D54B50: run_crypt_sha (px-crypt.c:76) ==2917549==by 0x11D54BCB: px_crypt (px-crypt.c:119) ==2917549==by 0x11D4BCBB: pg_crypt (pgcrypto.c:228) ==2917549==by 0x3BBCEA: ExecInterpExpr (execExprInterp.c:1001) ==2917549==by 0x3B790E: ExecInterpExprStillValid (execExprInterp.c:2299) ==2917549==by 0x48FD40: ExecEvalExprSwitchContext (executor.h:466) ==2917549==by 0x48FD40: evaluate_expr (clauses.c:5012) ==2917549==by 0x48FF08: evaluate_function (clauses.c:4519) ==2917549==by 0x492054: simplify_function (clauses.c:4108) ==2917549==by 0x4901FA: eval_const_expressions_mutator (clauses.c:2593) ==2917549==by 0x432CDE: expression_tree_mutator_impl (nodeFuncs.c:3486) ==2917549==by 0x48FFB9: eval_const_expressions_mutator (clauses.c:3727) ==2917549== Address 0x82fa740 is 912 bytes inside a recently re-allocated block of size 8,192 alloc'd ==2917549==at 0x4844818: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==2917549==by 0x71EF75: AllocSetContextCreateInternal (aset.c:444) ==2917549==by 0x3D17AE: CreateExprContextInternal (execUtils.c:260) ==2917549==by 0x3D1CA8: CreateExprContext (execUtils.c:310) ==2917549==by 0x3D1ED8: MakePerTupleExprContext (execUtils.c:462) ==2917549==by 0x48FDBF: evaluate_expr (clauses.c:5013) ==2917549==by 0x48FF08: evaluate_function (clauses.c:4519) ==2917549==by 0x492054: simplify_function (clauses.c:4108) ==2917549==by 0x4901FA: eval_const_expressions_mutator (clauses.c:2593) ==2917549==by 0x432CDE: expression_tree_mutator_impl (nodeFuncs.c:3486) ==2917549==by 0x48FFB9: eval_const_expressions_mutator (clauses.c:3727) ==2917549==by 0x432EFD: expression_tree_mutator_impl (nodeFuncs.c:3572) ==2917549== ==2917549== VALGRINDERROR-END { Memcheck:Addr1 fun:px_crypt_shacrypt fun:run_crypt_sha fun:px_crypt fun:pg_crypt fun:ExecInterpExpr fun:ExecInterpExprStillValid fun:ExecEvalExprSwitchContext fun:evaluate_expr fun:evaluate_function fun:simplify_function fun:eval_const_expressions_mutator fun:expression_tree_mutator_impl fun:eval_const_expressions_mutator } **2917549** Valgrind detected 1 error(s) during execution of "SELECT crypt('', '$5$Szzz0yzz');" Greetings, Andres
Re: Draft for basic NUMA observability
On 4/8/25 15:06, Andres Freund wrote: > Hi, > > On 2025-04-08 17:44:19 +0500, Kirill Reshke wrote: >> On Mon, 7 Apr 2025 at 23:00, Tomas Vondra wrote: >>> I'll let the CI run the tests on it, and >>> then will push, unless someone has more comments. >>> >> >> >> Hi! I noticed strange failure after this commit[0] >> >> Looks like it is related to 65c298f61fc70f2f960437c05649f71b862e2c48 >> >> In file included from [01m [K../pgsql/src/include/postgres.h:49 [m [K, >> from [01m [K../pgsql/src/port/pg_numa.c:16 [m [K: >> [01m [K../pgsql/src/include/utils/elog.h:79:10: [m [K >> [01;31m [Kfatal error: [m [Kutils/errcodes.h: No such file or >> directory >>79 | #include [01;31m [K"utils/errcodes.h" [m [K >> | [01;31m [K^~ [m [K >> compilation terminated. > > $ ninja -t missingdeps > Missing dep: src/port/libpgport.a.p/pg_numa.c.o uses > src/include/utils/errcodes.h (generated by CUSTOM_COMMAND) > Missing dep: src/port/libpgport_shlib.a.p/pg_numa.c.o uses > src/include/utils/errcodes.h (generated by CUSTOM_COMMAND) > Processed 2384 nodes. > Error: There are 2 missing dependency paths. > 2 targets had depfile dependencies on 1 distinct generated inputs (from 1 > rules) without a non-depfile dep path to the generator. > There might be build flakiness if any of the targets listed above are built > alone, or not late enough, in a clean output directory. > > > I think it's not right that something in src/port defines an SQL callable > function. The set of headers for that pull in a lot of things. > > Since the file relies on things like GUCs, I suspect this should be in > src/backend/port or such instead. > Yeah, I think you're right, src/backend/port seems like a better place for this. I'll look into moving that in the evening. regards -- Tomas Vondra
Re: Large expressions in indexes can't be stored (non-TOASTable)
On Tue, Apr 08, 2025 at 11:21:31PM -0300, Euler Taveira wrote: > The logical replication creates origin names as pg_SUBOID_RELID or pg_SUBOID. > It means the maximum origin name is 24. This limited origin name also applies > to pglogical that limits the name to 54 IIRC. I think that covers the majority > of the logical replication setups. There might be a small number of custom > logical replication systems that possibly use long names for replication > origin. I've never seen a replication origin name longer than NAMEDATALEN. pg_replication_origin_create() can be used with text as input for the origin name, still your argument sounds sensible here as I would suspect that most setups of logical replication are these. > If you consider that the maximum number of replication origin is limited to 2 > bytes (65k distinct names), it is reasonable to restrict the replication > origin names to 512 due to the high number of combinations. We generally > expects that a catalog string uses "name" as type if it is an identifier; it > could be the case for roname if author decided to be strict. I would be more cautious than a limit on NAMEDATALEN. The restriction suggested by Nathan at 512 bytes should be plenty enough. > This additional TOAST table has no or rare use. +1 for removing it. It is one > less file, one less table and one less index; in short, one less source of > data > corruption. ;) I guess that's the consensus, then. No objections to the removal here. -- Michael signature.asc Description: PGP signature
Re: Remove unnecessary static type qualifiers
On Wed, 09 Apr 2025 at 10:34, Junwang Zhao wrote: > On Wed, Apr 9, 2025 at 5:22 AM David Rowley wrote: >> >> On Wed, 9 Apr 2025 at 03:46, Peter Eisentraut wrote: >> > To avoid creating an array on the stack, you could maybe write it with a >> > pointer instead, like: >> > >> > const char * const query = "..."; >> > >> > I haven't tested whether that results in different or better compiled >> > code. The original code is probably good enough. >> >> I expect it's been done the way it has to make the overflow detection >> code work. The problem with having a pointer to a string constant is >> that sizeof() will return the size of the pointer and not the space >> needed to store the string. >> >> We can get rid of the variable and make the overflow work by checking >> the return length of snprintf. I think that's all C99 spec now... >> >> len = snprintf(qbuf, "set client_encoding to '%s'", encoding); >> /* check query buffer overflow */ >> if (len >= sizeof(qbuf)) >> return -1; >> > > Agree, this is a better coding style. Please see if the following code makes > sense to you. > > diff --git a/src/interfaces/libpq/fe-connect.c > b/src/interfaces/libpq/fe-connect.c > index 0258d9ace3c..247d079faa6 100644 > --- a/src/interfaces/libpq/fe-connect.c > +++ b/src/interfaces/libpq/fe-connect.c > @@ -7717,9 +7717,9 @@ int > PQsetClientEncoding(PGconn *conn, const char *encoding) > { > charqbuf[128]; > - static const char query[] = "set client_encoding to '%s'"; > PGresult *res; > int status; > + int len; > > if (!conn || conn->status != CONNECTION_OK) > return -1; > @@ -7731,12 +7731,11 @@ PQsetClientEncoding(PGconn *conn, const char > *encoding) > if (strcmp(encoding, "auto") == 0) > encoding = > pg_encoding_to_char(pg_get_encoding_from_locale(NULL, true)); > > - /* check query buffer overflow */ > - if (sizeof(qbuf) < (sizeof(query) + strlen(encoding))) > + len = snprintf(qbuf, sizeof(qbuf), "set client_encoding to > '%s'", encoding); > + if (len >= sizeof(qbuf)) > return -1; > > /* ok, now send a query */ > - sprintf(qbuf, query, encoding); > res = PQexec(conn, qbuf); > > if (res == NULL) > Thank you for all the explanations! The above code looks good to me, except for a minor issue; since snprintf may return a negative value, should we check for this? -- Regrads, Japin Li
Re: Remove unnecessary static type qualifiers
Junwang Zhao writes: > On Tue, Apr 8, 2025 at 4:29 PM Japin Li wrote: >> - static const char query[] = "set client_encoding to '%s'"; >> + const char query[] = "set client_encoding to '%s'"; > I doubt that, if you remove the *static*, it will allocate more memory > on stack and need more instructions to copy the string to that area. Yeah, it's not an improvement as written. This might be OK: >> + const char *query = "set client_encoding to '%s'"; But I kinda doubt it's worth touching. regards, tom lane
Re: Horribly slow pg_upgrade performance with many Large Objects
On Tue, Apr 08, 2025 at 12:37:43PM -0400, Tom Lane wrote: > Hannu Krosing writes: >> I think we do preserve role oids > > Oh ... I'd been looking for mentions of "role" in > pg_upgrade_support.c, but what I should have looked for was > "pg_authid". So yeah, we do preserve role OIDs, and maybe that's > enough to make this workable, at least with source versions that > share the same rules for what goes into pg_largeobject_metadata and > pg_shdepend. It's not something I'd risk back-patching though. I do think it's worth considering going back to copying pg_largobject_metadata's files for upgrades from v16 and newer. That sounds restrictive at the moment, but it'll mean that all but one supported major version can copy the files during upgrade to v19. I'll admit I'm a tad worried about having to go back to copying via SQL commands in the future and re-regressing things (leading to unpredictable differences in upgrade downtime), but I'm not sure that's a great reason to withhold this optimization. Of course, I wouldn't be opposed to optimizing the SQL command strategy, too, but I suspect that won't compare to copying the files. -- nathan
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Tue, Apr 8, 2025 at 9:32 AM Wolfgang Walther wrote: > And that should also not be a problem for distributions - they could offer a > libpq and a libpq_oauth package, where only one of them can be installed at > the same time, I guess? * My outsider understanding is that maintaining this sort of thing becomes a major headache, because of combinatorics. You don't really want to ship a libpq and libpq-with-gss and libpq-with-oauth and libpq-with-oauth-and-gss and ... --Jacob
Re: Horribly slow pg_upgrade performance with many Large Objects
Nathan Bossart writes: > I do think it's worth considering going back to copying > pg_largobject_metadata's files for upgrades from v16 and newer. (If we do this) I don't see why we'd need to stop at v16. I'm envisioning that we'd use COPY, which will be dealing in the text representation of aclitems, and I don't think that's changed in a long time. The sort of thing that would break it is changes in the set of available/default privilege bits for large objects. That is, where the dump currently contains something like SELECT pg_catalog.lo_create('2121'); ALTER LARGE OBJECT 2121 OWNER TO postgres; GRANT ALL ON LARGE OBJECT 2121 TO joe; we'd have COPY pg_largeobject_metadata FROM STDIN; ... 212110 {postgres=rw/postgres,joe=rw/postgres} ... and some appropriate COPY data for pg_shdepend too. (The fact that this representation will contain both numeric and symbolic role OIDs is why I was concerned about OID stability.) The key thing that worries me here is if the source and target versions have different ideas of which roles are pinned, which would silently change what appears in pg_shdepend. But it'd only really break if a role mentioned in some LO's owner or ACL is pinned in the source and not in the target, which seems unlikely. (In the other direction, we'd just be adding a useless row in pg_shdepend.) regards, tom lane
Re: Horribly slow pg_upgrade performance with many Large Objects
Hannu Krosing writes: > In copy case I would expect the presence of grants to not make much > difference. aclitemin is slower than a lot of other datatype input functions, but it's still got to be faster than a GRANT. regards, tom lane