Re: bug: virtual generated column can be partition key
Sorry I missed this email while sending the patches - our emails crossed in the air. On Tue, Apr 22, 2025 at 2:15 PM jian he wrote: > On Tue, Apr 22, 2025 at 3:02 PM jian he > wrote: > > Other than that, it looks good to me for fixing this bug. > > The error message seems not that intuitive. > > +CREATE TABLE gtest_part_key (f1 date NOT NULL, f2 bigint, f3 bigint > GENERATED ALWAYS AS (f2 * 2) VIRTUAL) PARTITION BY RANGE > ((gtest_part_key)); > +ERROR: cannot use generated column in partition key > +LINE 1: ...D ALWAYS AS (f2 * 2) VIRTUAL) PARTITION BY RANGE ((gtest_par... > + ^ > +DETAIL: Column "f3" is a generated column. > Yes. It's not apparent where does f3 appear in the partition key, to a lay users. Users who explicitly use whole row expression in a partition key, would know that a whole row expression contains all columns. And the error location points to the whole-row reference. So the current state isn't that bad. > > with the attached patch. now, > +CREATE TABLE gtest_part_key (f1 date NOT NULL, f2 bigint, f3 bigint > GENERATED ALWAYS AS (f2 * 2) VIRTUAL) PARTITION BY RANGE > ((gtest_part_key)); > ERROR: cannot use generated column in partition key > LINE 1: ...D ALWAYS AS (f2 * 2) VIRTUAL) PARTITION BY RANGE ((gtest_par... > ^ > DETAIL: Generated column "f3" is part of the partition key of > relation "gtest_part_key" > > The DETAIL here is just mentioning what's already known from the command, so the change in the DETAIL may not be useful. If I understand your intention correctly, we have to mention something to the effect "partition key expression contains a whole-row reference which in turn contains a generated column." But that might be too verbose. -- Best Wishes, Ashutosh Bapat
amcheck support for BRIN indexes
Hi, It's not for v18, just wanted to share with the community and register it in the upcoming Commitfest 2025-07. Here is the patch with amcheck support for BRIN indexes. Patch uses amcheck common infrastructure that was introduced in [1]. It works and I deleted all the code that I copied from btree check at first. Great. During the check we hold ShareUpdateExclusiveLock, so we don't block regular reads/inserts/updates and the same time range summarizations/desummarizations are impossible during the check which simplifies check logic. While checking we do ereport(ERROR) on the first issue, the same way as btree, gin checks do. There are two parts: First part is 'index structure check': 1) Meta page check 2) Revmap check. Walk revmap and check every valid revmap item points to the index tuple with the expected range blkno, and index tuple is consistent with the tuple description. Also it's not documented, but known from the brin code that for every empty range we should have allnulls = true, hasnulls = false. So this is also checked here. 3) Regular pages check. Walk regular pages and check that every index tuple has a corresponding revmap item that points to it. We don't check index tuple structures here, because it was already done in 2 steps. Regular pages check is optional. Orphan index tuple errors in this check doesn't necessary mean that index is corrupted, but AFAIS brin is not supposed to have such orphan index tuples now, so if we encounter one than probably something wrong with the index. Second part is 'all consistent check': We check all heap tuples are consistent with the index. It's the most expensive part and it's optional. Here we call consistent functions for every heap tuple. Also here we check that fields like 'has_nulls', 'all_nulls', 'empty_range' are consistent with what we see in the heap. There are two patch files: 0001-brin-refactoring.patch It's just two tiny changes in the brin code. 1) For index tuple structure check we need to know how on-disk index tuples look like. Function that lets you get it 'brtuple_disk_tupdesc' is internal. This patch makes it extern. 2) Create macros for BRIN_MAX_PAGES_PER_RANGE. For the meta page check. 0002-amechek-brin-support.patch It's check functionality itself + regression tests + amcheck extension updates. Some open questions: 1) How to get the correct strategy number for the scan key in "all heap consistent" check. The consistent function wants a scan key, and to generate it we have to pick a strategy number. We can't just use the same strategy number for all indexes because its meaning differs from opclass to opclass (for example equal strategy number is 1 in Bloom and 3 in min_max). We also can't pick an operator and use it for every check, because opclasses don't have any requirements about what operators they should support. The solution was to let user to define check operator (parameter consistent_operator_names). It's an array as we can have a multicolumn index. We can use '=' as default check operator, because AFAIS '=' operator is supported by all core opclasses except 'box_inclusion_ops', and IMHO it's the most obvious candidate for such a check. So if param 'consistent_operator_names' is an empty array (param default value), then for all attributes we use operator '='. In most cases operator '=' does the job and you don't need to worry about consistent_operator_names param. Not sure about it, what do you think? 2) The current implementation of "all heap consistent" uses the scan of the entire heap. So if we have a lot of unsummarized ranges, we do a lot of wasted work because we can't use the tuples that belong to the unsummarized ranges. Instead of having one scan for the entire heap, we can walk the revmap, take only the summarized ranges, and scan only the pages that belong to those ranges. So we have one scan per range. This approach helps us avoid touching those heap tuples that we can't use for index checks. But I'm not sure if we should to worry about that because every autovacuum summarizes all the unsummarized ranges, so don't expect a lot of unsummarized ranges on average. TODO list: 1) add TAP tests 2) update SGML docs for amcheck (think it's better to do after patch is reviewed and more or less finalized) 3) pg_amcheck integration Big thanks to Tomas Vondra for the first patch idea and initial review. [1] https://www.postgresql.org/message-id/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru Best regards, Arseniy Mukhin From 7de60e8da824a97327a5c28ccd9d8a384c07b997 Mon Sep 17 00:00:00 2001 From: Arseniy Mukhin Date: Tue, 22 Apr 2025 11:00:36 +0300 Subject: [PATCH 2/2] amcheck - brin support --- contrib/amcheck/Makefile|5 +- contrib/amcheck/amcheck--1.5--1.6.sql | 20 + contrib/amcheck/amcheck.control |2 +- contrib/amcheck/expected/check_brin.out | 134 +++ contrib/amcheck/meson.build |3 + contrib/amcheck/sql/check_brin.sql | 10
Re: Check for tuplestorestate nullness before dereferencing
В письме от пятница, 18 октября 2024 г. 02:28:22 MSK пользователь David Rowley написал: > It would be good to know if the optimisation added in d2c555ee5 ever > applies with today's code. If it does apply, we should likely add a > test case for it and if it never does, then we should just remove the > optimisation and always create the tuplestore when it's NULL. That's sounds reasonable. It looks like that removing "node->eflags != 0" check is more logical then adding not null check. > If it does apply, we should likely add a test case for it Do you have any idea how to test it? -- Nikolay Shaplov aka Nataraj Fuzzing Engineer at Postgres Professional Matrix IM: @dhyan:nataraj.su signature.asc Description: This is a digitally signed message part.
Re: [PATCH] Check for TupleTableSlot nullness before dereferencing
В письме от пятница, 13 декабря 2024 г. 11:54:35 MSK пользователь Alexander Kuznetsov написал: > Hello, > > ping. What do you think about reasoning below? Maybe we should consider > proposing different patch for removing redundant check there? Hi! Please, pay attention that commitfest entry for this patch https://commitfest.postgresql.org/patch/5662/ reports problems with windows build. There is a small chance that this is flase alarm, sometimes checkers fails for their own reason. But most probably this is persistent error, and if it is, this problem should be researched first of all, and fixed. Only after that there there can be any discussion if this null-related problem should be fixed or not. > > 09.10.2024 18:23, Alexander Kuznetsov wrote: > > 03.10.2024 12:48, Daniel Gustafsson wrote: > >> From a quick reading we can only reach there after evaluating an > >> expression, so can it really be null though? This code hasn't changed > >> all that much since 2009, if there was a reachable segfault on a null > >> pointer deref I have a feeling we'd heard about it by now so some extra > >> care seems warranted to ensure it's not a static analyzer false > >> positive. > > > > Thanks for your response! > > It seems to me that dereferencing is possible under the following > > scenario: > > [...] > > This entire reasoning is based on the assumption that slot2 can > > theoretically be NULL, as there is such a check at line 968. Is it > > possible that no errors have occurred because this condition has always > > been satisfied and is, perhaps, redundant, or maybe I'm misunderstanding > > something? -- Nikolay Shaplov aka Nataraj Fuzzing Engineer at Postgres Professional Matrix IM: @dhyan:nataraj.su signature.asc Description: This is a digitally signed message part.
Re: jsonapi: scary new warnings with LTO enabled
> On 21 Apr 2025, at 20:58, Jacob Champion > wrote: > Personally, I'm fine with can't-fail APIs, as long as they're > documented as such. (I think the deferred error API was probably > chosen so that src/common JSON clients could be written without a lot > of pain?) My preference is that no operation can silently work on a failed object, but it's not a hill (even more so given where we are in the cycle). The attached v3 allocates via the JSON api, no specific error handling should be required as it's already handled today. -- Daniel Gustafsson v3-0001-Allocate-JsonLexContexts-on-the-heap-to-avoid-war.patch Description: Binary data
Re: pgsql: Add function to get memory context stats for processes
> On 17 Apr 2025, at 16:42, Robert Haas wrote: > > On Tue, Apr 15, 2025 at 6:11 AM Andres Freund wrote: >> There very well could be a CFI - but it better be somewhere where the >> in-memory state is consistent. Otherwise an error inside raised in the CFI >> would lead the in-memory state inconsistent which then would cause problems >> when cleaning up the dsa during resowner release or process exit. >> >> What am I missing here? > > I think maybe you're only thinking about gathering the data. What > about publishing it? If the DSA code were interrupted at a CFI and the > interrupting code went and tried to perform a DSA allocation to store > the resulting data and then returned to the interrupted DSA operation, > would you expect the code to cope with that? I do not believe we have > anywhere enough guarantees about reentrancy for that to be safe. Do you mean that an interrupt handler will make DSA allocations? I don't think that would be something we'd want to allow regardless of this patch. Or did I misunderstand the above? -- Daniel Gustafsson
disallow alter individual column if partition key contains wholerow reference
hi. while reviewing disallow generated column as partition key in [1], I found out this bug. demo: drop table if exists t4; CREATE TABLE t4(f1 int, f2 bigint) PARTITION BY list ((t4)); create table t4_1 partition of t4 for values in ((1,2)); alter table t4 alter column f2 set data type text using f2; insert into t4 select 1, '2'; ERROR: invalid memory alloc request size 18446744073709551615 turns out the fix seems pretty simple, mainly on has_partition_attrs. has_partition_attrs is used to Checks if any of the 'attnums' is a partition key attribute for rel. if partition keys have column references, then has_partition_attrs should return true. [1]: https://postgr.es/m/CACJufxF=wdgthxsaqr9thyusfx_1_t9e6n8te3b8eqxcvov...@mail.gmail.com From 7a4c9bc1cb65c3aedc92a4bf31352ba19f1135b9 Mon Sep 17 00:00:00 2001 From: jian he Date: Tue, 22 Apr 2025 19:37:24 +0800 Subject: [PATCH v1 1/1] fix wholerow as partition key reference. If the partition key contains wholerow reference, individual columns cannot be altered. discussion: https://postgr.es/m/ --- src/backend/catalog/partition.c | 12 src/test/regress/expected/alter_table.out | 14 ++ src/test/regress/sql/alter_table.sql | 7 +++ 3 files changed, 33 insertions(+) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 93d72157a46..e0ae3d2abe1 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -290,6 +290,18 @@ has_partition_attrs(Relation rel, Bitmapset *attnums, bool *used_in_expr) /* Find all attributes referenced */ pull_varattnos(expr, 1, &expr_attrs); + + /* + * If this is a wholerow reference, then assume unconditionally that + * 'attnums' is included as part of the partition key attributes. + */ + if (bms_is_member(0 - FirstLowInvalidHeapAttributeNumber, expr_attrs)) + { +if (used_in_expr) + *used_in_expr = true; +return true; + } + partexprs_item = lnext(partexprs, partexprs_item); if (bms_overlap(attnums, expr_attrs)) diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out index 476266e3f4b..fef20e5ae7c 100644 --- a/src/test/regress/expected/alter_table.out +++ b/src/test/regress/expected/alter_table.out @@ -3984,6 +3984,20 @@ LINE 1: ALTER TABLE partitioned ALTER COLUMN b TYPE char(5); ALTER TABLE partitioned SET (fillfactor=100); ERROR: cannot specify storage parameters for a partitioned table HINT: Specify storage parameters for its leaf partitions instead. +-- partition expression is whole row +CREATE TABLE partitioned1(a int, b int) PARTITION BY list((partitioned1)); +ALTER TABLE partitioned1 DROP COLUMN a; --error +ERROR: cannot drop column "a" because it is part of the partition key of relation "partitioned1" +ALTER TABLE partitioned1 DROP COLUMN b; --error +ERROR: cannot drop column "b" because it is part of the partition key of relation "partitioned1" +ALTER TABLE partitioned1 ALTER COLUMN a TYPE text; --error +ERROR: cannot alter column "a" because it is part of the partition key of relation "partitioned1" +LINE 1: ALTER TABLE partitioned1 ALTER COLUMN a TYPE text; + ^ +ALTER TABLE partitioned1 ALTER COLUMN b TYPE text; --error +ERROR: cannot alter column "b" because it is part of the partition key of relation "partitioned1" +LINE 1: ALTER TABLE partitioned1 ALTER COLUMN b TYPE text; + ^ -- partitioned table cannot participate in regular inheritance CREATE TABLE nonpartitioned ( a int, diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql index 5ce9d1e429f..f5745f448ae 100644 --- a/src/test/regress/sql/alter_table.sql +++ b/src/test/regress/sql/alter_table.sql @@ -2390,6 +2390,13 @@ ALTER TABLE partitioned ALTER COLUMN b TYPE char(5); -- specifying storage parameters for partitioned tables is not supported ALTER TABLE partitioned SET (fillfactor=100); +-- partition expression is whole row +CREATE TABLE partitioned1(a int, b int) PARTITION BY list((partitioned1)); +ALTER TABLE partitioned1 DROP COLUMN a; --error +ALTER TABLE partitioned1 DROP COLUMN b; --error +ALTER TABLE partitioned1 ALTER COLUMN a TYPE text; --error +ALTER TABLE partitioned1 ALTER COLUMN b TYPE text; --error + -- partitioned table cannot participate in regular inheritance CREATE TABLE nonpartitioned ( a int, -- 2.34.1
Re: Large expressions in indexes can't be stored (non-TOASTable)
On Mon, Apr 21, 2025 at 9:26 PM Nathan Bossart wrote: > > Apparently replication origins in tests are supposed to be prefixed with > "regress_". Here is a new patch with that fixed. > Hi Nathan, Thanks for the patch! I tested it for functionality and here are a few comments: 1) While testing, I noticed an unexpected behavior with the new 512 byte length restriction in place. Since there’s no explicit restriction on the column roname’s type, it’s possible to insert an origin name exceeding 512 bytes. For instance, can use the COPY command -- similar to how pg_dump might dump and restore catalog tables. For example: postgres=# COPY pg_catalog.pg_replication_origin (roident, roname) FROM stdin; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 44 regress_... [>512 bytes string] >> \. COPY 1 Once inserted, I was able to use the pg_replication_origin_xxx functions with this origin name without any errors. Not sure how common pg_dump/restore of catalog tables is in real systems, but should we still consider if this behavior is acceptable? ~~~ Below are a few cosmetic suggestions you might consider. 2) Creates a replication origin with the given external name, and returns the internal ID assigned to it. +The name must be no longer than 512 bytes. Would it make sense to make the phrasing a bit more formal, perhaps something like: “The name must not exceed 512 bytes.”? 3) For the code comments - + /* + * To avoid needing a TOAST table for pg_replication_origin, we restrict + * replication origin names to 512 bytes. This should be more than enough + * for all practical use. + */ + if (strlen(roname) > MAX_RONAME_LEN) a) /bytes. This/bytes. This/ (there is an extra space before “This”) b) /all practical use/all practical uses/ c) Both (a) and (b) above, also apply to the comment above the macro “MAX_RONAME_LEN”. 4) The "Detail" part of the error message could be made a bit more formal and precise - Current: DETAIL: Replication origin names must be no longer than 512 bytes. Suggestion: DETAIL: Replication origin names must not exceed 512 bytes. 5) As Euler pointed out, logical replication already has a built-in restriction for internally assigned origin names, limiting them to NAMEDATALEN. Given that, only the SQL function pg_replication_origin_create() is capable of creating longer origin names. Would it make sense to consider moving the check into pg_replication_origin_create() itself, since it seems redundant for all other callers? That said, if there's a possibility of future callers needing this restriction at a lower level, it may be reasonable to leave it as is. -- Thanks, Nisha
Re: bug: virtual generated column can be partition key
On Tue, Apr 22, 2025 at 3:02 PM jian he wrote: > Other than that, it looks good to me for fixing this bug. The error message seems not that intuitive. +CREATE TABLE gtest_part_key (f1 date NOT NULL, f2 bigint, f3 bigint GENERATED ALWAYS AS (f2 * 2) VIRTUAL) PARTITION BY RANGE ((gtest_part_key)); +ERROR: cannot use generated column in partition key +LINE 1: ...D ALWAYS AS (f2 * 2) VIRTUAL) PARTITION BY RANGE ((gtest_par... + ^ +DETAIL: Column "f3" is a generated column. with the attached patch. now, +CREATE TABLE gtest_part_key (f1 date NOT NULL, f2 bigint, f3 bigint GENERATED ALWAYS AS (f2 * 2) VIRTUAL) PARTITION BY RANGE ((gtest_part_key)); ERROR: cannot use generated column in partition key LINE 1: ...D ALWAYS AS (f2 * 2) VIRTUAL) PARTITION BY RANGE ((gtest_par... ^ DETAIL: Generated column "f3" is part of the partition key of relation "gtest_part_key" what do you think? diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 42610f50b0b..a2908fdeef9 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -19770,6 +19770,7 @@ ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNu char partattname[16]; Bitmapset *expr_attrs = NULL; int i; + bool whole_row = false; Assert(expr != NULL); atttype = exprType(expr); @@ -19799,7 +19800,9 @@ ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNu * the partitioned table. */ pull_varattnos(expr, 1, &expr_attrs); - if (bms_is_member(0 - FirstLowInvalidHeapAttributeNumber, expr_attrs)) + whole_row = bms_is_member(0 - FirstLowInvalidHeapAttributeNumber, expr_attrs); + + if (whole_row) { expr_attrs = bms_add_range(expr_attrs, 1 - FirstLowInvalidHeapAttributeNumber, @@ -19839,12 +19842,23 @@ ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNu * used in partition keys). Seems safer to prohibit for now. */ if (TupleDescAttr(RelationGetDescr(rel), attno - 1)->attgenerated) - ereport(ERROR, - (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), - errmsg("cannot use generated column in partition key"), - errdetail("Column \"%s\" is a generated column.", - get_attname(RelationGetRelid(rel), attno, false)), - parser_errposition(pstate, pelem->location))); +{ + if (!whole_row) + ereport(ERROR, +errcode(ERRCODE_INVALID_OBJECT_DEFINITION), +errmsg("cannot use generated column in partition key"), +errdetail("Column \"%s\" is a generated column.", + get_attname(RelationGetRelid(rel), attno, false)), +parser_errposition(pstate, pelem->location)); + else + ereport(ERROR, +errcode(ERRCODE_INVALID_OBJECT_DEFINITION), +errmsg("cannot use generated column in partition key"), +errdetail("Generated column \"%s\" is part of the partition key of relation \"%s\"", + get_attname(RelationGetRelid(rel), attno, false), + RelationGetRelationName(rel)), +parser_errposition(pstate, pelem->location)); +} } if (IsA(expr, Var) &&
Re: bug: virtual generated column can be partition key
Thanks Jian for your review. On Tue, Apr 22, 2025 at 12:32 PM jian he wrote: > On Tue, Apr 22, 2025 at 11:45 AM Ashutosh Bapat > wrote: > > > > I have included your original tests, but ended up rewriting code. Please > let me know what do you think. > > > > + if (attno < 0) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), > + errmsg("partition key expressions cannot contain system column > references"))); > + > + /* > + * We do not check dropped columns explicitly since they will > + * be eliminated when expanding the the whole row expression > + * anyway. > + */ > typo: "the the". > I am confused by the above comments. > ComputePartitionAttrs only called in DefineRelation. > DefineRelation will only CREATE a table, there will be no dropped > column via DefineRelation. > You are right! Fixed. > > > + /* > + * transformPartitionSpec() should have already rejected > + * subqueries, aggregates, window functions, and SRFs, based > + * on the EXPR_KIND_ for partition expressions. > + */ > "EXPR_KIND_" can change to EXPR_KIND_PARTITION_EXPRESSION > ? > That's an existing comment which appears to be moved in diff but it's actually untouched by the patch. Maybe you are right, IDK, but since it's not related to the fix, let's leave it untouched. I did wonder whether that comment has any relation to the pull_varattnos call which has been moved earlier since pull_varattnos doesn't expect any Query node in the tree. But pondering more, I think the comment is related to the number of rows participating in the partition key computation - there should be exactly one key for one tuple. Having SRFs, subqueries in part expression means a possibility of producing more than one set of partition keys, aggregates and window functions OTOH will produce one partition key for more than one row - all of them breaking 1:1 mapping between a tuple and a partition key. Hence I think the comment is at the right place. PFA revised patch. -- Best Wishes, Ashutosh Bapat From 5467147f6b37898263c78ce64b086072c76caaad Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat Date: Mon, 21 Apr 2025 18:06:58 +0530 Subject: [PATCH] Tighten check for generated column in partition key expression A generated column may end up being part of the partition key expression, if it's specified as an expression e.g. "()" or if the partition key expression contains a whole-row reference, even though we do not allow a generated column to be part of partition key expression. Fix this hole. Discussion: https://postgr.es/m/CACJufxF=wdgthxsaqr9thyusfx_1_t9e6n8te3b8eqxcvov...@mail.gmail.com --- src/backend/commands/tablecmds.c | 90 ++- .../regress/expected/generated_stored.out | 15 .../regress/expected/generated_virtual.out| 15 src/test/regress/sql/generated_stored.sql | 3 + src/test/regress/sql/generated_virtual.sql| 3 + 5 files changed, 86 insertions(+), 40 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 265b1c397fb..f7b7162a99c 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -19768,6 +19768,8 @@ ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNu /* Expression */ Node *expr = pelem->expr; char partattname[16]; + Bitmapset *expr_attrs = NULL; + int i; Assert(expr != NULL); atttype = exprType(expr); @@ -19791,43 +19793,36 @@ ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNu while (IsA(expr, CollateExpr)) expr = (Node *) ((CollateExpr *) expr)->arg; - if (IsA(expr, Var) && -((Var *) expr)->varattno > 0) + /* + * Examine all the columns in the partition key expression. When + * the whole-row reference is present, examine all the columns of + * the partitioned table. + */ + pull_varattnos(expr, 1, &expr_attrs); + if (bms_is_member(0 - FirstLowInvalidHeapAttributeNumber, expr_attrs)) { -/* - * User wrote "(column)" or "(column COLLATE something)". - * Treat it like simple attribute anyway. - */ -partattrs[attn] = ((Var *) expr)->varattno; +expr_attrs = bms_add_range(expr_attrs, + 1 - FirstLowInvalidHeapAttributeNumber, + RelationGetNumberOfAttributes(rel) - FirstLowInvalidHeapAttributeNumber); +expr_attrs = bms_del_member(expr_attrs, 0 - FirstLowInvalidHeapAttributeNumber); } - else - { -Bitmapset *expr_attrs = NULL; -int i; -partattrs[attn] = 0; /* marks the column as expression */ -*partexprs = lappend(*partexprs, expr); + i = -1; + while ((i = bms_next_member(expr_attrs, i)) >= 0) + { +AttrNumber attno = i + FirstLowInvalidHeapAttributeNumber; -/* - * transformPartitionSpec() should have already rejected - * subqueries, aggregates, window functions, and SRFs, based - * on the EXPR_KIND_ for partition expressions. - */ +Assert(attno != 0);
Re: Fix slot synchronization with two_phase decoding enabled
On Fri, Apr 11, 2025 at 1:03 PM Nisha Moond wrote: > > > Patch "v5_aprch_3-0001" implements the above Approach 3. > > Thanks Hou-san for implementing approach-2 and providing the patch. > I find Approach 2 better than Approach1. Yet to review Approach3. Please find my initial comments: Approach#1: 1) When slot is created with failover enabled and later we try to create a subscription using that pre-created slot with two-phase enabled, it does not error out. But it silently retains two_phase of the existing slot as false. Please check if an error is possible in this case too. Approach #2: 1) + ReorderBufferSkipPrepare(reorder, parsed.twophase_xid); In xact_decode, why do we have a new call to ReorderBufferSkipPrepare, can you please add some comments here? 2) + ereport(ERROR, + errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("\"%s\" and \"%s\" are mutually exclusive options when \"%s\" is specified", So like approach 1, here as well we disallow creating subscriptions with both failover and two_phase enabled when create_slot and copy_data is true? But users can alter the sub later to allow failover for two-phase enabled slot provided there are no pending PREPARE txns which are not yet consumed by the subscriber. Is my understanding correct? Can you please tell all the ways using which the user can enable both failover and two_phase together? The patch msg is not that clear. It will be good to add such details in patch message. 3) + /* + * Do not allow users to enable the failover and two_phase options together + * when create_slot is specified. + * + * See comments atop the similar check in DecodingContextFindStartpoint() + * for a detailed reason. + */ + if (!opts.create_slot && opts.twophase && opts.failover) + ereport(ERROR, + errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("\"%s\" and \"%s\" are mutually exclusive options when \"%s\" is specified", + Is the check '!opts.create_slot' correct? The error msg and comment says otherwise. thanks Shveta
Re: Fix slot synchronization with two_phase decoding enabled
On Mon, Apr 21, 2025 at 8:44 AM Zhijie Hou (Fujitsu) wrote: > > On Sat, Apr 19, 2025 at 2:19 AM Masahiko Sawada wrote: > > > > On Tue, Apr 8, 2025 at 10:14 PM Zhijie Hou (Fujitsu) > > wrote: > > > > > > > > > -- > > > Approach 2 > > > -- > > > > > > Instead of disallowing the use of two-phase and failover together, a more > > > flexible strategy could be only restrict failover for slots with two-phase > > > enabled when there's a possibility of existing prepared transactions > > > before > > the > > > two_phase_at that are not yet replicated. During slot creation with > > two-phase > > > and failover, we could check for any decoded prepared transactions when > > > determining the decoding start point (DecodingContextFindStartpoint). For > > > subsequent attempts to alter failover to true, we ensure that > > > two_phase_at is > > > less than restart_lsn, indicating that all prepared transactions have been > > > committed and replicated, thus the bug would not happen. > > > > > > pros: > > > > > > This method minimizes restrictions for users. Especially during slot > > > creation > > > with (two_phase=on, failover=on), as it’s uncommon for transactions to > > prepare > > > during consistent snapshot creation, the restriction becomes almost > > > unnoticeable. > > > > I think this approach can work for the transactions that are prepared > > while the slot is created. But if I understand the problem correctly, > > while the initial table sync is performing, the slot's two_phase is > > still false, so we need to deal with the transactions that are > > prepared during the initial table sync too. What do you think? > > > > Yes, I agree that we need to restrict this case too. Given that we haven't > started decoding when setting two_phase=true during CreateDecodingContext() > after tablesync, we could check prepared transactions afterwards during > decoding. This could involve reporting an ERROR when skipping a prepared > transaction during decoding if its prepare LSN is less than two_phase_at. > It will make it difficult for users to detect it as this happens at a later point of time. > Alternatively, a simpler method would be to prevent this situation entirely > during the CREATE SUBSCRIPTION command. For example, we could restrict slots > created with failover set to true and twophase is later modified to true after > tablesync. Although the simpler check is more user-visible, it may offer less > flexibility. > I agree with your point, but OTOH, I am also afraid of adding too many smart checks in the back-branch. If we follow what you say here, then users have the following ways in PG17 to enable both failover and two_phase. (a) During Create Subscription, users can set both 'failover' and 'two_phase', if 'copy_data' is false, or (b), if 'copy_data' is true, during Create Subscription, then users can enable 'two_phase' and wait for it to be enabled. Then use Alter Subscription to set 'failover'. -- With Regards, Amit Kapila.
Re: [PoC] Federated Authn/z with OAUTHBEARER
> On 22 Apr 2025, at 01:19, Jacob Champion > wrote: > v8 also makes the following changes: Thanks for this version, a few small comments: + if oauth_flow_supported +cdata.set('USE_LIBCURL', 1) + elif libcurlopt.enabled() +error('client OAuth is not supported on this platform') + endif We already know that libcurlopt.enabled() is true here so maybe just doing if-else-endif would make it more readable and save readers thinking it might have changed? Also, "client OAuth" reads a bit strange, how about "client-side OAuth" or "OAuth flow module"? - appendPQExpBufferStr(&conn->errorMessage, -libpq_gettext(actx->errctx)); - appendPQExpBufferStr(&conn->errorMessage, ": "); + appendPQExpBufferStr(errbuf, libpq_gettext(actx->errctx)); + appendPQExpBufferStr(errbuf, ": "); I think we should take this opportunity to turn this into a appendPQExpBuffer() with a format string instead of two calls. + len = errbuf->len; + if (len >= 2 && errbuf->data[len - 2] == '\n') Now that the actual variable, errbuf->len, is short and very descriptive I wonder if we shouldn't just use this as it makes the code even clearer IMO. -- Daniel Gustafsson
Re: bug: virtual generated column can be partition key
On Tue, Apr 22, 2025 at 11:45 AM Ashutosh Bapat wrote: > > I have included your original tests, but ended up rewriting code. Please let > me know what do you think. > + if (attno < 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("partition key expressions cannot contain system column references"))); + + /* + * We do not check dropped columns explicitly since they will + * be eliminated when expanding the the whole row expression + * anyway. + */ typo: "the the". I am confused by the above comments. ComputePartitionAttrs only called in DefineRelation. DefineRelation will only CREATE a table, there will be no dropped column via DefineRelation. + /* + * transformPartitionSpec() should have already rejected + * subqueries, aggregates, window functions, and SRFs, based + * on the EXPR_KIND_ for partition expressions. + */ "EXPR_KIND_" can change to EXPR_KIND_PARTITION_EXPRESSION ? Other than that, it looks good to me for fixing this bug.
Fix premature xmin advancement during fast forward decoding
Hi, When analyzing some issues in another thread[1], I found a bug that fast forward decoding could lead to premature advancement of catalog_xmin, resulting in required catalog data being removed by vacuum. The issue arises because we do not build a base snapshot when decoding changes during fast forward mode, preventing reference to the minimum transaction ID that remains visible in the snapshot when determining the candidate for catalog_xmin. As a result, catalog_xmin was directly advanced to the oldest running transaction ID found in the latest running_xacts record. In code-level, SnapBuildProcessChange -> ReorderBufferSetBaseSnapshot could not be reached during fast forward decoding, resulting in rb->txns_by_base_snapshot_lsn being NULL. When advancing catalog_xmin, the system attempts to refer to rb->txns_by_base_snapshot_lsn via ReorderBufferGetOldestXmin(). However, since rb->txns_by_base_snapshot_lsn is NULL, it defaults to directly using running->oldestRunningXid as the candidate for catalog_xmin. For reference, see the details in SnapBuildProcessRunningXacts. See the attachment for a test(0002) to prove that the catalog data that are still required would be removed after premature catalog_xmin advancement during fast forward decoding. To fix this, I think we can allow the base snapshot to be built during fast forward decoding, as implemented in the patch 0001 (We already built base snapshot in fast-forward mode for logical message in logicalmsg_decode()). I also explored the possibility of further optimizing the fix to reduce the overhead associated with building a snapshot in fast-forward mode. E.g., to maintain a single base_snapshot_xmin rather than generating a full snapshot for each transaction, and use this base_snapshot_xmin when determining the candidate catalog_xmin. However, I am not feeling confident about maintaining some fast-forward dedicated fields in common structures and perhaps employing different handling for catalog_xmin. Moreover, I conducted a basic test[2] to test the patch's impact, noting that advancing the slot incurs roughly a 4% increase in processing time after applying the patch, which appears to be acceptable. Additionally, the cost associated with building the snapshot via SnapBuildBuildSnapshot() did not show up in the profile. Therefore, I think it's reasonable to refrain from further optimization at this stage. [1] https://www.postgresql.org/message-id/OS3PR01MB5718ED3F0123737C7380BBAD94BB2%40OS3PR01MB5718.jpnprd01.prod.outlook.com [2] Create a table test(a int); Create a slot A. pgbench postgres -T 60 -j 100 -c 100 -f simple-insert simple-insert: BEGIN; INSERT INTO test VALUES(1); COMMIT; use pg_replication_slot_advance to advance the slot A to the latest position and count the time. HEAD: Time: 4189.257 ms PATCH: Time: 4371.358 ms Best Regards, Hou zj v1-0002-Add-a-test.patch Description: v1-0002-Add-a-test.patch v1-0001-Fix-premature-xmin-advancement-during-fast-forwar.patch Description: v1-0001-Fix-premature-xmin-advancement-during-fast-forwar.patch
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
On 4/21/25 07:46, Frédéric Yhuel wrote: On 4/20/25 00:42, Sami Imseih wrote: (In my testing, the "temporary file:" message comes out without any attached STATEMENT most of the time already, so this isn't losing much as far as that's concerned.) Indeed, this happens when using autocommit / implicit transactions. Actually, this also happens with Java-style cursors, i.e. using the setFetchSize() method, which pgJDBC converts to using named portals and EXECUTE protocol messages. The explanation is probably very similar to the one Sami gave for the implicit transaction case. In any case, my v3 patch seems to fix all these cases. (I'm not saying it's good enough to be committed as is. I think I should at least add some comments. Anything else?)
Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding
Hi Ajin, Some review comments for patch v17-0001 == Commit message 1. Additionally, transactions containing WAL records (INTERNAL_SNAPSHOT, COMMAND_ID, or INVALIDATION) that modify the historical snapshot constructed during logical decoding are deemed unfilterable. This is necessary because constructing a correct historical snapshot for searching publication information requires processing these WAL record. ~ /these WAL record./these WAL records./ == src/backend/replication/logical/reorderbuffer.c ReorderBufferFilterByRelFileLocator: 2. + if (found) + { + rb->try_to_filter_change = entry->filterable; + return entry->filterable; + } + Bad indentation. == src/include/replication/reorderbuffer.h 3. +extern bool ReorderBufferCanFilterChanges(ReorderBuffer *rb); + Why is this extern here? This function is not implemented in patch 0001. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Proposal: Adding compression of temporary files
Since the patch was prepared months ago, it needs to be rebased. -Filip- ne 13. 4. 2025 v 21:53 odesílatel Dmitry Dolgov <9erthali...@gmail.com> napsal: > > On Fri, Mar 28, 2025 at 09:23:13AM GMT, Filip Janus wrote: > > > +else > > > +if (nbytesOriginal - file->pos != 0) > > > +/* curOffset must be corrected also if compression is > > > + * enabled, nbytes was changed by compression but we > > > + * have to use the original value of nbytes > > > + */ > > > +file->curOffset-=bytestowrite; > > > > > > It's not something introduced by the compression patch - the first part > > > is what we used to do before. But I find it a bit confusing - isn't it > > > mixing the correction of "logical file position" adjustment we did > > > before, and also the adjustment possibly needed due to compression? > > > > > > In fact, isn't it going to fail if the code gets multiple loops in > > > > > > while (wpos < file->nbytes) > > > { > > >... > > > } > > > > > > because bytestowrite will be the value from the last loop? I haven't > > > tried, but I guess writing wide tuples (more than 8k) might fail. > > > > > > > I will definitely test it with larger tuples than 8K. > > > > Maybe I don't understand it correctly, > > the adjustment is performed in the case that file->nbytes and file->pos > > differ. > > So it must persist also if we are working with the compressed data, but > the > > problem is that data stored and compressed on disk has different sizes > than > > data incoming uncompressed ones, so what should be the correction value. > > By debugging, I realized that the correction should correspond to the > size > > of > > bytestowrite from the last iteration of the loop. > > I agree, this looks strange. If the idea is to set curOffset to its > original value + pos, and the original value was advanced multiple times > by bytestowrite, it seems incorrect to adjust it by bytestowrite, it > seems incorrect to adjust it only once. From what I see current tests do > not exercise a case where the while will get multiple loops, so it looks > fine. > > At the same time maybe I'm missing something, but how exactly such test > for 8k tuples and multiple loops in the while block should look like? > E.g. when I force a hash join on a table with a single wide text column, > the minimal tuple that is getting written to the temporary file still > has rather small length, I assume due to toasting. Is there some other > way to achieve that? > > 0002-Add-test-for-temporary-files-compression-this-commit.patch Description: Binary data 0001-This-commit-adds-support-for-temporary-files-compres.patch Description: Binary data
Allow to collect statistics on virtual generated columns
Hi hackers, Hi hackers, Now we can create a table with a virtual generated column, but when a condition in WHERE clause contains virtual generated column, estimated rows are not correct since no statistics on this is collectef. [Ex.1] test=# CREATE TABLE t (i int, v int GENERATED ALWAYS AS (i+1) VIRTUAL); CREATE TABLE test=# INSERT INTO t SELECT generate_series(1,1000); INSERT 0 1000 test=# INSERT INTO t SELECT 1 FROM generate_series(1,1000); INSERT 0 1000 test=# EXPLAIN ANALYZE SELECT * FROM t WHERE v = 2; QUERY PLAN -- Seq Scan on t (cost=0.00..36.02 rows=9 width=8) (actual time=0.093..3.059 rows=1001.00 loops=1) Filter: ((i + 1) = 2) Rows Removed by Filter: 999 Buffers: shared hit=9 Planning: Buffers: shared hit=26 Planning Time: 1.142 ms Execution Time: 3.434 ms (8 rows) Therefore, I would like to allow to collect statistics on virtual enerated columns. I think there are at least three approaches for this. (1) Allow the normal ANALYZE to collect statistics on virtual generated columns ANALYZE expands virtual generated columns' expression, and collects statistics on evaluated values. In this approach, the statistics on virtual generated columns are collected in default, but ANALYZE on the table would become a bit expensive. (2) Allow to create an index on virtual generated column This is proposed in [1]. This proposal itself would be useful, I believe it is better to provide a way to collect statistics without cost of creating an index. [1] https://www.postgresql.org/message-id/flat/CACJufxGao-cypdNhifHAdt8jHfK6-HX=trbovbkgruxw063...@mail.gmail.com (3) Allow to create extended statistics on virtual generated columns In this approach, ANALYZE processes virtual generated columns only if corresponding extended statistics are defined. Although we can create extended statistics on expressions of virtual generated columns even in the current implementation, this enables that users to create a useful statistics this just by specifying a column name without specifying complex expression. I can also think of two variations for this approach. (3a) At the timing when an extended statistics is created, virtual generated columns are expanded, and the statistics is defined on this expression. (3b) At the timing when an extended statistics is created, virtual generated columns are NOT expanded. The statistics is defined on the virtual generated column itself and, the expression is expanded when ANALYZE processes the extended statistics. I've attached a draft patch based on (3a). However, if it is possible we could change the definition of generated columns in future (as proposed in [2]), (3b) might be preferred. [2] https://www.postgresql.org/message-id/flat/cacjufxh3vetr7orf5rw29gndk3n1wwboe3wdkhyd3ipgrq9...@mail.gmail.com Here is an example of how the patch works. [Ex.2] test=# CREATE STATISTICS exstat ON v FROM t; CREATE STATISTICS test=# ANALYZE t; ANALYZE test=# EXPLAIN ANALYZE SELECT * FROM t WHERE v = 2; QUERY PLAN - Seq Scan on t (cost=0.00..41.50 rows=1001 width=8) (actual time=0.067..2.422 rows=1001.00 loops=1) Filter: ((i + 1) = 2) Rows Removed by Filter: 999 Buffers: shared hit=9 Planning: Buffers: shared hit=14 Planning Time: 0.785 ms Execution Time: 2.744 ms (8 rows) What do you think of this? Which approach of (1), (3a), or (3b) is good? Or, completely different approach is better? With your feedback, I would like to progress or rework the patch. Regards, Yugo Nagata -- Yugo Nagata >From a6b0be714f6d4e4e0e7423f07432d3135c807a63 Mon Sep 17 00:00:00 2001 From: Yugo Nagata Date: Tue, 22 Apr 2025 17:03:50 +0900 Subject: [PATCH v1] Allow to create extended statistics on virtual generated columns --- src/backend/commands/statscmds.c | 86 +++- 1 file changed, 40 insertions(+), 46 deletions(-) diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c index e24d540cd45..9b7f27fec28 100644 --- a/src/backend/commands/statscmds.c +++ b/src/backend/commands/statscmds.c @@ -29,6 +29,7 @@ #include "miscadmin.h" #include "nodes/nodeFuncs.h" #include "optimizer/optimizer.h" +#include "rewrite/rewriteHandler.h" #include "statistics/statistics.h" #include "utils/acl.h" #include "utils/builtins.h" @@ -240,28 +241,27 @@ CreateStatistics(CreateStatsStmt *stmt) attname))); attForm = (Form_pg_attribute) GETSTRUCT(atttuple); - /* Disallow use of system attributes in extended stats */ - if (attForm->attnum <= 0) -ereport(ERROR, - (errcode(ERRCODE_FEATU
Re: What's our minimum supported Python version?
> On 19 Apr 2025, at 18:17, Tom Lane wrote: > The reason I bring this up is that I found out the hard way > that src/test/modules/oauth_validator fails on RHEL8, because > its oauth_server.py script is not compatible with the 3.6.8 > version of Python supplied by this distro. Do you have the error message/log for the failure handy? -- Daniel Gustafsson
Re: Cygwin support
Hi Andrew, > Last year the old Windows machine where I was running the buildfarm > member lorikeet died, and since then we've had no buildfarm coverage for > Cygwin. I now have a new (but slow) W11pro machine and I have been > testing out Cygwin builds on it. I wanted to have it running the TAP > tests, unlike lorikeet. Attached is a set of very small patches aimed at > enabling this. > > [...] Thanks for the patches. I wonder though if it is advisable to support Cygwin if this requires extra effort from our side, compared to a regular Linux or Windows environment. I actually like the project but I wouldn't recommend running Postgres on prod, and for development these days it's simpler to use WSL / VirtualBox / Raspberry Pi, or running Postgres natively on Windows. -- Best regards, Aleksander Alekseev
Re: Adding pg_dump flag for parallel export to pipes
If there are no objections we will add this to the commitfest On Mon, Apr 7, 2025 at 9:48 PM Hannu Krosing wrote: > > > Just to bring this out separately : Does anybody have any idea why pipe > commands close inside tests ? > > Re: 003-pg_dump_basic_tests has a few basic validation tests for > correctmflag combinations. We need to write more automated tests in > 002_pg_dump.pl but have been running into some issues with environment > setup due to which certain pipe commands result in the shell process > becoming defunct. These same commands are working fine in manual > testing. We are still looking into this. > > > Hannu > > > On Mon, Apr 7, 2025 at 7:17 PM Nitin Motiani wrote: >> >> Hi Hackers, >> >> We are proposing the ability to specify a pipe command to pg_dump by a >> flag. And attaching the patch set. >> >> Why : Currently it is quite simple to pipe the output of pg_dump for >> text format to a pipe at command line and do any manipulations >> necessary. Following is an example : >> >>pg_dump | lz4 | pv -L 10k | ssh remote.host >> "cat - > remote.dump.lz4" >> >> Here we first compress the stream using lz4 and then send it over ssh >> to a remote host to be saved as a file while rate-limiting the network >> usage to 10KB/s. >> >> Something like this is not possible for format=directory (-Fd) since >> all you can provide is the directory name to store the individual >> files. Note it is not possible to do this irrespective of the usage of >> the parallel dump option ('--jobs' flag). >> >> While the directory format supports compression using a flag, the rest >> of the operations in the above example are not possible. And a pipe >> command provides more flexibility in what compression algorithm one >> wants to use. >> >> This patch set provides pg_dump the ability to pipe the data in the >> directory mode by using a new flag '--pipe-command' (in both parallel >> and non-parallel mode). >> >> We also add a similar option to pg_restore. >> >> The following can be the major use cases of these changes : >> 1. Stream pg_dump output to a cloud storage >> 2. SSH the data to a remote host (with or without throttling) >> 3. Custom compression options >> >> >> Usage Examples : Here is an example of how the pipe-command will look like. >> >> pg_dump -Fd mydb --pipe-command="cat > dumpdir/%f" (dumpdir >> should exist beforehand.) >> >> This is equivalent to >> >> pg_dump -Fd mydb --file=dumpdir >> >> (Please note that the flags '--file' or '--pipe-command' can't be used >> together.) >> >> For the more complex scenario as mentioned above, the command will be >> (with the parallelism of 5) : >> >> pg_dump -Fd mydb -j 5 --pipe-command="lz4 | pv -L 10k | ssh >> remote.host "cat > dumpdir/%f"" >> >> Please note the use of %f in the above examples. As a user would >> almost always want to write the post-processing output to a file (or >> perhaps a cloud location), we provide a format specifier %f in the >> command. The implementation of pipe-command replaces these format >> specifiers with the corresponding file names. These file names are the >> same as they would be in the current usage of directory format with >> '--file' flag (.dat, toc.dat, blob_NNN.toc, >> blob_.dat). >> >> The usage of this flag with pg_restore will also be similar. Here is >> an example of restoring from a gzip compressed dump directory. >> >> pg_restore -C -Fd -d postgres --pipe-commnad="cat >> dumpdir/%f.gz | gunzip" >> >> The new flag in pg_restore also works with '-l' and '-L' options >> >> pg_restore -C -Fd -d postgres --pipe-commnad="cat dumpdir/%f" -L >> db.list >> >> >> Implementation Details : Here are the major changes : >>1. We reuse the same variables which store the file name to store >> the pipe command. And add a new bool fSpecIsPipe in _archiveHandle >> (similar bools in pg_dump.c and pg_restore.c) to specify if it's a >> pipe command. >> 2. In the cases when the above bool is set to true, we use popen >> and pclose instead of fopen and fclose. >> 3. To enable the format specifier %f in the pipe-command, we make >> changes to the file name creation logic in a few places. Currently the >> file name (corresponding to a table or large object) is appended to >> the directory name provided by '--file' command. In case of >> '--pipe-command', we use 'replace_percent_placeholders' to replace %f >> with the corresponding file name. This change is made for both table >> files and LO TOC files. >> >> With these core changes, the rest of the code continues working as-is. >> >> We are attaching 4 patches for this change : >> >> 001-pg_dump_pipe has the pg_dump pipe support code. >> 002-pg_restore_pipe has the pg_restore pipe support. >> 003-pg_dump_basic_tests has a few basic validation tests for >> correctmflag combinations. We need to write more automated tests in >> 002_pg_dump.pl but have been running into some issues with environment >> setup due to which certa
Re: Cygwin support
Andrew Dunstan writes: > I agree that I would not normally use Cygwin to run a Postgres instance. > But its psql is nicer than others on Windows because unlike the native > builds we build it with readline. That's why I've kept a buildfarm > animal going all these years. If the maintenance burden were high I > wouldn't do so - but it's not really. And these patches are tiny. I vaguely recall some discussion about whether building with readline has become possible under MSVC. I think it'd make a lot of people happy if that could happen (but I hasten to add that I'm not volunteering). regards, tom lane
Re: Add Pipelining support in psql
On Tue, Apr 22, 2025 at 2:06 AM Michael Paquier wrote: > I am wondering if we could not be smarter with the handling of > the counters, but I really doubt that there is much more we can do > under a PGRES_FATAL_ERROR. One thing that bothers me is that the reported error is silently discarded within discardAbortedPipelineResults. psql -f bug_assertion.sql psql:bug_assertion.sql:7: ERROR: unexpected message type 0x50 during COPY from stdin CONTEXT: COPY psql_pipeline, line 1 psql:bug_assertion.sql:7: Pipeline aborted, command did not run This should ideally report the "FATAL: terminating connection because protocol synchronization was lost" sent by the backend process. Also, we still have a triggered assertion failure with the following: CREATE TABLE psql_pipeline(a text); \startpipeline COPY psql_pipeline FROM STDIN; SELECT 'val1'; \syncpipeline \endpipeline ... Assertion failed: (pset.piped_syncs == 0), function ExecQueryAndProcessResults, file common.c, line 2153. A possible alternative could be to abort discardAbortedPipelineResults when we encounter a res != NULL + FATAL error and let the outer loop handle it. As you said, the pipeline flow is borked so there's not much to salvage. The outer loop would read and print all error messages until the closed connection is detected. Then, CheckConnection will reset the connection which will reset the pipeline state. While testing this change, I was initially looking for the "FATAL: terminating connection because protocol synchronization was lost" message in the tests. However, this was failing on Windows[1] as the FATAL message wasn't reported on stderr. I'm not sure why yet. [1]: https://cirrus-ci.com/task/5051031505076224?logs=check_world#L240-L246 v02-0001-PATCH-psql-Fix-assertion-failure-with-pipeline-m.patch Description: Binary data
Re: Cygwin support
On 2025-04-22 Tu 8:10 AM, Aleksander Alekseev wrote: Hi Andrew, Last year the old Windows machine where I was running the buildfarm member lorikeet died, and since then we've had no buildfarm coverage for Cygwin. I now have a new (but slow) W11pro machine and I have been testing out Cygwin builds on it. I wanted to have it running the TAP tests, unlike lorikeet. Attached is a set of very small patches aimed at enabling this. [...] Thanks for the patches. I wonder though if it is advisable to support Cygwin if this requires extra effort from our side, compared to a regular Linux or Windows environment. I actually like the project but I wouldn't recommend running Postgres on prod, and for development these days it's simpler to use WSL / VirtualBox / Raspberry Pi, or running Postgres natively on Windows. I agree that I would not normally use Cygwin to run a Postgres instance. But its psql is nicer than others on Windows because unlike the native builds we build it with readline. That's why I've kept a buildfarm animal going all these years. If the maintenance burden were high I wouldn't do so - but it's not really. And these patches are tiny. What I might do with the new animal is run just enough TAP tests to exercise psql. That would reduce the errors we get, so it would be less bother to anyone. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Fortify float4 and float8 regression tests by ordering test results
Pavel Borisov writes: > It's not a big problem, but propose a simple fix for the tests. It > just adds ORDER BY 1 to all relevant float4 and floa8 queries. You do realize that this problem is hardly confined to the float4 and float8 tests? To a first approximation, a table AM that fails to preserve insertion order would break every single one of our regression tests. (I speak from experience, as Salesforce had to deal with this when I was there...) The reason why we don't simply add ORDER BY to everything is explained at [1]: You might wonder why we don't order all the regression test queries explicitly to get rid of this issue once and for all. The reason is that that would make the regression tests less useful, not more, since they'd tend to exercise query plan types that produce ordered results to the exclusion of those that don't. At some point we will probably have to think through what we want to do about running the regression tests with table AMs that don't wish to preserve ordering as much as heapam does. It's going to need careful balancing of multiple concerns, and I don't think "blindly slap ORDER BY everywhere" is going to be an acceptable answer. regards, tom lane [1] https://www.postgresql.org/docs/devel/regress-evaluation.html#REGRESS-EVALUATION-ORDERING-DIFFERENCES
Re: Check for tuplestorestate nullness before dereferencing
В письме от вторник, 22 апреля 2025 г. 18:50:49 MSK пользователь Tom Lane написал: > The reason that the subsequent bit of code is safe is that !forward > should not possibly be true unless EXEC_FLAG_BACKWARD was given, > which'd cause us to create a tuplestore. So if we were going > to change anything, I'd propose adding something more like > > if (!forward && eof_tuplestore) > { > + Assert(node->eflags & EXEC_FLAG_BACKWARD); > if (!node->eof_underlying) > { > /* > > or perhaps more directly, Assert that tuplestore is not null. I like Assert(node->eflags & EXEC_FLAG_BACKWARD); solution, it gives more information: "here we expect that BACKWARD eflag is set". (I am not quite familiar with this part of code, this knowledge in not obvious for me) While Assert(tuplestorestate != NULL) gives much less information about what is happening here. And I think it is better to add this Assert there. People will continue using static analyzers on postgres code, finding this place again and again. Better to close this issue once and for all :-) -- Nikolay Shaplov aka Nataraj Fuzzing Engineer at Postgres Professional Matrix IM: @dhyan:nataraj.su signature.asc Description: This is a digitally signed message part.
Re: Cygwin support
On 2025-04-22 Tu 10:26 AM, Tom Lane wrote: Andrew Dunstan writes: I agree that I would not normally use Cygwin to run a Postgres instance. But its psql is nicer than others on Windows because unlike the native builds we build it with readline. That's why I've kept a buildfarm animal going all these years. If the maintenance burden were high I wouldn't do so - but it's not really. And these patches are tiny. I vaguely recall some discussion about whether building with readline has become possible under MSVC. I think it'd make a lot of people happy if that could happen (but I hasten to add that I'm not volunteering). Neither am I, although I'll test it if somebody sends in a patch. If that happens I'll be happy enough to retire Cygwin support. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Fortify float4 and float8 regression tests by ordering test results
On Tue, 22 Apr 2025 at 18:40, Pavel Borisov wrote: > > Hi, Tom! > > On Tue, 22 Apr 2025 at 18:23, Tom Lane wrote: > > > > Pavel Borisov writes: > > > It's not a big problem, but propose a simple fix for the tests. It > > > just adds ORDER BY 1 to all relevant float4 and floa8 queries. > > > > You do realize that this problem is hardly confined to the > > float4 and float8 tests? To a first approximation, a table AM > > that fails to preserve insertion order would break every single > > one of our regression tests. (I speak from experience, as > > Salesforce had to deal with this when I was there...) > > > > The reason why we don't simply add ORDER BY to everything is > > explained at [1]: > > > > You might wonder why we don't order all the regression test > > queries explicitly to get rid of this issue once and for all. The > > reason is that that would make the regression tests less useful, > > not more, since they'd tend to exercise query plan types that > > produce ordered results to the exclusion of those that don't. > > > > At some point we will probably have to think through what we > > want to do about running the regression tests with table AMs that > > don't wish to preserve ordering as much as heapam does. It's going > > to need careful balancing of multiple concerns, and I don't think > > "blindly slap ORDER BY everywhere" is going to be an acceptable > > answer. > > > > I agree we might need to elaborate different AM support in regression tests. > Also, I agree that these are not the only tests to be fixed. > > What I hardly agree with, is that we suppose it's right for regression > tests to require some fixed results ordering for so simple cases. > Maybe for more complicated plans it's useful, but for the float tests > mentioned in the patch it's this requirement is a total loss IMO. > > I understand your sentiment against blindly slapping order by's, but I > don't see a useful alternative for this time. Also a large number of > tests in PG were already fortified with ORDER BY 1. I forgot to mention that it's not only a question of INSERTs ordering. Extension AM can have some internal ordering e.g. index-organized tables. And besides SELECT query results following internal AM ordering just being allowed, more importantly they are good performance-wise and justify table AM extensibility. Regards, Pavel.
Re: What's our minimum supported Python version?
Daniel Gustafsson writes: >> On 19 Apr 2025, at 18:17, Tom Lane wrote: >> The reason I bring this up is that I found out the hard way >> that src/test/modules/oauth_validator fails on RHEL8, because >> its oauth_server.py script is not compatible with the 3.6.8 >> version of Python supplied by this distro. > Do you have the error message/log for the failure handy? The first problem is that this Python version seems not to like assignments embedded in if statements: File "t/oauth_server.py", line 319 if err := self._get_param("error_code", None): ^ SyntaxError: invalid syntax I was able to work around that with the attached quick hack. But then I get Traceback (most recent call last): File "t/oauth_server.py", line 19, in class OAuthHandler(http.server.BaseHTTPRequestHandler): File "t/oauth_server.py", line 26, in OAuthHandler JsonObject = dict[str, object] # TypeAlias is not available until 3.10 TypeError: 'type' object is not subscriptable which I have no idea how to work around. regards, tom lane diff --git a/src/test/modules/oauth_validator/t/oauth_server.py b/src/test/modules/oauth_validator/t/oauth_server.py index 5bc30be87fd..133fe496cfc 100755 --- a/src/test/modules/oauth_validator/t/oauth_server.py +++ b/src/test/modules/oauth_validator/t/oauth_server.py @@ -316,11 +316,13 @@ class OAuthHandler(http.server.BaseHTTPRequestHandler): return resp def token(self) -> JsonObject: -if err := self._get_param("error_code", None): +err = self._get_param("error_code", None) +if err: self._response_code = self._get_param("error_status", 400) resp = {"error": err} -if desc := self._get_param("error_desc", ""): +desc = self._get_param("error_desc", "") +if desc: resp["error_description"] = desc return resp
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
> In any case, my v3 patch seems to fix all these cases. > > (I'm not saying it's good enough to be committed as is. I think I should > at least add some comments. Anything else?) the patch relies on looking up queryDesc->sourceText inside DropPortal, which Tom raised concerns about earlier in the thread [0] So, It seems to me we are better off just setting debug_query_string to NULL in DropPortal, or alternatively why not just log the statement automatically at the start of execution whenever we have log_temp_files > 0. That will allow us to safely capture the statement to blame for the temp files and will cover all cases? [0] https://www.postgresql.org/message-id/CAA5RZ0ssqRTz_9T0Gy74SiTViiX3V0rSFxc4N_4GNcbEBK9wow%40mail.gmail.com -- Sami Imseih Amazon Web Services (AWS)
Re: Fortify float4 and float8 regression tests by ordering test results
Hi Pavel, > It's not a big problem, but propose a simple fix for the tests. It > just adds ORDER BY 1 to all relevant float4 and floa8 queries. Thanks for the patch. That's a good change IMO. > I don't > have a strong opinion about backpatching this, but as the patch > changes only regression tests, it's maybe also worth backpatching. I don't see a reason for backpatching this but I'm not strongly opposed to the idea either. -- Best regards, Aleksander Alekseev
Re: [PATCH] contrib/xml2: xslt_process() should report XSLT-related error details
I forgot the patch of course... -- Robin Haberkorn Senior Software Engineer B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537 From cc5be302aa4305403b8131f6c22c39e4dfb75bd1 Mon Sep 17 00:00:00 2001 From: Robin Haberkorn Date: Thu, 17 Apr 2025 13:15:48 +0300 Subject: [PATCH] contrib/xml2: xslt_process() now reports XSLT-related error details * It sets (and restores) the libxslt error handler (xsltSetGenericErrorFunc()). Since it only supports old-school "generic" error handlers, which are no longer used in PG's libxml code, we reintroduced a "generic" error handler xml_generic_error_handler() in xml.c. * The alternative would have been to expose PgXmlErrorContext in xml.h, so we could implement a generic handler in xslt_proc.c. This is obviously not desirable, as it would depend on USE_LIBXML. * No longer use the PG_XML_STRICTNESS_LEGACY for error handling, but query the error_occurred-flag via pg_xml_error_occurred(). The existance of this getter is another hint, that we are not supposed to expose PgXmlErrorContext in xml.h. * This change means that xslt_process() now reports not only details about XML parsing errors, but XSLT-schema deviations and missing stylesheet parameters as well. * The XSLT error messages now also contain line numbers. For that to work, we had to set a dummy "SQL" URL when parsing XML strings. This is important, since xsltPrintErrorContext() includes line numbers only if an URL is set. * The xsltSaveResultToString() error handling has been removed. It can practically only fail in OOM situations and there is no reason to handle them any different than with the other libxslt functions. * Updated test suite and added test case for detecting missing stylesheet parameters. This was initially reported here but has obviously been fixed in the meantime: https://www.postgresql.org/message-id/4C5ECEF9.3030806%40mlfowler.com --- contrib/xml2/expected/xml2.out | 17 +++ contrib/xml2/sql/xml2.sql | 8 +++ contrib/xml2/xslt_proc.c | 40 -- src/backend/utils/adt/xml.c| 37 +++ src/include/utils/xml.h| 2 ++ 5 files changed, 92 insertions(+), 12 deletions(-) diff --git a/contrib/xml2/expected/xml2.out b/contrib/xml2/expected/xml2.out index 3d97b14..157d584 100644 --- a/contrib/xml2/expected/xml2.out +++ b/contrib/xml2/expected/xml2.out @@ -261,3 +261,20 @@ $$ $$); ERROR: failed to apply stylesheet +DETAIL: runtime error: file SQL line 7 element output +File write for 0wn3d.txt refused +runtime error: file SQL line 7 element output +xsltDocumentElem: write rights for 0wn3d.txt denied +-- detecting missing stylesheet parameter +SELECT xslt_process('', +$$http://www.w3.org/1999/XSL/Transform";> + + + +$$)::xml; +ERROR: failed to apply stylesheet +DETAIL: runtime error: file SQL line 3 element value-of +Variable 'n1' has not been declared. +Undefined variable +runtime error: file SQL line 3 element value-of +XPath evaluation returned no result. diff --git a/contrib/xml2/sql/xml2.sql b/contrib/xml2/sql/xml2.sql index ef99d16..9d42ac8 100644 --- a/contrib/xml2/sql/xml2.sql +++ b/contrib/xml2/sql/xml2.sql @@ -153,3 +153,11 @@ $$ $$); + +-- detecting missing stylesheet parameter +SELECT xslt_process('', +$$http://www.w3.org/1999/XSL/Transform";> + + + +$$)::xml; diff --git a/contrib/xml2/xslt_proc.c b/contrib/xml2/xslt_proc.c index b720d89..f17cf9f 100644 --- a/contrib/xml2/xslt_proc.c +++ b/contrib/xml2/xslt_proc.c @@ -61,6 +61,10 @@ xslt_process(PG_FUNCTION_ARGS) xmlChar*resstr = NULL; int reslen = 0; + /* the previous libxslt error context */ + xmlGenericErrorFunc saved_errfunc; + void *saved_errcxt; + if (fcinfo->nargs == 3) { paramstr = PG_GETARG_TEXT_PP(2); @@ -74,35 +78,46 @@ xslt_process(PG_FUNCTION_ARGS) } /* Setup parser */ - xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_LEGACY); + xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_ALL); + + /* + * Save the previous libxslt error context. + */ + saved_errfunc = xsltGenericError; + saved_errcxt = xsltGenericErrorContext; + xsltSetGenericErrorFunc(xmlerrcxt, xml_generic_error_handler); PG_TRY(); { xmlDocPtr ssdoc; bool xslt_sec_prefs_error; - /* Parse document */ + /* + * Parse document. + * It's important to set an "URL", so libxslt includes + * line numbers in error messages (cf. xsltPrintErrorContext()). + */ doctree = xmlReadMemory((char *) VARDATA_ANY(doct), -VARSIZE_ANY_EXHDR(doct), NULL, NULL, +VARSIZE_ANY_EXHDR(doct), "SQL", NULL, XML_PARSE_NOENT); - if (doctree == NULL) + if (doctree == NULL || pg_xml_error_occurred(xmlerrcxt)) xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT, "error parsing XML document"); /* Same for stylesheet */ ssdoc = xmlReadMem
Re: Large expressions in indexes can't be stored (non-TOASTable)
On Tue, Apr 22, 2025 at 05:17:17PM +0530, Nisha Moond wrote: > Thanks for the patch! I tested it for functionality and here are a few > comments: Thank you for reviewing. > 1) While testing, I noticed an unexpected behavior with the new 512 > byte length restriction in place. Since there´s no explicit > restriction on the column roname´s type, it´s possible to insert an > origin name exceeding 512 bytes. For instance, can use the COPY > command -- similar to how pg_dump might dump and restore catalog > tables. > > For example: > postgres=# COPY pg_catalog.pg_replication_origin (roident, roname) FROM > stdin; > Enter data to be copied followed by a newline. > End with a backslash and a period on a line by itself, or an EOF signal. > >> 44 regress_... [>512 bytes string] > >> \. > COPY 1 > > Once inserted, I was able to use the pg_replication_origin_xxx > functions with this origin name without any errors. > Not sure how common pg_dump/restore of catalog tables is in real > systems, but should we still consider if this behavior is acceptable? I'm personally not too worried about this. The limit is primarily there to provide a nicer ERROR than "row is too big", which AFAICT is the worst-case scenario if you go to the trouble of setting allow_system_table_mods and modifying shared catalogs directly. > 5) As Euler pointed out, logical replication already has a built-in > restriction for internally assigned origin names, limiting them to > NAMEDATALEN. Given that, only the SQL function > pg_replication_origin_create() is capable of creating longer origin > names. Would it make sense to consider moving the check into > pg_replication_origin_create() itself, since it seems redundant for > all other callers? > That said, if there's a possibility of future callers needing this > restriction at a lower level, it may be reasonable to leave it as is. pg_replication_origin_create() might be a better place. After all, that's where we're enforcing the name doesn't start with "pg_" and, for testing, _does_ start with "regress_". Plus, it seems highly unlikely that any other callers of replorigin_create() will ever provide names longer than 512 bytes. -- nathan
Re: long-standing data loss bug in initial sync of logical replication
On Tue, Mar 18, 2025 at 2:55 AM Amit Kapila wrote: > > On Mon, Mar 17, 2025 at 4:56 PM Hayato Kuroda (Fujitsu) > wrote: > > > > > Regarding the PG13, it cannot be > > > applied > > > as-is thus some adjustments are needed. I will share it in upcoming posts. > > > > Here is a patch set for PG13. Apart from PG14-17, the patch could be > > created as-is, > > because... > > > > 1. WAL record for invalidation messages (XLOG_XACT_INVALIDATIONS) does not > > exist. > > 2. Thus the ReorderBufferChange for the invalidation does not exist. > >Our patch tries to distribute it but cannot be done as-is. > > 3. Codes assumed that invalidation messages can be added only once. > > 4. The timing when invalidation messages are consumed is limited: > > a. COMMAND_ID change is poped, > > b. start of decoding a transaction, or > > c. end of decoding a transaction. > > > > Above means that invalidations cannot be executed while being decoded. > > I created two patch sets to resolve the data loss issue. 0001 has less code > > changes but could resolve a part of issue, 0002 has huge changes but > > provides a > > complete solution. > > > > 0001 - mostly same as patches for other versions. > > ReorderBufferAddInvalidations() > >was adjusted to allow being called several times. As I said above, > >0001 cannot execute inval messages while decoding the transacitons. > > 0002 - introduces new ReorderBufferChange type to indicate inval messages. > >It would be handled like PG14+. > > > > Here is an example. Assuming that the table foo exists on both nodes, a > > publication "pub" which publishes all tables, and a subscription "sub" which > > subscribes "pub". What if the workload is executed? > > > > ``` > > S1 S2 > > BEGIN; > > INSERT INTO foo VALUES (1) > > ALTER PUBLICATION pub RENAME TO pub_renamed; > > INSERT INTO foo VALUES (2) > > COMMIT; > > LR -> ? > > ``` > > > > With 0001, tuples (1) and (2) would be replicated to the subscriber. > > An error "publication "pub" does not exist" would raise when new changes > > are done > > later. > > > > 0001+0002 works more aggressively; the error would raise when S1 > > transaction is decoded. > > The behavior is same as for patched PG14-PG17. > > > > I understand that with 0001 the fix is partial in the sense that > because invalidations are processed at the transaction end, the > changes of concurrent DDL will only be reflected for the next > transaction. Now, on one hand, it is prudent to not add a new type of > ReorderBufferChange in the backbranch (v13) but the change is not that > invasive, so we can go with it as well. My preference would be to go > with just 0001 for v13 to minimize the risk of adding new bugs or > breaking something unintentionally. > > Sawada-San, and others involved here, do you have any suggestions on > this matter? Sorry for the late response. I agree with just 0001 for v13 as 0002 seems invasive. Given that v13 would have only a few releases until EOL and 0001 can deal with some cases in question, I'd like to avoid such invasive changes in v13. It would not be advisable to change the ReorderBufferChange format in minor release even though it would not change the struct size. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Fortify float4 and float8 regression tests by ordering test results
Hi, Tom! On Tue, 22 Apr 2025 at 21:13, Tom Lane wrote: > > Pavel Borisov writes: > > On Tue, 22 Apr 2025 at 20:22, Alexander Korotkov > > wrote: > >> On Tue, Apr 22, 2025 at 7:20 PM Tom Lane wrote: > >>> Alexander Korotkov writes: > I'd like to add that float4.out not only assumes that insert-ordering is > preserved (this could be more-or-less portable between table AMs). It > also > assumes the way UPDATE moves updated rows. That seems quite > heap-specific. You can see in the following fragment, updated rows jump > to > the bottom. > > >>> I'd be willing to consider a policy that we don't want to depend on > >>> exactly where UPDATE moves rows to. The proposed patch is not that, > >>> however. > > >> OK, that makes sense for me. > > > Thanks for this input! > > This was my first intention to fix only the test that was affected by > > UPDATE-order specifics, broke when runnung on an extension AM. > > Maybe I was too liberal and added ORDER BY's more than needed. > > I definitely agree to the proposal. > > On further reflection, this policy makes plenty of sense because even > heapam is not all that stable about row order after UPDATE. With > tables large enough to draw the attention of autovacuum, we've had > to use ORDER BY or other techniques to stabilize the results, because > the timing of background autovacuums affected where rows got put. > These tests are different only because the tables are small enough > and the updates few enough that autovacuum doesn't get involved. > I think it's reasonable that at a project-policy level we should > not consider that an excuse. For example, a change in autovacuum's > rules could probably break these tests even with heapam. > > On the other hand, as I mentioned earlier, a table AM that doesn't > reproduce original insertion order would break a whole lot more > tests than these. So that's a bridge I'd rather not cross, > at least not without a lot of consideration. > > BTW, you forgot to update expected/float4-misrounded-input.out. Added in v3. Thanks for a mention! Regards, Pavel Borisov v3-0001-Fortify-float4-and-float8-regression-tests-agains.patch Description: Binary data
Re: What's our minimum supported Python version?
Jacob Champion writes: > As for picking a version... 3.6 will have been EOL for almost three > years by the time 18 releases. It seems like we would drop it happily, > were it not for RHEL8. Agreed, but RHEL8 is out there and I don't think we can just drop support for it. I'm also not excited by the idea that an incidental test script gets to dictate what the cutoff is. I do think we should stop claiming that Python 3.2 will work. (Maybe it does, but we don't know that.) I see that the configure script only checks for Python >= 3, and it doesn't look like the meson scripts check explicitly at all, although there's a comment saying that our meson version cutoff is intended to allow working with Python 3.5. Maybe it's sufficient to make a documentation change here, and say we support Python >= 3.5? I'd be okay with saying 3.6.8 too, on the grounds that if anything older fails to work we'd almost certainly just say "too bad". But RHEL8 is widespread enough that I think we need to keep making the effort for 3.6.8. regards, tom lane
Re: Fortify float4 and float8 regression tests by ordering test results
On Tue, Apr 22, 2025 at 5:57 PM Pavel Borisov wrote: > On Tue, 22 Apr 2025 at 18:40, Pavel Borisov wrote: > > On Tue, 22 Apr 2025 at 18:23, Tom Lane wrote: > > > > > > Pavel Borisov writes: > > > > It's not a big problem, but propose a simple fix for the tests. It > > > > just adds ORDER BY 1 to all relevant float4 and floa8 queries. > > > > > > You do realize that this problem is hardly confined to the > > > float4 and float8 tests? To a first approximation, a table AM > > > that fails to preserve insertion order would break every single > > > one of our regression tests. (I speak from experience, as > > > Salesforce had to deal with this when I was there...) > > > > > > The reason why we don't simply add ORDER BY to everything is > > > explained at [1]: > > > > > > You might wonder why we don't order all the regression test > > > queries explicitly to get rid of this issue once and for all. The > > > reason is that that would make the regression tests less useful, > > > not more, since they'd tend to exercise query plan types that > > > produce ordered results to the exclusion of those that don't. > > > > > > At some point we will probably have to think through what we > > > want to do about running the regression tests with table AMs that > > > don't wish to preserve ordering as much as heapam does. It's going > > > to need careful balancing of multiple concerns, and I don't think > > > "blindly slap ORDER BY everywhere" is going to be an acceptable > > > answer. > > > > > > > I agree we might need to elaborate different AM support in regression tests. > > Also, I agree that these are not the only tests to be fixed. > > > > What I hardly agree with, is that we suppose it's right for regression > > tests to require some fixed results ordering for so simple cases. > > Maybe for more complicated plans it's useful, but for the float tests > > mentioned in the patch it's this requirement is a total loss IMO. > > > > I understand your sentiment against blindly slapping order by's, but I > > don't see a useful alternative for this time. Also a large number of > > tests in PG were already fortified with ORDER BY 1. > > I forgot to mention that it's not only a question of INSERTs ordering. > Extension AM can have some internal ordering e.g. index-organized > tables. And besides SELECT query results following internal AM > ordering just being allowed, more importantly they are good > performance-wise and justify table AM extensibility. +1, I'd like to add that float4.out not only assumes that insert-ordering is preserved (this could be more-or-less portable between table AMs). It also assumes the way UPDATE moves updated rows. That seems quite heap-specific. You can see in the following fragment, updated rows jump to the bottom. -- test the unary float4abs operator SELECT f.f1, @f.f1 AS abs_f1 FROM FLOAT4_TBL f; f1 |abs_f1 ---+--- 0 | 0 1004.3 |1004.3 -34.84 | 34.84 1.2345679e+20 | 1.2345679e+20 1.2345679e-20 | 1.2345679e-20 (5 rows) UPDATE FLOAT4_TBL SET f1 = FLOAT4_TBL.f1 * '-1' WHERE FLOAT4_TBL.f1 > '0.0'; SELECT * FROM FLOAT4_TBL; f1 0 -34.84 -1004.3 -1.2345679e+20 -1.2345679e-20 (5 rows) -- Regards, Alexander Korotkov Supabase
Re: What's our minimum supported Python version?
Jacob Champion writes: > On Tue, Apr 22, 2025 at 9:04 AM Tom Lane wrote: >> But RHEL8 is widespread >> enough that I think we need to keep making the effort for 3.6.8. > Even if we can get side-by-side versions working? That would be off-loading our problem onto the users (and the buildfarm owners). If we really had little other choice, we could go there; but it doesn't sound like this is that hard to work around. regards, tom lane
Re: Fortify float4 and float8 regression tests by ordering test results
Hi! On Tue, 22 Apr 2025 at 20:22, Alexander Korotkov wrote: > > On Tue, Apr 22, 2025 at 7:20 PM Tom Lane wrote: > > Alexander Korotkov writes: > > > I'd like to add that float4.out not only assumes that insert-ordering is > > > preserved (this could be more-or-less portable between table AMs). It > > > also > > > assumes the way UPDATE moves updated rows. That seems quite > > > heap-specific. You can see in the following fragment, updated rows jump > > > to > > > the bottom. > > > > I'd be willing to consider a policy that we don't want to depend on > > exactly where UPDATE moves rows to. The proposed patch is not that, > > however. > > OK, that makes sense for me. Thanks for this input! This was my first intention to fix only the test that was affected by UPDATE-order specifics, broke when runnung on an extension AM. Maybe I was too liberal and added ORDER BY's more than needed. I definitely agree to the proposal. Please find attached v2 that fixes only UPDATE-specific part of float4/float8 test. v2-0001-Fortify-float4-and-float8-regression-tests-agains.patch Description: Binary data
Fortify float4 and float8 regression tests by ordering test results
Hi, hackers! I noticed that SELECT results in float4 and float8 tests lack ORDER BY clauses. This makes test results depend on the current heap/MVCC implementation. If I try to run the float8 test on a table created with a different access method provided by an extension, I'm getting results ordered differently. It's not a big problem, but propose a simple fix for the tests. It just adds ORDER BY 1 to all relevant float4 and floa8 queries. I don't have a strong opinion about backpatching this, but as the patch changes only regression tests, it's maybe also worth backpatching. Regards, Pavel Borisov, Supabase. v1-0001-Fortify-float4-and-float8-tests-by-ordering-test-.patch Description: Binary data
Re: Fortify float4 and float8 regression tests by ordering test results
Hi, Tom! On Tue, 22 Apr 2025 at 18:23, Tom Lane wrote: > > Pavel Borisov writes: > > It's not a big problem, but propose a simple fix for the tests. It > > just adds ORDER BY 1 to all relevant float4 and floa8 queries. > > You do realize that this problem is hardly confined to the > float4 and float8 tests? To a first approximation, a table AM > that fails to preserve insertion order would break every single > one of our regression tests. (I speak from experience, as > Salesforce had to deal with this when I was there...) > > The reason why we don't simply add ORDER BY to everything is > explained at [1]: > > You might wonder why we don't order all the regression test > queries explicitly to get rid of this issue once and for all. The > reason is that that would make the regression tests less useful, > not more, since they'd tend to exercise query plan types that > produce ordered results to the exclusion of those that don't. > > At some point we will probably have to think through what we > want to do about running the regression tests with table AMs that > don't wish to preserve ordering as much as heapam does. It's going > to need careful balancing of multiple concerns, and I don't think > "blindly slap ORDER BY everywhere" is going to be an acceptable > answer. > I agree we might need to elaborate different AM support in regression tests. Also, I agree that these are not the only tests to be fixed. What I hardly agree with, is that we suppose it's right for regression tests to require some fixed results ordering for so simple cases. Maybe for more complicated plans it's useful, but for the float tests mentioned in the patch it's this requirement is a total loss IMO. I understand your sentiment against blindly slapping order by's, but I don't see a useful alternative for this time. Also a large number of tests in PG were already fortified with ORDER BY 1. Regards, Pavel Borisov Supabase
Re: [PATCH] contrib/xml2: xslt_process() should report XSLT-related error details
"Robin Haberkorn" writes: > I forgot the patch of course... Hi Robin, thank you for the patch! This has arrived too late to be considered for PG v18, but please add the thread to the open commitfest https://commitfest.postgresql.org/53/ so that we don't forget about it for v19, and so that it will receive automated testing from our cfbot. Just from a quick eyeballing of the patch, it seems generally sane, but I wonder why you have xml_generic_error_handler creating and then destroying an errorBuf instead of just appending the results directly to xmlerrcxt->err_buf. The extra buffer doesn't seem to be doing much except increasing our odds of an OOM failure right there. I'm also a little suspicious of - xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_LEGACY); + xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_ALL); as that looks like it is probably changing the behavior of the module in ways bigger than just providing more error details. This code has been in legacy status for so long that I feel a little hesitant about doing that. If we are willing to do it, should we go further and clean out the use of PG_XML_STRICTNESS_LEGACY in xml2/xpath.c as well? regards, tom lane
Re: What's our minimum supported Python version?
On Tue, Apr 22, 2025 at 8:05 AM Tom Lane wrote: > The first problem is that this Python version seems not to > like assignments embedded in if statements: [...] > > I was able to work around that with the attached quick hack. > But then I get [...] Thanks, I'll put a patch together. Sorry -- IIRC, both of those features are probably too new for me to have used, regardless of what we decide is the minimum version for 18. As for picking a version... 3.6 will have been EOL for almost three years by the time 18 releases. It seems like we would drop it happily, were it not for RHEL8. But RHEL also supports side-by-side Python, right? In general, I'd be more than willing to try plumbing that through the new tests, if it lets us drop end-of-life versions for new code. (I don't know if we'll need to drop 3.6 for PG18, specifically, but I also think we shouldn't plan to support 3.6 for the full RHEL8 lifetime.) Thanks, --Jacob
Re: Fortify float4 and float8 regression tests by ordering test results
On Tue, Apr 22, 2025 at 7:20 PM Tom Lane wrote: > Alexander Korotkov writes: > > I'd like to add that float4.out not only assumes that insert-ordering is > > preserved (this could be more-or-less portable between table AMs). It also > > assumes the way UPDATE moves updated rows. That seems quite > > heap-specific. You can see in the following fragment, updated rows jump to > > the bottom. > > I'd be willing to consider a policy that we don't want to depend on > exactly where UPDATE moves rows to. The proposed patch is not that, > however. OK, that makes sense for me. -- Regards, Alexander Korotkov Supabase
Re: ZStandard (with dictionaries) compression support for TOAST compression
Hi, On 2025-04-18 12:22:18 -0400, Robert Haas wrote: > On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda > wrote: > > Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT > > ...) > > > > As compressed datums can be copied to other unrelated tables via CTAS, > > INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a > > method inheritZstdDictionaryDependencies. This method is invoked at > > the end of such statements and ensures that any dictionary > > dependencies from source tables are copied to the destination table. > > We determine the set of source tables using the relationOids field in > > PlannedStmt. > > With the disclaimer that I haven't opened the patch or thought > terribly deeply about this issue, at least not yet, my fairly strong > suspicion is that this design is not going to work out, for multiple > reasons. In no particular order: > > 1. I don't think users will like it if dependencies on a zstd > dictionary spread like kudzu across all of their tables. I don't think > they'd like it even if it were 100% accurate, but presumably this is > going to add dependencies any time there MIGHT be a real dependency > rather than only when there actually is one. > > 2. Inserting into a table or updating it only takes RowExclusiveLock, > which is not even self-exclusive. I doubt that it's possible to change > system catalogs in a concurrency-safe way with such a weak lock. For > instance, if two sessions tried to do the same thing in concurrent > transactions, they could both try to add the same dependency at the > same time. > > 3. I'm not sure that CTAS, INSERT INTO...SELECT, and CREATE > TABLE...EXECUTE are the only ways that datums can creep from one table > into another. For example, what if I create a plpgsql function that > gets a value from one table and stores it in a variable, and then use > that variable to drive an INSERT into another table? I seem to recall > there are complex cases involving records and range types and arrays, > too, where the compressed object gets wrapped inside of another > object; though maybe that wouldn't matter to your implementation if > INSERT INTO ... SELECT uses a sufficiently aggressive strategy for > adding dependencies. +1 to all of these. > I think we could add plain-old zstd compression without really > tackling this issue +1 > I'm now also curious to know whether Andres would agree that it's bad > if zstd dictionaries are un-droppable. After all, I thought it would > be bad if there was no way to eliminate a dependency on a compression > method, and he disagreed. I still am not too worried about that aspect. However: > So maybe he would also think undroppable dictionaries are fine. I'm much less sanguine about this. Imagine a schema based multi-tenancy setup, where tenants come and go, and where a few of the tables use custom dictionaries. Whereas not being able to get rid of lz4 at all has basically no cost whatsoever, collecting more and more unusable dictionaries can imply a fair amount of space usage after a while. I don't see any argument why that would be ok, really. > But maybe not. It seems even worse to me than undroppable compression > methods, because you'll probably not have that many compression methods > ever, but you could have a large number of dictionaries eventually. Agreed on the latter. Greetings, Andres Freund
Re: index prefetching
On Tue, Apr 22, 2025 at 6:46 AM Tomas Vondra wrote: > here's an improved (rebased + updated) version of the patch series, with > some significant fixes and changes. The patch adds infrastructure and > modifies btree indexes to do prefetching - and AFAIK it passes all tests > (no results, correct results). Cool! > Compared to the last patch version [1] shared on list (in November), > there's a number of significant design changes - a lot of this is based > on a number of off-list discussions I had with Peter Geoghegan, which > was very helpful. Thanks for being so receptive to my feedback. I know that I wasn't particularly clear. I mostly only gave you my hand-wavy, caveat-laden ideas about how best to layer things. But you were willing to give them full and fair consideration. > 1) patch now relies on read_stream > So I squashed these two parts, and the patch now does read_stream (for > the table reads) from the beginning. Make sense. > Based on the discussions with Peter I decided to make this a bit more > ambitious, moving the whole batch management from the index AM to the > indexam.c level. So now there are two callbacks - amgetbatch and > amfreebatch, and it's up to indexam.c to manage the batches - decide how > many batches to allow, etc. The index AM is responsible merely for > loading the next batch, but does not decide when to load or free a > batch, how many to keep in memory, etc. > > There's a section in indexam.c with a more detailed description of the > design, I'm not going to explain all the design details here. To me, the really important point about this high-level design is that it provides a great deal of flexibility around reordering work, while still preserving the appearance of an index scan that performs work in the same old fixed order. All relevant kinds of work (whether table AM and index AM related work) are under the direct control of one single module. There's one central place for a mechanism that weighs both costs and benefits, keeping things in balance. (I realize that there's still some sense in which that isn't true, partly due to the read stream interface, but for now the important thing is that we're agreed on this high level direction.) > I think the indexam.c is a sensible layer for this. I was hoping doing > this at the "executor level" would mean no need for AM code changes, but > that turned out not possible - the AM clearly needs to know about the > batch boundaries, so that it can e.g. do killtuples, etc. That's why we > need the two callbacks (not just the "amgetbatch" one). At least this > way it's "hidden" by the indexam.c API, like index_getnext_slot(). Right. But (if I'm not mistaken) the index AM doesn't actually need to know *when* to do killtuples. It still needs to have some handling for this, since we're actually modifying index pages, and we need to have handling for certain special cases (e.g., posting list tuples) on the scan side. But it can be made to work in a way that isn't rigidly tied to the progress of the scan -- it's perfectly fine to do this work somewhat out of order, if that happens to make sense. It doesn't have to happen in perfect lockstep with the scan, right after the items from the relevant leaf page have all been returned. It should also eventually be possible to do things like perform killtuples in a different process (perhaps even thread?) to the one that originally read the corresponding leaf page items. That's the kind of long term goal to keep in mind, I feel. > (You could argue indexam.c is "executor" and maybe it is - I don't know > where exactly to draw the line. I don't think it matters, really. The > "hidden in indexam API" is the important bit.) The term that I've used is "index scan manager", since it subsumes some of the responsibilities related to scheduling work that has traditionally been under the control of index AMs. I'm not attached to that name, but we should agree upon some name for this new concept. It is a new layer, above the index AM but below the executor proper, and so it feels like it needs to be clearly distinguished from the two adjoining layers. > Or maybe we should > even rip out the amgettuple() entirely, and only support one of those > for each AM? That's what Peter suggested, but I'm not convinced we > should do that. Just to be clear, for other people reading along: I never said that we should fully remove amgettuple as an interface. What I said was that I think that we should remove btgettuple(), and any other amgettuple routine within index AMs that switch over to using the new interface. I'm not religious about removing amgettuple() from index AMs that also support the new batch interface. It's probably useful to keep around for now, for debugging purposes. My point was only this: I know of no good reason to keep around btgettuple in the first committed version of the patch. So if you're going to keep it around, you should surely have at least one explicit reason for doing s
Re: Fortify float4 and float8 regression tests by ordering test results
Alexander Korotkov writes: > I'd like to add that float4.out not only assumes that insert-ordering is > preserved (this could be more-or-less portable between table AMs). It also > assumes the way UPDATE moves updated rows. That seems quite > heap-specific. You can see in the following fragment, updated rows jump to > the bottom. I'd be willing to consider a policy that we don't want to depend on exactly where UPDATE moves rows to. The proposed patch is not that, however. regards, tom lane
Re: What's our minimum supported Python version?
On Tue, Apr 22, 2025 at 9:04 AM Tom Lane wrote: > I'm also not excited by the idea that an incidental > test script gets to dictate what the cutoff is. I agree, and I don't intend for the script to dictate that. > Maybe it's sufficient to make a documentation change here, and > say we support Python >= 3.5? I'd be okay with saying 3.6.8 > too, on the grounds that if anything older fails to work we'd > almost certainly just say "too bad". I'm definitely on board with pulling the minimum up, as high as we think we can get away with. If you're comfortable with 3.6(.8), I'd say let's start there. > But RHEL8 is widespread > enough that I think we need to keep making the effort for 3.6.8. Even if we can get side-by-side versions working? RHEL8 has a bunch of newer Python 3 versions available, according to the docs. And virtualenvs can do a lot of lifting for us in practice. Thanks, --Jacob
Re: High CPU consumption in cascade replication with large number of walsenders and ConditionVariable broadcast issues
After playing with the patch a little more, we’ve come to the conclusion, that the idea to signal CV broadcast from the timer handler is a risky one, as it creates a large number of execution paths, which are not present in current code base. It's hard to prove correctness of application behavior in each such case, so we've decided to use a different approach. In the new version of the patch we use timer handler only to set the flag, which is then checked in the ProcessStartupProcInterrupts function. This allow us to send signal on timeout if the startup process is waiting for the arrival of new WAL records (in ReadRecord), but the notification may be delayed while record is being applied (during redo handler invocation from ApplyWalRecord). This could increase delay for some corner cases with non-trivial WAL records like ‘drop database’, but this should be a rare case and walsender process have its own limit on the wait time, so the delay won’t be indefinite even in this case. A new variant of the patch is attached in case anybody else wants to play with this patch and approach. A slightly improved test case and formatted CV patch (which is not strictly required anymore for this case) are attached as well. Thanks, Alexey From b6749b210cceabcc0c382935ec412182b047fbb4 Mon Sep 17 00:00:00 2001 From: Alexey Makhmutov Date: Fri, 28 Mar 2025 16:16:36 +0300 Subject: [PATCH 1/2] Ensure that content of the wait list on CV is aligned with cv_sleep_target value Wait on the ConditionVariable could be interleaved with interruption handlers, such as timers. If such handler uses CV calls (i.e. ConditionVariableBroadcast), then value of the cv_sleep_target could be null or point to a different CV after wakeup. In this case we should not try to add ourselves to the wakeup wait list, as subsequent call to ConditionVariableCancelSleep won't be able to find our CV to clear its list. We should instead return control to the client, so the exit condition for the CV external wait loop could be checked. If wait is still required, then next invocation of ConditionVariableTimedSleep will perform the required setup procedures for CV. Behavior before this change results in the mismatch between cv_sleep_target value and content of the wakeup list, so the next invocation of ConditionVariableBroadcast method may fail due to presence of current process in the list. --- src/backend/storage/lmgr/condition_variable.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/backend/storage/lmgr/condition_variable.c b/src/backend/storage/lmgr/condition_variable.c index 228303e77f7..ab4ad7e5641 100644 --- a/src/backend/storage/lmgr/condition_variable.c +++ b/src/backend/storage/lmgr/condition_variable.c @@ -165,6 +165,17 @@ ConditionVariableTimedSleep(ConditionVariable *cv, long timeout, /* Reset latch before examining the state of the wait list. */ ResetLatch(MyLatch); + /* + * First, check that we are still expected to wait on this variable. + * If sleep target has changed, then we need to return spuriously to + * allow caller to check the exit condition and invoke the wait + * function again to properly prepare for the next sleep. Note, that + * we don't need to change wait list in this case, as we should be + * deleted from the list in case of cv_sleep_target change. + */ + if (cv != cv_sleep_target) + return false; + /* * If this process has been taken out of the wait list, then we know * that it has been signaled by ConditionVariableSignal (or -- 2.49.0 From 6eaeb7a7fbb40fc3a8fb04a2ea22df9a54747a0a Mon Sep 17 00:00:00 2001 From: Rustam Khamidullin Date: Fri, 14 Mar 2025 18:18:34 +0700 Subject: [PATCH 2/2] Implement batching for WAL records notification during cascade replication Currently standby server notifies walsenders after applying of each WAL record during cascade replication. This creates a bottleneck in case of large number of sender processes during WalSndWakeup invocation. This change introduces batching for such notifications, which are now sent either after certain number of applied records or specified time interval (whichever comes first). Co-authored-by: Alexey Makhmutov --- src/backend/access/transam/xlogrecovery.c | 99 ++- src/backend/postmaster/startup.c | 5 ++ src/backend/utils/misc/guc_tables.c | 22 + src/include/access/xlogrecovery.h | 8 ++ src/include/utils/timeout.h | 1 + 5 files changed, 133 insertions(+), 2 deletions(-) diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 6ce979f2d8b..43e95602223 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -64,6 +64,7 @@ #include "utils/pg_lsn.h" #include "utils/ps_status.h" #include "utils/pg_rusage.h" +#include "utils/timeout.h" /* Unsupported old recovery command file names (relative to $PGDATA) */ #define
[PATCH] contrib/xml2: xslt_process() should report XSLT-related error details
Dear PG hackers, here is a patch that improves the xslt_process() error handling. Error handling was one of the points criticized earlier [1] and as far as I understand it is one of the reasons that xslt_process() was never adopted into core and contrib/xml2 is considered deprecated since PG 8. However the point is also still on the TODO list ("Report errors returned by the XSLT library") and xslt_process() has been asked for several times on the mailing list. Memory handling bugs that have been reported earlier [2] have obviously long been fixed. The point raised in [1] was that xslt_process() would not report missing stylesheet parameters at all. This hasn't been the case at the current HEAD. However, xslt_process() still wouldn't report any details about errors related to XSLT processing. This is what is fixed by this patch. It now installs an XSLT error handler. See the patch comment for details. Since the goal is to eventually get xslt_process() adopted into core, I also got rid of PG_XML_STRICTNESS_LEGACY. Perhaps you can tell me what else is preventing adoption into core. I believe that xslt_process() should also accept the `xml` type as an alternative to strings. Strings should be kept for backwards compatibility, though. Also, the stylesheet parameter passing is very crude and limited. I suggest, that the current method of passing them as strings (see parse_params()) is deprecated and we don't try to tackle its limitations, but will instead accept hstores. If you agree, I will work on these two tasks next. Yours sincerely, Robin Haberkorn [1]: https://www.postgresql.org/message-id/4C5ECEF9.3030806%40mlfowler.com [2]: https://www.postgresql.org/message-id/flat/4A6A276A.6090405%40dunslane.net -- Robin Haberkorn Senior Software Engineer B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537
Re: Vacuum statistics
On 10/28/24 14:40, Alexander Korotkov wrote: On Sun, Aug 25, 2024 at 6:59 PM Alena Rybakina If I missed something or misunderstood, can you explain in more detail? Actually, I mean why do we need a possibility to return statistics for all tables/indexes in one function call? User anyway is supposed to use pg_stat_vacuum_indexes/pg_stat_vacuum_tables view, which do function calls one per relation. I suppose we can get rid of possibility to get all the objects in one function call and just return a tuple from the functions like other pgstatfuncs.c functions do. I suppose it was designed this way because databases may contain thousands of tables and indexes - remember, at least, partitions. But it may be okay to use the SRF_FIRSTCALL_INIT / SRF_RETURN_NEXT API. I think by registering a prosupport routine predicting cost and rows of these calls, we may let the planner build adequate plans for queries involving those stats - people will definitely join it with something else in the database. -- regards, Andrei Lepikhov
Re: Fortify float4 and float8 regression tests by ordering test results
Pavel Borisov writes: > On Tue, 22 Apr 2025 at 21:13, Tom Lane wrote: >> BTW, you forgot to update expected/float4-misrounded-input.out. > Added in v3. Thanks for a mention! v3 LGTM, pushed. regards, tom lane
Re: Fortify float4 and float8 regression tests by ordering test results
On Tue, 22 Apr 2025 at 22:25, Tom Lane wrote: > > Pavel Borisov writes: > > On Tue, 22 Apr 2025 at 21:13, Tom Lane wrote: > >> BTW, you forgot to update expected/float4-misrounded-input.out. > > > Added in v3. Thanks for a mention! > > v3 LGTM, pushed. Thank you! Regards, Pavel
Re: index prefetching
On 4/22/25 18:26, Peter Geoghegan wrote: > On Tue, Apr 22, 2025 at 6:46 AM Tomas Vondra wrote: >> here's an improved (rebased + updated) version of the patch series, with >> some significant fixes and changes. The patch adds infrastructure and >> modifies btree indexes to do prefetching - and AFAIK it passes all tests >> (no results, correct results). > > Cool! > >> Compared to the last patch version [1] shared on list (in November), >> there's a number of significant design changes - a lot of this is based >> on a number of off-list discussions I had with Peter Geoghegan, which >> was very helpful. > > Thanks for being so receptive to my feedback. I know that I wasn't > particularly clear. I mostly only gave you my hand-wavy, caveat-laden > ideas about how best to layer things. But you were willing to give > them full and fair consideration. > >> 1) patch now relies on read_stream > >> So I squashed these two parts, and the patch now does read_stream (for >> the table reads) from the beginning. > > Make sense. > >> Based on the discussions with Peter I decided to make this a bit more >> ambitious, moving the whole batch management from the index AM to the >> indexam.c level. So now there are two callbacks - amgetbatch and >> amfreebatch, and it's up to indexam.c to manage the batches - decide how >> many batches to allow, etc. The index AM is responsible merely for >> loading the next batch, but does not decide when to load or free a >> batch, how many to keep in memory, etc. >> >> There's a section in indexam.c with a more detailed description of the >> design, I'm not going to explain all the design details here. > > To me, the really important point about this high-level design is that > it provides a great deal of flexibility around reordering work, while > still preserving the appearance of an index scan that performs work in > the same old fixed order. All relevant kinds of work (whether table AM > and index AM related work) are under the direct control of one single > module. There's one central place for a mechanism that weighs both > costs and benefits, keeping things in balance. > > (I realize that there's still some sense in which that isn't true, > partly due to the read stream interface, but for now the important > thing is that we're agreed on this high level direction.) > Yeah, that makes sense, although I've been thinking about this a bit differently. I haven't been trying to establish a new "component" to manage prefetching. For me the question was what's the right layer, so that unnecessary details don't leak into AM and/or executor. The AM could issue fadvise prefetches, or perhaps even feed blocks into a read_stream, but it doesn't seem like the right place to ever do more decisions. OTOH we don't want every place in the executor to reimplement the prefetching, and indexam.c seems like a good place in between. It requires exchanging some additional details with the AM, provided by the new callbacks. It seems the indexam.c achieves both your and mine goals, more or less. >> I think the indexam.c is a sensible layer for this. I was hoping doing >> this at the "executor level" would mean no need for AM code changes, but >> that turned out not possible - the AM clearly needs to know about the >> batch boundaries, so that it can e.g. do killtuples, etc. That's why we >> need the two callbacks (not just the "amgetbatch" one). At least this >> way it's "hidden" by the indexam.c API, like index_getnext_slot(). > > Right. But (if I'm not mistaken) the index AM doesn't actually need to > know *when* to do killtuples. It still needs to have some handling for > this, since we're actually modifying index pages, and we need to have > handling for certain special cases (e.g., posting list tuples) on the > scan side. But it can be made to work in a way that isn't rigidly tied > to the progress of the scan -- it's perfectly fine to do this work > somewhat out of order, if that happens to make sense. It doesn't have > to happen in perfect lockstep with the scan, right after the items > from the relevant leaf page have all been returned. > > It should also eventually be possible to do things like perform > killtuples in a different process (perhaps even thread?) to the one > that originally read the corresponding leaf page items. That's the > kind of long term goal to keep in mind, I feel. > Right. The amfreebatch() does not mean the batch needs to be freed immediately, it's just handed over back to the AM, and it's up to the AM to do the necessary cleanup at some point. It might queue it for later, or perhaps even do that in a separate thread ... >> (You could argue indexam.c is "executor" and maybe it is - I don't know >> where exactly to draw the line. I don't think it matters, really. The >> "hidden in indexam API" is the important bit.) > > The term that I've used is "index scan manager", since it subsumes > some of the responsibilities related to scheduling work that has >
Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding
Hi Ajin. Here are some v17-0003 review comments (aka some v16-0003 comments that were not yet addressed or rejected) On Fri, Apr 11, 2025 at 4:05 PM Peter Smith wrote: ... > == > Commit message > > 2. missing commit message Not yet addressed. > ~~~ > > 8. > # Insert, delete, and update tests for restricted publication tables > $log_location = -s $node_publisher->logfile; > $node_publisher->safe_psql('postgres', "INSERT INTO insert_only_table > VALUES (1, 'to be inserted')"); > $node_publisher->safe_psql('postgres', "UPDATE insert_only_table SET > data = 'updated' WHERE id = 1"); > $logfile = slurp_file($node_publisher->logfile, $log_location); > ok($logfile =~ qr/Filtering UPDATE/, > 'unpublished UPDATE is filtered'); > > $log_location = -s $node_publisher->logfile; > $node_publisher->safe_psql('postgres', "DELETE FROM insert_only_table > WHERE id = 1"); > $logfile = slurp_file($node_publisher->logfile, $log_location); > ok($logfile =~ qr/Filtering DELETE/, > 'unpublished DELETE is filtered'); > > $log_location = -s $node_publisher->logfile; > $node_publisher->safe_psql('postgres', "INSERT INTO delete_only_table > VALUES (1, 'to be deleted')"); > $logfile = slurp_file($node_publisher->logfile, $log_location); > ok($logfile =~ qr/Filtering INSERT/, > 'unpublished INSERT is filtered'); > > ~ ... > 8b. > Change the comment or rearrange the tests so they are in the same > order as described > The comment was changed, and now says "Insert, update, and delete tests ..." but still, the "Filtering INSERT" test is last. IMO, that test should come first to match the comment. > ~ > > 8c. > Looking at the expected logs I wondered if it might be nicer for the > pgoutput_filter_change doing this logging to also emit the relation > name. > Not yet addressed == Kind Regards, Peter Smith. Fujitsu Australia
Re: Recent pg_rewind test failures in buildfarm
On Tue, Apr 22, 2025 at 06:22:51AM +, Bertrand Drouvot wrote: > Yeah, unless that might come from fc415edf8ca but I don't think that's the > case. I've been eyeing this whole area of the code for a few hours to convince myself on HEAD, and I cannot really find a defect directly related to it. I am wondering if we should do a PGSTAT_BACKEND_FLUSH_WAL in the WAL sender, but the end of any transaction happening in a logical WAL sender would make sure that this happens on a periodic basis. Anyway, sorry for make everybody waiting here. I've now removed the assertion down to v15. -- Michael signature.asc Description: PGP signature
Re: Correct documentation for protocol version
On Fri, 11 Apr 2025 at 11:02, Jelte Fennema-Nio wrote: > On Fri, 11 Apr 2025 at 22:57, Dave Cramer wrote: > > Well this isn't quite true since if you request 3.0 and have invalid > options it will return 3.0, which is not the highest supported minor > version. > > Probably good to update this section too then to be similarly correct > as your already updated section. Maybe also good to clarify further > that the version that the server responds with is the protocol version > that will be used during the following communication. > I've updated the wording to specify that the negotiateProtocol message will only be sent if the client requests a major version the server supports. Also added a note saying that this will be the protocol version that will be used for the duration of the connectin See attached. protocol-6.patch Description: Binary data
Re: Check for tuplestorestate nullness before dereferencing
Nikolay Shaplov writes: > В письме от пятница, 18 октября 2024 г. 02:28:22 MSK пользователь David > Rowley > написал: >> It would be good to know if the optimisation added in d2c555ee5 ever >> applies with today's code. If it does apply, we should likely add a >> test case for it and if it never does, then we should just remove the >> optimisation and always create the tuplestore when it's NULL. > That's sounds reasonable. It looks like that removing "node->eflags != 0" > check > is more logical then adding not null check. I don't believe that the patch as-proposed is necessary or reasonable. The case of node->eflags == 0 is reachable; I don't know why David's test didn't see this, but when I try asserting the contrary I get a regression crash on (gdb) p debug_query_string $1 = 0x1d03160 "explain (costs off) declare c1 scroll cursor for select (select 42) as x;" The reason that the subsequent bit of code is safe is that !forward should not possibly be true unless EXEC_FLAG_BACKWARD was given, which'd cause us to create a tuplestore. So if we were going to change anything, I'd propose adding something more like if (!forward && eof_tuplestore) { + Assert(node->eflags & EXEC_FLAG_BACKWARD); if (!node->eof_underlying) { /* or perhaps more directly, Assert that tuplestore is not null. But I don't like the proposed patch because if tuplestore is null here, something's wrong, and we should complain not silently mask the problem. regards, tom lane
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
> So, It seems to me we are better off just setting debug_query_string > to NULL in DropPortal, or alternatively why not just log the statement > automatically at the start of execution whenever we have log_temp_files > 0. > That will allow us to safely capture the statement to blame for the > temp files and will cover all cases? of course we only know about the temp file usage after execution, so this means we will probably end up with unnecessary statement logging, so this may not be such a good idea after all. So this tells me not logging STATEMENT along with log_temp_files is a better alternative and a statement can be logged with other existing mechanisms like log_min_duration_statement. -- Sami Imseih Amazon Web Services (AWS)
Re: long-standing data loss bug in initial sync of logical replication
On Tue, Apr 22, 2025 at 10:57 PM Masahiko Sawada wrote: > > On Tue, Mar 18, 2025 at 2:55 AM Amit Kapila wrote: > > > > On Mon, Mar 17, 2025 at 4:56 PM Hayato Kuroda (Fujitsu) > > wrote: > > > > > > > Regarding the PG13, it cannot be > > > > applied > > > > as-is thus some adjustments are needed. I will share it in upcoming > > > > posts. > > > > > > Here is a patch set for PG13. Apart from PG14-17, the patch could be > > > created as-is, > > > because... > > > > > > 1. WAL record for invalidation messages (XLOG_XACT_INVALIDATIONS) does > > > not exist. > > > 2. Thus the ReorderBufferChange for the invalidation does not exist. > > >Our patch tries to distribute it but cannot be done as-is. > > > 3. Codes assumed that invalidation messages can be added only once. > > > 4. The timing when invalidation messages are consumed is limited: > > > a. COMMAND_ID change is poped, > > > b. start of decoding a transaction, or > > > c. end of decoding a transaction. > > > > > > Above means that invalidations cannot be executed while being decoded. > > > I created two patch sets to resolve the data loss issue. 0001 has less > > > code > > > changes but could resolve a part of issue, 0002 has huge changes but > > > provides a > > > complete solution. > > > > > > 0001 - mostly same as patches for other versions. > > > ReorderBufferAddInvalidations() > > >was adjusted to allow being called several times. As I said above, > > >0001 cannot execute inval messages while decoding the transacitons. > > > 0002 - introduces new ReorderBufferChange type to indicate inval messages. > > >It would be handled like PG14+. > > > > > > Here is an example. Assuming that the table foo exists on both nodes, a > > > publication "pub" which publishes all tables, and a subscription "sub" > > > which > > > subscribes "pub". What if the workload is executed? > > > > > > ``` > > > S1 S2 > > > BEGIN; > > > INSERT INTO foo VALUES (1) > > > ALTER PUBLICATION pub RENAME TO > > > pub_renamed; > > > INSERT INTO foo VALUES (2) > > > COMMIT; > > > LR -> ? > > > ``` > > > > > > With 0001, tuples (1) and (2) would be replicated to the subscriber. > > > An error "publication "pub" does not exist" would raise when new changes > > > are done > > > later. > > > > > > 0001+0002 works more aggressively; the error would raise when S1 > > > transaction is decoded. > > > The behavior is same as for patched PG14-PG17. > > > > > > > I understand that with 0001 the fix is partial in the sense that > > because invalidations are processed at the transaction end, the > > changes of concurrent DDL will only be reflected for the next > > transaction. Now, on one hand, it is prudent to not add a new type of > > ReorderBufferChange in the backbranch (v13) but the change is not that > > invasive, so we can go with it as well. My preference would be to go > > with just 0001 for v13 to minimize the risk of adding new bugs or > > breaking something unintentionally. > > > > Sawada-San, and others involved here, do you have any suggestions on > > this matter? > > Sorry for the late response. > > I agree with just 0001 for v13 as 0002 seems invasive. Given that v13 > would have only a few releases until EOL and 0001 can deal with some > cases in question, I'd like to avoid such invasive changes in v13. > Fair enough. OTOH, we can leave the 13 branch considering following: (a) it is near EOL, (b) bug happens in rare cases (when the DDLs like ALTER PUBLICATION ... ADD TABLE ... or ALTER TYPE ... that don't take a strong lock on table happens concurrently to DMLs on the tables involved in the DDL.), and (c) the complete fix is invasive, even partial fix is not simple. I have a slight fear that if we make any mistake in fixing it partially (of course, we can't see any today), we may not even get a chance to fix it. Now, if the above convinces you or someone else not to push the partial fix in 13, then fine; otherwise, I'll push the 0001 to 13 day after tomorrow. -- With Regards, Amit Kapila.
Re: Fix premature xmin advancement during fast forward decoding
On Tue, Apr 22, 2025 at 12:36 PM Zhijie Hou (Fujitsu) wrote: > > Hi, > > To fix this, I think we can allow the base snapshot to be built during fast > forward decoding, as implemented in the patch 0001 (We already built base > snapshot in fast-forward mode for logical message in logicalmsg_decode()). The idea and code looks okay to me and the performance impact is also not that huge. IIUC, fast_forward decoding mode is used only in two cases. 1) pg_replication_slot_advance and 2) in upgrade flow to check if there are any pending WAL changes which are not yet replicated. See 'binary_upgrade_logical_slot_has_caught_up'-->'LogicalReplicationSlotHasPendingWal'. It seems like this change will not have any negative impact in the upgrade flow as well (in terms of performance and anything else). Thoughts? > > Moreover, I conducted a basic test[2] to test the patch's impact, noting that > advancing the slot incurs roughly a 4% increase in processing time after > applying the patch, which appears to be acceptable. Additionally, the cost > associated with building the snapshot via SnapBuildBuildSnapshot() did not > show > up in the profile. Therefore, I think it's reasonable to refrain from > further optimization at this stage. I agree on this. thanks Shveta
Re: pgsql: Update guidance for running vacuumdb after pg_upgrade.
On Tue, Apr 22, 2025 at 09:43:56PM +0200, Christoph Berg wrote: > Re: Nathan Bossart >> Update guidance for running vacuumdb after pg_upgrade. >> >> Now that pg_upgrade can carry over most optimizer statistics, we >> should recommend using vacuumdb's new --missing-stats-only option >> to only analyze relations that are missing statistics. > > I've been looking at vacuumdb --missing-stats-only because Debian's > pg_upgradecluster is using that now. > > I am wondering if this is really good advice in the pg_upgrade > documentation. Sure it's nice that optimizer statistics are carried > over by pg_upgrade, but the pg_stat_user_tables statistics are not > carried over, and afaict these are the numbers that determine when the > next autovacuum or autoanalyze run is going to happen. By removing the > "please run vacuumdb on all tables" step from the pg_upgrade docs, we > are effectively telling everyone that they should be starting with > these numbers all 0, postponing the next run to some indeterminate > point. Running `vacuumdb --missing-stats-only` does not fix that > because it's skipping the tables. Is that the message we want to send? > > (If I am misinterpreting the situation the docs should still explain > why this is ok.) relation_needs_vacanalyze() uses dead_tuples, ins_since_vacuum, and mod_since_analyze. IIUC a full post-upgrade vacuumdb run would only set dead_tuples to a nonzero value, so the worst-case scenario is that it would take longer before a vacuum is triggered based on autovacuum_vacuum_{threshold,max_threshold,scale_factor}. To address this, I think we'd need to recommend using "vacuumdb --all --analyze-only" instead. We could alternatively suggest first running "vacuumdb --all --analyze-in-stages --missing-stats-only" (to fill in any missing stats) followed by "vacuumdb --all --analyze-only" (to update dead_tuples). However, I'm not sure how concerned to be about this. It does seem bad that it might take longer for tables to be vacuumed for the first time after upgrade, but I believe that's already the case for any type of unclean shutdown (e.g., immediate shutdown, server crash, starting from a base backup, point-in-time recovery). I see that we do recommend running ANALYZE after pg_stat_reset(), though. In any case, IMO it's unfortunate that we might end up recommending roughly the same post-upgrade steps as before even though the optimizer statistics are carried over. -- nathan
Re: pgsql: Update guidance for running vacuumdb after pg_upgrade.
Re: Nathan Bossart > In any case, IMO it's unfortunate > that we might end up recommending roughly the same post-upgrade steps as > before even though the optimizer statistics are carried over. Maybe the docs (and the pg_upgrade scripts) should recommend the old procedure by default until this gap is closed? People could then still opt to use the new procedure in specific cases. Christoph
Re: What's our minimum supported Python version?
tl;dr I think requiring support of Python 3.9 for PG18 would probably be reasonable and save us a bunch of maintenance burden over the next 5 years. On Tue, 22 Apr 2025 at 21:41, Tom Lane wrote: > Yeah, that. The distros that are still shipping older Pythons > are LTS distros where part of the deal is that the vendor > is supposed to back-patch security fixes, even if the upstream > has moved on. Maybe a particular vendor is falling down on that > job, but it's not for us to rate their performance. So I don't > buy that "but security!" is a good argument here. I totally agree that "security" is not the discussion that we should be having here. I think the Python version decision (or really any minimum version decision) should be based on some analysis of costs to us for maintaining support vs benefits to the users. I'm pretty sure most users of RHEL expect most modern software not to compile/work completely effortlessly on their distros without some effort on their part or on the part of RHEL packagers. That's kinda what you're signing up for if you choose a distro like that. It might very well be that supporting Python 3.6 does not require many code changes. But it's so old that installing it on most of the recent OSes is not trivial, e.g. uv does not support installing it[1]. By saying we support it, it means that all developers need to go through serious hoops to be able to test that their python scripts work correctly on their own machines. And given our own support policy, we'd probably have to do that for 5 more years (unless we want to change supported python versions in minor PG releases, which I highly doubt we want). So that leaves the question, how much are we actually helping users by spending all that effort on keeping support for old Python versions? Based on the Red Hat docs, it's pretty easy to install newer Python3 versions on RHEL8. Depending on the X in RHEL 8.X you can install different Python3 versions from the official repos[2]. And based on the official RHEL EOL policy[3], only RHEL 8.4, 8.6, 8.8 and 8.10 are still getting some level of support. With 8.4 and 8.6 only getting the final level of support called "Update Services for SAP Solutions", and with 8.8 reaching that same final support level before PG18 is released (and 8.4 becoming unsupported before PG18). Based on that information (only taking into account only RHEL8, not other distros). It seems like we could at least bump the minimum Python support to Python 3.9, which is the newest python that's installable in RHEL 8.4 and 8.6. Anyone that's running a RHEL version that's still supported by Red Hat can then simply do "yum install python39" and run the tests (assuming we set up meson/autoconf to correctly detect the "python3.9" binary). I'd even be curious how much resistance we'd get from RHEL users if we'd bump the Python requirement for PG18 to Python 3.12, which is the newest version that RHEL 8.10 supports. RHEL 8.10 is the only RHEL 8 version that is getting regular support from Red Hat. (Full disclosure: I've used close to EOL versions of RHEL7 professionally, and I hated every second it. I constantly felt like I was walking through a software museum. Having to work around bugs that had been fixed for ages in upstream) Looking a little further than RHEL: - Ubuntu 20.04 can easily install Python3.9 from the official package repos. Currently this is still supported by Cristoph his pgdg repos, but looks like that support might be removed before PG18 given it's transition to "Expanded Security Maintenance" next month. - Ubuntu 22.04 can easily install Python3.11 from the official package repos. Still has full upstream support for 2 years. - Debian Bullseye (EOL in 1.5 years) can only install Python3.9 from the official package repos. - Debian Bookworm (EOL in 3 years) has Pytho - Python 3.8 itself is declared EOL by upstream - Python 3.10 will be declared EOL by upstream around the release of PG19 Based on all this info, I think that it sounds quite reasonable to start requiring Python3.9 for PG18. And for PG19 we could probably start requiring Python3.11 [1]: https://github.com/astral-sh/uv/issues/9833 [2]: https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/8/html/configuring_basic_system_settings/installing-and-using-dynamic-programming-languages_configuring-basic-system-settings#assembly_installing-and-using-python_installing-and-using-dynamic-programming-languages [3]: https://access.redhat.com/support/policy/updates/errata#RHEL8_Planning_Guide P.S. I CC-ed Devrim and Christoph, for insights into this from the packagers perspective.
Re: [PoC] Federated Authn/z with OAUTHBEARER
Hi Jacob, thank you for detailed explanation and links! Am I right that classic OAuth flow "create user account based on a token" is implemented using custom validators? 1) In pg_hba.conf set user to all and "delegate_ident_mapping=1" "local all all oauth issuer=$issuer scope=$scope delegate_ident_mapping=1" 2) Write a custom validator that will "execute" in C `CREATE USER token.name WITH token.listofOptions` after verification of a token. On 25-04-21 19:57, Jacob Champion wrote: We have some options for dealing with them, since their documentation instructs clients to hardcode their API entry points instead of using discovery. (That makes it easy for us to figure out when we're talking to Google, and potentially switch to a quirks mode.) What do you mean by "discovery"? OpenID link that returns endpoint? Google has this link https://accounts.google.com/.well-known/openid-configuration OUTPUT: { "issuer": "https://accounts.google.com";, "authorization_endpoint": "https://accounts.google.com/o/oauth2/v2/auth";, "device_authorization_endpoint": "https://oauth2.googleapis.com/device/code";, "token_endpoint": "https://oauth2.googleapis.com/token";, "userinfo_endpoint": "https://openidconnect.googleapis.com/v1/userinfo";, "revocation_endpoint": "https://oauth2.googleapis.com/revoke";, "jwks_uri": "https://www.googleapis.com/oauth2/v3/certs";, } Here it's described https://developers.google.com/identity/openid-connect/openid-connect But! Before we do that: How do you intend to authorize tokens issued by Google? Last I checked, they still had no way to register an application-specific scope, making it very dangerous IMO to use a public flow [2]. I've also thought as Antonin about https://www.googleapis.com/oauth2/v3/userinfo for verification As I understand from [2], the current problem is security, Google doesn't want to add new scopes. -- Best wishes, Ivan Kush Tantor Labs LLC
Re: pgsql: Update guidance for running vacuumdb after pg_upgrade.
On Tue, Apr 22, 2025 at 11:03:29PM +0200, Christoph Berg wrote: > Re: Nathan Bossart >> In any case, IMO it's unfortunate >> that we might end up recommending roughly the same post-upgrade steps as >> before even though the optimizer statistics are carried over. > > Maybe the docs (and the pg_upgrade scripts) should recommend the old > procedure by default until this gap is closed? People could then still > opt to use the new procedure in specific cases. I think we'd still want to modify the --analyze-in-stages recommendation (from what is currently recommended for supported versions). If we don't, you'll wipe out the optimizer stats you brought over from the old version. Here is a rough draft of what I am thinking. -- nathan diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml index df13365b287..648c6e2967c 100644 --- a/doc/src/sgml/ref/pgupgrade.sgml +++ b/doc/src/sgml/ref/pgupgrade.sgml @@ -833,17 +833,19 @@ psql --username=postgres --file=script.sql postgres Because not all statistics are not transferred by - pg_upgrade, you will be instructed to run a command to + pg_upgrade, you will be instructed to run commands to regenerate that information at the end of the upgrade. You might need to set connection parameters to match your new cluster. - Using vacuumdb --all --analyze-only --missing-stats-only - can efficiently generate such statistics. Alternatively, + First, use vacuumdb --all --analyze-in-stages --missing-stats-only - can be used to generate minimal statistics quickly. For either command, - the use of --jobs can speed it up. + to quickly generate minimal optimizer statistics for relations without + any. Then, use vacuumdb --all --analyze-only to ensure + all relations have updated cumulative statistics for triggering vacuum and + analyze. For both commands, the use of --jobs can speed + it up. If vacuum_cost_delay is set to a non-zero value, this can be overridden to speed up statistics generation using PGOPTIONS, e.g., PGOPTIONS='-c diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c index 18c2d652bb6..f1b90c5957e 100644 --- a/src/bin/pg_upgrade/check.c +++ b/src/bin/pg_upgrade/check.c @@ -814,9 +814,12 @@ output_completion_banner(char *deletion_script_file_name) } pg_log(PG_REPORT, - "Some optimizer statistics may not have been transferred by pg_upgrade.\n" + "Some statistics are not transferred by pg_upgrade.\n" "Once you start the new server, consider running:\n" - "%s/vacuumdb %s--all --analyze-in-stages --missing-stats-only", new_cluster.bindir, user_specification.data); + "%s/vacuumdb %s--all --analyze-in-stages --missing-stats-only\n" + "%s/vacuumdb %s--all --analyze-only", + new_cluster.bindir, user_specification.data, + new_cluster.bindir, user_specification.data); if (deletion_script_file_name) pg_log(PG_REPORT,
Re: What's our minimum supported Python version?
Re: Jelte Fennema-Nio > - Ubuntu 20.04 can easily install Python3.9 from the official package > repos. Currently this is still supported by Cristoph his pgdg repos, > but looks like that support might be removed before PG18 given it's > transition to "Expanded Security Maintenance" next month. 20.04 (focal) is already too old for PG18 because of llvm version woes, so we are already not building it there. (I forgot what the precise problem was, but I figured focal would be EOL before the PG18 release anyway.) > - Ubuntu 22.04 can easily install Python3.11 from the official package > repos. Still has full upstream support for 2 years. > - Debian Bullseye (EOL in 1.5 years) can only install Python3.9 from > the official package repos. > - Debian Bookworm (EOL in 3 years) has Pytho ^ n 3.11 > - Python 3.8 itself is declared EOL by upstream > - Python 3.10 will be declared EOL by upstream around the release of PG19 > Based on all this info, I think that it sounds quite reasonable to > start requiring Python3.9 for PG18. And for PG19 we could probably > start requiring Python3.11 Ack. Bullseye LTS will be supported until August 2026, so very close to the PG19 release, but it should work out not to support PG19 there. https://wiki.debian.org/LTS/ Christoph
Re: [PATCH] contrib/xml2: xslt_process() should report XSLT-related error details
Hello Tom! On Tue Apr 22, 2025 at 18:29:02 GMT +03, Tom Lane wrote: > Hi Robin, thank you for the patch! This has arrived too late to > be considered for PG v18, but please add the thread to the open > commitfest I will do that. But what about the other changes I proposed (xml data type and hstore support)? Should they all be discussed in this thread or should I create separate threads and consequently separate patch entries for the Commitfest? And should they all be based on the master branch or can they be based upon each other? Do you think they will help to bring xslt_process() closer to adoption into core? > Just from a quick eyeballing of the patch, it seems generally > sane, but I wonder why you have xml_generic_error_handler > creating and then destroying an errorBuf instead of just > appending the results directly to xmlerrcxt->err_buf. The > extra buffer doesn't seem to be doing much except increasing > our odds of an OOM failure right there. You are right. It was a leftover from thinning out xml_errorHandler(). I attached a new version of the patch where this is fixed. I also ran all touched files through pgindent. > I'm also a little suspicious of > > - xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_LEGACY); > + xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_ALL); > > as that looks like it is probably changing the behavior of > the module in ways bigger than just providing more error > details. This code has been in legacy status for so long > that I feel a little hesitant about doing that. You wouldn't actually need that change to get more error details as the details are provided by xml_generic_error_handler() which ignores the strictness level. The reason is that libxml generic error handlers cannot discern between warnings and errors. The only thing that the strictness level does is affecting how xml_errorHandler() works: With STRICTNESS_ALL, it can actually log warnings without appending anything to the err_buf and set the err_occurred flag which is apparently useful because some libxml functions can result in error-level messages without the function calls signalling failure. At least, otherwise the code doesn't make sense to me. In other words, using the STRICTNESS_ALL should allow you to see warnings, you wouldn't otherwise see and catch errors, you wouldn't otherwise catch -- at least if they occur in libxml itself. As far as I understand from the code comment xml_errorHandler(), STRICTNESS_LEGACY was introduced, just so you don't have to touch the existing contrib/xml2 code that did not check err_occurred: /* * Legacy error handling mode. err_occurred is never set, we just add the * message to err_buf. This mode exists because the xml2 contrib module * uses our error-handling infrastructure, but we don't want to change its * behaviour since it's deprecated anyway. This is also why we don't * distinguish between notices, warnings and errors here --- the old-style * generic error handler wouldn't have done that either. */ Since we now check that flag in xslt_proc.c, we should be fine. But I might be overlooking something. > If we are willing to do it, should we go further and clean out the use > of PG_XML_STRICTNESS_LEGACY in xml2/xpath.c as well? I think this is a separate construction site. We don't necessarily have to touch xpath.c and I didn't plan to do so in the near future. We should first determine whether the XPath support is actually worth keeping (should be adopted into core) or rather stay deprecated. Best regards, Robin Haberkorn -- Robin Haberkorn Senior Software Engineer B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537 From b4d9a6594142cf85e4e38352f6be3cea260962f7 Mon Sep 17 00:00:00 2001 From: Robin Haberkorn Date: Thu, 17 Apr 2025 13:15:48 +0300 Subject: [PATCH v2] contrib/xml2: xslt_process() now reports XSLT-related error details * It sets (and restores) the libxslt error handler (xsltSetGenericErrorFunc()). Since it only supports old-school "generic" error handlers, which are no longer used in PG's libxml code, we reintroduced a "generic" error handler xml_generic_error_handler() in xml.c. * The alternative would have been to expose PgXmlErrorContext in xml.h, so we could implement a generic handler in xslt_proc.c. This is obviously not desirable, as it would depend on USE_LIBXML. * No longer use the PG_XML_STRICTNESS_LEGACY for error handling, but query the error_occurred-flag via pg_xml_error_occurred(). The existance of this getter is another hint, that we are not supposed to expose PgXmlErrorContext in xml.h. * This change means that xslt_process() now reports not only details about XML parsing errors, but XSLT-schema deviations and missing stylesheet parameters as well. * The XSLT error messages now also contain line numbers. For that to work, we had to set a dummy "SQL" URL when parsing XML strings.
Re: [PATCH] Support older Pythons in oauth_server.py
Jacob Champion writes: > > This is tested against Python 3.6.15 (3.6 went EOL at the end of > 2021). I'm working on getting Rocky 8 installed locally to test > against. If it's decided we want to support downwards to 3.5, I will > test there too (but I hope we don't; see parent thread). > I did a quick test on 3.5. It seems to work after removing f-strings... At this point, I would be really happy with 3.6.
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Tue, Apr 22, 2025 at 3:02 AM Daniel Gustafsson wrote: > + if oauth_flow_supported > +cdata.set('USE_LIBCURL', 1) > + elif libcurlopt.enabled() > +error('client OAuth is not supported on this platform') > + endif > We already know that libcurlopt.enabled() is true here so maybe just doing > if-else-endif would make it more readable and save readers thinking it might > have changed? Features are tri-state, so libcurlopt.disabled() and libcurlopt.enabled() can both be false. :( My intent is to fall through nicely in the case where -Dlibcurl=auto. (Our minimum version of Meson is too old to switch to syntax that makes this more readable, like .allowed(), .require(), .disable_if(), etc...) > Also, "client OAuth" reads a bit strange, how about "client-side > OAuth" or "OAuth flow module"? > ... > I think we should take this opportunity to turn this into a > appendPQExpBuffer() > with a format string instead of two calls. > ... > Now that the actual variable, errbuf->len, is short and very descriptive I > wonder if we shouldn't just use this as it makes the code even clearer IMO. All three done in v9, attached. Thanks! --Jacob 1: 5f87f11b18e = 1: 5f87f11b18e Add minor-version counterpart to (PG_)MAJORVERSION 2: 4c9cc7f69af ! 2: 9e37fd7c217 oauth: Move the builtin flow into a separate module @@ configure.ac: if test "$PORTNAME" = "win32" ; then +if test "$with_libcurl" = yes ; then + # Error out early if this platform can't support libpq-oauth. + if test "$ac_cv_header_sys_event_h" != yes -a "$ac_cv_header_sys_epoll_h" != yes; then -+AC_MSG_ERROR([client OAuth is not supported on this platform]) ++AC_MSG_ERROR([client-side OAuth is not supported on this platform]) + fi +fi + @@ meson.build: if not libcurlopt.disabled() + if oauth_flow_supported +cdata.set('USE_LIBCURL', 1) + elif libcurlopt.enabled() -+error('client OAuth is not supported on this platform') ++error('client-side OAuth is not supported on this platform') + endif + else @@ src/interfaces/libpq-oauth/oauth-curl.c: pg_fe_run_oauth_flow_impl(PGconn *conn) * also the documentation for struct async_ctx. */ if (actx->errctx) - { +- { - appendPQExpBufferStr(&conn->errorMessage, - libpq_gettext(actx->errctx)); - appendPQExpBufferStr(&conn->errorMessage, ": "); -+ appendPQExpBufferStr(errbuf, libpq_gettext(actx->errctx)); -+ appendPQExpBufferStr(errbuf, ": "); - } +- } ++ appendPQExpBuffer(errbuf, "%s: ", libpq_gettext(actx->errctx)); if (PQExpBufferDataBroken(actx->errbuf)) - appendPQExpBufferStr(&conn->errorMessage, @@ src/interfaces/libpq-oauth/oauth-curl.c: pg_fe_run_oauth_flow_impl(PGconn *conn) if (actx->curl_err[0]) { - size_t len; - +- size_t len; +- - appendPQExpBuffer(&conn->errorMessage, -" (libcurl: %s)", actx->curl_err); + appendPQExpBuffer(errbuf, " (libcurl: %s)", actx->curl_err); @@ src/interfaces/libpq-oauth/oauth-curl.c: pg_fe_run_oauth_flow_impl(PGconn *conn) /* Sometimes libcurl adds a newline to the error buffer. :( */ - len = conn->errorMessage.len; - if (len >= 2 && conn->errorMessage.data[len - 2] == '\n') -+ len = errbuf->len; -+ if (len >= 2 && errbuf->data[len - 2] == '\n') ++ if (errbuf->len >= 2 && errbuf->data[errbuf->len - 2] == '\n') { - conn->errorMessage.data[len - 2] = ')'; - conn->errorMessage.data[len - 1] = '\0'; - conn->errorMessage.len--; -+ errbuf->data[len - 2] = ')'; -+ errbuf->data[len - 1] = '\0'; ++ errbuf->data[errbuf->len - 2] = ')'; ++ errbuf->data[errbuf->len - 1] = '\0'; + errbuf->len--; } } v9-0001-Add-minor-version-counterpart-to-PG_-MAJORVERSION.patch Description: Binary data v9-0002-oauth-Move-the-builtin-flow-into-a-separate-modul.patch Description: Binary data
DOCS - create publication (tweak for generated columns)
Hi, While re-reading documentation about logical replication of generated columns, I came across this paragraph in the CREATE PUBLICATION ... FOR TABLE description [1]. When a column list is specified, only the named columns are replicated. The column list can contain stored generated columns as well. If no column list is specified, all table columns (except generated columns) are replicated through this publication, including any columns added later. It has no effect on TRUNCATE commands. See Section 29.5 for details about column lists. That "except generated columns" part is not strictly correct because it fails to account for the 'publish_generated_columns' parameter. I've suggested a more accurate description below. SUGGESTION When a column list is specified, only the named columns are replicated. Stored generated columns may be included in the list. Specifying a column list has no effect on TRUNCATE commands. See Section 29.5 for details about column lists. If no column list is specified, all table columns are replicated through this publication, including any columns added later. Generated columns are included in this case only if publish_generated_columns is set to stored. I've attached a patch to implement this suggested wording. Thoughts? == [1] https://www.postgresql.org/docs/devel/sql-createpublication.html#SQL-CREATEPUBLICATION-PARAMS-FOR-TABLE Kind Regards, Peter Smith. Fujitsu Australia v1-0001-DOCS-create-publication-for-table.patch Description: Binary data
Re: jsonapi: scary new warnings with LTO enabled
On Tue, Apr 22, 2025 at 3:10 AM Daniel Gustafsson wrote: > My preference is that no operation can silently work on a failed object, but > it's not a hill (even more so given where we are in the cycle). Hm, okay. Something to talk about with Andrew and Peter, maybe. > The attached > v3 allocates via the JSON api, no specific error handling should be required > as > it's already handled today. pgindent shows one whitespace change on my machine (comment indentation), but other than that, LGTM! Fuzzers are happy too. Thanks, --Jacob
Re: pgsql: Add function to get memory context stats for processes
Hi, On 2025-04-17 10:42:45 -0400, Robert Haas wrote: > On Tue, Apr 15, 2025 at 6:11 AM Andres Freund wrote: > > There very well could be a CFI - but it better be somewhere where the > > in-memory state is consistent. Otherwise an error inside raised in the CFI > > would lead the in-memory state inconsistent which then would cause problems > > when cleaning up the dsa during resowner release or process exit. > > > > What am I missing here? > > I think maybe you're only thinking about gathering the data. What > about publishing it? If the DSA code were interrupted at a CFI and the > interrupting code went and tried to perform a DSA allocation to store > the resulting data and then returned to the interrupted DSA operation, > would you expect the code to cope with that? I would expect the DSA code to not call CFI() in such a place - afaict nearly all such cases would also not be safe against errors being raised in the CFI() and then later again allocating memory from the DSA (or resetting it). Similarly, aset.c better not accept interrupts in the middle of, e.g., AllocSetAllocFromNewBlock(), since we'd loose track of the allocation. Greetings, Andres Freund
Re: Cygwin support
On Tue, 2025-04-22 at 10:26 -0400, Tom Lane wrote: > I vaguely recall some discussion about whether building with readline > has become possible under MSVC. I think it'd make a lot of people > happy if that could happen (but I hasten to add that I'm not > volunteering). https://postgr.es/m/CACLU5mThm-uCtERMVHMoGED2EPfyS54j83WB20s_BmzVQuMkpg%40mail.gmail.com Seems like Kirk didn't pursue this further. Yours, Laurenz Albe
Re: Fortify float4 and float8 regression tests by ordering test results
Pavel Borisov writes: > On Tue, 22 Apr 2025 at 20:22, Alexander Korotkov wrote: >> On Tue, Apr 22, 2025 at 7:20 PM Tom Lane wrote: >>> Alexander Korotkov writes: I'd like to add that float4.out not only assumes that insert-ordering is preserved (this could be more-or-less portable between table AMs). It also assumes the way UPDATE moves updated rows. That seems quite heap-specific. You can see in the following fragment, updated rows jump to the bottom. >>> I'd be willing to consider a policy that we don't want to depend on >>> exactly where UPDATE moves rows to. The proposed patch is not that, >>> however. >> OK, that makes sense for me. > Thanks for this input! > This was my first intention to fix only the test that was affected by > UPDATE-order specifics, broke when runnung on an extension AM. > Maybe I was too liberal and added ORDER BY's more than needed. > I definitely agree to the proposal. On further reflection, this policy makes plenty of sense because even heapam is not all that stable about row order after UPDATE. With tables large enough to draw the attention of autovacuum, we've had to use ORDER BY or other techniques to stabilize the results, because the timing of background autovacuums affected where rows got put. These tests are different only because the tables are small enough and the updates few enough that autovacuum doesn't get involved. I think it's reasonable that at a project-policy level we should not consider that an excuse. For example, a change in autovacuum's rules could probably break these tests even with heapam. On the other hand, as I mentioned earlier, a table AM that doesn't reproduce original insertion order would break a whole lot more tests than these. So that's a bridge I'd rather not cross, at least not without a lot of consideration. BTW, you forgot to update expected/float4-misrounded-input.out. regards, tom lane
Re: Logical Replication of sequences
Hi Vignesh. Review comments for patch v20250422-0001. == Commit message 1. This patch introduces a new function: pg_sequence_state function allows retrieval of sequence values including LSN. SUGGESTION This patch introduces a new function, 'pg_sequence_state', which allows retrieval of sequence values, including the associated LSN. == src/backend/commands/sequence.c pg_sequence_state: 2. + ereport(ERROR, + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + errmsg("permission denied for sequence %s", + RelationGetRelationName(seqrel; Has redundant parentheses. == Kind Regards, Peter Smith. Fujitsu Australia
Re: What's our minimum supported Python version?
Tom Lane writes: > Florents Tselai writes: >> On 19 Apr 2025, at 7:17 PM, Tom Lane wrote: >>> I think we need to do some combination of moving our >>> minimum-supported-version goalposts forward, making sure that >>> whatever we claim is the minimum Python version is actually >>> being tested in the buildfarm, and fixing oauth_server.py >>> so that it works on that version. > >> From an Python ecosystem perspective, >> 3.9 is the usual minimum that people use in CI matrices nowdays. >> So if it was up to me, that’s what I’d choose. > > Per these numbers, that would be cutting off 31% of the buildfarm, > including a lot of still-in-support distros such as RHEL8. > So I would say that's not likely to be our choice. > > regards, tom lane Also from a python world perspective, we really don't want to run EOL versions. Besides the security reasons, bugs in thirdy-party dependencies usually are not fixed and the whole system seems to rot very quickly. Of course, this does not happen if one does not rely on third-party deps. Even in this case, I would say that it is wise to support a relatively small set because the built-in libs will vary over versions, and a wide range of versions will cause more troubles like this one. The oldest non EOL version is 3.9 right now. My suggestion is to follow the official supported releases. I'm not familiar with the buildfarm, but I know that python3.6 does not come installed by default in RHEL8, so instead of installing this version we can simply install 3.9, 3.11 or 3.12. In a persistent workstation, I believe the following should do the trick (tested on a 8.5 box): sudo dnf install python3.12 sudo dnf remove python36 BTW, we are used to create a virtualenv for everything, but that is another topic. Best regards, Renan Fonseca
Re: [PATCH] Support older Pythons in oauth_server.py
On Tue, Apr 22, 2025 at 2:43 PM Renan Alves Fonseca wrote: > I did a quick test on 3.5. Thanks! > It seems to work after removing > f-strings... At this point, I would be really happy with 3.6. Ah, makes sense. Well, I'm starting with the 3.6 fix regardless, since it's preventing RHEL8 testing. If we decide we must support 3.5, I can patch f-strings too. Fingers crossed... --Jacob
Re: What's our minimum supported Python version?
Jelte Fennema-Nio writes: > I totally agree that "security" is not the discussion that we should > be having here. I think the Python version decision (or really any > minimum version decision) should be based on some analysis of costs to > us for maintaining support vs benefits to the users. [ shrug... ] This thread in itself has already consumed more effort than we've spent thinking about old-Python problems for the last several years. (Otherwise, we would probably have updated that supported-version claim in installation.sgml some time ago.) So I doubt that there's all that much in cost savings to us that'd outweigh benefits to users. The fact that Jacob was able to fix oauth_server.py without (it seemed like) very much work seems to bear out that opinion. This calculus might change if we start to rely more heavily on Python for build and test infrastructure than we do today. But I think we can have that conversation when it happens. Worth noting also is that there are different requirements in different places. For example, contrib/unaccent/generate_unaccent_rules.py is probably only ever run by developers, so I doubt anyone is going to make a fuss if it doesn't work on hoary Pythons. Test infrastructure is more of a problem, since we want to be able to test on any platform that we consider supported at all. Anyway, what I propose that we do for now is replace the installation.sgml text The minimum required version is Python 3.2. with The minimum supported version is Python 3.6.8. where I use "supported" advisedly, as in "it might work with something older, but we don't promise to fix it if not". regards, tom lane
Re: What's our minimum supported Python version?
On Tue, Apr 22, 2025 at 2:28 PM Jelte Fennema-Nio wrote: > I'm pretty sure most users of RHEL expect most modern software not to > compile/work completely effortlessly on their distros without some > effort on their part or on the part of RHEL packagers. That's kinda > what you're signing up for if you choose a distro like that. This is the core of my position, too. I think it's reasonable to support Python versions for some time after they go EOL, but I don't think that we need to treat "users who want bleeding-edge Postgres with long-EOL dependencies" as something to cater to, at the expense of the committer testing matrix. Note that Meson itself has updated to Python 3.7 as a minimum version (as it now warns you, loudly). Thanks, --Jacob
Re: DOCS - create publication (tweak for generated columns)
On Tuesday, April 22, 2025, Peter Smith wrote: > Hi, > > While re-reading documentation about logical replication of generated > columns, I came across this paragraph in the CREATE PUBLICATION ... > FOR TABLE description [1]. > > > When a column list is specified, only the named columns are > replicated. The column list can contain stored generated columns as > well. If no column list is specified, all table columns (except > generated columns) are replicated through this publication, including > any columns added later. It has no effect on TRUNCATE commands. See > Section 29.5 for details about column lists. > > > That "except generated columns" part is not strictly correct because > it fails to account for the 'publish_generated_columns' parameter. > I've suggested a more accurate description below. > > > SUGGESTION > > When a column list is specified, only the named columns are > replicated. Stored generated columns may be included in the list. > Specifying a column list has no effect on TRUNCATE commands. See > Section 29.5 for details about column lists. If no column list is > specified, all table columns are replicated through this publication, > including any columns added later. Generated columns are included in > this case only if publish_generated_columns is set to stored. > > > I've attached a patch to implement this suggested wording. > > Thoughts? > > == > [1] https://www.postgresql.org/docs/devel/sql-createpublication.html#SQL- > CREATEPUBLICATION-PARAMS-FOR-TABLE > In the column list I would stick to mentioning what cannot be specified, since it would be assumed by default that any column on the table is fair game. I believe that means not mentioning stored generated and instead mentioning virtual generated. It really seems odds mentioning stored generated being allowed before mentioning when they are optional, default excluded. Alternative: The optional column list may include any non-virtual columns of the table. If omitted, tables publish all (including future) non-generated columns by default, and may publish stored generated columns if the publication option publish_generated_columns is set to stored. See Section 29.5 for details about column lists. I don’t get why truncate is mentioned here. I omitted it intentionally. David J.
Re: Row pattern recognition
> Attached are the v30 patches, just adding a patch to change the > default io_method parameter to sync, expecting this affects something > to the recent CFbot failure. > https://commitfest.postgresql.org/patch/4460/ > https://cirrus-ci.com/task/6078653164945408 > which is similar to: > https://www.postgresql.org/message-id/20250422.39.1502127917165838403.ishii%40postgresql.org CFBot has just run tests against v30 patches and now it turns to green again! I guess io_method = sync definitely did the trick. Note that previously only the Windows Server 2019, VS 2019 - Neason & ninja test had failed. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: Allow database owners to CREATE EVENT TRIGGER
On Tuesday, April 22, 2025, Steve Chavez wrote: > > alter event trigger command which doesn’t need to be exercised here > > That part does need to be tested, I modified `AlterEventTriggerOwner_internal` > to allow altering owners to regular users. Before it was only restricted to > superusers. > > > Actually, leave the other member around, but not granted ownership, and > both create tables, to demonstrate that a non-superuser and non-owner is > unaffected by the trigger. > > I've updated the tests accordingly. Please let me know if that's what you > had in mind. > Pretty much. You have a bad drop table cleanup command, and I’d drop the entire alter event trigger owner test. The other thing I’m wondering, but haven’t gotten around to testing, is whether a role affected by the event trigger is able to just drop the trigger given this implementation. I always get member/member-of dynamics confused. Having member (possibly via set role…) trying to drop the trigger would be good to prove that it isn’t allowed. David J.
Re: Allow database owners to CREATE EVENT TRIGGER
On Tuesday, April 22, 2025, David G. Johnston wrote: > On Tuesday, April 22, 2025, Steve Chavez wrote: > >> > alter event trigger command which doesn’t need to be exercised here >> >> That part does need to be tested, I modified >> `AlterEventTriggerOwner_internal` to allow altering owners to regular >> users. Before it was only restricted to superusers. >> > Ok. I missed this. David J.
Re: Allow database owners to CREATE EVENT TRIGGER
On Tuesday, April 22, 2025, David G. Johnston wrote: > On Tuesday, April 22, 2025, David G. Johnston > wrote: > >> On Tuesday, April 22, 2025, Steve Chavez wrote: >> >>> > alter event trigger command which doesn’t need to be exercised here >>> >>> That part does need to be tested, I modified >>> `AlterEventTriggerOwner_internal` to allow altering owners to regular >>> users. Before it was only restricted to superusers. >>> >> > Ok. I missed this. > Sorry for the self-reply but this nagged at me. It’s probably not a big deal either way, but the prior test existed to ensure that a superuser couldn’t do something they are otherwise are always permitted to do - assign object to whomever they wish. So event_trigger.sql had a test that errored showing this anomaly. You moved the test and now are proving it doesn’t error. But it is not expected to error; and immediately above you already show that a non-superuser can be an owner. We don’t need a test to show a superuser demonstrating their normal abilities. IOW, select test cases around the feature as it is implemented now, not its history. A personal one-off test to ensure that no super-user prohibitions remained will suffice to make sure all such code that needed to be removed is gone. David J.
Re: disabled SSL log_like tests
I wrote: > The fact that mamba hasn't been failing on the other affected > tests is a bit puzzling. mamba isn't running kerberos or ldap > or oauth_validator, so the lack of failures there is expected. > authentication/t/001_password.pl appears accidentally not vulnerable: > it's not using log_like in any connect_fails tests. (It is using > log_unlike, so those tests could be giving silent false negatives.) > But authentication/t/003_peer.pl has 8 test cases that look > vulnerable. I guess there must be some extra dollop of timing > weirdness in the ssl tests that's not there in 003_peer.pl. After pushing this patch, the probable explanation for the "extra timing weirdness" finally penetrated my brain: the error reports we are looking for are cases that are detected by OpenSSL. So it's plausible that OpenSSL has told the connecting client to go away before it returns the error indication to the backend's calling code, which would be what would log the message. That is what provides the window for a race condition. You still need a bunch of assumptions about the kernel's subsequent scheduling of psql and the TAP script versus the backend, but the whole thing is certainly dependent on psql getting the word sooner than the backend. (Not all of the tests disabled by 55828a6b6 meet this description, but a bunch of them do, and I believe that I just zapped every log_like condition in the script rather than trying to be selective about which ones were known to have failed.) It seems at least plausible that the Kerberos tests could have a similar problem. I'm less sure about LDAP or OAuth. But authentication/t/003_peer.pl would not, because elog.c emits errors to the log before sending them to the client. So that explains the lack of buildfarm reports from mamba, and it's unclear that we have any other animals with the right timing behavior to see this. regards, tom lane
Re: [PATCH] Support older Pythons in oauth_server.py
On Tue, Apr 22, 2025 at 12:36 PM Tom Lane wrote: > Thanks! I confirm this works for me on RHEL8 with 3.6.8. Great, thanks for the quick review! I'm considering committing this myself. Do you mind if it takes a day or so to get this in, while I quadruple-check everything, or are you blocked on this? --Jacob
Re: Cygwin support
On 2025-04-22 Tu 12:55 PM, Laurenz Albe wrote: On Tue, 2025-04-22 at 10:26 -0400, Tom Lane wrote: I vaguely recall some discussion about whether building with readline has become possible under MSVC. I think it'd make a lot of people happy if that could happen (but I hasten to add that I'm not volunteering). https://postgr.es/m/CACLU5mThm-uCtERMVHMoGED2EPfyS54j83WB20s_BmzVQuMkpg%40mail.gmail.com Seems like Kirk didn't pursue this further. It's worth looking at, but I don't think it's possible for release 18 let alone the older releases. We should look at retiring Cygwin when all the live branches have psql with a working readline on native Windows builds. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: AIO v2.5
On 3/18/25 21:12, Andres Freund wrote: > Hi, > > Attached is v2.10, with the following changes: > > ... > > - committed io_method=worker > There's an open item related to this commit (247ce06b883d), based on: For now the default io_method is changed to "worker". We should re- evaluate that around beta1, we might want to be careful and set the default to "sync" for 18. FWIW the open item is in the "recheck mid-beta" section, but the commit message says "around beta1" which is not that far away (especially if we choose to do beta1 on May 8, with the minor releases). What information we need to gather to make the decision and who's expected to do it? I assume someone needs to do run some benchmarks, but is anyone working on it already? What benchmarks, what platforms? FWIW I'm asking because of the RMT, but I'm also willing to do some of the tests, if needed - but it'd be good to get some guidance. regards -- Tomas Vondra
Re: [PATCH] Support older Pythons in oauth_server.py
On Tue, Apr 22, 2025 at 3:17 PM Jelte Fennema-Nio wrote: > The way you replaced this does not have the same behaviour in the case > where the prefix/suffix is not part of the string. removeprefix/suffix > will not remove any characters in that case, but your code will always > remove the number of characters that the suffix/prefix is long. Maybe > your other checks and definition of the OAuth spec ensure that these > prefixes/suffixes are present when you remove them Correct. Directly above the changed code are the prefix checks, which set self._alt_issuer/_parameterized. Thanks! --Jacob
Re: [PATCH] Support older Pythons in oauth_server.py
Jacob Champion writes: > I'm considering committing this myself. Please do. > Do you mind if it takes a day > or so to get this in, while I quadruple-check everything, or are you > blocked on this? I'm not in a hurry. Push when you're ready. regards, tom lane
Re: [PATCH] Support older Pythons in oauth_server.py
Jacob Champion writes: > This is tested against Python 3.6.15 (3.6 went EOL at the end of > 2021). I'm working on getting Rocky 8 installed locally to test > against. If it's decided we want to support downwards to 3.5, I will > test there too (but I hope we don't; see parent thread). Thanks! I confirm this works for me on RHEL8 with 3.6.8. regards, tom lane
Re: What's our minimum supported Python version?
On Tue, Apr 22, 2025 at 12:09 PM Renan Alves Fonseca wrote: > The oldest non EOL version is 3.9 right now. My suggestion is to follow > the official supported releases. I think this policy is too aggressive. Many operating systems we support are going to ship Python versions past their EOL date (and keep them supported for a long while with security patches). There's a middle ground to be found, IMO. --Jacob
[PATCH] Support older Pythons in oauth_server.py
On Tue, Apr 22, 2025 at 8:29 AM Jacob Champion wrote: > On Tue, Apr 22, 2025 at 8:05 AM Tom Lane wrote: > > The first problem is that this Python version seems not to > > like assignments embedded in if statements: [...] > > > > I was able to work around that with the attached quick hack. > > But then I get [...] > > Thanks, I'll put a patch together. Sorry -- IIRC, both of those > features are probably too new for me to have used, regardless of what > we decide is the minimum version for 18. Splitting this off into its own $SUBJECT. I coupled oauth_server to new Python stuff without realizing it, sorry: - urllib's reclassification of `~` as a safe URL character (3.7) - walrus operator `:=` (3.8) - dict[type, type] annotations (3.9) - str.removeprefix/suffix() (3.9) Attached patch gets rid of all that. This is tested against Python 3.6.15 (3.6 went EOL at the end of 2021). I'm working on getting Rocky 8 installed locally to test against. If it's decided we want to support downwards to 3.5, I will test there too (but I hope we don't; see parent thread). Thanks, --Jacob 0001-oauth-Support-Python-3.6-in-tests.patch Description: Binary data
Re: What's our minimum supported Python version?
Jacob Champion writes: > On Tue, Apr 22, 2025 at 12:09 PM Renan Alves Fonseca > wrote: >> The oldest non EOL version is 3.9 right now. My suggestion is to follow >> the official supported releases. > I think this policy is too aggressive. Many operating systems we > support are going to ship Python versions past their EOL date (and > keep them supported for a long while with security patches). Yeah, that. The distros that are still shipping older Pythons are LTS distros where part of the deal is that the vendor is supposed to back-patch security fixes, even if the upstream has moved on. Maybe a particular vendor is falling down on that job, but it's not for us to rate their performance. So I don't buy that "but security!" is a good argument here. (Full disclosure: I worked for Red Hat for more than a decade, and still preferentially use their distros. So I've pretty well bought into that model.) regards, tom lane
Re: [PATCH] Support older Pythons in oauth_server.py
On Tue, 22 Apr 2025 at 21:26, Jacob Champion wrote: > - str.removeprefix/suffix() (3.9) The way you replaced this does not have the same behaviour in the case where the prefix/suffix is not part of the string. removeprefix/suffix will not remove any characters in that case, but your code will always remove the number of characters that the suffix/prefix is long. Maybe your other checks and definition of the OAuth spec ensure that these prefixes/suffixes are present when you remove them, but the code is not the same in the general case (you'd need to add a hasprefix/suffix check before removing the characters. +# Strip off the magic path segment. (The more readable +# str.removeprefix()/removesuffix() aren't available until Py3.9.) if self._alt_issuer: # The /alternate issuer uses IETF-style .well-known URIs. if self.path.startswith("/.well-known/"): -self.path = self.path.removesuffix("/alternate") +self.path = self.path[: -len("/alternate")] else: -self.path = self.path.removeprefix("/alternate") +self.path = self.path[len("/alternate") :] elif self._parameterized: -self.path = self.path.removeprefix("/param") +self.path = self.path[len("/param") :]
Re: Enable data checksums by default
On 10/16/24 08:54, Peter Eisentraut wrote: > On 14.10.24 11:28, Peter Eisentraut wrote: >> On 03.10.24 23:13, Nathan Bossart wrote: >>> On Tue, Oct 01, 2024 at 11:15:02AM -0400, Peter Eisentraut wrote: I have committed 0001 (the new option) and 0004 (the docs tweak). I think there is consensus for the rest, too, but I'll leave it for a few more days to think about. I guess the test failure has to be addressed. >>> >>> Here is a rebased patch with the test fix (for cfbot). I have made no >>> other changes. >> >> I have committed the test changes (patch 0002). (I renamed the option >> to no_data_checksums to keep the wording consistent with the initdb >> option.) >> >> Right now, with checksums off by default, this doesn't do much, but >> you can test this like >> >> PG_TEST_INITDB_EXTRA_OPTS=--data-checksums meson test ... >> >> and everything will pass. To make that work, I had to adjust the >> order of how the initdb options are assembled in Cluster.pm a bit. >> >> I will work on the patch that flips the default next. > > The patch that flips the default has been committed. > > I also started a PG18 open items page and made a note that we follow up > on the upgrade experience, as was discussed in this thread. > > https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items > Regarding the open item, can someone explain what exactly are we planning to evaluate mid-beta? We went through the open items on the RMT team meeting today, and my impression was the questions are mostly about performance of having checksums by default, but now I realize the thread talks about "upgrade experience" which seems fairly wide. So, what kind of data we expect to gather in order to evaluate this? Who's expected to collect it and evaluate this? regards -- Tomas Vondra
Re: What's our minimum supported Python version?
On Tue, Apr 22, 2025 at 3:04 PM Tom Lane wrote: > The fact that Jacob was able to fix > oauth_server.py without (it seemed like) very much work seems to > bear out that opinion. I think my ability to do it should not be used as evidence of "ease". I knew where to download 3.6, that I should build it with Clang to avoid GCC segfaults, that I should set the rpaths appropriately, and to wrap it all in a virtualenv for Meson to find. And I had to kick Meson to get it to clear its cached Python information, or else the contrib modules refused to build... I suspect the only way most developers are going to want to test this is by running EL8. > This calculus might change if we start to rely more heavily on > Python for build and test infrastructure than we do today. But > I think we can have that conversation when it happens. As long as the need to backport to PG18 doesn't freeze that conversation in place, I suppose. > Anyway, what I propose that we do for now is replace the > installation.sgml text > > The minimum required version is Python 3.2. > > with > > The minimum supported version is Python 3.6.8. > > where I use "supported" advisedly, as in "it might work with > something older, but we don't promise to fix it if not". No objection to that as a first step. I'm still hopeful that Devrim has some evidence in favor of bumping to 3.8 or 3.9. :) Thanks, --Jacob
Re: Allow database owners to CREATE EVENT TRIGGER
> alter event trigger command which doesn’t need to be exercised here That part does need to be tested, I modified `AlterEventTriggerOwner_internal` to allow altering owners to regular users. Before it was only restricted to superusers. > Actually, leave the other member around, but not granted ownership, and both create tables, to demonstrate that a non-superuser and non-owner is unaffected by the trigger. I've updated the tests accordingly. Please let me know if that's what you had in mind. Best regards, Steve Chavez On Sun, 20 Apr 2025 at 23:13, David G. Johnston wrote: > On Sunday, April 20, 2025, Steve Chavez wrote: > >> > Also, this looks unconventional… >> > EventTriggerCacheItem *item = (EventTriggerCacheItem*) lfirst_oid(lc); >> >> Just noticed the mistake there, I would have expected a compilation >> error. New patch attached with the following change: >> >> EventTriggerCacheItem *item = lfirst(lc); >> >> On Sun, 20 Apr 2025 at 22:55, Steve Chavez wrote: >> >>> Sorry, attached the output file. >>> >>> > You can remove role member_1 and trigger..1 and “create table foo” from > the nosuper script without any loss of test coverage. Or member2 trigger2 > table_bar along with the alter event trigger command which doesn’t need to > be exercised here. Ownership is all that matters. Whether come to > directly or via alter. > > Actually, leave the other member around, but not granted ownership, and > both create tables, to demonstrate that a non-superuser and non-owner is > unaffected by the trigger. > > David J. > > From e9a73cda8145a04b8375d21e37a6e713af1bed34 Mon Sep 17 00:00:00 2001 From: steve-chavez Date: Sun, 20 Apr 2025 19:46:00 -0500 Subject: [PATCH v2] Allow regular users to create event triggers --- src/backend/commands/event_trigger.c | 33 +++-- src/backend/utils/cache/evtcache.c| 1 + src/include/utils/evtcache.h | 1 + src/test/regress/expected/event_trigger.out | 13 +--- .../expected/event_trigger_nosuper.out| 71 +++ src/test/regress/parallel_schedule| 4 ++ src/test/regress/sql/event_trigger.sql| 11 +-- .../regress/sql/event_trigger_nosuper.sql | 59 +++ 8 files changed, 147 insertions(+), 46 deletions(-) create mode 100644 src/test/regress/expected/event_trigger_nosuper.out create mode 100644 src/test/regress/sql/event_trigger_nosuper.sql diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c index edc2c988e29..7ae0636f7c2 100644 --- a/src/backend/commands/event_trigger.c +++ b/src/backend/commands/event_trigger.c @@ -126,18 +126,6 @@ CreateEventTrigger(CreateEventTrigStmt *stmt) ListCell *lc; List *tags = NULL; - /* - * It would be nice to allow database owners or even regular users to do - * this, but there are obvious privilege escalation risks which would have - * to somehow be plugged first. - */ - if (!superuser()) - ereport(ERROR, -(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), - errmsg("permission denied to create event trigger \"%s\"", - stmt->trigname), - errhint("Must be superuser to create an event trigger."))); - /* Validate event name. */ if (strcmp(stmt->eventname, "ddl_command_start") != 0 && strcmp(stmt->eventname, "ddl_command_end") != 0 && @@ -545,14 +533,6 @@ AlterEventTriggerOwner_internal(Relation rel, HeapTuple tup, Oid newOwnerId) aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_EVENT_TRIGGER, NameStr(form->evtname)); - /* New owner must be a superuser */ - if (!superuser_arg(newOwnerId)) - ereport(ERROR, -(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), - errmsg("permission denied to change owner of event trigger \"%s\"", - NameStr(form->evtname)), - errhint("The owner of an event trigger must be a superuser."))); - form->evtowner = newOwnerId; CatalogTupleUpdate(rel, &tup->t_self, tup); @@ -698,7 +678,7 @@ EventTriggerCommonSetup(Node *parsetree, if (unfiltered || filter_event_trigger(tag, item)) { /* We must plan to fire this trigger. */ - runlist = lappend_oid(runlist, item->fnoid); + runlist = lappend(runlist, item); } } @@ -1084,11 +1064,16 @@ EventTriggerInvoke(List *fn_oid_list, EventTriggerData *trigdata) foreach(lc, fn_oid_list) { LOCAL_FCINFO(fcinfo, 0); - Oid fnoid = lfirst_oid(lc); + EventTriggerCacheItem *item = lfirst(lc); FmgrInfo flinfo; PgStat_FunctionCallUsage fcusage; + Oid current_user = GetUserId(); + + if (!is_member_of_role_nosuper(current_user, item->owneroid)) { + continue; + } - elog(DEBUG1, "EventTriggerInvoke %u", fnoid); + elog(DEBUG1, "EventTriggerInvoke %u", item->fnoid); /* * We want each event trigger to be able to see the results of the @@ -1102,7 +1087,7 @@ EventTriggerInvoke(List *fn_oid_list, EventTriggerData *trigdata) CommandCounterIncrement(); /* Look up the function */ - fmgr_info(fnoid, &flinfo); + fmgr_info(item->fnoid, &flinfo);