Re: Virtual generated columns
On Fri, Nov 29, 2024 at 7:14 PM Peter Eisentraut wrote: > Here is a new patch version, with several updates. > - Added support for ALTER TABLE ... SET EXPRESSION. When using ALTER TABLE to set expression for virtual generated columns, we don't enforce a rewrite, which means we don't have the opportunity to check whether the new values for these columns could cause an underflow or overflow. For instance, create table t (a int, b int generated always as (a) virtual); insert into t values (2147483647); # alter table t alter column b set expression as (a * 2); ALTER TABLE # select * from t; ERROR: integer out of range The same thing could occur with INSERT. As we don't compute virtual generated columns on write, we may end up inserting values that cause underflow or overflow for these columns. create table t1 (a int, b int generated always as (a * 2) virtual); insert into t1 values (2147483647); # select * from t1; ERROR: integer out of range I'm not sure if this is expected or not, so I just wanted to point it out. Thanks Richard
Re: Conflict detection for update_deleted in logical replication
On Tue, Jan 7, 2025 at 2:49 AM Amit Kapila wrote: > > On Mon, Jan 6, 2025 at 4:52 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Friday, January 3, 2025 2:36 PM Masahiko Sawada > > wrote: > > > > > > > > + /* > > > +* The changes made by this and later transactions are still > > > non-removable > > > +* to allow for the detection of update_deleted conflicts when > > > applying > > > +* changes in this logical replication worker. > > > +* > > > +* Note that this info cannot directly protect dead tuples from > > > being > > > +* prematurely frozen or removed. The logical replication launcher > > > +* asynchronously collects this info to determine whether to > > > advance > > > the > > > +* xmin value of the replication slot. > > > +* > > > +* Therefore, FullTransactionId that includes both the > > > transaction ID and > > > +* its epoch is used here instead of a single Transaction ID. > > > This is > > > +* critical because without considering the epoch, the > > > transaction ID > > > +* alone may appear as if it is in the future due to transaction > > > ID > > > +* wraparound. > > > +*/ > > > + FullTransactionId oldest_nonremovable_xid; > > > > > > The last paragraph of the comment mentions that we need to use > > > FullTransactionId to properly compare XIDs even after the XID wraparound > > > happens. But once we set the oldest-nonremovable-xid it prevents XIDs from > > > being wraparound, no? I mean that workers' > > > oldest-nonremovable-xid values and slot's non-removal-xid (i.e., its > > > xmin) are never away from more than 2^31 XIDs. > > > > I think the issue is that the launcher may create the replication slot after > > the apply worker has already set the 'oldest_nonremovable_xid' because the > > launcher are doing that asynchronously. So, Before the slot is created, > > there's > > a window where transaction IDs might wrap around. If initially the apply > > worker > > has computed a candidate_xid (755) and the xid wraparound before the > > launcher > > creates the slot, causing the new current xid to be (740), then the old > > candidate_xid(755) looks like a xid in the future, and the launcher could > > advance the xmin to 755 which cause the dead tuples to be removed > > prematurely. > > (We are trying to reproduce this to ensure that it's a real issue and will > > share after finishing) > > > > We thought of another approach, which is to create/drop this slot first as > > soon as one enables/disables detect_update_deleted (E.g. create/drop slot > > during DDL). But it seems complicate to control the concurrent slot > > create/drop. For example, if one backend A enables detect_update_deteled, it > > will create a slot. But if another backend B is disabling the > > detect_update_deteled at the same time, then the newly created slot may be > > dropped by backend B. I thought about checking the number of subscriptions > > that > > enables detect_update_deteled before dropping the slot in backend B, but the > > subscription changes caused by backend A may not visable yet (e.g. not > > committed yet). > > > > This means that for the transaction whose changes are not yet visible, > we may have already created the slot and the backend B would end up > dropping it. Is it possible that during the change of this new option > via DDL, we take AccessExclusiveLock on pg_subscription as we do in > DropSubscription() to ensure that concurrent transactions can't drop > the slot? Will that help in solving the above scenario? If we create/stop the slot during DDL, how do we support rollback DDLs? > > The second idea could be that each worker first checks whether a slot > exists along with a subscription flag (new option). Checking the > existence of a slot each time would be costly, so we somehow need to > cache it. But if we do that then we need to invent some cache > invalidation mechanism for the slot. I am not sure if we can design a > race-free mechanism for that. I mean we need to think of a solution > for race conditions between the launcher and apply workers to ensure > that after dropping the slot, launcher doesn't recreate the slot (say > if some subscription enables this option) before all the workers can > clear their existing values of oldest_nonremovable_xid. > > The third idea to avoid the race condition could be that in the > function InitializeLogRepWorker() after CommitTransactionCommand(), we > check if the retain_dead_tuples flag is true for MySubscription then > we check whether the system slot exists. If exits then go ahead, > otherwise, wait till the slot is created. It could be some additional > cycles during worker start up but it is a one-time effort and that too > only when the flag is set. In addition to this, we anyway need to > create the slot in the launcher before launching the workers, and > after re-reading the subscription, the change in
Fix a wrong errmsg in AlterRole()
Hi, ``` postgres=# create group g1; CREATE ROLE postgres=# create role r1 in group g1; CREATE ROLE postgres=# set session authorization r1; SET postgres=> alter group g1 drop user r1; ERROR: permission denied to alter role DETAIL: Only roles with the ADMIN option on role "g1" may add members. ``` The errmsg is "add members", but we are doing a drop. Here is a small patch to fix it. -- Regards, ChangAo Chen 0001-Fix-a-wrong-errmsg-in-AlterRole.patch Description: Binary data
Re: EphemeralNamedRelation and materialized view
On Mon, 30 Dec 2024 16:06:06 -0500 Tom Lane wrote: > Yugo NAGATA writes: > > On Wed, 20 Nov 2024 12:43:16 -0500 > > Tom Lane wrote: > >> ... It seems to me that we should > >> think about this, for MVs as well as those other object types, > >> as fundamentally a dependency problem. That is, the reason > >> we can't allow a reference to an ENR in a long-lived object > >> is that we have no catalog representation for the reference. > >> So that leads to thinking that the issue ought to be handled > >> in recordDependencyOnExpr and friends. If we see an ENR while > >> scanning a rangetable to extract dependencies, then complain. > >> This might be a bit messy to produce good error messages for, > >> though. > > > I've attached a updated patch. Use of ENRs are now checked in > > find_expr_references_walker() called from recordDependencyOnExpr(). > > This looks pretty good to me, except that I question the use of > getObjectTypeDescription() in the error message. There are a > few things not to like about that: > > 1. This is kind of an off-label use of getObjectTypeDescription, > in that we can't expect the object to be visible yet in the catalogs. > Yeah, we can hack around that by passing missing_ok = true, but it > still seems like a kluge. > > 2. The grammar isn't great, and translatability of the message > would be poor I think. > > 3. As your test case demonstrates, the message is going to complain > about a "rule" if the problem is with a view or matview, because > we represent the dependency as being from the view's ON SELECT rule. > This seems quite confusing for anyone not deeply versed in PG's inner > workings. > > After some thought I propose that we just complain that a "persistent > object" can't depend on a transition table, and not try to identify > the depender any more closely than that. We can still add some > context to the message by showing the transition table's name, > since that's readily available from the RTE. See attached v3, > where I also did a bit of editing of the comments and test case. Thank you for your reviewing and editing the patch! I agree with your proposal on the error message handling. > BTW, I'm not entirely convinced that the first addition (in Var > processing) is necessary. Such a Var must refer to an RTE > somewhere, and I'm having a hard time coming up with a case > where the RTE wouldn't also be part of what we scan for > dependencies. It's harmless enough to have the extra check, > but can you think of a case where it's actually needed? On second thought, I could not think of such a case. This part can be removed. I attached v4 patch. Regards, Yugo Nagata -- Yugo NAGATA >From 40f96b4ce8e93e8920e840e52b6a78f60a319efd Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Mon, 30 Dec 2024 15:59:32 -0500 Subject: [PATCH v4] Disallow NAMEDTUPLESTORE RTEs in stored views, rules, etc. A named tuplestore is necessarily a transient object, so it makes no sense to reference one in a persistent object such as a view. We didn't previously prevent that, with the result that if you tried you would get some weird failure about how the executor couldn't find the tuplestore. We can mechanize a check for this case cheaply by making dependency extraction complain if it comes across such an RTE. This is a plausible way of dealing with it since part of the problem is that we have no way to make a pg_depend representation of a named tuplestore. Report and fix by Yugo Nagata. Although this is an old problem, it's a very weird corner case and there have been no reports from end users. So it seems sufficient to fix it in master. Discussion: https://postgr.es/m/20240726160714.e74d0db579f2c017e1ca0...@sraoss.co.jp --- src/backend/catalog/dependency.c | 15 +++ src/test/regress/expected/triggers.out | 20 src/test/regress/sql/triggers.sql | 19 +++ 3 files changed, 54 insertions(+) diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index 096b68c7f3..18316a3968 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -2191,7 +2191,22 @@ find_expr_references_walker(Node *node, } context->rtables = list_delete_first(context->rtables); break; +case RTE_NAMEDTUPLESTORE: + + /* + * Cataloged objects cannot depend on tuplestores, because + * those have no cataloged representation. For now we can + * call the tuplestore a "transition table" because that's + * the only kind exposed to SQL, but someday we might have + * to work harder. + */ + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("transition table \"%s\" cannot be referenced in a persistent object", + rte->eref->aliasname))); + break; default: + /* Other RTE types can be ignored here */ break; } } diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/ex
Re: Conflict detection for update_deleted in logical replication
On Wed, Jan 8, 2025 at 2:15 PM Masahiko Sawada wrote: > > On Tue, Jan 7, 2025 at 2:49 AM Amit Kapila wrote: > > > > > > We thought of another approach, which is to create/drop this slot first as > > > soon as one enables/disables detect_update_deleted (E.g. create/drop slot > > > during DDL). But it seems complicate to control the concurrent slot > > > create/drop. For example, if one backend A enables detect_update_deteled, > > > it > > > will create a slot. But if another backend B is disabling the > > > detect_update_deteled at the same time, then the newly created slot may be > > > dropped by backend B. I thought about checking the number of > > > subscriptions that > > > enables detect_update_deteled before dropping the slot in backend B, but > > > the > > > subscription changes caused by backend A may not visable yet (e.g. not > > > committed yet). > > > > > > > This means that for the transaction whose changes are not yet visible, > > we may have already created the slot and the backend B would end up > > dropping it. Is it possible that during the change of this new option > > via DDL, we take AccessExclusiveLock on pg_subscription as we do in > > DropSubscription() to ensure that concurrent transactions can't drop > > the slot? Will that help in solving the above scenario? > > If we create/stop the slot during DDL, how do we support rollback DDLs? > We will prevent changing this setting in a transaction block as we already do for slot related case. See use of PreventInTransactionBlock() in subscriptioncmds.c. > > Thank you for considering some ideas. As I mentioned above, we might > need to consider a case like where 'CREATE SUBSCRIPTION .. > (retain_conflict_info = true)' is rolled back. Having said that, this > comment is just for simplifying the logic. If using TransactionId > instead makes other parts complex, it would not make sense. I'm okay > with leaving this part and improving the comment for > oldest_nonremovable_xid, say, by mentioning that there is a window for > XID wraparound happening between workers computing their > oldst_nonremovable_xid and pg_conflict_detection slot being created. > Fair enough. Let us see what you think about my above response first. -- With Regards, Amit Kapila.
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Wed, Jan 8, 2025 at 3:21 AM Jacob Champion < jacob.champ...@enterprisedb.com> wrote: > On Fri, Dec 20, 2024 at 2:21 PM Daniel Gustafsson wrote: > > > > > On 20 Dec 2024, at 02:00, Jacob Champion < > jacob.champ...@enterprisedb.com> wrote: > > > > Thanks for the new version, I was doing a v39 review but I'll roll that > over > > into a v40 review now. > > (Sorry for the rug pull!) > > > As I was reading I was trying to identify parts can be broken out and > committed > > ahead of time. This not only to trim down size, but mostly to shape the > final > > commit into a coherent single commit that brings a single functionality > > utilizing existing APIs. Basically I think we should keep generic > > functionality out of the final commit and keep that focused on OAuth and > the > > required APIs and infra. > > Sounds good. > > > The async auth support seemed like a candidate to go in before the > rest. While > > there won't be any consumers of it, it's also not limited to OAuth. > What do > > you think about slicing that off and get in ahead of time? I took a > small stab > > at separating out the generic bits (it includes the > PG_MAX_AUTH_TOKEN_LENGTH > > move as well which is unrelated, but could also be committed ahead of > time) > > along with some small tweaks on it. > > +1 to separating the PG_MAX_... macro move. I will take a closer look > at the async patch in isolation; there's some work I'm doing to fix a > bug Kashif (cc'd) found recently, and it has me a bit unsure about my > chosen order of operations in the async part of fe-connect.c. That > deserves its own email, but I need to investigate more. > Thanks Jacob Most of the testing with psql is done and working on the remaining test cases. > > Thanks! > --Jacob > > >
Re: Support for NO INHERIT to INHERIT state change with named NOT NULL constraints
On 2024-Nov-25, Suraj Kharage wrote: > Another case which needs conclusion is - > When changing from INHERIT to NO INHERIT, we need to walk all children and > decrement coninhcount for the corresponding constraint. If a constraint in > one child reaches zero, should we drop it? not sure. If we do, make sure > to reset the corresponding attnotnull bit too. We could decide not to drop > the constraint, in which case you don’t need to reset attnotnull. I think it's more useful if we keep such a constraint (but of course change its conislocal to true, if it isn't that already). There are arguments for doing both things (drop it or leave it); but if you drop it, there's no way to put it back without scanning the table again. If you keep it, it's easy to drop it afterwards. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Postgres is bloatware by design: it was built to house PhD theses." (Joey Hellerstein, SIGMOD annual conference 2002)
Re: Conflict detection for update_deleted in logical replication
On Thu, Dec 19, 2024 at 11:11 PM Nisha Moond wrote: > > Here is further performance test analysis with v16 patch-set. > > > In the test scenarios already shared on -hackers [1], where pgbench was run > only on the publisher node in a pub-sub setup, no performance degradation was > observed on either node. > > > > In contrast, when pgbench was run only on the subscriber side with > detect_update_deleted=on [2], the TPS performance was reduced due to dead > tuple accumulation. This performance drop depended on the > wal_receiver_status_interval—larger intervals resulted in more dead tuple > accumulation on the subscriber node. However, after the improvement in patch > v16-0002, which dynamically tunes the status request, the default TPS > reduction was limited to only 1%. > > > > We performed more benchmarks with the v16-patches where pgbench was run on > both the publisher and subscriber, focusing on TPS performance. To summarize > the key observations: > > - No performance impact on the publisher as dead tuple accumulation does not > occur on the publisher. Nice. It means that frequently getting in-commit-phase transactions by the subscriber didn't have a negative impact on the publisher's performance. > > - The performance is reduced on the subscriber side (TPS reduction (~50%) > [3] ) due to dead tuple retention for the conflict detection when > detect_update_deleted=on. > > - Performance reduction happens only on the subscriber side, as workload on > the publisher is pretty high and the apply workers must wait for the amount > of transactions with earlier timestamps to be applied and flushed before > advancing the non-removable XID to remove dead tuples. Assuming that the performance dip happened due to dead tuple retention for the conflict detection, would TPS on other databases also be affected? > > > [3] Test with pgbench run on both publisher and subscriber. > > > > Test setup: > > - Tests performed on pgHead + v16 patches > > - Created a pub-sub replication system. > > - Parameters for both instances were: > > > >share_buffers = 30GB > >min_wal_size = 10GB > >max_wal_size = 20GB > >autovacuum = false Since you disabled autovacuum on the subscriber, dead tuples created by non-hot updates are accumulated anyway regardless of detect_update_deleted setting, is that right? > Test Run: > > - Ran pgbench(read-write) on both the publisher and the subscriber with 30 > clients for a duration of 120 seconds, collecting data over 5 runs. > > - Note that pgbench was running for different tables on pub and sub. > > (The scripts used for test "case1-2_measure.sh" and case1-2_setup.sh" are > attached). > > > > Results: > > > > Run# pub TPS sub TPS > > 1 32209 13704 > > 2 32378 13684 > > 3 32720 13680 > > 4 31483 13681 > > 5 31773 13813 > > median 32209 13684 > > regression 7% -53% What was the TPS on the subscriber when detect_update_deleted = false? And how much were the tables bloated compared to when detect_update_deleted = false? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Re: proposal: schema variables
hi. you forgot change acldefault should add 'V' for SESSION VARIABLE in doc/src/sgml/ddl.sgml maybe some examples () of session variables being shadowed would be great. because doc/src/sgml/ref/create_variable.sgml said Session variables can be shadowed by other identifiers. For details, see . If I click the link, then in ddl.sgml, there is also a bunch of text saying that variable will be shadowed in some situations, but there are no simple examples to demonstrate that. "Create an date session variable var1:" maybe it should be rephrased as "Create a session variable var1 as date data type?" (i am not native english speaker) --- --- a/src/include/nodes/primnodes.h +++ b/src/include/nodes/primnodes.h @@ -379,7 +379,8 @@ typedef struct Param Exprxpr; ParamKind paramkind; /* kind of parameter. See above */ int paramid;/* numeric ID for parameter */ - Oid paramtype; /* pg_type OID of parameter's datatype */ + /* pg_type OID of parameter's datatype */ + Oid paramtype pg_node_attr(query_jumble_ignore); I think we need the above to make select v2; select v1; normalized as one query. --- when we create a new table, we do something like this: DefineRelation heap_create_with_catalog GetNewRelFileNumber GetNewOidWithIndex relation Oid uniqueness and variable uniqueness is the same thing. If variable oid uniqueness problem ever reached, at that moment, we should care more about relation oid uniqueness problem? and in GetNewRelFileNumber, we have comments: "" * As with GetNewOidWithIndex(), there is some theoretical risk of a race * condition, but it doesn't seem worth worrying about. """ also comments in GetNewOidWithIndex """ * Note that we are effectively assuming that the table has a relatively small * number of entries (much less than 2^32) and there aren't very long runs of * consecutive existing OIDs. This is a mostly reasonable assumption for * system catalogs. """ that means pg_catalog.pg_variable.varcreate_lsn is not really necessary? --- I think the latest patch for LET self assign ACL SELECT is not correct, previously I also tried the same idea. test demo. CREATE TYPE vartest_t1 AS (a int, b int); CREATE VARIABLE var1 AS vartest_t1; CREATE ROLE regress_var_test_role; GRANT UPDATE ON VARIABLE var1 TO regress_var_test_role; GRANT select ON table tenk1 TO regress_var_test_role; SET ROLE TO regress_var_test_role; --both should fail LET var1.a = var1.a + 10; LET var1.a = (select * from (select count(*) from tenk1 where unique1 = var1.a + 10));
Re: pg_settings.unit and DefineCustomXXXVariable
On Wed, 8 Jan 2025 at 11:13, Luca Ferrari wrote: > > Hi all, > I need to define a few GUCs, and for that purpose I'm using > DefineCustomXXXVariable functions that provide hooks for assignment, > show and check. However it is not clear to me if it is possible to > populate the unit column in pg_settings for the custom defined > variables, and if so, how. DefineCustomXXXVariable has a 'flags' argument, into which GUC_UNIT_* can be or-ed. AFAIK, the options provided in guc.h are the only kinds of units supported and available, and I think only integer variables actually support the unit conversion implied by the unit options. Kind regards, Matthias van de Meent
Re: Conflict detection for update_deleted in logical replication
On Wed, Jan 8, 2025 at 1:53 AM Amit Kapila wrote: > > On Wed, Jan 8, 2025 at 3:02 PM Masahiko Sawada wrote: > > > > On Thu, Dec 19, 2024 at 11:11 PM Nisha Moond > > wrote: > > > > > > Here is further performance test analysis with v16 patch-set. > > > > > > > > > In the test scenarios already shared on -hackers [1], where pgbench was > > > run only on the publisher node in a pub-sub setup, no performance > > > degradation was observed on either node. > > > > > > > > > > > > In contrast, when pgbench was run only on the subscriber side with > > > detect_update_deleted=on [2], the TPS performance was reduced due to dead > > > tuple accumulation. This performance drop depended on the > > > wal_receiver_status_interval—larger intervals resulted in more dead tuple > > > accumulation on the subscriber node. However, after the improvement in > > > patch v16-0002, which dynamically tunes the status request, the default > > > TPS reduction was limited to only 1%. > > > > > > > > > > > > We performed more benchmarks with the v16-patches where pgbench was run > > > on both the publisher and subscriber, focusing on TPS performance. To > > > summarize the key observations: > > > > > > - No performance impact on the publisher as dead tuple accumulation does > > > not occur on the publisher. > > > > Nice. It means that frequently getting in-commit-phase transactions by > > the subscriber didn't have a negative impact on the publisher's > > performance. > > > > > > > > - The performance is reduced on the subscriber side (TPS reduction > > > (~50%) [3] ) due to dead tuple retention for the conflict detection when > > > detect_update_deleted=on. > > > > > > - Performance reduction happens only on the subscriber side, as workload > > > on the publisher is pretty high and the apply workers must wait for the > > > amount of transactions with earlier timestamps to be applied and flushed > > > before advancing the non-removable XID to remove dead tuples. > > > > Assuming that the performance dip happened due to dead tuple retention > > for the conflict detection, would TPS on other databases also be > > affected? > > > > As we use slot->xmin to retain dead tuples, shouldn't the impact be > global (means on all databases)? I think so too. > > > > > > > > > > [3] Test with pgbench run on both publisher and subscriber. > > > > > > > > > > > > Test setup: > > > > > > - Tests performed on pgHead + v16 patches > > > > > > - Created a pub-sub replication system. > > > > > > - Parameters for both instances were: > > > > > > > > > > > >share_buffers = 30GB > > > > > >min_wal_size = 10GB > > > > > >max_wal_size = 20GB > > > > > >autovacuum = false > > > > Since you disabled autovacuum on the subscriber, dead tuples created > > by non-hot updates are accumulated anyway regardless of > > detect_update_deleted setting, is that right? > > > > I think hot-pruning mechanism during the update operation will remove > dead tuples even when autovacuum is disabled. True, but why did it disable autovacuum? It seems that case1-2_setup.sh doesn't specify fillfactor, which makes hot-updates less likely to happen. I understand that a certain performance dip happens due to dead tuple retention, which is fine, but I'm surprised that the TPS decreased by 50% within 120 seconds. The TPS goes even worse for a longer test? I did a quick benchmark where I completely disabled removing dead tuples (by autovacuum=off and a logical slot) and ran pgbench but I didn't see such a precipitous dip. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py
On 2025-01-07 Tu 2:44 PM, Mats Kindahl wrote: I got some time over during the holidays, so I spent some of it doing something I've been thinking about for a while. For those of you that are not aware of it: Coccinelle is a tool for pattern matching and text transformation for C code and can be used for detection of problematic programming patterns and to make complex, tree-wide patches easy. It is aware of the structure of C code and is better suited to make complicated changes than what is possible using normal text substitution tools like Sed and Perl. Coccinelle have been successfully been used in the Linux project since 2008 and is now an established tool for Linux development and a large number of semantic patches have been added to the source tree to capture everything from generic issues (like eliminating the redundant A in expressions like "!A || (A && B)") to more Linux-specific problems like adding a missing call to kfree(). Although PostgreSQL is nowhere the size of the Linux kernel, it is nevertheless of a significant size and would benefit from incorporating Coccinelle into the development. I noticed it's been used in a few cases way back (like 10 years back) to fix issues in the PostgreSQL code, but I thought it might be useful to make it part of normal development practice to, among other things: - Identify and correct bugs in the source code both during development and review. - Make large-scale changes to the source tree to improve the code based on new insights. - Encode and enforce APIs by ensuring that function calls are used correctly. - Use improved coding patterns for more efficient code. - Allow extensions to automatically update code for later PostgreSQL versions. To that end, I created a series of patches to show how it could be used in the PostgreSQL tree. It is a lot easier to discuss concrete code and I split it up into separate messages since that makes it easier to discuss each individual patch. The series contains code to make it easy to work with Coccinelle during development and reviews, as well as examples of semantic patches that capture problems, demonstrate how to make large-scale changes, how to enforce APIs, and also improve some coding patterns. This first patch contains the coccicheck.py script, which is a re-implementation of the coccicheck script that the Linux kernel uses. We cannot immediately use the coccicheck script since it is quite closely tied to the Linux source code tree and we need to have something that both supports autoconf and Meson. Since Python seems to be used more and more in the tree, it seems to be the most natural choice. (I have no strong opinion on what language to use, but think it would be good to have something that is as platform-independent as possible.) The intention is that we should be able to use the Linux semantic patches directly, so it supports the "Requires" and "Options" keywords, which can be used to require a specific version of spatch(1) and add options to the execution of that semantic patch, respectively. Please don't start multiple threads like this. If you want to submit a set of patches for a single feature, send them all as attachments in a single email. Otherwise this just makes life hard for threading email readers. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: per backend WAL statistics
Hi, On Wed, Jan 08, 2025 at 03:21:26PM +0900, Michael Paquier wrote: > On Tue, Jan 07, 2025 at 08:48:51AM +, Bertrand Drouvot wrote: > > Now that commit 9aea73fc61 added backend-level statistics to pgstats (and > > per backend IO statistics), we can more easily add per backend statistics. > > > > Please find attached a patch to implement $SUBJECT. > > I've looked at v1-0002 and v1-0001. Thanks for looking at it! > +static void > +pgstat_flush_io_entry(PgStat_EntryRef *entry_ref, bool nowait, bool > need_lock) > { > - PgStatShared_Backend *shbackendioent; > - PgStat_BackendPendingIO *pendingent; > + PgStatShared_Backend *shbackendent; > + PgStat_BackendPending *pendingent; > PgStat_BktypeIO *bktype_shstats; > + PgStat_BackendPendingIO *pending_io; > > - if (!pgstat_lock_entry(entry_ref, nowait)) > - return false; > + if (need_lock && !pgstat_lock_entry(entry_ref, nowait)) > + return; > > The addition of need_lock at this level leads to a result that seems a > bit confusing, where pgstat_backend_flush_cb() passes "false" because > it locks the entry by itself as an effect of v1-0003 with the new area > for WAL. Wouldn't it be cleaner to do an extra pgstat_[un]lock_entry > dance in pgstat_backend_flush_io() instead? Another approach I can > think of that would be slightly cleaner to me is to pass a bits32 to a > single routine that would control if WAL stats, I/O stats or both > should be flushed, keeping pgstat_flush_backend() as name with an > extra argument to decide which parts of the stats should be flushed. Yeah, that's more elegant as it also means that the main callback will not change (should we add even more stats in the future). Done that way in v2 attached. > -PgStat_BackendPendingIO * > +PgStat_BackendPending * > > This rename makes sense. > > #define PG_STAT_GET_WAL_COLS 9 > TupleDesc tupdesc; > Datum values[PG_STAT_GET_WAL_COLS] = {0}; > boolnulls[PG_STAT_GET_WAL_COLS] = {0}; > > It feels unnatural to have a PG_STAT_GET_WAL_COLS while it would not > only relate to this function anymore. Has been renamed to PG_STAT_WAL_COLS in the attached. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 190422763353e87f5208e8f6d1aee823eb7860e5 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Mon, 6 Jan 2025 07:51:27 + Subject: [PATCH v2 1/3] Extract logic filling pg_stat_get_wal()'s tuple into its own routine This commit adds pg_stat_wal_build_tuple(), a helper routine for pg_stat_get_wal(), that fills its tuple based on the contents of PgStat_WalStats. This will be used in a follow-up commit that uses the same structures as pg_stat_wal for reporting, but for the PGSTAT_KIND_BACKEND statistics kind. --- src/backend/utils/adt/pgstatfuncs.c | 56 ++--- 1 file changed, 36 insertions(+), 20 deletions(-) 100.0% src/backend/utils/adt/ diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index 3245f3a8d8..7309f06993 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -1560,20 +1560,22 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS) } /* - * Returns statistics of WAL activity + * pg_stat_wal_build_tuple + * + * Helper routine for pg_stat_get_wal() returning one tuple based on the contents + * of wal_stats. */ -Datum -pg_stat_get_wal(PG_FUNCTION_ARGS) +static Datum +pg_stat_wal_build_tuple(PgStat_WalStats wal_stats) { -#define PG_STAT_GET_WAL_COLS 9 +#define PG_STAT_WAL_COLS 9 TupleDesc tupdesc; - Datum values[PG_STAT_GET_WAL_COLS] = {0}; - bool nulls[PG_STAT_GET_WAL_COLS] = {0}; + Datum values[PG_STAT_WAL_COLS] = {0}; + bool nulls[PG_STAT_WAL_COLS] = {0}; char buf[256]; - PgStat_WalStats *wal_stats; /* Initialise attributes information in the tuple descriptor */ - tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS); + tupdesc = CreateTemplateTupleDesc(PG_STAT_WAL_COLS); TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_records", INT8OID, -1, 0); TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wal_fpi", @@ -1595,34 +1597,48 @@ pg_stat_get_wal(PG_FUNCTION_ARGS) BlessTupleDesc(tupdesc); - /* Get statistics about WAL activity */ - wal_stats = pgstat_fetch_stat_wal(); - /* Fill values and NULLs */ - values[0] = Int64GetDatum(wal_stats->wal_records); - values[1] = Int64GetDatum(wal_stats->wal_fpi); + values[0] = Int64GetDatum(wal_stats.wal_records); + values[1] = Int64GetDatum(wal_stats.wal_fpi); /* Convert to numeric. */ - snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats->wal_bytes); + snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes); values[2] = DirectFunctionCall3(numeric_in, CStringGetDatum(buf), ObjectIdGetDatum(0), Int32GetDatum(-1)); - values[3] = Int64GetDatum(wal_stats->wal_buffers_full); - values[4]
Re: Enhancing Memory Context Statistics Reporting
Hi Fujii-san, Thank you for testing the feature. > Issue 1: Error with pg_get_process_memory_contexts() > When I used pg_get_process_memory_contexts() on the PID of a backend > process > that had just caused an error but hadn’t rolled back yet, > the following error occurred: > >Session 1 (PID=70011): >=# begin; >=# select 1/0; >ERROR: division by zero > >Session 2: >=# select * from pg_get_process_memory_contexts(70011, false); > >Session 1 terminated with: >ERROR: ResourceOwnerEnlarge called after release started >FATAL: terminating connection because protocol synchronization was lost > > In this scenario, a DSM segment descriptor is created and associated with the CurrentResourceOwner, which is set to the aborting transaction's resource owner. This occurs when ProcessGetMemoryContextInterrupts is called by the backend while a transaction is still open and about to be rolled back. I believe this issue needs to be addressed in the DSA and DSM code by adding a check to ensure that the CurrentResourceOwner is not about to be released before creating a DSM under the CurrentResourceOwner. The attached fix resolves this issue. However, for a more comprehensive solution, I believe the same change should be extended to other parts of the DSA and DSM code where CurrentResourceOwner is referenced. Issue 2: Segmentation Fault > When I ran pg_get_process_memory_contexts() every 0.1 seconds using > \watch command while running "make -j 4 installcheck-world", > I encountered a segmentation fault: > >LOG: client backend (PID 97975) was terminated by signal 11: > Segmentation fault: 11 >DETAIL: Failed process was running: select infinite_recurse(); >LOG: terminating any other active server processes > > I have not been able to reproduce this issue. Could you please clarify which process you ran pg_get_process_memory_context() on, with the interval of 0.1? Was it a backend process created by make installcheck-world, or some other process? Thank you, Rahila Syed fix_for_resource_owner_error.patch Description: Binary data
Re: Enhancing Memory Context Statistics Reporting
On 2025/01/08 21:03, Rahila Syed wrote: I have not been able to reproduce this issue. Could you please clarify which process you ran |pg_get_process_memory_context()| on, with the interval of 0.1? I used the following query for testing: =# SELECT count(*) FROM pg_stat_activity, pg_get_process_memory_contexts(pid, false) WHERE pid <> pg_backend_pid(); =# \watch 0.1 Was it a backend process created by |make installcheck-world|, or some other process? Yes, the target backends were from make installcheck-world. No other workloads were running. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Tue, Jan 7, 2025 at 2:24 PM Jacob Champion wrote: > Along those lines, though, Michael Paquier suggested that maybe I > could pull the require_auth prefactoring up to the front of the > patchset. That might look a bit odd until OAuth support lands, since > it won't be adding any new useful value, but I will give it a shot. While I take a look at the async patch from upthread, here is my attempt at pulling the require_auth change out. Note that there's a dead branch that cannot be exercised until OAuth lands. We're not going to process the SASL mechanism name at all if no mechanisms are allowed to begin with, and right now SASL is synonymous with SCRAM. I can change that by always allowing AuthenticationSASL messages -- even if none of the allowed authentication types use SASL -- but that approach didn't seem to generate excitement on- or off-list the last time I proposed it [1]. Thanks, --Jacob [1] https://postgr.es/m/CAAWbhmg%2BGzNMK5Li182BKSbzoFVaKk_dDJ628NnuV80GqYgFFg%40mail.gmail.com commit f8bbb0f2a2e0f4840bdeb5c9a7b9a35797280aaf Author: Jacob Champion Date: Mon Dec 16 13:57:14 2024 -0800 require_auth: prepare for multiple SASL mechanisms Prior to this patch, the require_auth implementation assumed that the AuthenticationSASL protocol message was synonymous with SCRAM-SHA-256. In preparation for the OAUTHBEARER SASL mechanism, split the implementation into two tiers: the first checks the acceptable AUTH_REQ_* codes, and the second checks acceptable mechanisms if AUTH_REQ_SASL et al are permitted. conn->allowed_sasl_mechs is the list of pointers to acceptable mechanisms. (Since we'll support only a small number of mechanisms, this is an array of static length to minimize bookkeeping.) pg_SASL_init() will bail if the selected mechanism isn't contained in this array. Since there's only one mechansism supported right now, one branch of the second tier cannot be exercised yet (it's marked with Assert(false)). This assertion will need to be removed when the next mechanism is added. diff --git a/src/interfaces/libpq/fe-auth.c b/src/interfaces/libpq/fe-auth.c index 14a9a862f51..722bb47ee14 100644 --- a/src/interfaces/libpq/fe-auth.c +++ b/src/interfaces/libpq/fe-auth.c @@ -543,6 +543,35 @@ pg_SASL_init(PGconn *conn, int payloadlen) goto error; } + /* Make sure require_auth is satisfied. */ + if (conn->require_auth) + { + boolallowed = false; + + for (int i = 0; i < lengthof(conn->allowed_sasl_mechs); i++) + { + if (conn->sasl == conn->allowed_sasl_mechs[i]) + { + allowed = true; + break; + } + } + + if (!allowed) + { + /* +* TODO: this is dead code until a second SASL mechanism is added; +* the connection can't have proceeded past check_expected_areq() +* if no SASL methods are allowed. +*/ + Assert(false); + + libpq_append_conn_error(conn, "authentication method requirement \"%s\" failed: server requested %s authentication", + conn->require_auth, selected_mechanism); + goto error; + } + } + if (conn->channel_binding[0] == 'r' && /* require */ strcmp(selected_mechanism, SCRAM_SHA_256_PLUS_NAME) != 0) { diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 8f211821eb2..6f262706b0a 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -1110,6 +1110,56 @@ libpq_prng_init(PGconn *conn) pg_prng_seed(&conn->prng_state, rseed); } +/* + * Fills the connection's allowed_sasl_mechs list with all supported SASL + * mechanisms. + */ +static inline void +fill_allowed_sasl_mechs(PGconn *conn) +{ + /*--- +* We only support one mechanism at the moment, so rather than deal with a +* linked list, conn->allowed_sasl_mechs is an array of static length. We +* rely on the compile-time assertion here to keep us honest. +* +* To add a new mechanism to require_auth, +* - update the length of conn->allowed_sasl_mechs, +* - add the new pg_fe_sasl_mech pointer to this function, and +* - handle the new mechanism name in the require_auth portion of +* pqConnectOptions2(), below. +*/ + StaticAssertDecl(lengthof(conn->allowed_sasl_mechs) == 1, +"fill_allowed_sasl_mechs() must be updated when resizing conn->allowed_sasl_mechs[]"); + + conn->allowed_sasl_mechs[0] = &pg_sc
Re: New GUC autovacuum_max_threshold ?
On 1/7/25 23:57, Nathan Bossart wrote: Here is a rebased patch for cfbot. AFAICT we are still pretty far from consensus on which approach to take, unfortunately. For what it's worth, although I would have preferred the sub-linear growth thing, I'd much rather have this than nothing. And I have to admit that the proposed formulas were either too convoluted or wrong. This very patch is more straightforward. Please let me know if I can help and how.
Re: psql: Option to use expanded mode for various meta-commands
On Mon, 30 Dec 2024 at 15:48, Greg Sabino Mullane wrote: > > I like this, very useful. It's a shame about the conflict with \dx (lesson > for the future: think extra carefully about option namings!). I am impressed > that \dx \d \d+ \d+x and even \dxx all work as one might intuit with > this patch. > Thanks for looking. Attached is a more complete patch, now with - trivial bug-fix for previous \d code, per cfbot - expanded mode support for \l and \z - updated psql help - a few representative regression test cases The majority of the patch is doc updates, which are somewhat tedious. Initially, I resisted documenting the 'x' option in the description of every command affected, but given the length of the list of commands in the HTML page, it's a long way from any given command to the top or bottom where 'x' is described in more detail. So in the end, I decided to just add a sentence to each command's description, keeping it as short as possible. Regards, Dean diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml new file mode 100644 index 72f3347..c48bcfb --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -865,6 +865,13 @@ testdb=> +Many of the meta-commands also allow x to be appended +as an option. This will cause the results to be displayed in expanded +mode, as if expanded=on were included in the list of +\pset options. See also \x. + + + The following meta-commands are defined: @@ -1272,7 +1279,7 @@ SELECT $1 \parse stmt1 -\d[S+] [ pattern ] +\d[Sx+] [ pattern ] @@ -1321,12 +1328,24 @@ SELECT $1 \parse stmt1 foreign tables. This is purely a convenience measure. + +As with many other commands, if x is appended to +the command name, the results are displayed in expanded mode, but note +that this only applies when \d is used without a +pattern argument, and +the x modifier cannot appear immediately after the +\d (because \dx is a different +command); the x modifier may only appear after an +S or + modifier. For example, +\d+x is equivalent to \dtvmsE+x +and will show a list of all relations in expanded mode. + -\da[S] [ pattern ] +\da[Sx] [ pattern ] @@ -1337,19 +1356,23 @@ SELECT $1 \parse stmt1 By default, only user-created objects are shown; supply a pattern or the S modifier to include system objects. +If x is appended to the command name, the results +are displayed in expanded mode. -\dA[+] [ pattern ] +\dA[x+] [ pattern ] Lists access methods. If pattern is specified, only access -methods whose names match the pattern are shown. If -+ is appended to the command name, each access +methods whose names match the pattern are shown. +If x is appended to the command name, the results +are displayed in expanded mode. +If + is appended to the command name, each access method is listed with its associated handler function and description. @@ -1357,7 +1380,7 @@ SELECT $1 \parse stmt1 - \dAc[+] + \dAc[x+] [access-method-pattern [input-type-pattern]] @@ -1372,6 +1395,8 @@ SELECT $1 \parse stmt1 If input-type-pattern is specified, only operator classes associated with input types whose names match that pattern are listed. +If x is appended to the command name, the results +are displayed in expanded mode. If + is appended to the command name, each operator class is listed with its associated operator family and owner. @@ -1380,7 +1405,7 @@ SELECT $1 \parse stmt1 - \dAf[+] + \dAf[x+] [access-method-pattern [input-type-pattern]] @@ -1395,6 +1420,8 @@ SELECT $1 \parse stmt1 If input-type-pattern is specified, only operator families associated with input types whose names match that pattern are listed. +If x is appended to the command name, the results +are displayed in expanded mode. If + is appended to the command name, each operator family is listed with its owner. @@ -1403,7 +1430,7 @@ SELECT $1 \parse stmt1 - \dAo[+] + \dAo[x+] [access-method-pattern [operator-family-pattern]] @@ -1419,6 +1446,8 @@ SELECT $1 \parse stmt1 If operator-family-pattern is specified, only members of operator families whose names match that pattern are listed. +If x is
Logical replication - proposal for a custom conflict resolution function
Hello hackers, I'd like some feedback on a logical replication feature I would like to write a patch for. The feature would allow users to register a custom conflict handler function to be executed for each conflicting row, after the built-in conflict resolver has run. Running after the built-in conflict resolver allows the function to use the value provided by the resolver, for instance for custom logging. The proposed programming model for the function follows the model for trigger functions - a set of read only variables are populated in the top-level block, and the function returns a record or NULL. The conflict handler function would have more variables. At minimum, three records - the local version of the new value, the remote version of the new value, the value returned by the built-in resolver function - rather than the two for a trigger function, the timestamp information for the local and remote updates, the conflict type detected and the resolver used. The new syntax proposed may look something like this: CREATE SUBSCRIPTION CONNECTION PUBLICATION CONFLICT RESOLVER (conflict_type1 = resolver1 [, conflict_typeN = resolverN,... ]) [conflict_handler=my_custom_function]) // <--- allow users to pass a custom conflict resolution function. Please let me know your thoughts, thanks! Diego
Re: Virtual generated columns
Em qua., 8 de jan. de 2025 às 16:23, Vik Fearing escreveu: > This is lying to the planner, and you get to enjoy whatever breaks > because of it. A function that accesses external data is not immutable; > it is stable at best. > I understand that, but it's not documented, so users can think that way is fine. So, it would be good to explain why this way could break this or that. regards Marcos
Re: using PGOPTIONS in meson
Hi, On 2025-01-08 23:27:42 +0800, jian he wrote: > hi. > i am wondering ways to do > ``make check PGOPTIONS="-c debug_parallel_query=regress -c work_mem=50MB"`` > in meson way, especially the PGOPTIONS part. > > I have looked at [1] and [2]. > [1] > https://www.postgresql.org/docs/current/regress-run.html#REGRESS-RUN-CUSTOM-SETTINGS > [2] https://wiki.postgresql.org/wiki/Meson This isn't really a make specific thing, i.e. there's no make specific logic for PGOPTIONS. Options passed to make just end up as environment variables and libpq looks at environment variables. Which means you just need to set an environment variable, which can be done without make: PGOPTIONS="..." meson test ... or export PGOPTIONS="..."; mesoin test ... Greetings, Andres Freund
Re: Fwd: Re: A new look at old NFS readdir() problems?
Bruce Momjian writes: > Will people now always get a clear error on failure? Crazy idea, but > could we have initdb or postmaster start test this? It'd require adding quite a lot of cycles: I'd guess you'd need to create/drop a hundred or a thousand files to be sure of seeing the problem. Another problem is that even if the primary data directory is okay, a tablespace could be created on a filesystem that's not okay. I doubt that it's worth it. regards, tom lane
Re: Fwd: Re: A new look at old NFS readdir() problems?
On Wed, Jan 8, 2025 at 03:21:27PM -0500, Tom Lane wrote: > Bruce Momjian writes: > > Will people now always get a clear error on failure? Crazy idea, but > > could we have initdb or postmaster start test this? > > It'd require adding quite a lot of cycles: I'd guess you'd need to > create/drop a hundred or a thousand files to be sure of seeing the > problem. > > Another problem is that even if the primary data directory is okay, > a tablespace could be created on a filesystem that's not okay. > > I doubt that it's worth it. Oh, I thought if it could be recreated using just two files, it would be worth it --- obviously not. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?
Hi, Le mer. 8 janv. 2025 à 18:37, Sami Imseih a écrit : > > s/parallel vacuum workers for index vacuuming/parallel workers for index > creation/? > > oops, that's my oversight from copying the message from vacuum. fixed. > > > I'd add "in parallel" to match its counterpart "serially" above. That > would > > make it more clear in case one just look for "building index" in the log. > > good point. > > Below is the new output with the attached v2. > > postgres=# create index foo_idx1 on foo(id); > DEBUG: building index "foo_idx1" on table "foo" with parallel workers > DEBUG: launched 1 parallel workers for index creation (planned: 1) > DEBUG: index "foo_idx1" can safely use deduplication > CREATE INDEX > > I tend to agree it might be better than Benoit's patch on the index messages, though I'm afraid that DEBUG1 level won't work for many users. DEBUGx are for developers, not users. A better log level for me would be LOG. For client messages, LOG messages won't be displayed by default. So there's still a need for a "SET client_min_messages to LOG" but it won't be necessary to go all the way to DEBUG1. Regards. -- Guillaume.
Re: use a non-locking initial test in TAS_SPIN on AArch64
Nathan Bossart writes: > Are there any objections to proceeding with this change? So far, it's been > tested on a c8g.24xlarge and an Apple M3 (which seems to be too small to > show any effect). If anyone has access to a larger ARM machine, additional > testing would be greatly appreciated. I think it would be unfortunate if > this slipped to v19. I just acquired an M4 Pro, which may also be too small to show any effect, but perhaps running the test there would at least give us more confidence that there's not a bad effect. Which test case(s) would you recommend trying? regards, tom lane
Re: AIO v2.0
Hi, On 2025-01-08 15:04:39 +0100, Jakub Wartak wrote: > On Mon, Jan 6, 2025 at 5:28 PM Andres Freund wrote: > > I didn't think that pg_stat_* was quite the right namespace, given that it > > shows not stats, but the currently ongoing IOs. I am going with pg_aios for > > now, but I don't particularly like that. > > If you are looking for other proposals: > * pg_aios_progress ? (to follow pattern of pg_stat_copy|vaccuum_progress?) > * pg_debug_aios ? > * pg_debug_io ? I think pg_aios is better than those, if not by much. Seems others are ok with that name too. And we easily can evolve it later. > > I think we'll want a pg_stat_aio as well, tracking things like: > > > > - how often the queue to IO workes was full > > - how many times we submitted IO to the kernel (<= #ios with io_uring) > > - how many times we asked the kernel for events (<= #ios with io_uring) > > - how many times we had to wait for in-flight IOs before issuing more IOs > > If I could dream of one thing that would be 99.9% percentile of IO > response times in milliseconds for different classes of I/O traffic > (read/write/flush). But it sounds like it would be very similiar to > pg_stat_io and potentially would have to be > per-tablespace/IO-traffic(subject)-type too. Yea, that's a significant project on its own. It's not that cheap to compute reasonably accurate percentiles and we have no infrastructure for doing so right now. > AFAIU pg_stat_io has improper structure to have that there. Hm, not obvious to me why? It might make the view a bit wide to add it as an additional column, but otherwise I don't see a problem? > BTW: before trying to even start to compile that AIO v2.2* and > responding to the previous review, what are You looking interested to > hear the most about it so that it adds some value? Due to the rather limited "users" of AIO in the patchset, I think most benchmarks aren't expected to show any meaningful gains. However, they shouldn't show any significant regressions either (when not using direct IO). I think trying to find regressions would be a rather valuable thing. I'm tempted to collect a few of the reasonbly-ready read stream conversions into the patchset, to make the potential gains more visible. But I am not sure it's a good investment of time right now. One small regression I do know about, namely scans of large relations that are bigger than shared buffers but do fit in the kernel page cache. The increase of BAS_BULKREAD does cause a small slowdown - but without it we never can do sufficient asynchronous IO. I think the slowdown is small enough to just accept that, but it's worth qualifying that on a few machines. > Any workload specific measurements? just general feedback, functionality > gaps? To see the benefits it'd be interesting to compare: 1) sequential scan performance with data not in shared buffers, using buffered IO 2) same, but using direct IO when testing the patch 3) checkpoint performance In my experiments 1) gains a decent amount of performance in many cases, but nothing overwhelming - sequential scans are easy for the kernel to read ahead. I do see very significant gains for 2) - On a system with 10 striped NVMe SSDs that each can do ~3.5 GB/s I measured very parallel sequential scans (I had to use ALTER TABLE to get sufficient numbers of workers): master: ~18 GB/s patch, buffered:~20 GB/s patch, direct, worker: ~28 GB/s patch, direct, uring: ~35 GB/s This was with io_workers=32, io_max_concurrency=128, effective_io_concurrency=1000 (doesn't need to be that high, but it's what I still have the numbers for). This was without data checksums enabled as otherwise the checksum code becomes a *huge* bottleneck. I also see significant gains with 3). Bigger when using direct IO. One complicating factor measuring 3) is that the first write to a block will often be slower than subsequent writes because the filesystem will need to update some journaled metadata, presenting a bottleneck. Checkpoint performance is also severely limited by data checksum computation if enabled - independent of this patchset. One annoying thing when testing DIO is that right now VACUUM will be rather slow if the data isn't already in s_b, as it isn't yet read-stream-ified. > Integrity/data testing with stuff like dm-dust, dm-flakey, dm-delay > to try the error handling routines? Hm. I don't think that's going to work very well even on master. If the filesystem fails there's not much that PG can do... > Some kind of AIO <-> standby/recovery interactions? I wouldn't expect anything there. I think Thomas somewhere has a patch that read-stream-ifies recovery prefetching, once that's done it would be more interesting. > * - btw, Date: 2025-01-01 04:03:33 - I saw what you did there! so > let's officially recognize the 2025 as the year of AIO in PG, as it > was 1st message :D Hah, that was actually the opposite of what I intended :). I'd hoped to post earlier,
Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?
Sami Imseih writes: > I am curious what are the thoughts on introducing a > CREATE INDEX VERBOSE which can provide this info? > similar to users scripting VACUUM VERBOSE to log > more details about the vacuum operation including parallel > usage. What I can recall being discussed in the past is to extend EXPLAIN and/or EXPLAIN ANALYZE to cover utility statements that have nontrivial execution complexity --- for example, ALTER TABLE has a lot of machinery underneath, and people often wish to know things like whether a particular ALTER will cause a table rewrite or not. Of course, a patch for that would be a few orders of magnitude larger than what you've got here :-(. But if you're looking for a framework for reporting these sorts of details, I'd much rather go in that direction than follow the model of VACUUM VERBOSE. VACUUM VERBOSE is a kluge with little to recommend it other than having been easy to implement. regards, tom lane
Re: Non-text mode for pg_dumpall
Hi, Le mer. 8 janv. 2025 à 17:41, Mahendra Singh Thalor a écrit : > On Wed, 8 Jan 2025 at 20:07, Mahendra Singh Thalor > wrote: > > > > Hi all, > > > > On Wed, 8 Jan 2025 at 00:34, Mahendra Singh Thalor > wrote: > > > > > > On Mon, 6 Jan 2025 at 23:05, Nathan Bossart > wrote: > > > > > > > > On Thu, Jan 02, 2025 at 02:05:13AM +0530, Mahendra Singh Thalor > wrote: > > > > > Here, I am attaching an updated patch. I fixed some bugs of v01 > patch and > > > > > did some code cleanup also. > > > > > > > > Thank you for picking this up! I started to review it, but the > > > > documentation changes didn't build, and a few tests in check-world > are > > > > failing. Would you mind resolving those issues? Also, if you > haven't > > > > already, please add an entry to the next commitfest [0] to ensure > that 1) > > > > this feature is tracked and 2) the automated tests will run. > > > > > > Thanks Nathan for the quick response. > > > > > > I fixed bugs of documentation changes and check-world in the latest > patch. Now docs are building and check-world is passing. > > > > > > I added entry into commitfest for this patch.[0] > > > > > > > > > > > + if (dbfile) > > > > + { > > > > + printfPQExpBuffer(&cmd, "\"%s\" %s %s", pg_dump_bin, > > > > + dbfile, > create_opts); > > > > + appendPQExpBufferStr(&cmd, " -F d "); > > > > + } > > > > > > > > Have you given any thought to allowing a directory of custom format > files, > > > > as discussed upthread [1]? Perhaps that is better handled as a > follow-up > > > > patch, but it'd be good to understand the plan, anyway. > > > > > > I will make these changes and will test. I will update my findings > after doing some testing. > > > > In the latest patch, I added dump and restoring for > directory/custom/tar/plain formats. Please consider this patch for review > and testing. > > > > Design: > > When we give --format=d|c|t then we are dumping all global sql commands > in global.dat in plain sql format and we are making a map.dat file with > dbname and dboid. For each database, we are making separate subdirectory > with dboid under databases directory and dumping as per archive > format(d|c|t). > > While restoring, first we are restoring all global sql commands from > global.dat and then we are restoring one by one all databases. As we are > supporting --exclude-database with pg_dumpall, the same we are supporting > with pg_restore also to skip restoring on some specified database patterns. > > If we want to restore a single database, then we can specided particular > subdirectory from the databases folder. To get file name, we refer dbname > into map.file. > > > > TODO: Now I will work on test cases for these new added options to the > pg_dumpall and pg_restore. > > > > Here, I am attaching the v04 patch for testing and review. > > Sorry. My mistake. > v04 was the delta patch on the top of v03. > > Here, I am attaching the v05 patch for testing and review. > > Just FWIW, I did a quick test tonight. It applies cleanly, compiles OK. I did a dump: $ pg_dumpall -Fd -f dir and then a restore (after dropping the databases I had): $ pg_restore -Cd postgres -v dir It worked really well. That's great. Quick thing to fix: you've got this error message: pg_restore: error: -d/--dbanme should be given when using archive dump of pg_dumpall I guess it is --dbname, rather than --dbanme. Of course, it needs much more testing, but this feature would be great to have. Thanks for working on this! -- Guillaume.
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Mon, Jan 6, 2025 at 1:07 PM Jim Nasby wrote: > > I’ve been saying “workload management” for lack of a better term, but my > initial suggestion upthread was to simply stop allowing new transactions to > start if global work_mem consumption exceeded some threshold. That’s > simplistic enough that I wouldn’t really consider it “workload management”. > Maybe “deferred execution” would be a better name. The only other thing it’d > need is a timeout on how long a new transaction could sit in limbo. Yes, this seems like a good thing to do, but we need to handle "work_mem" , by itself, first. The problem is that it's just too easy for a query to blow up work_mem consumption, almost instantaneously. By the time we notice that we're low on working memory, pausing new transactions may not be sufficient. We could already be in the middle of a giant Hash Join, for example, and the hash table is just going to continue to grow... Before we can solve the problem you describe, we need to be able to limit the work_mem consumption by an in-progress query. James
Re: New GUC autovacuum_max_threshold ?
On Wed, Jan 8, 2025 at 3:01 PM Nathan Bossart wrote: > > On Wed, Jan 08, 2025 at 02:48:10PM +0100, Frédéric Yhuel wrote: > > For what it's worth, although I would have preferred the sub-linear growth > > thing, I'd much rather have this than nothing. > > +1, this is how I feel, too. But I also don't want to add something that > folks won't find useful. > > > And I have to admit that the proposed formulas were either too convoluted or > > wrong. > > > > This very patch is more straightforward. Please let me know if I can help > > and how. > > I read through the thread from the top, and it does seem like there is > reasonably strong support for the hard cap. Upon a closer review of the > patch, I noticed that the relopt was defined such that you couldn't disable > autovacuum_max_threshold on a per-table basis, so I fixed that in v4. > To be frank, this patch feels like a solution in search of a problem, and as I read back through the thread, it isn't clear what problem this is intended to fix. There is some talk of "simplifying" autovacuum configuration, but some noted that we already have a rather complex set of GUCs to deal with, and adding another one, along with more math, into the equation doesn't seem simpler to mel I'd like to think the bar should be that the problem should be clear. So what is the problem? Is the patch supposed to help with wraparound prevention? autovac_freeze_max_age already covers that, and when it doesn't vacuum_failsafe_age helps out. A couple of people mentioned issues around hitting the index wall when vacuuming large tables, but we believe that problem is mostly resolved due to radix based tid storage, so this doesn't solve that. (To the degree you don't think v17 has baked into enough production workloads to be sure, I'd agree, but that's also an argument against doing more work that might not be needed) Maybe the hope is that this setting will cause vacuum to run more often to help ameliorate i/o work from freeze vacuums kicking in, but I suspect that Melanie's nearby work on eager vacuuming is a smarter solution towards this problem (warning, it also may want to add more gucs), so I think we're not solving that, and in fact might be undercutting it. I guess that means this is supposed to help with bloat management? but only on large tables? I guess because you run vacuums more often? Except that the adages of running vacuums more often don't apply as cleanly to large tables, because those tables typically come with large indexes, and while we have a lot of machinery in place to help with repeated scans of the heap, that same machinery doesn't exist for scanning the indexes, which gives you sort of an exponential curve around vacuum times as table size (but actually index size) grows larger. On the upside, this does mean we're less likely to see a 50x boost in vacuums on large tables that some seemed concerned about, but on the downside its because we're probably going to increase the probability of vacuum worker starvation. But getting back to goals, if your goal is to help with bloat management, trying to tie that to a number that doesn't cleanly map to the meta information of the table in question is a poor way to do it. Meaning, to the degree that you are skeptical that vacuuming based on 20% of the rows of a table might not really be 20% of the size of the table, it's certainly going to be a closer map than 100million rows in a n number of tables of unknown (but presumably greater than 500million?) numbers of rows of unknown sizes. And again, we have a means to tackle these bloat cases already; lowering vacuum_scale_factor. This isn't to say the system is perfect; I do think there are some fundamental issues that need addressing, but adding this guc just feels a little less baked than usual. Robert Treat https://xzilla.net
Re: Conflict detection for update_deleted in logical replication
On Wed, Jan 8, 2025 at 3:00 AM Zhijie Hou (Fujitsu) wrote: > > On Wednesday, January 8, 2025 6:33 PM Masahiko Sawada > wrote: > > Hi, > > > On Wed, Jan 8, 2025 at 1:53 AM Amit Kapila > > wrote: > > > On Wed, Jan 8, 2025 at 3:02 PM Masahiko Sawada > > wrote: > > > > > > > > On Thu, Dec 19, 2024 at 11:11 PM Nisha Moond > > wrote: > > > > > > > > > > > > > > > [3] Test with pgbench run on both publisher and subscriber. > > > > > > > > > > > > > > > > > > > > Test setup: > > > > > > > > > > - Tests performed on pgHead + v16 patches > > > > > > > > > > - Created a pub-sub replication system. > > > > > > > > > > - Parameters for both instances were: > > > > > > > > > > > > > > > > > > > >share_buffers = 30GB > > > > > > > > > >min_wal_size = 10GB > > > > > > > > > >max_wal_size = 20GB > > > > > > > > > >autovacuum = false > > > > > > > > Since you disabled autovacuum on the subscriber, dead tuples created > > > > by non-hot updates are accumulated anyway regardless of > > > > detect_update_deleted setting, is that right? > > > > > > > > > > I think hot-pruning mechanism during the update operation will remove > > > dead tuples even when autovacuum is disabled. > > > > True, but why did it disable autovacuum? It seems that case1-2_setup.sh > > doesn't specify fillfactor, which makes hot-updates less likely to happen. > > IIUC, we disable autovacuum as a general practice in read-write tests for > stable TPS numbers. Okay. TBH I'm not sure what we can say with these results. At a glance, in a typical bi-directional-like setup, we can interpret these results as that if users turn retain_conflict_info on the TPS goes 50% down. But I'm not sure this 50% dip is the worst case that users possibly face. It could be better in practice thanks to autovacuum, or it also could go even worse due to further bloats if we run the test longer. Suppose that users had 50% performance dip due to dead tuple retention for update_deleted detection, is there any way for users to improve the situation? For example, trying to advance slot.xmin more frequently might help to reduce dead tuple accumulation. I think it would be good if we could have a way to balance between the publisher performance and the subscriber performance. In test case 3, we observed a -53% performance dip, which is worse than the results of test case 5 with wal_receiver_status_interval = 100s. Given that in test case 5 with wal_receiver_status_interval = 100s we cannot remove dead tuples for the most of the whole 120s test time, probably we could not remove dead tuples for a long time also in test case 3. I expected that the apply worker gets remote transaction XIDs and tries to advance slot.xmin more frequently, so this performance dip surprised me. I would like to know how many times the apply worker gets remote transaction XIDs and succeeds in advance slot.xmin during the test. > > > > > I understand that a certain performance dip happens due to dead tuple > > retention, which is fine, but I'm surprised that the TPS decreased by 50% > > within > > 120 seconds. The TPS goes even worse for a longer test? > > We will try to increase the time and run the test again. > > > I did a quick > > benchmark where I completely disabled removing dead tuples (by > > autovacuum=off and a logical slot) and ran pgbench but I didn't see such a > > precipitous dip. > > I think a logical slot only retain the dead tuples on system catalog, > so the TPS on user table would not be affected that much. You're right, I missed it. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Moving the vacuum GUCs' docs out of the Client Connection Defaults section
On Wed, Jan 8, 2025 at 8:39 AM Peter Eisentraut wrote: > > On 07.01.25 18:31, Melanie Plageman wrote: > > > > Oh, one thing I forgot to say. Though I increased the indentation of > > some of the subsections that I moved, I didn't rewrap the lines > > because they were already not wrapped to 78. I can do this, but I > > can't tell from the existing docs what text width the paragraphs of > > text are supposed to be wrapped to. > > For a patch that moves things around like this, I would leave everything > as is and not rewrap. That makes it easier to follow what's going on > with "git diff -w", "git show -w" etc. > > In .dir-locals.el, there is this configuration for Emacs: > > (nxml-mode . ((fill-column . 78) > (indent-tabs-mode . nil))) > > So that's one data point about what the line length should be. Well, in this case, the diff won't look different with git show/diff -w because moving them to another part of the file is more than a whitespace change. For clarity, I added a note to the commit message that the actual GUCs' docs are just lifted and shifted. I decided in the end not to wrap it anyway, though. I tried it and didn't like the result. If I wrap it to 78, that actually makes a good number of the GUCs' docs wider (i.e. they were wrapped to less than 78). Which means future additions are more likely to need to wrap (as Tom mentioned upthread). Attached is v2 (required a rebase). - Melanie v2-0001-Consolidate-docs-for-vacuum-related-GUCs-in-new-s.patch Description: Binary data
Re: Fix a wrong errmsg in AlterRole()
Hi, I modified the patch according to Tom's suggestion. -- Regards, ChangAo Chen v2-0001-Fix-a-wrong-errdetail-in-AlterRole.patch Description: Binary data
Re: pgindent exit status if a file encounters an error
On Wed, Jan 8, 2025 at 9:35 PM Andrew Dunstan wrote: > > > > I forget now what I was intending here, so I have committed this with minor > tweaks. > > Thanks Andrew for taking care of this. -- Best Wishes, Ashutosh Bapat
Re: Make pg_stat_io view count IOs as bytes instead of blocks
Hi, Thanks for the review! On Thu, 9 Jan 2025 at 05:59, Michael Paquier wrote: > > On Thu, Dec 26, 2024 at 02:41:26PM +0300, Nazir Bilal Yavuz wrote: > > Thanks! v4 is attached. I quickly tested the pg_stat_get_backend_io() > > function and it seems it is working. > > Thanks a lot for the rebased version. This looks pretty solid. Here > are some comments. > > void > -pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op) > +pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op, > uint64 bytes) > { > -pgstat_count_io_op_n(io_object, io_context, io_op, 1); > +pgstat_count_io_op_n(io_object, io_context, io_op, 1, bytes); > > pgstat_count_io_op_n() is only used locally in pgstat_io.c. I'm OK to > keep it as it is used with the time calculations, but wouldn't it be > better to make it static in pgstat_io.c instead and not declare it in > pgstat.h? Do we really have a need for pgstat_count_io_op() at all at > the end or would it be better to change it so as it can handle a > number of operations given by the caller? I am a bit confused, are you suggesting these two alternatives: 1- Making pgstat_count_io_op_n() static and continuing to use pgstat_count_io_op() as it is. 2- Removing pgstat_count_io_op() and instead using pgstat_count_io_op_n() everywhere. > > typedef struct PgStat_BktypeIO > { > +uint64 > bytes[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; > > This wastes a bit of memory, while keeping the code simpler to > understand. That's better than more element number manipulations, so > I'm OK with what you have here. > > +static inline bool > +is_ioop_tracked_in_bytes(IOOp io_op) > +{ > +Assert((unsigned int) io_op < IOOP_NUM_TYPES); > +return io_op >= IOOP_EXTEND; > +} > > This is only used in an assertion of pgstat_count_io_op_n() in > pgstat_io.c. Let's also keep it this routine local to the file. The > assert to make sure that the callers don't assign bytes to the > operations that don't support the counters is a good idea. Makes sense, done. > > + * If an IOOp does not tracked in bytes, IO_COL_INVALID is returned. > > s/does not tracked/is not tracked/. Done. > > +/* > + * Get the number of the column containing IO bytes for the specified IOOp. > + * If an IOOp does not tracked in bytes, IO_COL_INVALID is returned. > + */ > +static io_stat_col > +pgstat_get_io_byte_index(IOOp io_op) > +{ > > Makes sense to me, and that's consistent with how the time attributes > are handled. > > -/* > - * Hard-code this to the value of BLCKSZ for now. Future values > - * could include XLOG_BLCKSZ, once WAL IO is tracked, and > constant > - * multipliers, once non-block-oriented IO (e.g. temporary file > - * IO) is tracked. > - */ > -values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ); > > Glad to see that gone. > > +write_bytes bigint > +extend_bytes bigint > +read_bytes bigint > > These additions in the documentation are incorrect. All these > attributes are of type numeric, not bigint. Done. > > +Number of units of size BLCKSZ which the process > > It seems to me that this had better mention 8192 as the default. I agree, done. v5 is attached. -- Regards, Nazir Bilal Yavuz Microsoft From b8123b9447117bdf4c9f34ddf57fd7acbd61745d Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Thu, 9 Jan 2025 09:39:05 +0300 Subject: [PATCH v5] Make pg_stat_io count IOs as bytes instead of blocks Currently in pg_stat_io view, IOs are counted as blocks. There are two problems with this approach: 1- The actual number of I/O requests sent to the kernel is lower because I/O requests may be merged before being sent. Additionally, it gives the impression that all I/Os are done in block size, which shadows the benefits of merging I/O requests. 2- There may be some IOs which are not done in block size in the future. For example, WAL read IOs are done in variable bytes and it is not possible to correctly show these IOs in pg_stat_io view. Because of these problems, now show the total number of IO requests to the kernel (as smgr function calls) and total number of bytes in the IO. Also, op_bytes column is removed from the pg_stat_io view. --- src/include/catalog/pg_proc.dat | 12 +-- src/include/pgstat.h| 31 ++-- src/backend/catalog/system_views.sql| 4 +- src/backend/storage/buffer/bufmgr.c | 14 ++-- src/backend/storage/buffer/localbuf.c | 6 +- src/backend/storage/smgr/md.c | 4 +- src/backend/utils/activity/pgstat_backend.c | 2 + src/backend/utils/activity/pgstat_io.c | 23 -- src/backend/utils/adt/pgstatfuncs.c | 86 - src/test/regress/expected/rules.out | 6 +- doc/src/sgml/monitoring.sgml| 63 +-- 11 files changed, 172 insertions(+), 79 deletions(-) diff --
Re: per backend WAL statistics
Hi, On Thu, Jan 09, 2025 at 01:03:15PM +0900, Michael Paquier wrote: > On Wed, Jan 08, 2025 at 11:11:59AM +, Bertrand Drouvot wrote: > > Yeah, that's more elegant as it also means that the main callback will not > > change > > (should we add even more stats in the future). Done that way in v2 attached. > > I've put my hands on v2-0002 to begin with something. > > +/* flag bits for different types of statistics to flush */ > +#define PGSTAT_FLUSH_IO(1 << 0) /* Flush I/O statistics */ > +#define PGSTAT_FLUSH_ALL (PGSTAT_FLUSH_IO) > > These are located and used only in pgstat_backend.c. It seems to me > that we'd better declare them in pgstat_internal.h and extend the > existing pgstat_flush_backend() with an argument so as callers can do > what they want. > > + /* Get our own entry_ref if not provided */ > + if (!entry_ref) > + entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, > InvalidOid, > + > MyProcNumber, false, NULL); > > This relates to the previous remark, actually, where I think that it > is cleaner to have pgstat_flush_backend() do pgstat_get_entry_ref(), > same way as HEAD, and just pass down the flags. I see, so you keep pgstat_flush_backend() calls (with an extra arg) and remove the new "pgstat_backend_flush_io()" function. > This comes at the cost of pgstat_flush_backend_entry() > requiring an extra pgstat_tracks_backend_bktype(), which is not a big > issue, and the patch gets a bit shorter. Yeah, all of the above is fine by me. PFA v3 which is v2 refactoring with your proposed above changes. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From bcdf4eb91f7ff4beeba50433a173ac2650afe432 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Mon, 6 Jan 2025 07:51:27 + Subject: [PATCH v3 1/3] Extract logic filling pg_stat_get_wal()'s tuple into its own routine This commit adds pg_stat_wal_build_tuple(), a helper routine for pg_stat_get_wal(), that fills its tuple based on the contents of PgStat_WalStats. This will be used in a follow-up commit that uses the same structures as pg_stat_wal for reporting, but for the PGSTAT_KIND_BACKEND statistics kind. --- src/backend/utils/adt/pgstatfuncs.c | 56 ++--- 1 file changed, 36 insertions(+), 20 deletions(-) 100.0% src/backend/utils/adt/ diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index 3245f3a8d8..7309f06993 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -1560,20 +1560,22 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS) } /* - * Returns statistics of WAL activity + * pg_stat_wal_build_tuple + * + * Helper routine for pg_stat_get_wal() returning one tuple based on the contents + * of wal_stats. */ -Datum -pg_stat_get_wal(PG_FUNCTION_ARGS) +static Datum +pg_stat_wal_build_tuple(PgStat_WalStats wal_stats) { -#define PG_STAT_GET_WAL_COLS 9 +#define PG_STAT_WAL_COLS 9 TupleDesc tupdesc; - Datum values[PG_STAT_GET_WAL_COLS] = {0}; - bool nulls[PG_STAT_GET_WAL_COLS] = {0}; + Datum values[PG_STAT_WAL_COLS] = {0}; + bool nulls[PG_STAT_WAL_COLS] = {0}; char buf[256]; - PgStat_WalStats *wal_stats; /* Initialise attributes information in the tuple descriptor */ - tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS); + tupdesc = CreateTemplateTupleDesc(PG_STAT_WAL_COLS); TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_records", INT8OID, -1, 0); TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wal_fpi", @@ -1595,34 +1597,48 @@ pg_stat_get_wal(PG_FUNCTION_ARGS) BlessTupleDesc(tupdesc); - /* Get statistics about WAL activity */ - wal_stats = pgstat_fetch_stat_wal(); - /* Fill values and NULLs */ - values[0] = Int64GetDatum(wal_stats->wal_records); - values[1] = Int64GetDatum(wal_stats->wal_fpi); + values[0] = Int64GetDatum(wal_stats.wal_records); + values[1] = Int64GetDatum(wal_stats.wal_fpi); /* Convert to numeric. */ - snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats->wal_bytes); + snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes); values[2] = DirectFunctionCall3(numeric_in, CStringGetDatum(buf), ObjectIdGetDatum(0), Int32GetDatum(-1)); - values[3] = Int64GetDatum(wal_stats->wal_buffers_full); - values[4] = Int64GetDatum(wal_stats->wal_write); - values[5] = Int64GetDatum(wal_stats->wal_sync); + values[3] = Int64GetDatum(wal_stats.wal_buffers_full); + values[4] = Int64GetDatum(wal_stats.wal_write); + values[5] = Int64GetDatum(wal_stats.wal_sync); /* Convert counters from microsec to millisec for display */ - values[6] = Float8GetDatum(((double) wal_stats->wal_write_time) / 1000.0); - values[7] = Float8GetDatum(((double) wal_stats->wal_sync_time) / 1000.0); + values[6] = Float8GetDatum(((double) wal_stats.wal_write_time) / 1000.0); + values[7] = Float8Ge
RE: Log a warning in pg_createsubscriber for max_slot_wal_keep_size
Dear Shubham, Thanks for updating the patch. Few comments. ``` +#include "common/int.h"/* include for strtoi64 */ ``` I could build the source code without the inclusion. Can you remove? ``` + max_slot_wal_keep_size = strtoi64(PQgetvalue(res, 0, 6), NULL, 0); + + /* use strtoi64 to convert the string result to a 64-bit integer */ + if (max_slot_wal_keep_size == 0 && errno != 0) + { + /* handle conversion error */ + pg_log_error("Failed to convert max_slot_wal_keep_size to int64"); + } ``` I'm not sure the error handling is really needed. pg_dump also uses the function and it does not have such handlings. ``` + pg_log_debug("publisher: max_slot_wal_keep_size: %ld", +max_slot_wal_keep_size); ``` IIUC, "%ld" does not always represent int64 format. Since it is a debug message, isn't it enough to use INT64_FORMAT macro? ``` +# Configure 'max_slot_wal_keep_size = 1' on the publisher and +# reload configuration +$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 1'); +$node_p->reload; ``` Can you use 'max_slot_wal_keep_size = 10MB' or something to test the handling is correct? Best regards, Hayato Kuroda FUJITSU LIMITED
Re: Several buildfarm animals fail tests because of shared memory error
On Thu, 9 Jan 2025 at 15:30, Alexander Lakhin wrote: > > Maybe you could try to reproduce such failures without buildfarm client, just > by running select_parallel, for example, with the attached patch applied. > I mean running `make check` with parallel_schedule like: > ... > Or > TESTS="test_setup copy create_misc create_index $(printf "select_parallel %.0s" {1..100})" make check-tests > Thanks Alexander for pointing to the test steps. I'll try to run these on leafhopper the next couple of days and come back if I see anything interesting. - robins
Re: per backend WAL statistics
On Wed, Jan 08, 2025 at 11:11:59AM +, Bertrand Drouvot wrote: > Yeah, that's more elegant as it also means that the main callback will not > change > (should we add even more stats in the future). Done that way in v2 attached. I've put my hands on v2-0002 to begin with something. +/* flag bits for different types of statistics to flush */ +#define PGSTAT_FLUSH_IO(1 << 0) /* Flush I/O statistics */ +#define PGSTAT_FLUSH_ALL (PGSTAT_FLUSH_IO) These are located and used only in pgstat_backend.c. It seems to me that we'd better declare them in pgstat_internal.h and extend the existing pgstat_flush_backend() with an argument so as callers can do what they want. + /* Get our own entry_ref if not provided */ + if (!entry_ref) + entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid, + MyProcNumber, false, NULL); This relates to the previous remark, actually, where I think that it is cleaner to have pgstat_flush_backend() do pgstat_get_entry_ref(), same way as HEAD, and just pass down the flags. pgstat_flush_backend cannot call directly pgstat_backend_flush_cb(), of course, so I've settled down to a new pgstat_flush_backend_entry() that handles the entry locking. This comes at the cost of pgstat_flush_backend_entry() requiring an extra pgstat_tracks_backend_bktype(), which is not a big issue, and the patch gets a bit shorter. -- Michael From eba357e4e2712cd73fdad585f6d8088de0dbaccc Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Mon, 6 Jan 2025 08:44:29 + Subject: [PATCH v3] PGSTAT_KIND_BACKEND code refactoring This commit refactors some come related to per backend statistics. It makes the code more generic or more IO statistics focused as it will be used in a follow-up commit that will introduce per backend WAL statistics. --- src/include/pgstat.h | 6 +- src/include/utils/pgstat_internal.h | 7 +- src/backend/utils/activity/pgstat.c | 2 +- src/backend/utils/activity/pgstat_backend.c | 69 ++-- src/backend/utils/activity/pgstat_io.c | 8 +-- src/backend/utils/activity/pgstat_relation.c | 4 +- src/backend/utils/adt/pgstatfuncs.c | 2 +- src/tools/pgindent/typedefs.list | 1 + 8 files changed, 68 insertions(+), 31 deletions(-) diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 0d8427f27d1..6631bd2d730 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -381,7 +381,7 @@ typedef PgStat_PendingIO PgStat_BackendPendingIO; typedef struct PgStat_Backend { TimestampTz stat_reset_timestamp; - PgStat_BktypeIO stats; + PgStat_BktypeIO io_stats; } PgStat_Backend; typedef struct PgStat_StatDBEntry @@ -523,6 +523,10 @@ typedef struct PgStat_PendingWalStats instr_time wal_sync_time; } PgStat_PendingWalStats; +typedef struct PgStat_BackendPending +{ + PgStat_BackendPendingIO pending_io; +} PgStat_BackendPending; /* * Functions in pgstat.c diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h index 52eb008710f..4bb8e5c53ab 100644 --- a/src/include/utils/pgstat_internal.h +++ b/src/include/utils/pgstat_internal.h @@ -613,9 +613,12 @@ extern void pgstat_archiver_snapshot_cb(void); * Functions in pgstat_backend.c */ -extern void pgstat_flush_backend(bool nowait); +/* flags for pgstat_flush_backend() */ +#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */ +#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO) -extern PgStat_BackendPendingIO *pgstat_prep_backend_pending(ProcNumber procnum); +extern void pgstat_flush_backend(bool nowait, bits32 flags); +extern PgStat_BackendPending *pgstat_prep_backend_pending(ProcNumber procnum); extern bool pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait); extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts); diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c index 16a03b8ce15..34520535d54 100644 --- a/src/backend/utils/activity/pgstat.c +++ b/src/backend/utils/activity/pgstat.c @@ -370,7 +370,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE] .shared_size = sizeof(PgStatShared_Backend), .shared_data_off = offsetof(PgStatShared_Backend, stats), .shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats), - .pending_size = sizeof(PgStat_BackendPendingIO), + .pending_size = sizeof(PgStat_BackendPending), .flush_pending_cb = pgstat_backend_flush_cb, .reset_timestamp_cb = pgstat_backend_reset_timestamp_cb, diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c index 1f91bfef0a3..3972bcf9456 100644 --- a/src/backend/utils/activity/pgstat_backend.c +++ b/src/backend/utils/activity/pgstat_backend.c @@ -39,23 +39,21 @@ pgstat_fetch_stat_backend(ProcNumber procNumber) }
Re: WAL-logging facility for pgstats kinds
On Thu, Jan 02, 2025 at 08:08:42PM -0500, Andres Freund wrote: > I can't think of a real case where we would want to WAL log the stats > themselves, rather than re-emitting stats during replay based on the WAL > record of the "underlying object". > > Do you have counter-examples? I'm not sure if the rebuild based on the WAL records is simpler than logging a snapshot of them that the startup process could digest. Anyway, I've also wanted to be able to replicate stats for historical tracking for stats logged through hooks, and a second case was injection point stats. All these would require registering a custom rmgr on top of a custom stats kind, and centralizing the logic eases a lot some of the sanity checks they'd require as the redo callback of a stats kind can just be attached to its PgStat_KindInfo. I know that Lucas has been playing a bit with the area, and perhaps he has more cases in mind where the replication of stats data could be relevant. So I am adding him in CC. Perhaps I could be wrong, of course. -- Michael signature.asc Description: PGP signature
Re: Proposal: add new API to stringinfo
> If it's not too much work to coax the compiler to do so, then I don't see a > strong reason to avoid it. Granted, there are probably much more expensive > parts of the StringInfo support functions (e.g., palloc()), but these are > hot enough code paths that IMHO it's worth saving whatever cycles we can. Ok, I have created v3 patch to do more inlining as you suggested. With the patch I confirmed that there's no call to functions except palloc in makeStringInfo, makeStringInfoExt, initStringInfo and initStringInfoExt in the asm codes (see attached stringinfo.s). Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp v3-0001-Add-new-StringInfo-APIs-to-allow-callers-to-speci.patch Description: Binary data .file "stringinfo.c" .text .p2align 4 .globl makeStringInfo .type makeStringInfo, @function makeStringInfo: .LFB100: .cfi_startproc endbr64 pushq %r12 .cfi_def_cfa_offset 16 .cfi_offset 12, -16 movl$24, %edi callpalloc@PLT movl$1024, %edi movq%rax, %r12 callpalloc@PLT movl$1024, 12(%r12) movq%rax, (%r12) movb$0, (%rax) movq%r12, %rax movl$0, 8(%r12) movl$0, 16(%r12) popq%r12 .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE100: .size makeStringInfo, .-makeStringInfo .section.rodata.str1.1,"aMS",@progbits,1 .LC0: .string "src/common/stringinfo.c" .LC1: .string "initsize > 0" .text .p2align 4 .globl makeStringInfoExt .type makeStringInfoExt, @function makeStringInfoExt: .LFB101: .cfi_startproc endbr64 pushq %r12 .cfi_def_cfa_offset 16 .cfi_offset 12, -16 pushq %rbx .cfi_def_cfa_offset 24 .cfi_offset 3, -24 movl%edi, %ebx movl$24, %edi subq$8, %rsp .cfi_def_cfa_offset 32 callpalloc@PLT testl %ebx, %ebx jle .L7 movslq %ebx, %rdi movq%rax, %r12 callpalloc@PLT movl%ebx, 12(%r12) movq%rax, (%r12) movb$0, (%rax) movq%r12, %rax movl$0, 8(%r12) movl$0, 16(%r12) addq$8, %rsp .cfi_remember_state .cfi_def_cfa_offset 24 popq%rbx .cfi_def_cfa_offset 16 popq%r12 .cfi_def_cfa_offset 8 ret .L7: .cfi_restore_state movl$43, %edx leaq.LC0(%rip), %rsi leaq.LC1(%rip), %rdi callExceptionalCondition@PLT .cfi_endproc .LFE101: .size makeStringInfoExt, .-makeStringInfoExt .p2align 4 .globl initStringInfo .type initStringInfo, @function initStringInfo: .LFB102: .cfi_startproc endbr64 pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movq%rdi, %rbx movl$1024, %edi callpalloc@PLT movl$1024, 12(%rbx) movq%rax, (%rbx) movb$0, (%rax) movl$0, 8(%rbx) movl$0, 16(%rbx) popq%rbx .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE102: .size initStringInfo, .-initStringInfo .p2align 4 .globl initStringInfoExt .type initStringInfoExt, @function initStringInfoExt: .LFB103: .cfi_startproc endbr64 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 pushq %rbx .cfi_def_cfa_offset 24 .cfi_offset 3, -24 subq$8, %rsp .cfi_def_cfa_offset 32 testl %esi, %esi jle .L13 movq%rdi, %rbx movslq %esi, %rdi movl%esi, %ebp callpalloc@PLT movl%ebp, 12(%rbx) movq%rax, (%rbx) movb$0, (%rax) movl$0, 8(%rbx) movl$0, 16(%rbx) addq$8, %rsp .cfi_remember_state .cfi_def_cfa_offset 24 popq%rbx .cfi_def_cfa_offset 16 popq%rbp .cfi_def_cfa_offset 8 ret .L13: .cfi_restore_state movl$43, %edx leaq.LC0(%rip), %rsi leaq.LC1(%rip), %rdi callExceptionalCondition@PLT .cfi_endproc .LFE103: .size initStringInfoExt, .-initStringInfoExt .section.rodata.str1.1 .LC2: .string "str->maxlen != 0" .text .p2align 4 .globl resetStringInfo .type resetStringInfo, @function resetStringInfo: .LFB104: .cfi_startproc endbr64 movl12(%rdi), %edx testl %edx, %edx je .L19 movq(%rdi), %rax movb$0, (%rax)
Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING
On Sat, Jan 4, 2025 at 4:23 AM Robert Treat wrote: > > On Wed, Dec 18, 2024 at 5:56 AM Amit Kapila wrote: > > > > On Thu, Feb 8, 2024 at 9:57 AM Laurenz Albe > > wrote: > > > > > > On Thu, 2024-02-08 at 13:40 +1100, Peter Smith wrote: > > > > - how to set the replica identity. If a table without a replica > > > > identity is > > > > + how to set the replica identity. If a table without a replica > > > > identity > > > > + (or with replica identity behavior the same as > > > > NOTHING) is > > > > added to a publication that replicates UPDATE > > > > or DELETE operations then > > > > subsequent UPDATE or DELETE > > > > > > I had the impression that the root of the confusion was the perceived > > > difference > > > between "REPLICA IDENTITY NOTHING" and "no replica identity", and that > > > change > > > doesn't improve that. > > > > > > How about: > > > > > > If a table without a replica identity (explicitly set to > > > NOTHING, > > > or set to a primary key or index that doesn't exist) is added ... > > > > > > > Is it correct to say "set to a primary key or index that doesn't > > exist"? Because when it is set to the primary key then it should work. > > > > I think Peter's proposal along with Ashutosh's proposal is the simpler > > approach to clarify things in this area but I am fine if others find > > some other way of updating docs better. > > > > After reading this thread, I think part of the confusion here is that > technically speaking there is no such thing as having "no replica > identity"; when a table is created, by default it's "replica identity" > is set to DEFAULT, and how it behaves at that point will be dependent > on the structure of the table (PK or no PK), not whether it is being > replicated/published. To that end, I've taken the above suggestions > and re-worked them to remove that language, as well as add some > additional clarity. > > Patch attached for this, I am going to update the commitfest with > myself as author and will work this further if needed. Thanks all. > > Robert Treat > https://xzilla.net Hi Robert. Thanks for picking up this patch. Some review comments for the 0001 patch. == doc/src/sgml/logical-replication.sgml 1. fallback if no other solution is possible. If a replica identity other than FULL is set on the publisher side, a replica identity comprising the same or fewer columns must also be set on the subscriber - side. See for details on - how to set the replica identity. If a table without a replica identity is + side. If a table with replica identity set to NOTHING + (or set to use a primary key or index that doesn't exist) is added to a publication that replicates UPDATE or DELETE operations then subsequent UPDATE or DELETE operations will cause an error on the publisher. INSERT operations can proceed regardless of any replica identity. + See for details on + how to set the replica identity. 1a. That part "If a table with replica identity set to NOTHING (or set to use a primary key or index that doesn't exist) is added ..." is not very clear to me. IIUC, there are 3 ways for this to be a problem: i) RI is set to NOTHING ii) RI is set to DEFAULT but there is no valid PK ==> same as NOTHING iii) RI is set USING INDEX but then that index gets dropped ==> same as NOTHING To avoid any misunderstandings, why don't we just spell that out in full? So, ... SUGGESTION A replica identity behaves the same as NOTHING when it is set to DEFAULT and there is no primary key, or when it is set USING INDEX but the index no longer exists. If a table with replica identity set to NOTHING (or behaving the same as NOTHING) is added to a publication that replicates UPDATE or DELETE operations then subsequent UPDATE or DELETE operations will cause an error on the publisher. ~ 1b. I've always thought that for this important topic of logical replication it is unusual that there is not even a heading for this topic anywhere. This makes it unnecessarily difficult to find this information. IMO it would greatly help just to have a "Replica Identity" subsection for all this paragraph, so then it will appear nicely in the Chapter 29 table-of-contents making it much easier to find. ~ 1c. That great big slab of paragraph text is hard to read. I suspect it was done that way simply to separate the RI topic visually from the other (paragraph) topics of the sect1 heading. But now, after we introduce the sect2 heading (my suggestion #1b above), we don't have to do that anymore, so more blank lines can be added and it improves the readability a lot. == src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml 2. - Records the old values of the columns of the primary key, if any. + Records the old values of the columns of the primary key. When there + is no primary key, the behavior is the same as NOTHING. This is the default for non-system tables.
RE: ecpg command does not warn COPY ... FROM STDIN;
Dear Tom, Fujii-san, > > ISTM that ecpg supports COPY TO STDOUT and includes the regression test > "copystdout" for it. No? > > Oh right. (Pokes at it...) It looks like the backend accepts > "FROM STDOUT" as a synonym for "FROM STDIN", so that's why this > is checking for both spellings. But I agree it'd be better > for the error message to only use the standard spelling. > > Also I tried > > regression=# copy int4_tbl from program stdin; > ERROR: STDIN/STDOUT not allowed with PROGRAM > > So we probably don't need to bother with adjusting the check > to allow that. To confirm - I have no objections for the decision. I'm also think it is not an usual grammar. I checked the doc [1] and I could not find "COPY FROM STDOUT". I.e., first version is enough. [1]: https://www.postgresql.org/docs/devel/sql-copy.html Best regards, Hayato Kuroda FUJITSU LIMITED
Re: pure parsers and reentrant scanners
On 20.12.24 16:23, Tom Lane wrote: Ok, we can fix that, but maybe this is also a good moment to think about whether that is useful. I could not reproduce the issue with flex 2.5.39. I could find no download of flex 2.5.35. The github site only offers back to 2.5.39, the sourceforce site back to 2.5.36. lapwing says it's Debian 7.0, which went out of support in 2016 and out of super-duper-extended support in 2020. It also doesn't have a supported OpenSSL version anymore, and IIRC, it has a weird old compiler that occasionally gives bogus warnings. I think it's time to stop supporting this. OK, that's fair. I do see lapwing called out a lot in the commit log, though it's not clear how much of that is about 32-bitness and how much about old tools. It's surely still valuable to have i386 machines in the buildfarm, but I agree that supporting unobtainable tool versions is a bit much. Could we get that animal updated to some newer OS version? Presumably, we should also rip out the existing yyget_column and yyset_column kluges in src/backend/parser/scan.l: extern int core_yyget_column(yyscan_t yyscanner); src/bin/psql/psqlscanslash.l: extern int slash_yyget_column(yyscan_t yyscanner); src/bin/pgbench/exprscan.l: extern int expr_yyget_column(yyscan_t yyscanner); src/fe_utils/psqlscan.l: extern intpsql_yyget_column(yyscan_t yyscanner); All my flex-related patches are in now. Here is a patch that removes the workarounds for compiler warnings with flex 2.5.35. This ended up being a whole lot, including the whole fix-old-flex-code.pl script. The second patch contemplates raising the minimum required flex version, but what to? The most recent incrementing was exactly because 2.5.35 was the oldest in the buildfarm. The previous incrementings were apparently because certain features were required or some bugs had to be avoided. Options: - Leave at 2.5.35 as long as it's present in the buildfarm. - Set to 2.5.36 because it's the oldest that compiles without warnings. Also, the oldest you can still download from the flex sourceforge site. - Set to 2.5.37 because that's the next oldest in the buildfarm (for CentOS/RHEL 7, so it will stay around for a while). - Set to 2.5.34 because that's the oldest we actually require as of commit b1ef48980dd. - Remove version check, because these are all so old that no one cares anymore. From fc1e8aedfe7371ae563ec51dd04346c0efd69fff Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 9 Jan 2025 08:36:16 +0100 Subject: [PATCH 1/2] Drop warning-free support for flex 2.5.35 This removes all the various workarounds for avoiding compiler warnings with flex 2.5.35. Several recent patches have added additional warnings that would either need to be fixed along the lines of the existing workarounds, or we decide to no longer care about this, which we do here. Flex 2.5.35 is extremely outdated, and you can't even download it anymore from any of the Flex project sites, so it's nearly impossible to support. After this, using flex 2.5.35 will still work, but the generated code will produce numerous compiler warnings. Discussion: https://www.postgresql.org/message-id/1a204ccd-7ae6-478c-a431-407b5c48c...@eisentraut.org --- src/Makefile.global.in | 1 - src/backend/parser/Makefile| 1 - src/backend/parser/meson.build | 2 +- src/backend/parser/scan.l | 9 - src/bin/pgbench/exprscan.l | 9 - src/bin/psql/Makefile | 1 - src/bin/psql/meson.build | 2 +- src/bin/psql/psqlscanslash.l | 9 - src/fe_utils/Makefile | 1 - src/fe_utils/meson.build | 2 +- src/fe_utils/psqlscan.l| 9 - src/tools/fix-old-flex-code.pl | 66 -- src/tools/pgflex | 13 --- 13 files changed, 3 insertions(+), 122 deletions(-) delete mode 100644 src/tools/fix-old-flex-code.pl diff --git a/src/Makefile.global.in b/src/Makefile.global.in index eac3d001211..1278b7744f4 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -788,7 +788,6 @@ TAS = @TAS@ %.c: %.l $(FLEX) $(if $(FLEX_NO_BACKUP),-b) $(FLEXFLAGS) -o'$@' $< @$(if $(FLEX_NO_BACKUP),if [ `wc -l &2; exit 1; fi) - $(if $(FLEX_FIX_WARNING),$(PERL) $(top_srcdir)/src/tools/fix-old-flex-code.pl '$@') %.c: %.y $(if $(BISON_CHECK_CMD),$(BISON_CHECK_CMD)) diff --git a/src/backend/parser/Makefile b/src/backend/parser/Makefile index 3162a01f302..8c0fe28d63f 100644 --- a/src/backend/parser/Makefile +++ b/src/backend/parser/Makefile @@ -59,7 +59,6 @@ gram.c: BISON_CHECK_CMD = $(PERL) $(srcdir)/check_keywords.pl $< $(top_srcdir)/s scan.c: FLEXFLAGS = -CF -p -p scan.c: FLEX_NO_BACKUP=yes -scan.c: FLEX_FIX_WARNING=yes # Force these dependencies to be known even without dependency info built: diff --git a/src/backend/parser/meson.build b/src/backend/parser/meson.build index 4c3ca25dd49..874aa749aa6 100644 --- a/src/b
Re: Coccinelle for PostgreSQL development [4/N]: correcting palloc() use
On Wed, Jan 8, 2025 at 10:02 PM Peter Eisentraut wrote: > On 07.01.25 20:49, Mats Kindahl wrote: > > This is the first example semantic patch and shows how to capture and > > fix a common problem. > > > > If you use an palloc() to allocate memory for an object (or an array of > > objects) and by mistake type something like: > > > > StringInfoData *info = palloc(sizeof(StringInfoData*)); > > > > You will not allocate enough memory for storing the object. This > > semantic patch catches any cases where you are either allocating an > > array of objects or a single object that do not have corret types in > > this sense, more precisely, it captures assignments to a variable of > > type T* where palloc() uses sizeof(T) either alone or with a single > > expression (assuming this is an array count). > > > > The semantic patch is overzealous in the sense that allocation to a > > "const char **" expects a "sizeof(const char *)" and it cannot deal with > > typedefs that introduce aliases (these two can be seen in the patch). > > Although the sizes of these are the same, and Coccinelle do not have a > > good system for comparing types, it might be better to just follow the > > convention of always using the type "T*" for any "palloc(sizeof(T))" > > since it makes automated checking easier and is a small inconvenience; > > especially considering that coccicheck can easily fix this for you. It > > also simplifies other automated checking to follow this convention. > > I think this kind of thing is better addressed with a static analyzer or > some heavier compiler annotations and warnings or something like that. > That kind of tool would have the right level of information to decide > whether some code is correct. Having a text matching tool do it is just > never going to be complete, and the problems you mention show that this > tool fundamentally doesn't understand the code, and just knows how to > parse the code a little better than grep or sed. > Thanks for the comments Peter. On one hand, I agree with you. Coccinelle understands the *structure* of the code (e.g., what is a type and what is an identifier, where functions start and end, etc.), but not the *semantics* of the code (e.g., if two types are different). On the other hand, however, development practice is also about following common coding patterns to avoid problems and improve readability. In this sense, having strange situations in the code that just happen to work makes it more difficult for developers as well as for automation tools. For example, allocating memory for elements of sizeof(T**) and assigning it to a T** variable will not be caught by something like ASAN (because the sizes match), and cannot trigger a fault (again, because the sizes match), but correcting this would still make the life easier for both developers and static analysis tools. That said, if your only concern is the "const int *" vs. "int *" thing (but not the T** vs T* thing), it is possible to deal with this by either adding Python code to the Coccinelle script to just remove all "const" since they do not affect the size or detect that one of the types has "const" in it and just ignore this situation and not suggest to patch it. > I do like the idea of exploring this tool for facilitating code > rewriting or semantic patching. But I'm suspicious about using it for > enforcing coding styles or patterns. > The case we're discussing here is one such case where I think that enforcing a coding style might not be a good idea, but I think it was valuable to raise the issue and get opinions. (This is why I mentioned the problem with the patch being overzealous.) The real value here is not specific patches though. It is having a tool like Coccinelle for detecting and correcting problematic code as part of the development practice since I think it would speed up development and review and also improve the quality of the code. -- Best wishes, Mats Kindahl, Timescale
Re: [PATCH] Add get_bytes() and set_bytes() functions
On Fri, Oct 18, 2024 at 05:20:42PM +0300, Aleksander Alekseev wrote: > Rebased, v5. v5-0001 includes the following output: --- a/src/test/regress/expected/opr_sanity.out +++ b/src/test/regress/expected/opr_sanity.out @@ -126,9 +126,12 @@ WHERE p1.oid < p2.oid AND p1.proretset != p2.proretset OR p1.provolatile != p2.provolatile OR p1.pronargs != p2.pronargs); - oid | proname | oid | proname --+-+-+- -(0 rows) + oid | proname | oid | proname +--+--+--+- + 2405 | int2send | 8577 | bytea + 2407 | int4send | 8578 | bytea + 2409 | int8send | 8579 | bytea +(3 rows) This should not happen, as you are using multiple times the same function. -- Michael signature.asc Description: PGP signature
RE: Conflict detection for update_deleted in logical replication
On Thursday, January 9, 2025 9:48 AM Masahiko Sawada Hi, > > On Wed, Jan 8, 2025 at 3:00 AM Zhijie Hou (Fujitsu) > wrote: > > > > On Wednesday, January 8, 2025 6:33 PM Masahiko Sawada > wrote: > > > > Hi, > > > > > On Wed, Jan 8, 2025 at 1:53 AM Amit Kapila > > > wrote: > > > > On Wed, Jan 8, 2025 at 3:02 PM Masahiko Sawada > > > wrote: > > > > > > > > > > On Thu, Dec 19, 2024 at 11:11 PM Nisha Moond > > > wrote: > > > > > > > > > > > > > > > > > > [3] Test with pgbench run on both publisher and subscriber. > > > > > > > > > > > > > > > > > > > > > > > > Test setup: > > > > > > > > > > > > - Tests performed on pgHead + v16 patches > > > > > > > > > > > > - Created a pub-sub replication system. > > > > > > > > > > > > - Parameters for both instances were: > > > > > > > > > > > > > > > > > > > > > > > >share_buffers = 30GB > > > > > > > > > > > >min_wal_size = 10GB > > > > > > > > > > > >max_wal_size = 20GB > > > > > > > > > > > >autovacuum = false > > > > > > > > > > Since you disabled autovacuum on the subscriber, dead tuples > > > > > created by non-hot updates are accumulated anyway regardless of > > > > > detect_update_deleted setting, is that right? > > > > > > > > > > > > > I think hot-pruning mechanism during the update operation will > > > > remove dead tuples even when autovacuum is disabled. > > > > > > True, but why did it disable autovacuum? It seems that > > > case1-2_setup.sh doesn't specify fillfactor, which makes hot-updates less > likely to happen. > > > > IIUC, we disable autovacuum as a general practice in read-write tests > > for stable TPS numbers. > ... > In test case 3, we observed a -53% performance dip, which is worse than the > results of test case 5 with wal_receiver_status_interval = 100s. Given that > in test case 5 with wal_receiver_status_interval = 100s we cannot remove dead > tuples for the most of the whole 120s test time, probably we could not remove > dead tuples for a long time also in test case 3. I expected that the apply > worker gets remote transaction XIDs and tries to advance slot.xmin more > frequently, so this performance dip surprised me. As noted in my previous email[1], the delay primarily occurs during the final phase (RCI_WAIT_FOR_LOCAL_FLUSH), where we wait for concurrent transactions from the publisher to be applied and flushed locally (e.g., last_flushpos < data->remote_lsn). I think that the interval between each transaction ID advancement is brief, the duration of each advancement itself is significant. > I would like to know how many times the apply worker gets remote transaction > XIDs and succeeds in advance slot.xmin during the test. my colleague will collect and share the data soon. [1] https://www.postgresql.org/message-id/OS0PR01MB57164C9A65F29875AE63F0BD94132%40OS0PR01MB5716.jpnprd01.prod.outlook.com Best Regards, Hou zj
Re: Conflict detection for update_deleted in logical replication
On Wed, Jan 8, 2025 at 2:24 PM Amit Kapila wrote: > > On Wed, Jan 8, 2025 at 2:15 PM Masahiko Sawada wrote: > > > > On Tue, Jan 7, 2025 at 2:49 AM Amit Kapila wrote: > > > > > > > > We thought of another approach, which is to create/drop this slot first > > > > as > > > > soon as one enables/disables detect_update_deleted (E.g. create/drop > > > > slot > > > > during DDL). But it seems complicate to control the concurrent slot > > > > create/drop. For example, if one backend A enables > > > > detect_update_deteled, it > > > > will create a slot. But if another backend B is disabling the > > > > detect_update_deteled at the same time, then the newly created slot may > > > > be > > > > dropped by backend B. I thought about checking the number of > > > > subscriptions that > > > > enables detect_update_deteled before dropping the slot in backend B, > > > > but the > > > > subscription changes caused by backend A may not visable yet (e.g. not > > > > committed yet). > > > > > > > > > > This means that for the transaction whose changes are not yet visible, > > > we may have already created the slot and the backend B would end up > > > dropping it. Is it possible that during the change of this new option > > > via DDL, we take AccessExclusiveLock on pg_subscription as we do in > > > DropSubscription() to ensure that concurrent transactions can't drop > > > the slot? Will that help in solving the above scenario? > > > > If we create/stop the slot during DDL, how do we support rollback DDLs? > > > > We will prevent changing this setting in a transaction block as we > already do for slot related case. See use of > PreventInTransactionBlock() in subscriptioncmds.c. > On further thinking, even if we prevent this command in a transaction block, there is still a small chance of rollback. Say, we created the slot as the last operation after making database changes, but still, the transaction can fail in the commit code path. So, it is still not bulletproof. However, we already create a remote_slot at the end of CREATE SUBSCRIPTION, so, if by any chance the transaction fails in the commit code path, we will end up having a dangling slot on the remote node. The same can happen in the DROP SUBSCRIPTION code path as well. We can follow that or the other option is to allow creation of the slot by the backend and let drop be handled by the launcher which can even take care of dangling slots. However, I feel it will be better to give the responsibility to the launcher for creating and dropping the slot as the patch is doing and use the FullTransactionId for each worker. What do you think? -- With Regards, Amit Kapila.
Re: Psql meta-command conninfo+
Hi, After looking at this ever more today, I think "Server Parameter Settings" > is confusing as well. I think "Connection Status" instead of > "Current Status" as is defined in v36 will work better. > This way we will have "Connection Info" and "Connection Status". > Connection Status will reflect the values of specific parameters > that the server reports. > Noted. > Including all the parameters in [1] under > "Server Parameter Settings" (or "Connection Status") > seems like the easy choice here. Some may not be as useful as > others, but I don't think we should pick and choose either. > Maybe someone else has other thoughts about this? > Sure, let's wait for others' opinions. > We can include role by marking the "role" guc with > the GUC_REPORT flag in guc_tables.c. I really think > without it, the is_superuser field will be incomplete. > This is because either "role" or "session authorization" > will change the is_superuser. > Thanks! I will take a look. > A thought also, that if we do choose to report all the parameters > in [1], it should be coded in a more dynamic way. Maybe loop > through the conn->pstatus list? For example I see "search_path" > will be added to the list in the next release. > If we loop through conn->pstatus, we will be bypassing the official API. The list is of type pgParameterStatus, which is an internal struct defined in libpq-int.h. As the file's header warns, including this file can lead to issues. Or am I missing something? Regards, Hunaid Sohail
Re: Fix bank selection logic in SLRU
On 2024-Dec-27, Andrey Borodin wrote: > > On 19 Dec 2024, at 20:48, Yura Sokolov wrote: > > > > Here's version with type change bits16 -> uint16 > > Thanks! This version looks good to me. I’ll mark the CF entry as RfC. Thank you, I have pushed this. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Once again, thank you and all of the developers for your hard work on PostgreSQL. This is by far the most pleasant management experience of any database I've worked on." (Dan Harris) http://archives.postgresql.org/pgsql-performance/2006-04/msg00247.php
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size
On Mon, Jan 6, 2025 at 3:22 AM Peter Smith wrote: > > Hi Shubham. > > The patch v6-0001 LGTM. > > OTOH, if you want to be picky, the docs wording could be slightly > modified to be more consistent with the coded warning message. > > CURRENT > Replication failures can occur if required WAL files are prematurely > deleted. To prevent this, the source server must set linkend="guc-max-slot-wal-keep-size"/> to -1, > ensuring WAL files are not automatically removed. > > SUGGESTION > Replication failures can occur if required WAL files are missing. To > prevent this, the source server must set linkend="guc-max-slot-wal-keep-size"/> to -1 to > ensure that required WAL files are not prematurely removed. > > ~~~ > > See the attached NITPICKS diff if you want to make this change. > I have used your suggestion and updated the 'pg_createsubscriber' documentation accordingly. The attached Patch contains the suggested change. Thanks and regards, Shubham Khanna. v7-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patch Description: Binary data
Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size
On Mon, Jan 6, 2025 at 7:59 AM Hayato Kuroda (Fujitsu) wrote: > > Dear Shubham, > > Thanks for creating a patch. I have one comment about it. > > check_publisher() assumed that the SQL function > `pg_catalog.current_setting('max_slot_wal_keep_size')` > will return the numeric, but it just return the text representation. I.e., if > the parameter is > set to 10MB, the function returns like below: > > ``` > postgres=# SELECT * FROM pg_catalog.current_setting('max_slot_wal_keep_size'); > current_setting > - > 10MB > (1 row) > ``` > > Your patch can work well because atoi() ignores the latter part of the string, > e.g., "10MB" is converted to "10", but this is not clean. I suggest either of > 1) accepting the value as the string, or 2) using an SQL function > pg_size_bytes() > to get max_slot_wal_keep_size. > I have fixed the given comment. The v7 version Patch attached at [1] has the changes for the same. [1] - https://www.postgresql.org/message-id/CAHv8Rj%2BSxmBgz-vw5f7meRzEh6pgp9YhAjL5BPzTpevu31rY7Q%40mail.gmail.com Thanks and regards, Shubham Khanna.
Orphaned users in PG16 and above can only be managed by Superusers
Hi All, Starting from PG16, it seems that orphaned users can only be managed by superusers. For example, if userA creates userB, and userB creates userC, then both userB (the parent of userC) and userA (the grandparent of userC) would typically have the ability to manage/administer userC. However, if userB is dropped, userA (the grandparent of userC) loses the ability to administer userC as well. This leads to a situation where only superusers can manage userC. Shouldn't userA retain the permission to manage userC even if userB is removed? Otherwise, only superusers would have the authority to administer userC (the orphaned user in this case), which may not be feasible for cloud environments where superuser access is restricted. -- With Regards, Ashutosh Sharma.
Re: Several buildfarm animals fail tests because of shared memory error
Hello Robins, 22.12.2024 09:27, Robins Tharakan wrote: - The only info about leafhopper may be relevant is that it's one of the newest machines (Graviton4) so it comes with a recent hardware / kernel / stock gcc 11.4.1. Could you please take a look at leafhopper. which is producing weird test failures rather often? For example, https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2024-12-16%2023%3A43%3A03 - REL_15_STABLE https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2024-12-21%2022%3A18%3A04 - REL_16_STABLE https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2025-01-02%2009%3A21%3A04 - REL_17_STABLE --- /home/bf/proj/bf/build-farm-17/REL_16_STABLE/pgsql.build/src/test/regress/expected/select_parallel.out 2024-12-21 22:18:03.844773742 + +++ /home/bf/proj/bf/build-farm-17/REL_16_STABLE/pgsql.build/src/test/recovery/tmp_check/results/select_parallel.out 2024-12-21 22:23:28.264849796 + @@ -551,7 +551,7 @@ -> Nested Loop (actual rows=98000 loops=1) -> Seq Scan on tenk2 (actual rows=10 loops=1) Filter: (thousand = 0) - Rows Removed by Filter: 9990 + Rows Removed by Filter: 9395 Or: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2024-12-18%2023%3A35%3A04 - master https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2025-01-02%2009%3A22%3A04 - master https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2025-01-08%2007%3A38%3A03 - master # Failed test 'regression tests pass' # at t/027_stream_regress.pl line 95. # got: '256' # expected: '0' # Looks like you failed 1 test of 9. [23:42:59] t/027_stream_regress.pl ... ... diff -U3 /home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/regress/expected/memoize.out /home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/recovery/tmp_check/results/memoize.out --- /home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/regress/expected/memoize.out 2024-12-18 23:35:04.318987642 + +++ /home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/recovery/tmp_check/results/memoize.out 2024-12-18 23:42:24.806028990 + @@ -179,7 +179,7 @@ Hits: 980 Misses: 20 Evictions: Zero Overflows: 0 Memory Usage: NkB -> Seq Scan on tenk1 t2 (actual rows=1 loops=N) Filter: ((t1.twenty = unique1) AND (t1.two = two)) - Rows Removed by Filter: + Rows Removed by Filter: 9775 (12 rows) Maybe you could try to reproduce such failures without buildfarm client, just by running select_parallel, for example, with the attached patch applied. I mean running `make check` with parallel_schedule like: ... # -- # Run these alone so they don't run out of parallel workers # select_parallel depends on create_misc # -- test: select_parallel test: select_parallel test: select_parallel (e.g. with 100 repetitions) Or TESTS="test_setup copy create_misc create_index $(printf "select_parallel %.0s" {1..100})" make check-tests Best regards, Alexander Lakhin Neon (https://neon.tech)diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out index a809036453..4398d3b17e 100644 --- a/src/test/regress/expected/select_parallel.out +++ b/src/test/regress/expected/select_parallel.out @@ -1,3 +1,6 @@ +SET client_min_messages TO 'warning'; +DROP FUNCTION IF EXISTS sp_parallel_restricted CASCADE; +RESET client_min_messages; -- -- PARALLEL -- diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql index 71a75bc86e..4e5a7471e1 100644 --- a/src/test/regress/sql/select_parallel.sql +++ b/src/test/regress/sql/select_parallel.sql @@ -1,3 +1,7 @@ +SET client_min_messages TO 'warning'; +DROP FUNCTION IF EXISTS sp_parallel_restricted CASCADE; +RESET client_min_messages; + -- -- PARALLEL --
Re: ecpg command does not warn COPY ... FROM STDIN;
On 2025/01/09 0:42, Tom Lane wrote: Fujii Masao writes: On 2025/01/08 23:04, Ryo Kanbayashi wrote: But it is not working. ecpg command fails to notice though code like above exits on pgc code. This issue seems to have been introduced in commit 3d009e45bd. Indeed :-( As for COPY FROM STDOUT, while it's possible, mentioning it might be confusing since it's not official syntax. So I'm not sure if this is good idea. There's another problem: the correct syntax is COPY TO STDOUT, and that won't trigger this warning either. ISTM that ecpg supports COPY TO STDOUT and includes the regression test "copystdout" for it. No? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: pgindent exit status if a file encounters an error
On 2024-06-28 Fr 8:35 AM, Ashutosh Bapat wrote: On Wed, Jun 26, 2024 at 8:54 PM Tom Lane wrote: Ashutosh Bapat writes: > The usage help mentions exit code 2 specifically while describing --check > option but it doesn't mention exit code 1. Neither does the README. So I > don't think we need to document exit code 3 anywhere. Please let me know if > you think otherwise. I think we should have at least a code comment summarizing the possible exit codes, along the lines of # Exit codes: # 0 -- all OK # 1 -- could not invoke pgindent, nothing done # 2 -- --check mode and at least one file requires changes # 3 -- pgindent failed on at least one file Thanks. Here's a patch with these lines. In an offline chat Andrew mentioned that he expects more changes in the patch and he would take care of those. Will review and test his patch. I forget now what I was intending here, so I have committed this with minor tweaks. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Virtual generated columns
On 04.12.24 05:55, jian he wrote: On Fri, Nov 29, 2024 at 6:13 PM Peter Eisentraut wrote: - Added support for virtual columns in trigger column lists. (For that, I renamed ExecInitStoredGenerated() to ExecInitGenerated(), which handles the computation of ri_extraUpdatedCols.) why not duplicate some code from ExecInitStoredGenerated to ExecGetExtraUpdatedCols? This answers itself: I'd rather not duplicate code. I don't see that as an improvement. * now the expression is that something initiated for the virtual generated column. which may not be necessary for virtual columns. let's make ResultRelInfo->ri_GeneratedExprsI, ResultRelInfo->ri_GeneratedExprsU be NULL for virtual columns. currently it may look like this: (gdb) p resultRelInfo->ri_GeneratedExprsU $20 = (ExprState **) 0x34f9638 (gdb) p resultRelInfo->ri_GeneratedExprsU[0] $21 = (ExprState *) 0x0 (gdb) p resultRelInfo->ri_GeneratedExprsU[1] $22 = (ExprState *) 0x0 (gdb) p resultRelInfo->ri_GeneratedExprsU[2] $23 = (ExprState *) 0x40 I have fixed that in v11. * ExecInitStoredGenerated main used in ExecComputeStoredGenerated. * we also need to slightly change ExecInitGenerated's comments. also fixed * in InitResultRelInfo, do we need explicit set ri_Generated_valid to false? Doesn't seem necessary. The struct is initialized to zero at the beginning.
Re: Virtual generated columns
On 11.12.24 07:49, jian he wrote: On Fri, Nov 29, 2024 at 6:01 PM Peter Eisentraut wrote: The purpose of check_modified_virtual_generated() for trigger functions written in C. The prevent someone from inserting real values into the trigger tuples, because they would then be processed by the rest of the system, which would be incorrect. Higher-level languages such as plpgsql should handle that themselves, by preventing setting generated columns in trigger functions. The presence of check_modified_virtual_generated() is still a backstop for those, but shouldn't really be necessary. please check the attached patch. * remove check_modified_virtual_generated. * using heap_modify_tuple_by_cols in ExecBRInsertTriggers, ExecBRUpdateTriggers to overwrite virtual generated columns value to null. and it's not complicated. so that trigger behavior for stored and virtual will be more aligned I have integrated that into v11. I agree it's not complicated and it's better to keep the behavior aligned. I kept the function check_modified_virtual_generated() but now it just modifies the tuple, using your code, instead of erroring. That avoids having to write the same code twice. I don't understand the purpose of the change in pl_exec.c.
Re: Virtual generated columns
On 16.12.24 15:34, jian he wrote: hi. some minor issues... SET EXPRESSION AS This form replaces the expression of a generated column. Existing data in the column is rewritten and all the future changes will apply the new generation expression. the second sentence seems not to apply to a virtual generated column? Tweaked the wording in v11. doc/src/sgml/ref/alter_table.sgml seems does not explicitly mention the difference of ALTER TABLE tp ALTER COLUMN b SET EXPRESSION AS (a * 3); ALTER TABLE ONLY tp ALTER COLUMN b SET EXPRESSION AS (a * 3); ? the first one will recurse to the child tables and replace any generated expression in the child table for the to be altered column, the latter won't. This is implied by the general meaning of the ONLY clause in ALTER TABLE. This applies to all ALTER TABLE actions. Is there anything we need to explain specifically for this action? CheckAttributeType, we can change it to <<< else if (att_typtype == TYPTYPE_DOMAIN) { if ((flags & CHKATYPE_IS_VIRTUAL) && DomainHasConstraints(atttypid)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("virtual generated column \"%s\" cannot have a domain type", attname))); } <<< so we can support the domain without any constraints for now. (I don't have a huge opinion though). I don't think this would be correct, since constraints could be added to the domain later. ALTER COLUMN SET NOT NULL, if already not-null, then it will become a no-op. Similarly if old and new generated expressions are the same, ATExecSetExpression can return InvalidObjectAddress, making it a no-op. For example, in ATExecSetExpression, can we make the following ALTER TABLE a no-op? CREATE TABLE gtest20 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 3) VIRTUAL ); ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3); since ATExecSetExpression is not recursive, Each input argument (AlteredTableInfo *tab) is separated for partitioned tables and partitions. so does AlteredTableInfo->newvals, AlteredTableInfo->rewrite information. so for no-op ATExecSetExpression return InvalidObjectAddress will also work for partitioned tables, inheritance. I don't know why we would need to make this a no-op. I mean, we also don't make UPDATE ... SET x = x a no-op. attached file trying to do that. While testing it, I found out there is no test case for ALTER COLUMN SET EXPRESSION for inheritance cases. even though it works. in src/test/regress/sql/generated_virtual.sql after line 161, we can add following tests: <<< ALTER TABLE ONLY gtest1 ALTER COLUMN b SET EXPRESSION AS (a * 10); select tableoid::regclass, * from gtest1; ALTER TABLE gtest1 ALTER COLUMN b SET EXPRESSION AS (a * 100); select tableoid::regclass, * from gtest1; <<< There was already a test for this: +-- alter only parent's and one child's generation expression +ALTER TABLE ONLY gtest_parent ALTER COLUMN f3 SET EXPRESSION AS (f2 * 4); +ALTER TABLE gtest_child ALTER COLUMN f3 SET EXPRESSION AS (f2 * 10); Is there anything you think this doesn't cover?
Re: Virtual generated columns
On 03.12.24 15:15, jian he wrote: -- check constraints CREATE TABLE gtest20 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) VIRTUAL CHECK (b < 50)); INSERT INTO gtest20 (a) VALUES (10); -- ok INSERT INTO gtest20 (a) VALUES (30); -- violates constraint ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 100); -- violates constraint ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3); -- ok - The above content is in src/test/regress/sql/generated_virtual.sql, the last two query comments seem to conflict with the error message for now. Fixed the comment in the test in patch v11. i add some regress tests for your v10 changes in src/backend/commands/statscmds.c. please check attached. Added to patch v11. the sql tests, "sanity check of system catalog" maybe place it to the end of the sql file will have better chance of catching some error. for virtual, we can also check attnotnull, atthasdef value. like: SELECT attrelid, attname, attgenerated FROM pg_attribute WHERE attgenerated IN ('v') and (attnotnull or not atthasdef); I moved the existing check to the bottom, as you suggest. I don't understand what the purpose of testing attnotnull is. That is independent of attgenerated, I think.
Re: ecpg command does not warn COPY ... FROM STDIN;
Fujii Masao writes: > On 2025/01/09 0:42, Tom Lane wrote: >> There's another problem: the correct syntax is COPY TO STDOUT, >> and that won't trigger this warning either. > ISTM that ecpg supports COPY TO STDOUT and includes the regression test > "copystdout" for it. No? Oh right. (Pokes at it...) It looks like the backend accepts "FROM STDOUT" as a synonym for "FROM STDIN", so that's why this is checking for both spellings. But I agree it'd be better for the error message to only use the standard spelling. Also I tried regression=# copy int4_tbl from program stdin; ERROR: STDIN/STDOUT not allowed with PROGRAM So we probably don't need to bother with adjusting the check to allow that. regards, tom lane
Re: use a non-locking initial test in TAS_SPIN on AArch64
On Tue, Oct 22, 2024 at 02:54:57PM -0500, Nathan Bossart wrote: > My colleague Salvatore Dipietro (CC'd) sent me a couple of profiles that > showed an enormous amount of s_lock() time going to the > __sync_lock_test_and_set() call in the AArch64 implementation of tas(). > Upon closer inspection, I noticed that we don't implement a custom > TAS_SPIN() for this architecture, so I quickly hacked together the attached > patch and ran a couple of benchmarks that stressed the spinlock code. I > found no discussion about TAS_SPIN() on ARM in the archives, but I did > notice that the initial AArch64 support was added [0] before x86_64 started > using a non-locking test [1]. > > These benchmarks are for a c8g.24xlarge running a select-only pgbench with > 256 clients and pg_stat_statements.track_planning enabled. > > without the patch: > > [...] > > tps = 74135.100891 (without initial connection time) > > with the patch: > > [...] > > tps = 549462.785554 (without initial connection time) Are there any objections to proceeding with this change? So far, it's been tested on a c8g.24xlarge and an Apple M3 (which seems to be too small to show any effect). If anyone has access to a larger ARM machine, additional testing would be greatly appreciated. I think it would be unfortunate if this slipped to v19. -- nathan
Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
On Wed, Dec 18, 2024 at 10:33 AM Amit Kapila wrote: Hi Amit! > On Thu, Dec 5, 2024 at 3:14 PM Jakub Wartak > wrote: > > > > One of our customers ran into a very odd case, where hot standby feedback > > backend_xmin propagation stopped working due to major (hours/days) clock > > time shifts on hypervisor-managed VMs. This happens (and is fully > > reproducible) e.g. in scenarios where standby connects and its own VM is > > having time from the future (relative to primary) and then that time goes > > back to "normal". In such situation "sends hot_standby_feedback xmin" > > timestamp messages are stopped being transferred, e.g.: > > > > 2024-12-05 02:02:35 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot > > standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0 > > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending > > write 6/E9015230 flush 6/E9015230 apply 6/E9015230 > > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot > > standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0 > > <-- clock readjustment and no further "sending hot standby feedback" > ... > > > > I can share reproduction steps if anyone is interested. This basically > > happens due to usage of TimestampDifferenceExceeds() in > > XLogWalRcvSendHSFeedback(), but I bet there are other similiar scenarios. > > > > We started to use a different mechanism in HEAD. See > XLogWalRcvSendHSFeedback(). Yes, you are correct somewhat because I was looking on REL13_STABLE, but I've taken a fresh quick look at 05a7be93558 and tested it too. Sadly, PG17 still maintains the same behavior of lack of proper backend_xmin propagation (it stops sending hot standby feedback once time on standby jumps forward). I believe this is the case because walreceiver schedules next wakeup in far far future (when the clock / now() is way ahead, see WalRcvComputeNextWakeup()), so when the clock is back to normal (resetted back -X hours/days), the next wakeup seems to be +X hours/days ahead. > > What I was kind of surprised about was the lack of recommendation for > > having primary/standby to have clocks synced when using > > hot_standby_feedback, but such a thing is mentioned for > > recovery_min_apply_delay. So I would like to add at least one sentence to > > hot_standby_feedback to warn about this too, patch attached. > > > > IIUC, this issue doesn't occur because the primary and standby clocks > are not synchronized. It happened because the clock on standby moved > backward. In PG17 it would be because the clock moved way forward too much on the standby. I don't know how it happened to that customer, but it was probably done somehow by the hypervisor in that scenario (so time wasn't slewed improperly by ntpd AFAIR, edge case, I know...) > This is quite unlike the 'recovery_min_apply_delay' where > non-synchronization of clocks between primary and standby can lead to > unexpected results. This is because we don't compare any time on the > primary with the time on standby. If this understanding is correct > then the wording proposed by your patch should be changed accordingly. .. if my understanding is correct, it is both depending on version :^) I was thinking about backpatching docs (of what is the recommended policy here? to just update new-release docs?), so I'm proposing something more generic than earlier, but it takes Your point into account - would something like below be good enough? - -Using this option requires the primary and standby(s) to have system -clocks synchronized, otherwise it may lead to prolonged risk of not -removing dead rows on primary for extended periods of time as the -feedback mechanism is based on timestamps exchanged between primary -and standby(s). - + +Using this option requires the primary and standby(s) to have system +clocks synchronized (without big time jumps), otherwise it may lead to +prolonged risk of not removing dead rows on primary for extended periods +of time as the feedback mechanism implementation is timestamp based. + -J.
Re: Reorder shutdown sequence, to flush pgstats later
On 08/01/2025 04:10, Andres Freund wrote: I instead opted to somewhat rearrange the shutdown sequence: This commit changes the shutdown sequence so checkpointer is signalled to trigger writing the shutdown checkpoint without terminating it. Once checkpointer wrote the checkpoint it will wait for a termination signal. Postmaster now triggers the shutdown checkpoint where we previously did so by terminating checkpointer. Checkpointer is terminated after all other children have been terminated, tracked using the new PM_SHUTDOWN_CHECKPOINTER PMState. In addition, the existing PM_SHUTDOWN_2 state is renamed to PM_XLOG_IS_SHUTDOWN, that seems a lot more descriptive. Sounds good. This patch is not fully polished, there are a few details I'm not sure about: - Previously the shutdown checkpoint and process exit were triggered from HandleCheckpointerInterrupts(). I could have continued doing so in the patch, but it seemed quite weird to have a wait loop in a function called HandleCheckpointerInterrupts(). Instead I made the main loop in CheckpointerMain() terminate if checkpointer was asked to write the shutdown checkpoint or exit - In process_pm_pmsignal() the code now triggers a PostmasterStateMachine() if checkpointer signalled ShutdownXLOG having completed. We didn't really have a need for that before. process_pm_child_exit() does call PostmasterStateMachine(), but somehow calling it in process_pm_pmsignal() feels slightly off Looks good to me. - I don't love the naming of the various PMState values (existing and new), but a larger renaming should probably be done separately? We currently have PM_WAIT_BACKENDS and PM_WAIT_DEAD_END. Maybe follow that example and name all the shutdown states after what you're waiting for: PM_WAIT_BACKENDS, /* waiting for live backends to exit */ PM_WAIT_XLOG_SHUTDOWN, /* waiting for checkpointer to do shutdown ckpt */ PM_WAIT_XLOG_ARCHIVAL, /* waiting for archiver and walsenders to finish */ PM_WAIT_DEAD_END, /* waiting for dead-end children to exit */ PM_WAIT_CHECKPOINTER, /* waiting for checkpointer to shut down */ PM_NO_CHILDREN,/* all important children have exited */ - There's logging in ShutdownXLOG() that's only triggered if called from checkpointer. Seems kinda odd - shouldn't we just move that to checkpointer.c? What logging do you mean? The "LOG: shutting down" message? No objections to moving it, although it doesn't bother me either way. The first two patches are just logging improvements that I found helpful to write this patch: 0001: Instead of modifying pmState directly, do so via a new function UpdatePMState(), which logs the state transition at DEBUG2. Requires a helper to stringify PMState. I've written one-off versions of this patch quite a few times. Without knowing in what state postmaster is in, it's quite hard to debug the state machine. +1 elog(DEBUG1, "updating PMState from %d/%s to %d/%s", pmState, pmstate_name(pmState), newState, pmstate_name(newState)); I think the state names are enough, no need to print the numerical values. 0002: Make logging of postmaster signalling child processes more consistent I found it somewhat hard to understand what's happening during state changes without being able to see the signals being sent. While we did have logging in SignalChildren(), we didn't in signal_child(), and most signals that are important for the shutdown sequence are sent via signal_child(). +1 I don't think the cases where DEBUG2 or DEBUG4 are used currently are very well thought out. To save a few lines of code, I'd be happy with merging signal_child_ext() and signal_child() and just always use DEBUG2 or DEBUG4. Or DEBUG3 as a compromise :-). The next is a small prerequisite: 0003: postmaster: Introduce variadic btmask_all_except() I proposed this in another thread, where I also needed btmask_all_except3() but then removed the only call to btmask_all_except2(). Instead of adding/removing code, it seems better to just use a variadic function. Nice. A variadic btmask_add() might be handy too. And then 0004, the reason for this thread. Overall, looks good to me. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Moving the vacuum GUCs' docs out of the Client Connection Defaults section
On 07.01.25 18:31, Melanie Plageman wrote: On Tue, Jan 7, 2025 at 12:15 PM Melanie Plageman wrote: Cool, I've attached a patch to do this. I left a few of the GUCs under Resource Consumption (like autovacuum_work_mem and vacuum_buffer_usage_limit) where they are because it seemed appropriate. This is my first docs patch that introduces new sections and such, so I'm not sure I got the indentation 100% correct (I, of course, tried to follow conventions). Oh, one thing I forgot to say. Though I increased the indentation of some of the subsections that I moved, I didn't rewrap the lines because they were already not wrapped to 78. I can do this, but I can't tell from the existing docs what text width the paragraphs of text are supposed to be wrapped to. For a patch that moves things around like this, I would leave everything as is and not rewrap. That makes it easier to follow what's going on with "git diff -w", "git show -w" etc. In .dir-locals.el, there is this configuration for Emacs: (nxml-mode . ((fill-column . 78) (indent-tabs-mode . nil))) So that's one data point about what the line length should be.
Re: Non-text mode for pg_dumpall
Hi all, On Wed, 8 Jan 2025 at 00:34, Mahendra Singh Thalor wrote: > > On Mon, 6 Jan 2025 at 23:05, Nathan Bossart wrote: > > > > On Thu, Jan 02, 2025 at 02:05:13AM +0530, Mahendra Singh Thalor wrote: > > > Here, I am attaching an updated patch. I fixed some bugs of v01 patch and > > > did some code cleanup also. > > > > Thank you for picking this up! I started to review it, but the > > documentation changes didn't build, and a few tests in check-world are > > failing. Would you mind resolving those issues? Also, if you haven't > > already, please add an entry to the next commitfest [0] to ensure that 1) > > this feature is tracked and 2) the automated tests will run. > > Thanks Nathan for the quick response. > > I fixed bugs of documentation changes and check-world in the latest patch. Now docs are building and check-world is passing. > > I added entry into commitfest for this patch.[0] > > > > > + if (dbfile) > > + { > > + printfPQExpBuffer(&cmd, "\"%s\" %s %s", pg_dump_bin, > > + dbfile, create_opts); > > + appendPQExpBufferStr(&cmd, " -F d "); > > + } > > > > Have you given any thought to allowing a directory of custom format files, > > as discussed upthread [1]? Perhaps that is better handled as a follow-up > > patch, but it'd be good to understand the plan, anyway. > > I will make these changes and will test. I will update my findings after doing some testing. In the latest patch, I added dump and restoring for directory/custom/tar/plain formats. Please consider this patch for review and testing. *Design*: When we give --format=d|c|t then we are dumping all global sql commands in global.dat in plain sql format and we are making a map.dat file with dbname and dboid. For each database, we are making separate subdirectory with dboid under databases directory and dumping as per archive format(d|c|t). While restoring, first we are restoring all global sql commands from global.dat and then we are restoring one by one all databases. As we are supporting --exclude-database with pg_dumpall, the same we are supporting with pg_restore also to skip restoring on some specified database patterns. If we want to restore a single database, then we can specided particular subdirectory from the databases folder. To get file name, we refer dbname into map.file. *TODO*: Now I will work on test cases for these new added options to the pg_dumpall and pg_restore. Here, I am attaching the v04 patch for testing and review. > > Apart from these bugs, I added code to handle --exclude-database= PATTERN. Earlier I was using NAME only to skip databases for restore. > > TODO: .pl test cases for new added options. > > Here, I am attaching an updated patch for review and feedback. > > > > > [0] https://commitfest.postgresql.org > > [1] https://postgr.es/m/CABUevExoQ26jo%2BaQ9QZq%2BUMA1aD6gfpm9xBnh_t5e0DhaCeRYA%40mail.gmail.com > > > > -- > > nathan > > -- > Thanks and Regards > Mahendra Singh Thalor > EnterpriseDB: http://www.enterprisedb.com -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com v04-pg_dumpall-with-directory-format-and-restore-08_jan.patch Description: Binary data
Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?
Hi, On Fri, Jan 03, 2025 at 12:30:14PM -0600, Sami Imseih wrote: > While reviewing patch [1], I realized that the DEBUG1 message > for CREATE INDEX could do better in providing information > about parallel workers launched. Currently, the message just > shows how many workers are planned, but a user may want > to ensure that they have the appropriate number of workers > launched as well when they enable DEBUG1 logging. Yeah, one could see how many workers are currently running but would not be able to figure out once the index is created. So adding the info in the log makes sense to me. > I prepared a simple patch, attached, for this. The log message matches > the format > used in VACUUM VERBOSE ( for consistency sake ). A few random comments: === 1 s/parallel vacuum workers for index vacuuming/parallel workers for index creation/? (2 times) === 2 - (errmsg_internal("building index \"%s\" on table \"%s\" with request for %d parallel workers", + (errmsg_internal("building index \"%s\" on table \"%s\"", I'd add "in parallel" to match its counterpart "serially" above. That would make it more clear in case one just look for "building index" in the log. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: ecpg command does not warn COPY ... FROM STDIN;
Kuroda-san, thank to your comment. > > I found a code validation bug in master branch. > > > > Now, ecpg does not support 'EXEC SQL COPY ... FROM STDIN ... ;' and > > code for warning it exits. > > > > https://github.com/postgres/postgres/blob/7b27f5fd36cb3270e8ac25aefd73b55 > > 2663d1392/src/interfaces/ecpg/preproc/ecpg.addons#L242-L245 > > --- > > ECPG: addon CopyStmt COPY opt_binary qualified_name opt_column_list > > copy_from opt_program copy_file_name copy_delimiter opt_with > > copy_options where_clause > > if (strcmp(@6, "from") == 0 && > > (strcmp(@7, "stdin") == 0 || strcmp(@7, "stdout") == 0)) > > mmerror(PARSE_ERROR, ET_WARNING, "COPY FROM STDIN is not > > implemented"); > > --- > > > > But it is not working. > > ecpg command fails to notice though code like above exits on pgc code. > > Good catch. I have a comment about the fix. > > The parser accepts a statement like "COPY ... FROM STDOUT", and ISTM it is > not either > implemented yet. However, the warning message only mentions STDIN case even > STDOUT > is specified. > > EXEC SQL COPY foo FROM STDOUT; > -> > WARNING: COPY FROM STDIN is not implemented > > I feel we can change like "COPY FROM STDIN/STDOUT...", or prepare two > messages. > Thought? I think your proposed fix is better too. So, I modified the patch. With new patch, warning message is changed like below :) ryo@DESKTOP-IOASPN6:~/work/postgres/src$ ../master/bin/ecpg copy_from_should_be_warned.pgc copy_from_should_be_warned.pgc:24: WARNING: COPY FROM STDIN/STDOUT is not implemented ryo@DESKTOP-IOASPN6:~/work/postgres/src$ -- Best regards, Ryo Kanbayashi https://github.com/ryogrid On Wed, Jan 8, 2025 at 10:35 PM Hayato Kuroda (Fujitsu) wrote: > > Dear Kanbayashi-san, > > > I found a code validation bug in master branch. > > > > Now, ecpg does not support 'EXEC SQL COPY ... FROM STDIN ... ;' and > > code for warning it exits. > > > > https://github.com/postgres/postgres/blob/7b27f5fd36cb3270e8ac25aefd73b55 > > 2663d1392/src/interfaces/ecpg/preproc/ecpg.addons#L242-L245 > > --- > > ECPG: addon CopyStmt COPY opt_binary qualified_name opt_column_list > > copy_from opt_program copy_file_name copy_delimiter opt_with > > copy_options where_clause > > if (strcmp(@6, "from") == 0 && > > (strcmp(@7, "stdin") == 0 || strcmp(@7, "stdout") == 0)) > > mmerror(PARSE_ERROR, ET_WARNING, "COPY FROM STDIN is not > > implemented"); > > --- > > > > But it is not working. > > ecpg command fails to notice though code like above exits on pgc code. > > Good catch. I have a comment about the fix. > > The parser accepts a statement like "COPY ... FROM STDOUT", and ISTM it is > not either > implemented yet. However, the warning message only mentions STDIN case even > STDOUT > is specified. > > EXEC SQL COPY foo FROM STDOUT; > -> > WARNING: COPY FROM STDIN is not implemented > > I feel we can change like "COPY FROM STDIN/STDOUT...", or prepare two > messages. > Thought? > > Best regards, > Hayato Kuroda > FUJITSU LIMITED > copy_from_stdin_no_warning2.patch Description: Binary data
Re: AIO v2.0
On Mon, Jan 6, 2025 at 5:28 PM Andres Freund wrote: > > Hi, > > On 2024-12-19 17:29:12 -0500, Andres Freund wrote: > > > Not about patch itself, but questions about related stack functionality: > > > > > > > > > > > > 7. Is pg_stat_aios still on the table or not ? (AIO 2021 had it). Any > > > hints > > > on how to inspect real I/O calls requested to review if the code is > > > issuing > > > sensible calls: there's no strace for uring, or do you stick to DEBUG3 or > > > perhaps using some bpftrace / xfsslower is the best way to go ? > > > > I think we still want something like it, but I don't think it needs to be in > > the initial commits. > > After I got this question from Thomas as well, I started hacking one up. > > What information would you like to see? > > Here's what I currently have: .. > ├─[ RECORD 2 ]───┼┤ > │ pid│ 358212 │ > │ io_id │ 2051 │ > │ io_generation │ 4199 │ > │ state │ IN_FLIGHT │ > │ operation │ read │ > │ offset │ 511967232 │ > │ length │ 262144 │ > │ subject│ smgr │ > │ iovec_data_len │ 32 │ > │ raw_result │ (null) │ > │ result │ UNKNOWN│ > │ error_desc │ (null) │ > │ subject_desc │ blocks 1373216..1373247 in file "base/5/16388" │ > │ flag_sync │ f │ > │ flag_localmem │ f │ > │ flag_buffered │ t │ Cool! It's more than enough for me in future, thanks! > I didn't think that pg_stat_* was quite the right namespace, given that it > shows not stats, but the currently ongoing IOs. I am going with pg_aios for > now, but I don't particularly like that. If you are looking for other proposals: * pg_aios_progress ? (to follow pattern of pg_stat_copy|vaccuum_progress?) * pg_debug_aios ? * pg_debug_io ? > I think we'll want a pg_stat_aio as well, tracking things like: > > - how often the queue to IO workes was full > - how many times we submitted IO to the kernel (<= #ios with io_uring) > - how many times we asked the kernel for events (<= #ios with io_uring) > - how many times we had to wait for in-flight IOs before issuing more IOs If I could dream of one thing that would be 99.9% percentile of IO response times in milliseconds for different classes of I/O traffic (read/write/flush). But it sounds like it would be very similiar to pg_stat_io and potentially would have to be per-tablespace/IO-traffic(subject)-type too. AFAIU pg_stat_io has improper structure to have that there. BTW: before trying to even start to compile that AIO v2.2* and responding to the previous review, what are You looking interested to hear the most about it so that it adds some value ? Any workload specific measurements? just general feedback, functionality gaps? Integrity/data testing with stuff like dm-dust, dm-flakey, dm-delay to try the error handling routines? Some kind of AIO <-> standby/recovery interactions? * - btw, Date: 2025-01-01 04:03:33 - I saw what you did there! so let's officially recognize the 2025 as the year of AIO in PG, as it was 1st message :D -J.
Re: Reorder shutdown sequence, to flush pgstats later
Hi, On Wed, Jan 08, 2025 at 03:10:03PM +0200, Heikki Linnakangas wrote: > On 08/01/2025 04:10, Andres Freund wrote: > > > elog(DEBUG1, "updating PMState from %d/%s to %d/%s", > > pmState, pmstate_name(pmState), newState, > > pmstate_name(newState)); > > I think the state names are enough, no need to print the numerical values. +1 > > 0002: Make logging of postmaster signalling child processes more consistent > > > >I found it somewhat hard to understand what's happening during state > >changes without being able to see the signals being sent. While we > > did > >have logging in SignalChildren(), we didn't in signal_child(), and > > most > >signals that are important for the shutdown sequence are sent via > >signal_child(). > > +1 Random comments: === 1 +static const char * +pm_signame(int signal) +{ +#define PM_CASE(state) case state: return #state + switch (signal) What about? " #define PM_SIGNAL_CASE(sig) case sig: return #sig " to better reflect its context. === 2 + default: + /* all signals sent by postmaster should be listed here */ + Assert(false); + return "(unknown)"; + } +#undef PM_CASE + pg_unreachable(); Shouldn't we remove the "return "(unknown)"? (if not the pg_unreachable() looks like dead code). > I don't think the cases where DEBUG2 or DEBUG4 are used currently are very > well thought out. To save a few lines of code, I'd be happy with merging > signal_child_ext() and signal_child() +1 > > The next is a small prerequisite: > > > > 0003: postmaster: Introduce variadic btmask_all_except() > > > >I proposed this in another thread, where I also needed > >btmask_all_except3() but then removed the only call to > >btmask_all_except2(). Instead of adding/removing code, it seems > > better > >to just use a variadic function. LGTM. > Nice. A variadic btmask_add() might be handy too. > > > And then 0004, the reason for this thread. Did not give it a detailled review, but IIUC it now provides this flow: PM_SHUTDOWN->PM_XLOG_IS_SHUTDOWN->PM_WAIT_DEAD_END->PM_SHUTDOWN_CHECKPOINTER->PM_NO_CHILDREN and that looks good to me to fix the issue. A few random comments: === 3 + postmaster.c will only + * signal checkpointer to exit after all processes that could emit stats + * have been shut down. s/postmaster.c/PostmasterStateMachine()/? === 4 + * too. That allows checkpointer to perform some last bits of of Typo "of of" I'll give 0004 a closer look once it leaves the WIP state but want to share that it looks like a good way to fix the issue. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pg_settings.unit and DefineCustomXXXVariable
Matthias van de Meent writes: > On Wed, 8 Jan 2025 at 11:13, Luca Ferrari wrote: >> I need to define a few GUCs, and for that purpose I'm using >> DefineCustomXXXVariable functions that provide hooks for assignment, >> show and check. However it is not clear to me if it is possible to >> populate the unit column in pg_settings for the custom defined >> variables, and if so, how. > DefineCustomXXXVariable has a 'flags' argument, into which GUC_UNIT_* > can be or-ed. Correct. contrib/auto_explain has some examples. > AFAIK, the options provided in guc.h are the only kinds of units > supported and available, and I think only integer variables actually > support the unit conversion implied by the unit options. Nowadays we allow units for float GUCs as well. (Originally it was indeed just for integers. But at some point we decided an existing integer value with units needed to become float, and then we had to back-fill the implementation for compatibility's sake.) regards, tom lane
Re: ecpg command does not warn COPY ... FROM STDIN;
On 2025/01/08 23:04, Ryo Kanbayashi wrote: But it is not working. ecpg command fails to notice though code like above exits on pgc code. This issue seems to have been introduced in commit 3d009e45bd. Before that, as per my testing, ecpg successfully detected COPY FROM STDIN and reported a warning. It seems the fix will need to be back-patched to all supported versions. I feel we can change like "COPY FROM STDIN/STDOUT...", or prepare two messages. Thought? I think your proposed fix is better too. So, I modified the patch. As for COPY FROM STDOUT, while it's possible, mentioning it might be confusing since it's not official syntax. So I'm not sure if this is good idea. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: CREATE TABLE NOT VALID for check and foreign key
Hello, On 2025-Jan-07, Yasuo Honda wrote: > I'd like PostgreSQL to raise errors and/or warnings for the NOT VALID > check constraint for CREATE TABLE. > Ruby on Rails supports creating check constraints with the NOT VALID > option and I was not aware that it is just ignored until > https://github.com/rails/rails/issues/53732 issue is reported. Thanks. I left a comment there. I think raising a WARNING might work. > Rails has implemented a kind of workaround by not dumping the NOT > VALID option, but it does not help for the first execution. > https://github.com/rails/rails/pull/53735 Yeah, that's probably not ideal. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "You're _really_ hosed if the person doing the hiring doesn't understand relational systems: you end up with a whole raft of programmers, none of whom has had a Date with the clue stick." (Andrew Sullivan) https://postgr.es/m/20050809113420.gd2...@phlogiston.dyndns.org
RE: ecpg command does not warn COPY ... FROM STDIN;
Dear Kanbayashi-san, > I found a code validation bug in master branch. > > Now, ecpg does not support 'EXEC SQL COPY ... FROM STDIN ... ;' and > code for warning it exits. > > https://github.com/postgres/postgres/blob/7b27f5fd36cb3270e8ac25aefd73b55 > 2663d1392/src/interfaces/ecpg/preproc/ecpg.addons#L242-L245 > --- > ECPG: addon CopyStmt COPY opt_binary qualified_name opt_column_list > copy_from opt_program copy_file_name copy_delimiter opt_with > copy_options where_clause > if (strcmp(@6, "from") == 0 && > (strcmp(@7, "stdin") == 0 || strcmp(@7, "stdout") == 0)) > mmerror(PARSE_ERROR, ET_WARNING, "COPY FROM STDIN is not > implemented"); > --- > > But it is not working. > ecpg command fails to notice though code like above exits on pgc code. Good catch. I have a comment about the fix. The parser accepts a statement like "COPY ... FROM STDOUT", and ISTM it is not either implemented yet. However, the warning message only mentions STDIN case even STDOUT is specified. EXEC SQL COPY foo FROM STDOUT; -> WARNING: COPY FROM STDIN is not implemented I feel we can change like "COPY FROM STDIN/STDOUT...", or prepare two messages. Thought? Best regards, Hayato Kuroda FUJITSU LIMITED
Re: Virtual generated columns
On 08.01.25 09:22, Richard Guo wrote: - Added support for ALTER TABLE ... SET EXPRESSION. When using ALTER TABLE to set expression for virtual generated columns, we don't enforce a rewrite, which means we don't have the opportunity to check whether the new values for these columns could cause an underflow or overflow. For instance, create table t (a int, b int generated always as (a) virtual); insert into t values (2147483647); # alter table t alter column b set expression as (a * 2); ALTER TABLE # select * from t; ERROR: integer out of range The same thing could occur with INSERT. As we don't compute virtual generated columns on write, we may end up inserting values that cause underflow or overflow for these columns. create table t1 (a int, b int generated always as (a * 2) virtual); insert into t1 values (2147483647); # select * from t1; ERROR: integer out of range I'm not sure if this is expected or not, so I just wanted to point it out. Yes, this is expected behavior. This also happens with a view. So it is consistent for compute-on-read objects.
Re: Separate GUC for replication origins
On Thu, Dec 19, 2024, at 10:31 AM, Peter Eisentraut wrote: > On 10.12.24 19:41, Euler Taveira wrote: > > I'm attaching a patch that adds max_replication_origins. It basically > > replaces > > all of the points that refers to max_replication_slots on the subscriber. It > > uses the same default value as max_replication_slots (10). I did nothing to > > keep the backward compatibility. I mean has a special value (like -1) that > > means same value as max_replication_slots. Is it worth it? (If not, it > > should > > be mentioned in the commit message.) > > I think some backward compatibility tweak like that would be useful. > Otherwise, the net effect of this is that most people will have to do > one more thing than before to keep their systems working. Also, all > deployment and automation scripts would have to be updated for this. > This new patch includes the special value (-1) that uses max_replication_slots instead. -- Euler Taveira EDB https://www.enterprisedb.com/ From 8a80eb6a8aa3cbfb2c0eb09ee3799e857729e205 Mon Sep 17 00:00:00 2001 From: Euler Taveira Date: Tue, 3 Sep 2024 12:10:20 -0300 Subject: [PATCH v2] Separate GUC for replication origins This feature already exists but it is provided by an existing GUC: max_replication_slots. The new GUC (max_replication_origins) defines the maximum number of replication origins that can be tracked simultaneously. The max_replication_slots was used for this purpose but it is confusing (when you are learning about logical replication) and introduces a limitation (you cannot have a small number of replication slots and a high number of subscriptions). For backward compatibility, the default is -1, indicating that the value of max_replication_slots is used instead. --- doc/src/sgml/config.sgml | 27 ++ doc/src/sgml/logical-replication.sgml | 6 +- src/backend/replication/logical/launcher.c| 4 +- src/backend/replication/logical/origin.c | 91 --- src/backend/utils/misc/guc_tables.c | 12 +++ src/backend/utils/misc/postgresql.conf.sample | 3 + src/bin/pg_basebackup/pg_createsubscriber.c | 18 ++-- .../t/040_pg_createsubscriber.pl | 4 +- src/bin/pg_upgrade/check.c| 14 +-- src/bin/pg_upgrade/t/004_subscription.pl | 14 +-- src/include/replication/origin.h | 3 + 11 files changed, 111 insertions(+), 85 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 8683f0bdf5..1f15d97055 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4493,13 +4493,6 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows to replica or higher to allow replication slots to be used. - - - Note that this parameter also applies on the subscriber side, but with - a different meaning. See - in for more - details. - @@ -5202,10 +5195,10 @@ ANY num_sync ( - max_replication_slots (integer) + + max_replication_origins (integer) -max_replication_slots configuration parameter +max_replication_origins configuration parameter in a subscriber @@ -5217,18 +5210,14 @@ ANY num_sync ( pg_replication_origin_status) -will prevent the server from starting. -max_replication_slots must be set to at least the +will prevent the server from starting. It defaults to -1, indicating +that the value of should be +used instead. This parameter can only be set at server start. + +max_replication_origins must be set to at least the number of subscriptions that will be added to the subscriber, plus some reserve for table synchronization. - - -Note that this parameter also applies on a sending server, but with -a different meaning. See -in for more -details. - diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml index 8290cd1a08..1c4b6586db 100644 --- a/doc/src/sgml/logical-replication.sgml +++ b/doc/src/sgml/logical-replication.sgml @@ -2144,9 +2144,7 @@ CONTEXT: processing remote data for replication origin "pg_16395" during "INSER Logical replication requires several configuration options to be set. Most - options are relevant only on one side of the replication. However, - max_replication_slots is used on both the publisher and - the subscriber, but it has a different meaning for each. + options are relevant only on one side of the replication. @@ -2181,7 +2179,7 @@ CONTEXT: processing remote data for replication origin "pg_16395" during "INSER Subscribers -max_replication_slots +max_replication_origins must be set to at least the number of subscriptions that will be added to the subscriber, plus some reserve for tabl
Re: ecpg command does not warn COPY ... FROM STDIN;
Fujii Masao writes: > On 2025/01/08 23:04, Ryo Kanbayashi wrote: > But it is not working. > ecpg command fails to notice though code like above exits on pgc code. > This issue seems to have been introduced in commit 3d009e45bd. Indeed :-( > As for COPY FROM STDOUT, while it's possible, mentioning it might be > confusing since it's not official syntax. So I'm not sure if this is > good idea. There's another problem: the correct syntax is COPY TO STDOUT, and that won't trigger this warning either. I'm inclined to drop the test on @5 altogether, and just check for "stdin" or "stdout" in @7. There is no variant of those that will work. (But maybe we should allow it if opt_program isn't empty?) The warning message could perhaps be written "COPY FROM STDIN/TO STDOUT is not implemented". regards, tom lane
Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?
> s/parallel vacuum workers for index vacuuming/parallel workers for index > creation/? oops, that's my oversight from copying the message from vacuum. fixed. > I'd add "in parallel" to match its counterpart "serially" above. That would > make it more clear in case one just look for "building index" in the log. good point. Below is the new output with the attached v2. postgres=# create index foo_idx1 on foo(id); DEBUG: building index "foo_idx1" on table "foo" with parallel workers DEBUG: launched 1 parallel workers for index creation (planned: 1) DEBUG: index "foo_idx1" can safely use deduplication CREATE INDEX Regards, Sami v2-0001-improve-DEBUG1-logging-of-parallel-workers-for-CR.patch Description: Binary data
Re: Fwd: Re: A new look at old NFS readdir() problems?
On Tue, Jan 7, 2025 at 01:03:05AM +, David Steele wrote: > > > I'm more concerned about the report we saw on SUSE/NFS [1]. If that > > > report is accurate it indicates this may not be something we can just > > > document and move on from -- unless we are willing to entirely drop > > > support for NFS. > > > [1] https://github.com/pgbackrest/pgbackrest/issues/1423 > > > > I installed an up-to-date OpenSUSE image (Leap 15.6) and it passes > > my "rmtree" test just fine with my NAS. The report you cite > > doesn't have any details on what the NFS server was, but I'd be > > inclined to guess that that server's filesystem lacked support > > for stable NFS cookies. > > The internal report we received might have had a similar cause. Sure seems > like a minefield for any user trying to figure out if their setup is > compliant, though. In many setups (especially production) a drop database is > rare. Will people now always get a clear error on failure? Crazy idea, but could we have initdb or postmaster start test this? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Fix a wrong errmsg in AlterRole()
On Wed, Jan 08, 2025 at 10:29:12AM -0500, Tom Lane wrote: > "=?ISO-8859-1?B?Y2NhNTUwNw==?=" writes: >> postgres=> alter group g1 drop user r1; >> ERROR: permission denied to alter role >> DETAIL: Only roles with the ADMIN option on role "g1" may add members. > >> The errmsg is "add members", but we are doing a drop. >> Here is a small patch to fix it. > > Hmm. Seems simpler to just make the errdetail say "... may add or > drop members." +1. It looks like this goes back to commit ce6b672 (v16). The error message was changed in commit de4d456, which is also a v16 change, so fortunately this has a good chance of back-patching cleanly. -- nathan
Re: jsonb_strip_nulls with arrays?
On 2024-09-17 Tu 4:53 PM, Florents Tselai wrote: We could, if we're going to do anything at all in this area. Another possibility would be to provide a second optional parameter for json{b}_strip_nulls. That's probably a better way to go. Here's a patch that adds that argument (only for jsonb; no json implementation yet) I think it looks sane. We're not stripping a top level null, which is one thing I looked out for. I think we need a json implementation as well, though. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Virtual generated columns
Peter Eisentraut writes: > On 03.12.24 15:15, jian he wrote: >> SELECT attrelid, attname, attgenerated FROM pg_attribute WHERE >> attgenerated IN ('v') and (attnotnull or not atthasdef); > I don't understand what the purpose of testing attnotnull is. That is > independent of attgenerated, I think. Does it make any sense to set NOT NULL on a generated column (virtual or otherwise, but especially virtual)? What is the system supposed to do if the expression evaluates to null? That concern generalizes to any constraint really. Even if we checked it at row storage time, there's no real guarantee that the expression is immutable enough to pass the constraint later. regards, tom lane
Re: Non-text mode for pg_dumpall
On Wed, 8 Jan 2025 at 20:07, Mahendra Singh Thalor wrote: > > Hi all, > > On Wed, 8 Jan 2025 at 00:34, Mahendra Singh Thalor wrote: > > > > On Mon, 6 Jan 2025 at 23:05, Nathan Bossart > > wrote: > > > > > > On Thu, Jan 02, 2025 at 02:05:13AM +0530, Mahendra Singh Thalor wrote: > > > > Here, I am attaching an updated patch. I fixed some bugs of v01 patch > > > > and > > > > did some code cleanup also. > > > > > > Thank you for picking this up! I started to review it, but the > > > documentation changes didn't build, and a few tests in check-world are > > > failing. Would you mind resolving those issues? Also, if you haven't > > > already, please add an entry to the next commitfest [0] to ensure that 1) > > > this feature is tracked and 2) the automated tests will run. > > > > Thanks Nathan for the quick response. > > > > I fixed bugs of documentation changes and check-world in the latest patch. > > Now docs are building and check-world is passing. > > > > I added entry into commitfest for this patch.[0] > > > > > > > > + if (dbfile) > > > + { > > > + printfPQExpBuffer(&cmd, "\"%s\" %s %s", pg_dump_bin, > > > + dbfile, create_opts); > > > + appendPQExpBufferStr(&cmd, " -F d "); > > > + } > > > > > > Have you given any thought to allowing a directory of custom format files, > > > as discussed upthread [1]? Perhaps that is better handled as a follow-up > > > patch, but it'd be good to understand the plan, anyway. > > > > I will make these changes and will test. I will update my findings after > > doing some testing. > > In the latest patch, I added dump and restoring for > directory/custom/tar/plain formats. Please consider this patch for review and > testing. > > Design: > When we give --format=d|c|t then we are dumping all global sql commands in > global.dat in plain sql format and we are making a map.dat file with dbname > and dboid. For each database, we are making separate subdirectory with dboid > under databases directory and dumping as per archive format(d|c|t). > While restoring, first we are restoring all global sql commands from > global.dat and then we are restoring one by one all databases. As we are > supporting --exclude-database with pg_dumpall, the same we are supporting > with pg_restore also to skip restoring on some specified database patterns. > If we want to restore a single database, then we can specided particular > subdirectory from the databases folder. To get file name, we refer dbname > into map.file. > > TODO: Now I will work on test cases for these new added options to the > pg_dumpall and pg_restore. > > Here, I am attaching the v04 patch for testing and review. Sorry. My mistake. v04 was the delta patch on the top of v03. Here, I am attaching the v05 patch for testing and review. > > > > > Apart from these bugs, I added code to handle --exclude-database= PATTERN. > > Earlier I was using NAME only to skip databases for restore. > > > > TODO: .pl test cases for new added options. > > > > Here, I am attaching an updated patch for review and feedback. > > > > > > > > [0] https://commitfest.postgresql.org > > > [1] > > > https://postgr.es/m/CABUevExoQ26jo%2BaQ9QZq%2BUMA1aD6gfpm9xBnh_t5e0DhaCeRYA%40mail.gmail.com > > > > > > -- > > > nathan > > > > -- > > Thanks and Regards > > Mahendra Singh Thalor > > EnterpriseDB: http://www.enterprisedb.com > > > > -- > Thanks and Regards > Mahendra Singh Thalor > EnterpriseDB: http://www.enterprisedb.com -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com v05_pg_dumpall-with-directory-tar-custom-format-08-jan.patch Description: Binary data
Re: Proposal: add new API to stringinfo
On Tue, Jan 07, 2025 at 03:57:57PM +0900, Tatsuo Ishii wrote: > So the v2 patch version is 1.3787% slower than master. Do you think we > need further inlining? If it's not too much work to coax the compiler to do so, then I don't see a strong reason to avoid it. Granted, there are probably much more expensive parts of the StringInfo support functions (e.g., palloc()), but these are hot enough code paths that IMHO it's worth saving whatever cycles we can. -- nathan
Re: Fix a wrong errmsg in AlterRole()
"=?ISO-8859-1?B?Y2NhNTUwNw==?=" writes: > postgres=> alter group g1 drop user r1; > ERROR: permission denied to alter role > DETAIL: Only roles with the ADMIN option on role "g1" may add members. > The errmsg is "add members", but we are doing a drop. > Here is a small patch to fix it. Hmm. Seems simpler to just make the errdetail say "... may add or drop members." regards, tom lane
using PGOPTIONS in meson
hi. i am wondering ways to do ``make check PGOPTIONS="-c debug_parallel_query=regress -c work_mem=50MB"`` in meson way, especially the PGOPTIONS part. I have looked at [1] and [2]. [1] https://www.postgresql.org/docs/current/regress-run.html#REGRESS-RUN-CUSTOM-SETTINGS [2] https://wiki.postgresql.org/wiki/Meson
Re: Fwd: Re: A new look at old NFS readdir() problems?
On 1/8/25 12:40, Bruce Momjian wrote: On Tue, Jan 7, 2025 at 01:03:05AM +, David Steele wrote: I'm more concerned about the report we saw on SUSE/NFS [1]. If that report is accurate it indicates this may not be something we can just document and move on from -- unless we are willing to entirely drop support for NFS. [1] https://github.com/pgbackrest/pgbackrest/issues/1423 I installed an up-to-date OpenSUSE image (Leap 15.6) and it passes my "rmtree" test just fine with my NAS. The report you cite doesn't have any details on what the NFS server was, but I'd be inclined to guess that that server's filesystem lacked support for stable NFS cookies. The internal report we received might have had a similar cause. Sure seems like a minefield for any user trying to figure out if their setup is compliant, though. In many setups (especially production) a drop database is rare. Will people now always get a clear error on failure? The error will be something like "directory is not empty" when trying to drop a database. So not very clear at all. Crazy idea, but could we have initdb or postmaster start test this? I think this test would be too expensive for postmaster start as it would require creating and deleting a large number of files (maybe multiple times). I suppose it could be a special setting but I doubt that would be popular. initdb might be more promising but clusters are frequently copied/restored to different storage so I'm not sure if it would be too helpful in the long run. Regards, -David
Re: Virtual generated columns
Em qua., 8 de jan. de 2025 às 13:14, Peter Eisentraut escreveu: > Here is a new patch version where I have gathered various pieces of > feedback and improvement suggestions that are scattered over this > thread. I hope I got them all. I will respond to the respective > messages directly to give my response to each item. > This new version you are not accepting subqueries, like previous ones. But we can create an immutable SQL function which will do the same. Wouldn't it be better to explain that on DOCs ? create table Orders(Order_ID integer not null primary key, Customer_ID integer references Customer); create function lkCustomer(integer) returns text language sql immutable as $function$select Name from Customer where Customer_ID = $1;$function$; alter table Orders add lkCustomer text generated always as (lkCustomer(Customer_ID)) stored; regards Marcos
Re: Virtual generated columns
On 08.01.25 17:38, Tom Lane wrote: Peter Eisentraut writes: On 03.12.24 15:15, jian he wrote: SELECT attrelid, attname, attgenerated FROM pg_attribute WHERE attgenerated IN ('v') and (attnotnull or not atthasdef); I don't understand what the purpose of testing attnotnull is. That is independent of attgenerated, I think. Does it make any sense to set NOT NULL on a generated column (virtual or otherwise, but especially virtual)? What is the system supposed to do if the expression evaluates to null? That concern generalizes to any constraint really. Even if we checked it at row storage time, there's no real guarantee that the expression is immutable enough to pass the constraint later. The generation expression is required to be immutable. So a table definition like a int, b int generated always as (a * 2) virtual, check (b > 0) is not very different from a int, check (a * 2 > 0) in terms of the constraint execution. The current patch does not support not-null constraints, but that's mostly because it's not implemented yet. Maybe that's what Jian was thinking about.
Re: use a non-locking initial test in TAS_SPIN on AArch64
Hi, On 2025-01-08 12:12:19 -0600, Nathan Bossart wrote: > On Tue, Oct 22, 2024 at 02:54:57PM -0500, Nathan Bossart wrote: > > My colleague Salvatore Dipietro (CC'd) sent me a couple of profiles that > > showed an enormous amount of s_lock() time going to the > > __sync_lock_test_and_set() call in the AArch64 implementation of tas(). > > Upon closer inspection, I noticed that we don't implement a custom > > TAS_SPIN() for this architecture, so I quickly hacked together the attached > > patch and ran a couple of benchmarks that stressed the spinlock code. I > > found no discussion about TAS_SPIN() on ARM in the archives, but I did > > notice that the initial AArch64 support was added [0] before x86_64 started > > using a non-locking test [1]. > > > > These benchmarks are for a c8g.24xlarge running a select-only pgbench with > > 256 clients and pg_stat_statements.track_planning enabled. > > > > without the patch: > > > > [...] > > > > tps = 74135.100891 (without initial connection time) > > > > with the patch: > > > > [...] > > > > tps = 549462.785554 (without initial connection time) > > Are there any objections to proceeding with this change? So far, it's been > tested on a c8g.24xlarge and an Apple M3 (which seems to be too small to > show any effect). If anyone has access to a larger ARM machine, additional > testing would be greatly appreciated. I think it would be unfortunate if > this slipped to v19. Seems reasonable. It's not surprising that spinning with an atomic operation doesn't scale very well once a you have more than a handful cores. Of course the whole way locking works in pg_stat_statements is a scalability disaster, but that's a bigger project to fix. Out of curiosity, have you measured whether this has a positive effect without pg_stat_statements? I think it'll e.g. also affect lwlocks, as they also use perform_spin_delay(). Greetings, Andres
Re: Fwd: Re: A new look at old NFS readdir() problems?
David Steele writes: > On 1/8/25 12:40, Bruce Momjian wrote: >> Will people now always get a clear error on failure? > The error will be something like "directory is not empty" when trying to > drop a database. So not very clear at all. Conceivably we could adjust rmtree() to print a HINT mentioning that unsafe network storage is a possible cause of this error. But I'm a little worried that that'd be crying wolf, because on most setups a much more likely explanation is somebody concurrently creating a file in the targeted directory. regards, tom lane
Re: Parameter NOT NULL to CREATE DOMAIN not the same as CHECK (VALUE IS NOT NULL)
Bruce Momjian writes: > I think this needs some serious research. We've discussed this topic before. The spec's definition of IS [NOT] NULL for composite values is bizarre to say the least. I think there's been an intentional choice to keep most NOT NULL checks "simple", that is we look at the overall value's isnull bit and don't probe any deeper than that. If the optimizations added in v17 changed existing behavior, I agree that's bad. We should probably fix it so that those are only applied when argisrow is false. regards, tom lane
Re: Sample rate added to pg_stat_statements
On 08.01.2025 22:39, Ilia Evdokimov wrote: After the changes in v9-patch, won’t all the necessary queries now be normalized? Since there are no longer any restrictions in the parser hook, queries will be normalized as usual, and pgss_planner, pgss_ExecutorStart, and ExecutorEnd will simply fetch them from 'pgss_query_texts.stat' file. For now, this seems like the simplest approach instead of making JumbleState available to other hooks. Moreover, if we do proceed with that, we might be able to remove the 'pgss_query_texts.stat' file altogether and more other improvements. In my view, if no other options arise, we should address making JumbleState available to other hooks in a separate thread. If you have any suggestions, I'm always open to feedback. I attached v9 patch with changes and test above. Unfortunately, these changes do not achieve the intended sampling goal. I looked into this more deeply: while the sampled-out queries do not appear in pg_stat_statements, an entry is still allocated in the hash table after normalization, which, in my view, should not happen when sampling is in effect. Therefore, patch v9 is unlikely to meet our needs. -- Best regards, Ilia Evdokimov, Tantor Labs LLC.
Re: New GUC autovacuum_max_threshold ?
On Wed, Jan 8, 2025 at 8:01 PM Nathan Bossart wrote: > On Wed, Jan 08, 2025 at 02:48:10PM +0100, Frédéric Yhuel wrote: > > For what it's worth, although I would have preferred the sub-linear > growth > > thing, I'd much rather have this than nothing. > > +1, this is how I feel, too. But I also don't want to add something that > folks won't find useful. > > > And I have to admit that the proposed formulas were either too > convoluted or > > wrong. > > > > This very patch is more straightforward. Please let me know if I can help > > and how. > > I read through the thread from the top, and it does seem like there is > reasonably strong support for the hard cap. Upon a closer review of the > patch, I noticed that the relopt was defined such that you couldn't disable > autovacuum_max_threshold on a per-table basis, so I fixed that in v4. > > -- > nathan > nathan, Please also provide the tests on the new parameter you want to introduce. Best, vini
Re: EphemeralNamedRelation and materialized view
Yugo NAGATA writes: > Thank you for your reviewing and editing the patch! > I agree with your proposal on the error message handling. Cool, pushed v4 then. regards, tom lane
Re: Non-text mode for pg_dumpall
On Thu, 9 Jan 2025 at 02:30, Guillaume Lelarge wrote: > > Hi, > > Le mer. 8 janv. 2025 à 17:41, Mahendra Singh Thalor a > écrit : >> >> On Wed, 8 Jan 2025 at 20:07, Mahendra Singh Thalor >> wrote: >> > >> > Hi all, >> > >> > On Wed, 8 Jan 2025 at 00:34, Mahendra Singh Thalor >> > wrote: >> > > >> > > On Mon, 6 Jan 2025 at 23:05, Nathan Bossart >> > > wrote: >> > > > >> > > > On Thu, Jan 02, 2025 at 02:05:13AM +0530, Mahendra Singh Thalor wrote: >> > > > > Here, I am attaching an updated patch. I fixed some bugs of v01 >> > > > > patch and >> > > > > did some code cleanup also. >> > > > >> > > > Thank you for picking this up! I started to review it, but the >> > > > documentation changes didn't build, and a few tests in check-world are >> > > > failing. Would you mind resolving those issues? Also, if you haven't >> > > > already, please add an entry to the next commitfest [0] to ensure that >> > > > 1) >> > > > this feature is tracked and 2) the automated tests will run. >> > > >> > > Thanks Nathan for the quick response. >> > > >> > > I fixed bugs of documentation changes and check-world in the latest >> > > patch. Now docs are building and check-world is passing. >> > > >> > > I added entry into commitfest for this patch.[0] >> > > >> > > > >> > > > + if (dbfile) >> > > > + { >> > > > + printfPQExpBuffer(&cmd, "\"%s\" %s %s", pg_dump_bin, >> > > > + dbfile, create_opts); >> > > > + appendPQExpBufferStr(&cmd, " -F d "); >> > > > + } >> > > > >> > > > Have you given any thought to allowing a directory of custom format >> > > > files, >> > > > as discussed upthread [1]? Perhaps that is better handled as a >> > > > follow-up >> > > > patch, but it'd be good to understand the plan, anyway. >> > > >> > > I will make these changes and will test. I will update my findings after >> > > doing some testing. >> > >> > In the latest patch, I added dump and restoring for >> > directory/custom/tar/plain formats. Please consider this patch for review >> > and testing. >> > >> > Design: >> > When we give --format=d|c|t then we are dumping all global sql commands in >> > global.dat in plain sql format and we are making a map.dat file with >> > dbname and dboid. For each database, we are making separate subdirectory >> > with dboid under databases directory and dumping as per archive >> > format(d|c|t). >> > While restoring, first we are restoring all global sql commands from >> > global.dat and then we are restoring one by one all databases. As we are >> > supporting --exclude-database with pg_dumpall, the same we are supporting >> > with pg_restore also to skip restoring on some specified database patterns. >> > If we want to restore a single database, then we can specided particular >> > subdirectory from the databases folder. To get file name, we refer dbname >> > into map.file. >> > >> > TODO: Now I will work on test cases for these new added options to the >> > pg_dumpall and pg_restore. >> > >> > Here, I am attaching the v04 patch for testing and review. >> >> Sorry. My mistake. >> v04 was the delta patch on the top of v03. >> >> Here, I am attaching the v05 patch for testing and review. >> > > Just FWIW, I did a quick test tonight. It applies cleanly, compiles OK. I did > a dump: Thanks for testing and review. > > $ pg_dumpall -Fd -f dir > > and then a restore (after dropping the databases I had): > > $ pg_restore -Cd postgres -v dir > > It worked really well. That's great. > > Quick thing to fix: you've got this error message: > pg_restore: error: -d/--dbanme should be given when using archive dump of > pg_dumpall > > I guess it is --dbname, rather than --dbanme. Fixed. > > Of course, it needs much more testing, but this feature would be great to > have. Thanks for working on this! > > Apart from above typo, I fixed some review comments those I received from Andrew in offline discussion. Thanks Andrew for the quick review. Here, I am attaching an updated patch for review and testing. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com v06_pg_dumpall-with-directory-tar-custom-format-08-jan.patch Description: Binary data
Re: Make pg_stat_io view count IOs as bytes instead of blocks
On Thu, Dec 26, 2024 at 02:41:26PM +0300, Nazir Bilal Yavuz wrote: > Thanks! v4 is attached. I quickly tested the pg_stat_get_backend_io() > function and it seems it is working. Thanks a lot for the rebased version. This looks pretty solid. Here are some comments. void -pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op) +pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op, uint64 bytes) { -pgstat_count_io_op_n(io_object, io_context, io_op, 1); +pgstat_count_io_op_n(io_object, io_context, io_op, 1, bytes); pgstat_count_io_op_n() is only used locally in pgstat_io.c. I'm OK to keep it as it is used with the time calculations, but wouldn't it be better to make it static in pgstat_io.c instead and not declare it in pgstat.h? Do we really have a need for pgstat_count_io_op() at all at the end or would it be better to change it so as it can handle a number of operations given by the caller? typedef struct PgStat_BktypeIO { +uint64 bytes[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; This wastes a bit of memory, while keeping the code simpler to understand. That's better than more element number manipulations, so I'm OK with what you have here. +static inline bool +is_ioop_tracked_in_bytes(IOOp io_op) +{ +Assert((unsigned int) io_op < IOOP_NUM_TYPES); +return io_op >= IOOP_EXTEND; +} This is only used in an assertion of pgstat_count_io_op_n() in pgstat_io.c. Let's also keep it this routine local to the file. The assert to make sure that the callers don't assign bytes to the operations that don't support the counters is a good idea. + * If an IOOp does not tracked in bytes, IO_COL_INVALID is returned. s/does not tracked/is not tracked/. +/* + * Get the number of the column containing IO bytes for the specified IOOp. + * If an IOOp does not tracked in bytes, IO_COL_INVALID is returned. + */ +static io_stat_col +pgstat_get_io_byte_index(IOOp io_op) +{ Makes sense to me, and that's consistent with how the time attributes are handled. -/* - * Hard-code this to the value of BLCKSZ for now. Future values - * could include XLOG_BLCKSZ, once WAL IO is tracked, and constant - * multipliers, once non-block-oriented IO (e.g. temporary file - * IO) is tracked. - */ -values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ); Glad to see that gone. +write_bytes bigint +extend_bytes bigint +read_bytes bigint These additions in the documentation are incorrect. All these attributes are of type numeric, not bigint. +Number of units of size BLCKSZ which the process It seems to me that this had better mention 8192 as the default. -- Michael signature.asc Description: PGP signature
Re: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)
On Tue, Jan 7, 2025 at 9:54 AM Thomas Munro wrote: > On Tue, Jan 7, 2025 at 5:23 AM Andrew Dunstan wrote: > > Do you have a plan for moving ahead with this? > > I think that all looks good, and I'll go ahead and commit it in the > next day or two. Sorry for the delay. I changed some remaining off_t, because I don't want to worry about which ones can and can't overflow. Might as we'll get 'em all. I also noticed a warning due to the new lseek() macro, which just required adding it to plperl_system.h. And pushed.
Re: Eager aggregation, take 3
hi. in create_grouping_expr_infos tce = lookup_type_cache(exprType((Node *) tle->expr), TYPECACHE_BTREE_OPFAMILY); if (!OidIsValid(tce->btree_opf) || !OidIsValid(tce->btree_opintype)) return; /* * Get the operator in the btree's opfamily. */ eq_op = get_opfamily_member(tce->btree_opf, tce->btree_opintype, tce->btree_opintype, BTEqualStrategyNumber); if (!OidIsValid(eq_op)) return; eq_opfamilies = get_mergejoin_opfamilies(eq_op); if (!eq_opfamilies) return; btree_opfamily = linitial_oid(eq_opfamilies); If eq_op is valid, then we don't need to call get_mergejoin_opfamilies? since get_mergejoin_opfamilies output will be the same as tce->btree_opf. and we already checked (tce->btree_opf) is valid. In other words, I think eq_op is valid imply that tce->btree_opf is the value (btree opfamily) we need.
Re: psql: Option to use expanded mode for various meta-commands
On Wed, Jan 8, 2025 at 8:44 AM Dean Rasheed wrote: > Attached is a more complete patch +1, looks good So in the end, I decided to just add a sentence to each command's > description, keeping it as > short as possible. > Yes, that makes sense. Cheers, Greg