date:20250108

Re: Virtual generated columns

2025-01-08 Thread Richard Guo

On Fri, Nov 29, 2024 at 7:14 PM Peter Eisentraut  wrote:
> Here is a new patch version, with several updates.

> - Added support for ALTER TABLE ... SET EXPRESSION.

When using ALTER TABLE to set expression for virtual generated
columns, we don't enforce a rewrite, which means we don't have the
opportunity to check whether the new values for these columns could
cause an underflow or overflow.  For instance,

create table t (a int, b int generated always as (a) virtual);
insert into t values (2147483647);

# alter table t alter column b set expression as (a * 2);
ALTER TABLE

# select * from t;
ERROR:  integer out of range

The same thing could occur with INSERT.  As we don't compute virtual
generated columns on write, we may end up inserting values that cause
underflow or overflow for these columns.

create table t1 (a int, b int generated always as (a * 2) virtual);
insert into t1 values (2147483647);

# select * from t1;
ERROR:  integer out of range

I'm not sure if this is expected or not, so I just wanted to point it
out.

Thanks
Richard

Re: Conflict detection for update_deleted in logical replication

2025-01-08 Thread Masahiko Sawada

On Tue, Jan 7, 2025 at 2:49 AM Amit Kapila  wrote:
>
> On Mon, Jan 6, 2025 at 4:52 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Friday, January 3, 2025 2:36 PM Masahiko Sawada  
> > wrote:
> >
> > >
> > > +   /*
> > > +* The changes made by this and later transactions are still
> > > non-removable
> > > +* to allow for the detection of update_deleted conflicts when
> > > applying
> > > +* changes in this logical replication worker.
> > > +*
> > > +* Note that this info cannot directly protect dead tuples from 
> > > being
> > > +* prematurely frozen or removed. The logical replication launcher
> > > +* asynchronously collects this info to determine whether to 
> > > advance
> > > the
> > > +* xmin value of the replication slot.
> > > +*
> > > +* Therefore, FullTransactionId that includes both the
> > > transaction ID and
> > > +* its epoch is used here instead of a single Transaction ID. 
> > > This is
> > > +* critical because without considering the epoch, the 
> > > transaction ID
> > > +* alone may appear as if it is in the future due to transaction 
> > > ID
> > > +* wraparound.
> > > +*/
> > > +   FullTransactionId oldest_nonremovable_xid;
> > >
> > > The last paragraph of the comment mentions that we need to use
> > > FullTransactionId to properly compare XIDs even after the XID wraparound
> > > happens. But once we set the oldest-nonremovable-xid it prevents XIDs from
> > > being wraparound, no? I mean that workers'
> > > oldest-nonremovable-xid values and slot's non-removal-xid (i.e., its
> > > xmin) are never away from more than 2^31 XIDs.
> >
> > I think the issue is that the launcher may create the replication slot after
> > the apply worker has already set the 'oldest_nonremovable_xid' because the
> > launcher are doing that asynchronously. So, Before the slot is created, 
> > there's
> > a window where transaction IDs might wrap around. If initially the apply 
> > worker
> > has computed a candidate_xid (755) and the xid wraparound before the 
> > launcher
> > creates the slot, causing the new current xid to be (740), then the old
> > candidate_xid(755) looks like a xid in the future, and the launcher could
> > advance the xmin to 755 which cause the dead tuples to be removed 
> > prematurely.
> > (We are trying to reproduce this to ensure that it's a real issue and will
> > share after finishing)
> >
> > We thought of another approach, which is to create/drop this slot first as
> > soon as one enables/disables detect_update_deleted (E.g. create/drop slot
> > during DDL). But it seems complicate to control the concurrent slot
> > create/drop. For example, if one backend A enables detect_update_deteled, it
> > will create a slot. But if another backend B is disabling the
> > detect_update_deteled at the same time, then the newly created slot may be
> > dropped by backend B. I thought about checking the number of subscriptions 
> > that
> > enables detect_update_deteled before dropping the slot in backend B, but the
> > subscription changes caused by backend A may not visable yet (e.g. not
> > committed yet).
> >
>
> This means that for the transaction whose changes are not yet visible,
> we may have already created the slot and the backend B would end up
> dropping it. Is it possible that during the change of this new option
> via DDL, we take AccessExclusiveLock on pg_subscription as we do in
> DropSubscription() to ensure that concurrent transactions can't drop
> the slot? Will that help in solving the above scenario?

If we create/stop the slot during DDL, how do we support rollback DDLs?

>
> The second idea could be that each worker first checks whether a slot
> exists along with a subscription flag (new option). Checking the
> existence of a slot each time would be costly, so we somehow need to
> cache it. But if we do that then we need to invent some cache
> invalidation mechanism for the slot. I am not sure if we can design a
> race-free mechanism for that. I mean we need to think of a solution
> for race conditions between the launcher and apply workers to ensure
> that after dropping the slot, launcher doesn't recreate the slot (say
> if some subscription enables this option) before all the workers can
> clear their existing values of oldest_nonremovable_xid.
>
> The third idea to avoid the race condition could be that in the
> function InitializeLogRepWorker() after CommitTransactionCommand(), we
> check if the retain_dead_tuples flag is true for MySubscription then
> we check whether the system slot exists. If exits then go ahead,
> otherwise, wait till the slot is created. It could be some additional
> cycles during worker start up but it is a one-time effort and that too
> only when the flag is set. In addition to this, we anyway need to
> create the slot in the launcher before launching the workers, and
> after re-reading the subscription, the change in

Fix a wrong errmsg in AlterRole()

2025-01-08 Thread cca5507

Hi,


```
postgres=# create group g1;
CREATE ROLE
postgres=# create role r1 in group g1;
CREATE ROLE
postgres=# set session authorization r1;
SET
postgres=> alter group g1 drop user r1;
ERROR:  permission denied to alter role
DETAIL:  Only roles with the ADMIN option on role "g1" may add members.
```


The errmsg is "add members", but we are doing a drop.


Here is a small patch to fix it.


--
Regards,
ChangAo Chen

0001-Fix-a-wrong-errmsg-in-AlterRole.patch
Description: Binary data

Re: EphemeralNamedRelation and materialized view

2025-01-08 Thread Yugo NAGATA

On Mon, 30 Dec 2024 16:06:06 -0500
Tom Lane  wrote:

> Yugo NAGATA  writes:
> > On Wed, 20 Nov 2024 12:43:16 -0500
> > Tom Lane  wrote:
> >> ... It seems to me that we should
> >> think about this, for MVs as well as those other object types,
> >> as fundamentally a dependency problem.  That is, the reason
> >> we can't allow a reference to an ENR in a long-lived object
> >> is that we have no catalog representation for the reference.
> >> So that leads to thinking that the issue ought to be handled
> >> in recordDependencyOnExpr and friends.  If we see an ENR while
> >> scanning a rangetable to extract dependencies, then complain.
> >> This might be a bit messy to produce good error messages for,
> >> though.
> 
> > I've attached a updated patch. Use of ENRs are now checked in
> > find_expr_references_walker() called from recordDependencyOnExpr().
> 
> This looks pretty good to me, except that I question the use of
> getObjectTypeDescription() in the error message.  There are a
> few things not to like about that:
> 
> 1. This is kind of an off-label use of getObjectTypeDescription,
> in that we can't expect the object to be visible yet in the catalogs.
> Yeah, we can hack around that by passing missing_ok = true, but it
> still seems like a kluge.
> 
> 2. The grammar isn't great, and translatability of the message
> would be poor I think.
> 
> 3. As your test case demonstrates, the message is going to complain
> about a "rule" if the problem is with a view or matview, because
> we represent the dependency as being from the view's ON SELECT rule.
> This seems quite confusing for anyone not deeply versed in PG's inner
> workings.
> 
> After some thought I propose that we just complain that a "persistent
> object" can't depend on a transition table, and not try to identify
> the depender any more closely than that.  We can still add some
> context to the message by showing the transition table's name,
> since that's readily available from the RTE.  See attached v3,
> where I also did a bit of editing of the comments and test case.

Thank you for your reviewing and editing the patch!
I agree with your proposal on the error message handling.

> BTW, I'm not entirely convinced that the first addition (in Var
> processing) is necessary.  Such a Var must refer to an RTE
> somewhere, and I'm having a hard time coming up with a case
> where the RTE wouldn't also be part of what we scan for
> dependencies.  It's harmless enough to have the extra check,
> but can you think of a case where it's actually needed?

On second thought, I could not think of such a case. This part
can be removed. I attached v4 patch.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
>From 40f96b4ce8e93e8920e840e52b6a78f60a319efd Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Mon, 30 Dec 2024 15:59:32 -0500
Subject: [PATCH v4] Disallow NAMEDTUPLESTORE RTEs in stored views, rules, etc.

A named tuplestore is necessarily a transient object, so it makes
no sense to reference one in a persistent object such as a view.
We didn't previously prevent that, with the result that if you
tried you would get some weird failure about how the executor
couldn't find the tuplestore.

We can mechanize a check for this case cheaply by making dependency
extraction complain if it comes across such an RTE.  This is a
plausible way of dealing with it since part of the problem is that we
have no way to make a pg_depend representation of a named tuplestore.

Report and fix by Yugo Nagata.  Although this is an old problem,
it's a very weird corner case and there have been no reports from
end users.  So it seems sufficient to fix it in master.

Discussion: https://postgr.es/m/20240726160714.e74d0db579f2c017e1ca0...@sraoss.co.jp
---
 src/backend/catalog/dependency.c   | 15 +++
 src/test/regress/expected/triggers.out | 20 
 src/test/regress/sql/triggers.sql  | 19 +++
 3 files changed, 54 insertions(+)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 096b68c7f3..18316a3968 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -2191,7 +2191,22 @@ find_expr_references_walker(Node *node,
 	}
 	context->rtables = list_delete_first(context->rtables);
 	break;
+case RTE_NAMEDTUPLESTORE:
+
+	/*
+	 * Cataloged objects cannot depend on tuplestores, because
+	 * those have no cataloged representation.  For now we can
+	 * call the tuplestore a "transition table" because that's
+	 * the only kind exposed to SQL, but someday we might have
+	 * to work harder.
+	 */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("transition table \"%s\" cannot be referenced in a persistent object",
+	rte->eref->aliasname)));
+	break;
 default:
+	/* Other RTE types can be ignored here */
 	break;
 			}
 		}
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/ex

Re: Conflict detection for update_deleted in logical replication

2025-01-08 Thread Amit Kapila

On Wed, Jan 8, 2025 at 2:15 PM Masahiko Sawada  wrote:
>
> On Tue, Jan 7, 2025 at 2:49 AM Amit Kapila  wrote:
> > >
> > > We thought of another approach, which is to create/drop this slot first as
> > > soon as one enables/disables detect_update_deleted (E.g. create/drop slot
> > > during DDL). But it seems complicate to control the concurrent slot
> > > create/drop. For example, if one backend A enables detect_update_deteled, 
> > > it
> > > will create a slot. But if another backend B is disabling the
> > > detect_update_deteled at the same time, then the newly created slot may be
> > > dropped by backend B. I thought about checking the number of 
> > > subscriptions that
> > > enables detect_update_deteled before dropping the slot in backend B, but 
> > > the
> > > subscription changes caused by backend A may not visable yet (e.g. not
> > > committed yet).
> > >
> >
> > This means that for the transaction whose changes are not yet visible,
> > we may have already created the slot and the backend B would end up
> > dropping it. Is it possible that during the change of this new option
> > via DDL, we take AccessExclusiveLock on pg_subscription as we do in
> > DropSubscription() to ensure that concurrent transactions can't drop
> > the slot? Will that help in solving the above scenario?
>
> If we create/stop the slot during DDL, how do we support rollback DDLs?
>

We will prevent changing this setting in a transaction block as we
already do for slot related case. See use of
PreventInTransactionBlock() in subscriptioncmds.c.

>
> Thank you for considering some ideas. As I mentioned above, we might
> need to consider a case like where 'CREATE SUBSCRIPTION ..
> (retain_conflict_info = true)' is rolled back. Having said that, this
> comment is just for simplifying the logic. If using TransactionId
> instead makes other parts complex, it would not make sense. I'm okay
> with leaving this part and improving the comment for
> oldest_nonremovable_xid, say, by mentioning that there is a window for
> XID wraparound happening between workers computing their
> oldst_nonremovable_xid and pg_conflict_detection slot being created.
>

Fair enough. Let us see what you think about my above response first.

-- 
With Regards,
Amit Kapila.

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-01-08 Thread Kashif Zeeshan

On Wed, Jan 8, 2025 at 3:21 AM Jacob Champion <
jacob.champ...@enterprisedb.com> wrote:

> On Fri, Dec 20, 2024 at 2:21 PM Daniel Gustafsson  wrote:
> >
> > > On 20 Dec 2024, at 02:00, Jacob Champion <
> jacob.champ...@enterprisedb.com> wrote:
> >
> > Thanks for the new version, I was doing a v39 review but I'll roll that
> over
> > into a v40 review now.
>
> (Sorry for the rug pull!)
>
> > As I was reading I was trying to identify parts can be broken out and
> committed
> > ahead of time.  This not only to trim down size, but mostly to shape the
> final
> > commit into a coherent single commit that brings a single functionality
> > utilizing existing APIs.  Basically I think we should keep generic
> > functionality out of the final commit and keep that focused on OAuth and
> the
> > required APIs and infra.
>
> Sounds good.
>
> > The async auth support seemed like a candidate to go in before the
> rest.  While
> > there won't be any consumers of it, it's also not limited to OAuth.
> What do
> > you think about slicing that off and get in ahead of time?  I took a
> small stab
> > at separating out the generic bits (it includes the
> PG_MAX_AUTH_TOKEN_LENGTH
> > move as well which is unrelated, but could also be committed ahead of
> time)
> > along with some small tweaks on it.
>
> +1 to separating the PG_MAX_... macro move. I will take a closer look
> at the async patch in isolation; there's some work I'm doing to fix a
> bug Kashif (cc'd) found recently, and it has me a bit unsure about my
> chosen order of operations in the async part of fe-connect.c. That
> deserves its own email, but I need to investigate more.
>
Thanks Jacob
Most of the testing with psql is done and working on the remaining test
cases.


>
> Thanks!
> --Jacob
>
>
>

Re: Support for NO INHERIT to INHERIT state change with named NOT NULL constraints

2025-01-08 Thread Alvaro Herrera

On 2024-Nov-25, Suraj Kharage wrote:

> Another case which needs conclusion is -
> When changing from INHERIT to NO INHERIT, we need to walk all children and
> decrement coninhcount for the corresponding constraint. If a constraint in
> one child reaches zero, should we drop it?  not sure. If we do, make sure
> to reset the corresponding attnotnull bit too. We could decide not to drop
> the constraint, in which case you don’t need to reset attnotnull.

I think it's more useful if we keep such a constraint (but of course
change its conislocal to true, if it isn't that already).

There are arguments for doing both things (drop it or leave it); but if
you drop it, there's no way to put it back without scanning the table
again.  If you keep it, it's easy to drop it afterwards.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Postgres is bloatware by design: it was built to house
 PhD theses." (Joey Hellerstein, SIGMOD annual conference 2002)

Re: Conflict detection for update_deleted in logical replication

2025-01-08 Thread Masahiko Sawada

On Thu, Dec 19, 2024 at 11:11 PM Nisha Moond  wrote:
>
> Here is further performance test analysis with v16 patch-set.
>
>
> In the test scenarios already shared on -hackers [1], where pgbench was run 
> only on the publisher node in a pub-sub setup, no performance degradation was 
> observed on either node.
>
>
>
> In contrast, when pgbench was run only on the subscriber side with 
> detect_update_deleted=on [2], the TPS performance was reduced due to dead 
> tuple accumulation. This performance drop depended on the 
> wal_receiver_status_interval—larger intervals resulted in more dead tuple 
> accumulation on the subscriber node. However, after the improvement in patch 
> v16-0002, which dynamically tunes the status request, the default TPS 
> reduction was limited to only 1%.
>
>
>
> We performed more benchmarks with the v16-patches where pgbench was run on 
> both the publisher and subscriber, focusing on TPS performance. To summarize 
> the key observations:
>
>  - No performance impact on the publisher as dead tuple accumulation does not 
> occur on the publisher.

Nice. It means that frequently getting in-commit-phase transactions by
the subscriber didn't have a negative impact on the publisher's
performance.

>
>  - The performance is reduced on the subscriber side (TPS reduction (~50%) 
> [3] ) due to dead tuple retention for the conflict detection when 
> detect_update_deleted=on.
>
>  - Performance reduction happens only on the subscriber side, as workload on 
> the publisher is pretty high and the apply workers must wait for the amount 
> of transactions with earlier timestamps to be applied and flushed before 
> advancing the non-removable XID to remove dead tuples.

Assuming that the performance dip happened due to dead tuple retention
for the conflict detection, would TPS on other databases also be
affected?

>
>
> [3] Test with pgbench run on both publisher and subscriber.
>
>
>
> Test setup:
>
> - Tests performed on pgHead + v16 patches
>
> - Created a pub-sub replication system.
>
> - Parameters for both instances were:
>
>
>
>share_buffers = 30GB
>
>min_wal_size = 10GB
>
>max_wal_size = 20GB
>
>autovacuum = false

Since you disabled autovacuum on the subscriber, dead tuples created
by non-hot updates are accumulated anyway regardless of
detect_update_deleted setting, is that right?

> Test Run:
>
> - Ran pgbench(read-write) on both the publisher and the subscriber with 30 
> clients for a duration of 120 seconds, collecting data over 5 runs.
>
> - Note that pgbench was running for different tables on pub and sub.
>
> (The scripts used for test "case1-2_measure.sh" and case1-2_setup.sh" are 
> attached).
>
>
>
> Results:
>
>
>
> Run#   pub TPS  sub TPS
>
> 1 32209   13704
>
> 2 32378   13684
>
> 3 32720   13680
>
> 4 31483   13681
>
> 5 31773   13813
>
> median   32209   13684
>
> regression  7% -53%

What was the TPS on the subscriber when detect_update_deleted = false?
And how much were the tables bloated compared to when
detect_update_deleted = false?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Re: proposal: schema variables

2025-01-08 Thread jian he

hi.

you forgot change
acldefault
should add
'V' for SESSION VARIABLE


in doc/src/sgml/ddl.sgml

maybe some examples () of session variables being
shadowed would be great.

because doc/src/sgml/ref/create_variable.sgml said
  
   
Session variables can be shadowed by other identifiers.
For details, see .
   
  
If I click the link, then in ddl.sgml, there is also a bunch of text
saying that variable will be shadowed
in some situations, but there are no simple examples to demonstrate that.


"Create an date session variable var1:"
maybe it should be rephrased as
"Create a session variable var1 as date data type?"
(i am not native english speaker)

---
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -379,7 +379,8 @@ typedef struct Param
Exprxpr;
ParamKind   paramkind;  /* kind of parameter.
See above */
int paramid;/* numeric ID
for parameter */
-   Oid paramtype;  /* pg_type OID
of parameter's datatype */
+   /* pg_type OID of parameter's datatype */
+   Oid paramtype
pg_node_attr(query_jumble_ignore);

I think we need the above to make
select v2;
select v1;
normalized as one query.
---
when we create a new table, we do something like this:
DefineRelation
heap_create_with_catalog
GetNewRelFileNumber
GetNewOidWithIndex

relation Oid uniqueness and variable uniqueness is the same thing.
If variable oid uniqueness problem ever reached,
at that moment, we should care more about relation oid uniqueness problem?

and in GetNewRelFileNumber, we have comments:
""
 * As with GetNewOidWithIndex(), there is some theoretical risk of a race
 * condition, but it doesn't seem worth worrying about.
"""
also comments in GetNewOidWithIndex
"""
 * Note that we are effectively assuming that the table has a relatively small
 * number of entries (much less than 2^32) and there aren't very long runs of
 * consecutive existing OIDs.  This is a mostly reasonable assumption for
 * system catalogs.
"""
that means pg_catalog.pg_variable.varcreate_lsn is not really necessary?
---
I think the latest patch for LET self assign ACL SELECT is not correct,
previously I also tried the same idea.

test demo.
CREATE TYPE vartest_t1 AS (a int, b int);
CREATE VARIABLE var1 AS vartest_t1;
CREATE ROLE regress_var_test_role;
GRANT UPDATE ON VARIABLE var1 TO regress_var_test_role;
GRANT select ON table tenk1 TO regress_var_test_role;
SET ROLE TO regress_var_test_role;

--both should fail
LET var1.a = var1.a + 10;
LET var1.a = (select * from (select count(*) from tenk1 where unique1
= var1.a + 10));

Re: pg_settings.unit and DefineCustomXXXVariable

2025-01-08 Thread Matthias van de Meent

On Wed, 8 Jan 2025 at 11:13, Luca Ferrari  wrote:
>
> Hi all,
> I need to define a few GUCs, and for that purpose I'm using
> DefineCustomXXXVariable functions that provide hooks for assignment,
> show and check. However it is not clear to me if it is possible to
> populate the unit column in pg_settings for the custom defined
> variables, and if so, how.

DefineCustomXXXVariable has a 'flags' argument, into which GUC_UNIT_*
can be or-ed.
AFAIK, the options provided in guc.h are the only kinds of units
supported and available, and I think only integer variables actually
support the unit conversion implied by the unit options.

Kind regards,

Matthias van de Meent

Re: Conflict detection for update_deleted in logical replication

2025-01-08 Thread Masahiko Sawada

On Wed, Jan 8, 2025 at 1:53 AM Amit Kapila  wrote:
>
> On Wed, Jan 8, 2025 at 3:02 PM Masahiko Sawada  wrote:
> >
> > On Thu, Dec 19, 2024 at 11:11 PM Nisha Moond  
> > wrote:
> > >
> > > Here is further performance test analysis with v16 patch-set.
> > >
> > >
> > > In the test scenarios already shared on -hackers [1], where pgbench was 
> > > run only on the publisher node in a pub-sub setup, no performance 
> > > degradation was observed on either node.
> > >
> > >
> > >
> > > In contrast, when pgbench was run only on the subscriber side with 
> > > detect_update_deleted=on [2], the TPS performance was reduced due to dead 
> > > tuple accumulation. This performance drop depended on the 
> > > wal_receiver_status_interval—larger intervals resulted in more dead tuple 
> > > accumulation on the subscriber node. However, after the improvement in 
> > > patch v16-0002, which dynamically tunes the status request, the default 
> > > TPS reduction was limited to only 1%.
> > >
> > >
> > >
> > > We performed more benchmarks with the v16-patches where pgbench was run 
> > > on both the publisher and subscriber, focusing on TPS performance. To 
> > > summarize the key observations:
> > >
> > >  - No performance impact on the publisher as dead tuple accumulation does 
> > > not occur on the publisher.
> >
> > Nice. It means that frequently getting in-commit-phase transactions by
> > the subscriber didn't have a negative impact on the publisher's
> > performance.
> >
> > >
> > >  - The performance is reduced on the subscriber side (TPS reduction 
> > > (~50%) [3] ) due to dead tuple retention for the conflict detection when 
> > > detect_update_deleted=on.
> > >
> > >  - Performance reduction happens only on the subscriber side, as workload 
> > > on the publisher is pretty high and the apply workers must wait for the 
> > > amount of transactions with earlier timestamps to be applied and flushed 
> > > before advancing the non-removable XID to remove dead tuples.
> >
> > Assuming that the performance dip happened due to dead tuple retention
> > for the conflict detection, would TPS on other databases also be
> > affected?
> >
>
> As we use slot->xmin to retain dead tuples, shouldn't the impact be
> global (means on all databases)?

I think so too.

>
> > >
> > >
> > > [3] Test with pgbench run on both publisher and subscriber.
> > >
> > >
> > >
> > > Test setup:
> > >
> > > - Tests performed on pgHead + v16 patches
> > >
> > > - Created a pub-sub replication system.
> > >
> > > - Parameters for both instances were:
> > >
> > >
> > >
> > >share_buffers = 30GB
> > >
> > >min_wal_size = 10GB
> > >
> > >max_wal_size = 20GB
> > >
> > >autovacuum = false
> >
> > Since you disabled autovacuum on the subscriber, dead tuples created
> > by non-hot updates are accumulated anyway regardless of
> > detect_update_deleted setting, is that right?
> >
>
> I think hot-pruning mechanism during the update operation will remove
> dead tuples even when autovacuum is disabled.

True, but why did it disable autovacuum? It seems that
case1-2_setup.sh doesn't specify fillfactor, which makes hot-updates
less likely to happen.

I understand that a certain performance dip happens due to dead tuple
retention, which is fine, but I'm surprised that the TPS decreased by
50% within 120 seconds. The TPS goes even worse for a longer test? I
did a quick benchmark where I completely disabled removing dead tuples
(by autovacuum=off and a logical slot) and ran pgbench but I didn't
see such a precipitous dip.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py

2025-01-08 Thread Andrew Dunstan




On 2025-01-07 Tu 2:44 PM, Mats Kindahl wrote:
I got some time over during the holidays, so I spent some of it 
doing something I've been thinking about for a while.


For those of you that are not aware of it: Coccinelle is a tool for 
pattern matching and text transformation for C code and can be used 
for detection of problematic programming patterns and to make complex, 
tree-wide patches easy. It is aware of the structure of C code and is 
better suited to make complicated changes than what is possible using 
normal text substitution tools like Sed and Perl.


Coccinelle have been successfully been used in the Linux project since 
2008 and is now an established tool for Linux development and a large 
number of semantic patches have been added to the source tree to 
capture everything from generic issues (like eliminating the redundant 
A in expressions like "!A || (A && B)") to more Linux-specific 
problems like adding a missing call to kfree().


Although PostgreSQL is nowhere the size of the Linux kernel, it is 
nevertheless of a significant size and would benefit from 
incorporating Coccinelle into the development. I noticed it's been 
used in a few cases way back (like 10 years back) to fix issues in the 
PostgreSQL code, but I thought it might be useful to make it part of 
normal development practice to, among other things:


- Identify and correct bugs in the source code both during development 
and review.
- Make large-scale changes to the source tree to improve the code 
based on new insights.
- Encode and enforce APIs by ensuring that function calls are used 
correctly.

- Use improved coding patterns for more efficient code.
- Allow extensions to automatically update code for later PostgreSQL 
versions.


To that end, I created a series of patches to show how it could be 
used in the PostgreSQL tree. It is a lot easier to discuss concrete 
code and I split it up into separate messages since that makes it 
easier to discuss each individual patch. The series contains code to 
make it easy to work with Coccinelle during development and reviews, 
as well as examples of semantic patches that capture problems, 
demonstrate how to make large-scale changes, how to enforce APIs, and 
also improve some coding patterns.


This first patch contains the coccicheck.py script, which is a 
re-implementation of the coccicheck script that the Linux kernel uses. 
We cannot immediately use the coccicheck script since it is quite 
closely tied to the Linux source code tree and we need to have 
something that both supports autoconf and Meson. Since Python seems to 
be used more and more in the tree, it seems to be the most natural 
choice. (I have no strong opinion on what language to use, but think 
it would be good to have something that is as platform-independent as 
possible.)


The intention is that we should be able to use the Linux semantic 
patches directly, so it supports the "Requires" and "Options" 
keywords, which can be used to require a specific version of spatch(1) 
and add options to the execution of that semantic patch, respectively.





Please don't start multiple threads like this. If you want to submit a 
set of patches for a single feature, send them all as attachments in a 
single email. Otherwise this just makes life hard for threading email 
readers.



cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: per backend WAL statistics

2025-01-08 Thread Bertrand Drouvot

Hi,

On Wed, Jan 08, 2025 at 03:21:26PM +0900, Michael Paquier wrote:
> On Tue, Jan 07, 2025 at 08:48:51AM +, Bertrand Drouvot wrote:
> > Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
> > per backend IO statistics), we can more easily add per backend statistics.
> > 
> > Please find attached a patch to implement $SUBJECT.
> 
> I've looked at v1-0002 and v1-0001.

Thanks for looking at it!

> +static void
> +pgstat_flush_io_entry(PgStat_EntryRef *entry_ref, bool nowait, bool 
> need_lock)
>  {
> - PgStatShared_Backend *shbackendioent;
> - PgStat_BackendPendingIO *pendingent;
> + PgStatShared_Backend *shbackendent;
> + PgStat_BackendPending *pendingent;
>   PgStat_BktypeIO *bktype_shstats;
> + PgStat_BackendPendingIO *pending_io;
>  
> - if (!pgstat_lock_entry(entry_ref, nowait))
> - return false;
> + if (need_lock && !pgstat_lock_entry(entry_ref, nowait))
> + return;
> 
> The addition of need_lock at this level leads to a result that seems a
> bit confusing, where pgstat_backend_flush_cb() passes "false" because
> it locks the entry by itself as an effect of v1-0003 with the new area
> for WAL.  Wouldn't it be cleaner to do an extra pgstat_[un]lock_entry
> dance in pgstat_backend_flush_io() instead?  Another approach I can
> think of that would be slightly cleaner to me is to pass a bits32 to a
> single routine that would control if WAL stats, I/O stats or both
> should be flushed, keeping pgstat_flush_backend() as name with an
> extra argument to decide which parts of the stats should be flushed.

Yeah, that's more elegant as it also means that the main callback will not 
change
(should we add even more stats in the future). Done that way in v2 attached.

> -PgStat_BackendPendingIO *
> +PgStat_BackendPending *
> 
> This rename makes sense.
> 
>  #define PG_STAT_GET_WAL_COLS 9
>   TupleDesc   tupdesc;
>   Datum   values[PG_STAT_GET_WAL_COLS] = {0};
>   boolnulls[PG_STAT_GET_WAL_COLS] = {0};
> 
> It feels unnatural to have a PG_STAT_GET_WAL_COLS while it would not
> only relate to this function anymore.

Has been renamed to PG_STAT_WAL_COLS in the attached.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 190422763353e87f5208e8f6d1aee823eb7860e5 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 6 Jan 2025 07:51:27 +
Subject: [PATCH v2 1/3] Extract logic filling pg_stat_get_wal()'s tuple into
 its own routine

This commit adds pg_stat_wal_build_tuple(), a helper routine for
pg_stat_get_wal(), that fills its tuple based on the contents
of PgStat_WalStats.  This will be used in a follow-up commit that uses
the same structures as pg_stat_wal for reporting, but for the PGSTAT_KIND_BACKEND
statistics kind.
---
 src/backend/utils/adt/pgstatfuncs.c | 56 ++---
 1 file changed, 36 insertions(+), 20 deletions(-)
 100.0% src/backend/utils/adt/

diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 3245f3a8d8..7309f06993 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1560,20 +1560,22 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
 }
 
 /*
- * Returns statistics of WAL activity
+ * pg_stat_wal_build_tuple
+ *
+ * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
+ * of wal_stats.
  */
-Datum
-pg_stat_get_wal(PG_FUNCTION_ARGS)
+static Datum
+pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
 {
-#define PG_STAT_GET_WAL_COLS	9
+#define PG_STAT_WAL_COLS	9
 	TupleDesc	tupdesc;
-	Datum		values[PG_STAT_GET_WAL_COLS] = {0};
-	bool		nulls[PG_STAT_GET_WAL_COLS] = {0};
+	Datum		values[PG_STAT_WAL_COLS] = {0};
+	bool		nulls[PG_STAT_WAL_COLS] = {0};
 	char		buf[256];
-	PgStat_WalStats *wal_stats;
 
 	/* Initialise attributes information in the tuple descriptor */
-	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS);
+	tupdesc = CreateTemplateTupleDesc(PG_STAT_WAL_COLS);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_records",
 	   INT8OID, -1, 0);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wal_fpi",
@@ -1595,34 +1597,48 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
 
 	BlessTupleDesc(tupdesc);
 
-	/* Get statistics about WAL activity */
-	wal_stats = pgstat_fetch_stat_wal();
-
 	/* Fill values and NULLs */
-	values[0] = Int64GetDatum(wal_stats->wal_records);
-	values[1] = Int64GetDatum(wal_stats->wal_fpi);
+	values[0] = Int64GetDatum(wal_stats.wal_records);
+	values[1] = Int64GetDatum(wal_stats.wal_fpi);
 
 	/* Convert to numeric. */
-	snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats->wal_bytes);
+	snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
 	values[2] = DirectFunctionCall3(numeric_in,
 	CStringGetDatum(buf),
 	ObjectIdGetDatum(0),
 	Int32GetDatum(-1));
 
-	values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
-	values[4]

Re: Enhancing Memory Context Statistics Reporting

2025-01-08 Thread Rahila Syed

Hi Fujii-san,

Thank you for testing the feature.


> Issue 1: Error with pg_get_process_memory_contexts()
> When I used pg_get_process_memory_contexts() on the PID of a backend
> process
> that had just caused an error but hadn’t rolled back yet,
> the following error occurred:
>
>Session 1 (PID=70011):
>=# begin;
>=# select 1/0;
>ERROR:  division by zero
>
>Session 2:
>=# select * from pg_get_process_memory_contexts(70011, false);
>
>Session 1 terminated with:
>ERROR:  ResourceOwnerEnlarge called after release started
>FATAL:  terminating connection because protocol synchronization was lost
>
> In this scenario, a DSM segment descriptor is created and associated with
the
CurrentResourceOwner, which is set to the aborting transaction's resource
owner.
This occurs when ProcessGetMemoryContextInterrupts is called by the backend
while a transaction is still open and about to be rolled back.

I believe this issue needs to be addressed in the DSA and DSM code by
adding
a check to ensure that the CurrentResourceOwner is not about to be released
before
creating a DSM under the CurrentResourceOwner.

The attached fix resolves this issue. However, for a more comprehensive
solution,
I believe the same change should be extended to other parts of the DSA and
DSM
code where CurrentResourceOwner is referenced.

Issue 2: Segmentation Fault
> When I ran pg_get_process_memory_contexts() every 0.1 seconds using
> \watch command while running "make -j 4 installcheck-world",
> I encountered a segmentation fault:
>
>LOG:  client backend (PID 97975) was terminated by signal 11:
> Segmentation fault: 11
>DETAIL:  Failed process was running: select infinite_recurse();
>LOG:  terminating any other active server processes
>
> I have not been able to reproduce this issue. Could you please clarify
which process you ran
pg_get_process_memory_context() on, with the interval of 0.1? Was it a
backend process
created by make installcheck-world, or some other process?

Thank you,
Rahila Syed


fix_for_resource_owner_error.patch
Description: Binary data

Re: Enhancing Memory Context Statistics Reporting

2025-01-08 Thread Fujii Masao





On 2025/01/08 21:03, Rahila Syed wrote:

I have not been able to reproduce this issue. Could you please clarify which 
process you ran
|pg_get_process_memory_context()| on, with the interval of 0.1?


I used the following query for testing:

=# SELECT count(*) FROM pg_stat_activity, pg_get_process_memory_contexts(pid, false) 
WHERE pid <> pg_backend_pid();
=# \watch 0.1


Was it a backend process
created by |make installcheck-world|, or some other process?


Yes, the target backends were from make installcheck-world.
No other workloads were running.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-01-08 Thread Jacob Champion

On Tue, Jan 7, 2025 at 2:24 PM Jacob Champion
 wrote:
> Along those lines, though, Michael Paquier suggested that maybe I
> could pull the require_auth prefactoring up to the front of the
> patchset. That might look a bit odd until OAuth support lands, since
> it won't be adding any new useful value, but I will give it a shot.

While I take a look at the async patch from upthread, here is my
attempt at pulling the require_auth change out.

Note that there's a dead branch that cannot be exercised until OAuth
lands. We're not going to process the SASL mechanism name at all if no
mechanisms are allowed to begin with, and right now SASL is synonymous
with SCRAM. I can change that by always allowing AuthenticationSASL
messages -- even if none of the allowed authentication types use SASL
-- but that approach didn't seem to generate excitement on- or
off-list the last time I proposed it [1].

Thanks,
--Jacob

[1] 
https://postgr.es/m/CAAWbhmg%2BGzNMK5Li182BKSbzoFVaKk_dDJ628NnuV80GqYgFFg%40mail.gmail.com
commit f8bbb0f2a2e0f4840bdeb5c9a7b9a35797280aaf
Author: Jacob Champion 
Date:   Mon Dec 16 13:57:14 2024 -0800

require_auth: prepare for multiple SASL mechanisms

Prior to this patch, the require_auth implementation assumed that the
AuthenticationSASL protocol message was synonymous with SCRAM-SHA-256.
In preparation for the OAUTHBEARER SASL mechanism, split the
implementation into two tiers: the first checks the acceptable
AUTH_REQ_* codes, and the second checks acceptable mechanisms if
AUTH_REQ_SASL et al are permitted.

conn->allowed_sasl_mechs is the list of pointers to acceptable
mechanisms. (Since we'll support only a small number of mechanisms, this
is an array of static length to minimize bookkeeping.) pg_SASL_init()
will bail if the selected mechanism isn't contained in this array.

Since there's only one mechansism supported right now, one branch of the
second tier cannot be exercised yet (it's marked with Assert(false)).
This assertion will need to be removed when the next mechanism is added.

diff --git a/src/interfaces/libpq/fe-auth.c b/src/interfaces/libpq/fe-auth.c
index 14a9a862f51..722bb47ee14 100644
--- a/src/interfaces/libpq/fe-auth.c
+++ b/src/interfaces/libpq/fe-auth.c
@@ -543,6 +543,35 @@ pg_SASL_init(PGconn *conn, int payloadlen)
goto error;
}
 
+   /* Make sure require_auth is satisfied. */
+   if (conn->require_auth)
+   {
+   boolallowed = false;
+
+   for (int i = 0; i < lengthof(conn->allowed_sasl_mechs); i++)
+   {
+   if (conn->sasl == conn->allowed_sasl_mechs[i])
+   {
+   allowed = true;
+   break;
+   }
+   }
+
+   if (!allowed)
+   {
+   /*
+* TODO: this is dead code until a second SASL 
mechanism is added;
+* the connection can't have proceeded past 
check_expected_areq()
+* if no SASL methods are allowed.
+*/
+   Assert(false);
+
+   libpq_append_conn_error(conn, "authentication method 
requirement \"%s\" failed: server requested %s authentication",
+   
conn->require_auth, selected_mechanism);
+   goto error;
+   }
+   }
+
if (conn->channel_binding[0] == 'r' &&  /* require */
strcmp(selected_mechanism, SCRAM_SHA_256_PLUS_NAME) != 0)
{
diff --git a/src/interfaces/libpq/fe-connect.c 
b/src/interfaces/libpq/fe-connect.c
index 8f211821eb2..6f262706b0a 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -1110,6 +1110,56 @@ libpq_prng_init(PGconn *conn)
pg_prng_seed(&conn->prng_state, rseed);
 }
 
+/*
+ * Fills the connection's allowed_sasl_mechs list with all supported SASL
+ * mechanisms.
+ */
+static inline void
+fill_allowed_sasl_mechs(PGconn *conn)
+{
+   /*---
+* We only support one mechanism at the moment, so rather than deal 
with a
+* linked list, conn->allowed_sasl_mechs is an array of static length. 
We
+* rely on the compile-time assertion here to keep us honest.
+*
+* To add a new mechanism to require_auth,
+* - update the length of conn->allowed_sasl_mechs,
+* - add the new pg_fe_sasl_mech pointer to this function, and
+* - handle the new mechanism name in the require_auth portion of
+*   pqConnectOptions2(), below.
+*/
+   StaticAssertDecl(lengthof(conn->allowed_sasl_mechs) == 1,
+"fill_allowed_sasl_mechs() must be 
updated when resizing conn->allowed_sasl_mechs[]");
+
+   conn->allowed_sasl_mechs[0] = &pg_sc

Re: New GUC autovacuum_max_threshold ?

2025-01-08 Thread Frédéric Yhuel





On 1/7/25 23:57, Nathan Bossart wrote:

Here is a rebased patch for cfbot.  AFAICT we are still pretty far from
consensus on which approach to take, unfortunately.



For what it's worth, although I would have preferred the sub-linear 
growth thing, I'd much rather have this than nothing.


And I have to admit that the proposed formulas were either too 
convoluted or wrong.


This very patch is more straightforward. Please let me know if I can 
help and how.

Re: psql: Option to use expanded mode for various meta-commands

2025-01-08 Thread Dean Rasheed

On Mon, 30 Dec 2024 at 15:48, Greg Sabino Mullane  wrote:
>
> I like this, very useful. It's a shame about the conflict with \dx (lesson 
> for the future: think extra carefully about option namings!). I am impressed 
> that \dx   \d   \d+   \d+x  and even \dxx all work as one might intuit with 
> this patch.
>

Thanks for looking.

Attached is a more complete patch, now with
 - trivial bug-fix for previous \d code, per cfbot
 - expanded mode support for \l and \z
 - updated psql help
 - a few representative regression test cases

The majority of the patch is doc updates, which are somewhat tedious.
Initially, I resisted documenting the 'x' option in the description of
every command affected, but given the length of the list of commands
in the HTML page, it's a long way from any given command to the top or
bottom where 'x' is described in more detail. So in the end, I decided
to just add a sentence to each command's description, keeping it as
short as possible.

Regards,
Dean
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
new file mode 100644
index 72f3347..c48bcfb
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -865,6 +865,13 @@ testdb=>
 
 
 
+Many of the meta-commands also allow x to be appended
+as an option. This will cause the results to be displayed in expanded
+mode, as if expanded=on were included in the list of
+\pset options. See also \x.
+
+
+
 The following meta-commands are defined:
 
 
@@ -1272,7 +1279,7 @@ SELECT $1 \parse stmt1
 
 
   
-\d[S+] [ pattern ]
+\d[Sx+] [ pattern ]
 
 
 
@@ -1321,12 +1328,24 @@ SELECT $1 \parse stmt1
 foreign tables.
 This is purely a convenience measure.
 
+
+As with many other commands, if x is appended to
+the command name, the results are displayed in expanded mode, but note
+that this only applies when \d is used without a
+pattern argument, and
+the x modifier cannot appear immediately after the
+\d (because \dx is a different
+command); the x modifier may only appear after an
+S or + modifier. For example,
+\d+x is equivalent to \dtvmsE+x
+and will show a list of all relations in expanded mode.
+
 
 
   
 
   
-\da[S] [ pattern ]
+\da[Sx] [ pattern ]
 
 
 
@@ -1337,19 +1356,23 @@ SELECT $1 \parse stmt1
 By default, only user-created objects are shown;  supply a
 pattern or the S modifier to include system
 objects.
+If x is appended to the command name, the results
+are displayed in expanded mode.
 
 
   
 
   
-\dA[+] [ pattern ]
+\dA[x+] [ pattern ]
 
 
 
 Lists access methods. If pattern is specified, only access
-methods whose names match the pattern are shown. If
-+ is appended to the command name, each access
+methods whose names match the pattern are shown.
+If x is appended to the command name, the results
+are displayed in expanded mode.
+If + is appended to the command name, each access
 method is listed with its associated handler function and description.
 
 
@@ -1357,7 +1380,7 @@ SELECT $1 \parse stmt1
 
   
 
-  \dAc[+]
+  \dAc[x+]
 [access-method-pattern
   [input-type-pattern]]
   
@@ -1372,6 +1395,8 @@ SELECT $1 \parse stmt1
 If input-type-pattern
 is specified, only operator classes associated with input types whose
 names match that pattern are listed.
+If x is appended to the command name, the results
+are displayed in expanded mode.
 If + is appended to the command name, each operator
 class is listed with its associated operator family and owner.
 
@@ -1380,7 +1405,7 @@ SELECT $1 \parse stmt1
 
   
 
-  \dAf[+]
+  \dAf[x+]
 [access-method-pattern
   [input-type-pattern]]
   
@@ -1395,6 +1420,8 @@ SELECT $1 \parse stmt1
 If input-type-pattern
 is specified, only operator families associated with input types whose
 names match that pattern are listed.
+If x is appended to the command name, the results
+are displayed in expanded mode.
 If + is appended to the command name, each operator
 family is listed with its owner.
 
@@ -1403,7 +1430,7 @@ SELECT $1 \parse stmt1
 
   
 
-  \dAo[+]
+  \dAo[x+]
 [access-method-pattern
   [operator-family-pattern]]
   
@@ -1419,6 +1446,8 @@ SELECT $1 \parse stmt1
 If operator-family-pattern
 is specified, only members of operator families whose names match that
 pattern are listed.
+If x is

Logical replication - proposal for a custom conflict resolution function

2025-01-08 Thread Diego Fronza

Hello hackers, I'd like some feedback on a logical replication feature I
would like to write a patch for.
The feature would allow users to register a custom conflict handler
function to be executed for each conflicting row, after the built-in
conflict resolver has run.

Running after the built-in conflict resolver allows the function to use the
value provided by the resolver, for instance for custom logging.

The proposed programming model for the function follows the model for
trigger functions - a set of read only variables are populated in the
top-level block, and the function returns a record or NULL.

The conflict handler function would have more variables. At minimum, three
records - the local version of the new value, the remote version of the new
value, the value returned by the built-in resolver function - rather than
the two for a trigger function, the timestamp information for the local and
remote updates, the conflict type detected and the resolver used.

The new syntax proposed may look something like this:
CREATE SUBSCRIPTION 
CONNECTION 
PUBLICATION 
CONFLICT RESOLVER (conflict_type1 = resolver1 [, conflict_typeN =
resolverN,... ])
[conflict_handler=my_custom_function])  // <--- allow users to pass a
custom conflict resolution function.

Please let me know your thoughts, thanks!

Diego

Re: Virtual generated columns

2025-01-08 Thread Marcos Pegoraro

Em qua., 8 de jan. de 2025 às 16:23, Vik Fearing 
escreveu:

> This is lying to the planner, and you get to enjoy whatever breaks
> because of it.  A function that accesses external data is not immutable;
> it is stable at best.
>

I understand that, but it's not documented, so users can think that way is
fine. So, it would be good to explain why this way could break this or that.

regards
Marcos

Re: using PGOPTIONS in meson

2025-01-08 Thread Andres Freund

Hi,

On 2025-01-08 23:27:42 +0800, jian he wrote:
> hi.
> i am wondering ways to do
> ``make check PGOPTIONS="-c debug_parallel_query=regress -c work_mem=50MB"``
> in meson way, especially the PGOPTIONS part.
> 
> I have looked at [1] and [2].
> [1] 
> https://www.postgresql.org/docs/current/regress-run.html#REGRESS-RUN-CUSTOM-SETTINGS
> [2] https://wiki.postgresql.org/wiki/Meson

This isn't really a make specific thing, i.e. there's no make specific logic
for PGOPTIONS. Options passed to make just end up as environment variables and
libpq looks at environment variables.  Which means you just need to set an
environment variable, which can be done without make:

PGOPTIONS="..." meson test ...

or export PGOPTIONS="..."; mesoin test ...

Greetings,

Andres Freund

Re: Fwd: Re: A new look at old NFS readdir() problems?

2025-01-08 Thread Tom Lane

Bruce Momjian  writes:
> Will people now always get a clear error on failure?  Crazy idea, but
> could we have initdb or postmaster start test this? 

It'd require adding quite a lot of cycles: I'd guess you'd need to
create/drop a hundred or a thousand files to be sure of seeing the
problem.

Another problem is that even if the primary data directory is okay,
a tablespace could be created on a filesystem that's not okay.

I doubt that it's worth it.

regards, tom lane

Re: Fwd: Re: A new look at old NFS readdir() problems?

2025-01-08 Thread Bruce Momjian

On Wed, Jan  8, 2025 at 03:21:27PM -0500, Tom Lane wrote:
> Bruce Momjian  writes:
> > Will people now always get a clear error on failure?  Crazy idea, but
> > could we have initdb or postmaster start test this? 
> 
> It'd require adding quite a lot of cycles: I'd guess you'd need to
> create/drop a hundred or a thousand files to be sure of seeing the
> problem.
> 
> Another problem is that even if the primary data directory is okay,
> a tablespace could be created on a filesystem that's not okay.
> 
> I doubt that it's worth it.

Oh, I thought if it could be recreated using just two files, it would be
worth it --- obviously not.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

2025-01-08 Thread Guillaume Lelarge

Hi,

Le mer. 8 janv. 2025 à 18:37, Sami Imseih  a écrit :

> > s/parallel vacuum workers for index vacuuming/parallel workers for index
> creation/?
>
> oops, that's my oversight from copying the message from vacuum. fixed.
>
> > I'd add "in parallel" to match its counterpart "serially" above. That
> would
> > make it more clear in case one just look for "building index" in the log.
>
> good point.
>
> Below is the new output with the attached v2.
>
> postgres=# create index foo_idx1 on foo(id);
> DEBUG:  building index "foo_idx1" on table "foo" with parallel workers
> DEBUG:  launched 1 parallel workers for index creation (planned: 1)
> DEBUG:  index "foo_idx1" can safely use deduplication
> CREATE INDEX
>
>
I tend to agree it might be better than Benoit's patch on the index
messages, though I'm afraid that DEBUG1 level won't work for many users.
DEBUGx are for developers, not users. A better log level for me would be
LOG. For client messages, LOG messages won't be displayed by default. So
there's still a need for a "SET client_min_messages to LOG" but it won't be
necessary to go all the way to DEBUG1.

Regards.


-- 
Guillaume.

Re: use a non-locking initial test in TAS_SPIN on AArch64

2025-01-08 Thread Tom Lane

Nathan Bossart  writes:
> Are there any objections to proceeding with this change?  So far, it's been
> tested on a c8g.24xlarge and an Apple M3 (which seems to be too small to
> show any effect).  If anyone has access to a larger ARM machine, additional
> testing would be greatly appreciated.  I think it would be unfortunate if
> this slipped to v19.

I just acquired an M4 Pro, which may also be too small to show any
effect, but perhaps running the test there would at least give us
more confidence that there's not a bad effect.  Which test case(s)
would you recommend trying?

regards, tom lane

Re: AIO v2.0

2025-01-08 Thread Andres Freund

Hi,

On 2025-01-08 15:04:39 +0100, Jakub Wartak wrote:
> On Mon, Jan 6, 2025 at 5:28 PM Andres Freund  wrote:
> > I didn't think that pg_stat_* was quite the right namespace, given that it
> > shows not stats, but the currently ongoing IOs.  I am going with pg_aios for
> > now, but I don't particularly like that.
>
> If you are looking for other proposals:
> * pg_aios_progress ? (to follow pattern of pg_stat_copy|vaccuum_progress?)
> * pg_debug_aios ?
> * pg_debug_io ?

I think pg_aios is better than those, if not by much.  Seems others are ok
with that name too. And we easily can evolve it later.

> > I think we'll want a pg_stat_aio as well, tracking things like:
> >
> > - how often the queue to IO workes was full
> > - how many times we submitted IO to the kernel (<= #ios with io_uring)
> > - how many times we asked the kernel for events (<= #ios with io_uring)
> > - how many times we had to wait for in-flight IOs before issuing more IOs
>
> If I could dream of one thing that would be 99.9% percentile of IO
> response times in milliseconds for different classes of I/O traffic
> (read/write/flush). But it sounds like it would be very similiar to
> pg_stat_io and potentially would have to be
> per-tablespace/IO-traffic(subject)-type too.

Yea, that's a significant project on its own. It's not that cheap to compute
reasonably accurate percentiles and we have no infrastructure for doing so
right now.

> AFAIU pg_stat_io has improper structure to have that there.

Hm, not obvious to me why? It might make the view a bit wide to add it as an
additional column, but otherwise I don't see a problem?

> BTW: before trying to even start to compile that AIO v2.2* and
> responding to the previous review, what are You looking interested to
> hear the most about it so that it adds some value?

Due to the rather limited "users" of AIO in the patchset, I think most
benchmarks aren't expected to show any meaningful gains. However, they
shouldn't show any significant regressions either (when not using direct
IO). I think trying to find regressions would be a rather valuable thing.

I'm tempted to collect a few of the reasonbly-ready read stream conversions
into the patchset, to make the potential gains more visible. But I am not sure
it's a good investment of time right now.

One small regression I do know about, namely scans of large relations that are
bigger than shared buffers but do fit in the kernel page cache. The increase
of BAS_BULKREAD does cause a small slowdown - but without it we never can do
sufficient asynchronous IO.   I think the slowdown is small enough to just
accept that, but it's worth qualifying that on a few machines.

> Any workload specific measurements? just general feedback, functionality
> gaps?

To see the benefits it'd be interesting to compare:

1) sequential scan performance with data not in shared buffers, using buffered 
IO
2) same, but using direct IO when testing the patch
3) checkpoint performance

In my experiments 1) gains a decent amount of performance in many cases, but
nothing overwhelming - sequential scans are easy for the kernel to read ahead.

I do see very significant gains for 2) - On a system with 10 striped NVMe SSDs
that each can do ~3.5 GB/s I measured very parallel sequential scans (I had
to use ALTER TABLE to get sufficient numbers of workers):

master: ~18 GB/s
patch, buffered:~20 GB/s
patch, direct, worker:  ~28 GB/s
patch, direct, uring:   ~35 GB/s

This was with io_workers=32, io_max_concurrency=128,
effective_io_concurrency=1000 (doesn't need to be that high, but it's what I
still have the numbers for).

This was without data checksums enabled as otherwise the checksum code becomes
a *huge* bottleneck.

I also see significant gains with 3). Bigger when using direct IO.  One
complicating factor measuring 3) is that the first write to a block will often
be slower than subsequent writes because the filesystem will need to update
some journaled metadata, presenting a bottleneck.

Checkpoint performance is also severely limited by data checksum computation
if enabled - independent of this patchset.

One annoying thing when testing DIO is that right now VACUUM will be rather
slow if the data isn't already in s_b, as it isn't yet read-stream-ified.

> Integrity/data testing with stuff like dm-dust, dm-flakey, dm-delay
> to try the error handling routines?

Hm. I don't think that's going to work very well even on master. If the
filesystem fails there's not much that PG can do...

> Some kind of AIO <-> standby/recovery interactions?

I wouldn't expect anything there. I think Thomas somewhere has a patch that
read-stream-ifies recovery prefetching, once that's done it would be more
interesting.

> * - btw, Date: 2025-01-01 04:03:33 - I saw what you did there! so
> let's officially recognize the 2025 as the year of AIO in PG, as it
> was 1st message :D

Hah, that was actually the opposite of what I intended :). I'd hoped to post
earlier,

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

2025-01-08 Thread Tom Lane

Sami Imseih  writes:
> I am curious what are the thoughts on introducing a
> CREATE INDEX VERBOSE which can provide this info?
> similar to users scripting VACUUM VERBOSE to log
> more details about the vacuum operation including parallel
> usage.

What I can recall being discussed in the past is to extend
EXPLAIN and/or EXPLAIN ANALYZE to cover utility statements
that have nontrivial execution complexity --- for example,
ALTER TABLE has a lot of machinery underneath, and people
often wish to know things like whether a particular ALTER
will cause a table rewrite or not.

Of course, a patch for that would be a few orders of magnitude
larger than what you've got here :-(.  But if you're looking
for a framework for reporting these sorts of details, I'd
much rather go in that direction than follow the model of
VACUUM VERBOSE.  VACUUM VERBOSE is a kluge with little to
recommend it other than having been easy to implement.

regards, tom lane

Re: Non-text mode for pg_dumpall

2025-01-08 Thread Guillaume Lelarge

Hi,

Le mer. 8 janv. 2025 à 17:41, Mahendra Singh Thalor  a
écrit :

> On Wed, 8 Jan 2025 at 20:07, Mahendra Singh Thalor 
> wrote:
> >
> > Hi all,
> >
> > On Wed, 8 Jan 2025 at 00:34, Mahendra Singh Thalor 
> wrote:
> > >
> > > On Mon, 6 Jan 2025 at 23:05, Nathan Bossart 
> wrote:
> > > >
> > > > On Thu, Jan 02, 2025 at 02:05:13AM +0530, Mahendra Singh Thalor
> wrote:
> > > > > Here, I am attaching an updated patch. I fixed some bugs of v01
> patch and
> > > > > did some code cleanup also.
> > > >
> > > > Thank you for picking this up!  I started to review it, but the
> > > > documentation changes didn't build, and a few tests in check-world
> are
> > > > failing.  Would you mind resolving those issues?  Also, if you
> haven't
> > > > already, please add an entry to the next commitfest [0] to ensure
> that 1)
> > > > this feature is tracked and 2) the automated tests will run.
> > >
> > > Thanks Nathan for the quick response.
> > >
> > > I fixed bugs of documentation changes and check-world in the latest
> patch. Now docs are building and check-world is passing.
> > >
> > > I added entry into commitfest for this patch.[0]
> > >
> > > >
> > > > +   if (dbfile)
> > > > +   {
> > > > +   printfPQExpBuffer(&cmd, "\"%s\" %s %s", pg_dump_bin,
> > > > + dbfile,
> create_opts);
> > > > +   appendPQExpBufferStr(&cmd, " -F d ");
> > > > +   }
> > > >
> > > > Have you given any thought to allowing a directory of custom format
> files,
> > > > as discussed upthread [1]?  Perhaps that is better handled as a
> follow-up
> > > > patch, but it'd be good to understand the plan, anyway.
> > >
> > > I will make these changes and will test. I will update my findings
> after doing some testing.
> >
> > In the latest patch, I added dump and restoring for
> directory/custom/tar/plain formats. Please consider this patch for review
> and testing.
> >
> > Design:
> > When we give --format=d|c|t then we are dumping all global sql commands
> in global.dat in plain sql format and we are making a map.dat file with
> dbname and dboid. For each database, we are making separate subdirectory
> with dboid under databases directory and dumping as per archive
> format(d|c|t).
> > While restoring, first we are restoring all global sql commands from
> global.dat and then we are restoring one by one all databases.  As we are
> supporting --exclude-database with pg_dumpall, the same we are supporting
> with pg_restore also to skip restoring on some specified database patterns.
> > If we want to restore a single database, then we can specided particular
> subdirectory from the databases folder. To get file name, we refer dbname
> into map.file.
> >
> > TODO: Now I will work on test cases for these new added options to the
> pg_dumpall and pg_restore.
> >
> > Here, I am attaching the v04 patch for testing and review.
>
> Sorry. My mistake.
> v04 was the delta patch on the top of v03.
>
> Here, I am attaching the v05 patch for testing and review.
>
>
Just FWIW, I did a quick test tonight. It applies cleanly, compiles OK. I
did a dump:

$ pg_dumpall -Fd -f dir

and then a restore (after dropping the databases I had):

$ pg_restore -Cd postgres -v dir

It worked really well. That's great.

Quick thing to fix: you've got this error message:
pg_restore: error:  -d/--dbanme should be given when using archive dump of
pg_dumpall

I guess it is --dbname, rather than --dbanme.

Of course, it needs much more testing, but this feature would be great to
have. Thanks for working on this!


-- 
Guillaume.

Re: Add the ability to limit the amount of memory that can be allocated to backends.

2025-01-08 Thread James Hunter

On Mon, Jan 6, 2025 at 1:07 PM Jim Nasby  wrote:
>
> I’ve been saying “workload management” for lack of a better term, but my 
> initial suggestion upthread was to simply stop allowing new transactions to 
> start if global work_mem consumption exceeded some threshold. That’s 
> simplistic enough that I wouldn’t really consider it “workload management”. 
> Maybe “deferred execution” would be a better name. The only other thing it’d 
> need is a timeout on how long a new transaction could sit in limbo.

Yes, this seems like a good thing to do, but we need to handle
"work_mem" , by itself, first.

The problem is that it's just too easy for a query to blow up work_mem
consumption, almost instantaneously. By the time we notice that we're
low on working memory, pausing new transactions may not be sufficient.
We could already be in the middle of a giant Hash Join, for example,
and the hash table is just going to continue to grow...

Before we can solve the problem you describe, we need to be able to
limit the work_mem consumption by an in-progress query.

James

Re: New GUC autovacuum_max_threshold ?

2025-01-08 Thread Robert Treat

On Wed, Jan 8, 2025 at 3:01 PM Nathan Bossart  wrote:
>
> On Wed, Jan 08, 2025 at 02:48:10PM +0100, Frédéric Yhuel wrote:
> > For what it's worth, although I would have preferred the sub-linear growth
> > thing, I'd much rather have this than nothing.
>
> +1, this is how I feel, too.  But I also don't want to add something that
> folks won't find useful.
>
> > And I have to admit that the proposed formulas were either too convoluted or
> > wrong.
> >
> > This very patch is more straightforward. Please let me know if I can help
> > and how.
>
> I read through the thread from the top, and it does seem like there is
> reasonably strong support for the hard cap.  Upon a closer review of the
> patch, I noticed that the relopt was defined such that you couldn't disable
> autovacuum_max_threshold on a per-table basis, so I fixed that in v4.
>

To be frank, this patch feels like a solution in search of a problem,
and as I read back through the thread, it isn't clear what problem
this is intended to fix.

There is some talk of "simplifying" autovacuum configuration, but some
noted that we already have a rather complex set of GUCs to deal with,
and adding another one, along with more math, into the equation
doesn't seem simpler to mel I'd like to think the bar should be that
the problem should be clear. So what is the problem?

Is the patch supposed to help with wraparound prevention?
autovac_freeze_max_age already covers that, and when it doesn't
vacuum_failsafe_age helps out.

A couple of people mentioned issues around hitting the index wall when
vacuuming large tables, but we believe that problem is mostly resolved
due to radix based tid storage, so this doesn't solve that. (To the
degree you don't think v17 has baked into enough production workloads
to be sure, I'd agree, but that's also an argument against doing more
work that might not be needed)

Maybe the hope is that this setting will cause vacuum to run more
often to help ameliorate i/o work from freeze vacuums kicking in, but
I suspect that Melanie's nearby work on eager vacuuming is a smarter
solution towards this problem (warning, it also may want to add more
gucs), so I think we're not solving that, and in fact might be
undercutting it.

I guess that means this is supposed to help with bloat management? but
only on large tables? I guess because you run vacuums more often?
Except that the adages of running vacuums more often don't apply as
cleanly to large tables, because those tables typically come with
large indexes, and while we have a lot of machinery in place to help
with repeated scans of the heap, that same machinery doesn't exist for
scanning the indexes, which gives you sort of an exponential curve
around vacuum times as table size (but actually index size) grows
larger. On the upside, this does mean we're less likely to see a 50x
boost in vacuums on large tables that some seemed concerned about, but
on the downside its because we're probably going to increase the
probability of vacuum worker starvation.

But getting back to goals, if your goal is to help with bloat
management, trying to tie that to a number that doesn't cleanly map to
the meta information of the table in question is a poor way to do it.
Meaning, to the degree that you are skeptical that vacuuming based on
20% of the rows of a table might not really be 20% of the size of the
table, it's certainly going to be a closer map than 100million rows in
a n number of tables of unknown (but presumably greater than
500million?) numbers of rows of unknown sizes. And again, we have a
means to tackle these bloat cases already; lowering
vacuum_scale_factor.

This isn't to say the system is perfect; I do think there are some
fundamental issues that need addressing, but adding this guc just
feels a little less baked than usual.

Robert Treat
https://xzilla.net

Re: Conflict detection for update_deleted in logical replication

2025-01-08 Thread Masahiko Sawada

On Wed, Jan 8, 2025 at 3:00 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Wednesday, January 8, 2025 6:33 PM Masahiko Sawada  
> wrote:
>
> Hi,
>
> > On Wed, Jan 8, 2025 at 1:53 AM Amit Kapila 
> > wrote:
> > > On Wed, Jan 8, 2025 at 3:02 PM Masahiko Sawada
> >  wrote:
> > > >
> > > > On Thu, Dec 19, 2024 at 11:11 PM Nisha Moond
> >  wrote:
> > > > >
> > > > >
> > > > > [3] Test with pgbench run on both publisher and subscriber.
> > > > >
> > > > >
> > > > >
> > > > > Test setup:
> > > > >
> > > > > - Tests performed on pgHead + v16 patches
> > > > >
> > > > > - Created a pub-sub replication system.
> > > > >
> > > > > - Parameters for both instances were:
> > > > >
> > > > >
> > > > >
> > > > >share_buffers = 30GB
> > > > >
> > > > >min_wal_size = 10GB
> > > > >
> > > > >max_wal_size = 20GB
> > > > >
> > > > >autovacuum = false
> > > >
> > > > Since you disabled autovacuum on the subscriber, dead tuples created
> > > > by non-hot updates are accumulated anyway regardless of
> > > > detect_update_deleted setting, is that right?
> > > >
> > >
> > > I think hot-pruning mechanism during the update operation will remove
> > > dead tuples even when autovacuum is disabled.
> >
> > True, but why did it disable autovacuum? It seems that case1-2_setup.sh
> > doesn't specify fillfactor, which makes hot-updates less likely to happen.
>
> IIUC, we disable autovacuum as a general practice in read-write tests for
> stable TPS numbers.

Okay. TBH I'm not sure what we can say with these results. At a
glance, in a typical bi-directional-like setup,  we can interpret
these results as that if users turn retain_conflict_info on the TPS
goes 50% down.  But I'm not sure this 50% dip is the worst case that
users possibly face. It could be better in practice thanks to
autovacuum, or it also could go even worse due to further bloats if we
run the test longer.

Suppose that users had 50% performance dip due to dead tuple retention
for update_deleted detection, is there any way for users to improve
the situation? For example, trying to advance slot.xmin more
frequently might help to reduce dead tuple accumulation. I think it
would be good if we could have a way to balance between the publisher
performance and the subscriber performance.

In test case 3, we observed a -53% performance dip, which is worse
than the results of test case 5 with wal_receiver_status_interval =
100s. Given that in test case 5 with wal_receiver_status_interval =
100s we cannot remove dead tuples for the most of the whole 120s test
time, probably we could not remove dead tuples for a long time also in
test case 3. I expected that the apply worker gets remote transaction
XIDs and tries to advance slot.xmin more frequently, so this
performance dip surprised me. I would like to know how many times the
apply worker gets remote transaction XIDs and succeeds in advance
slot.xmin during the test.

>
> >
> > I understand that a certain performance dip happens due to dead tuple
> > retention, which is fine, but I'm surprised that the TPS decreased by 50% 
> > within
> > 120 seconds. The TPS goes even worse for a longer test?
>
> We will try to increase the time and run the test again.
>
> > I did a quick
> > benchmark where I completely disabled removing dead tuples (by
> > autovacuum=off and a logical slot) and ran pgbench but I didn't see such a
> > precipitous dip.
>
> I think a logical slot only retain the dead tuples on system catalog,
> so the TPS on user table would not be affected that much.

You're right, I missed it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Moving the vacuum GUCs' docs out of the Client Connection Defaults section

2025-01-08 Thread Melanie Plageman

On Wed, Jan 8, 2025 at 8:39 AM Peter Eisentraut  wrote:
>
> On 07.01.25 18:31, Melanie Plageman wrote:
> >
> > Oh, one thing I forgot to say. Though I increased the indentation of
> > some of the subsections that I moved, I didn't rewrap the lines
> > because they were already not wrapped to 78. I can do this, but I
> > can't tell from the existing docs what text width the paragraphs of
> > text are supposed to be wrapped to.
>
> For a patch that moves things around like this, I would leave everything
> as is and not rewrap.  That makes it easier to follow what's going on
> with "git diff -w", "git show -w" etc.
>
> In .dir-locals.el, there is this configuration for Emacs:
>
>   (nxml-mode . ((fill-column . 78)
> (indent-tabs-mode . nil)))
>
> So that's one data point about what the line length should be.

Well, in this case, the diff won't look different with git show/diff
-w because moving them to another part of the file is more than a
whitespace change. For clarity, I added a note to the commit message
that the actual GUCs' docs are just lifted and shifted.

I decided in the end not to wrap it anyway, though. I tried it and
didn't like the result. If I wrap it to 78, that actually makes a good
number of the GUCs' docs wider (i.e. they were wrapped to less than
78). Which means future additions are more likely to need to wrap (as
Tom mentioned upthread).

Attached is v2 (required a rebase).

- Melanie


v2-0001-Consolidate-docs-for-vacuum-related-GUCs-in-new-s.patch
Description: Binary data

Re: Fix a wrong errmsg in AlterRole()

2025-01-08 Thread cca5507

Hi,

I modified the patch according to Tom's suggestion.


--
Regards,
ChangAo Chen

v2-0001-Fix-a-wrong-errdetail-in-AlterRole.patch
Description: Binary data

Re: pgindent exit status if a file encounters an error

2025-01-08 Thread Ashutosh Bapat

On Wed, Jan 8, 2025 at 9:35 PM Andrew Dunstan  wrote:
>
>
>
> I forget now what I was intending here, so I have committed this with minor 
> tweaks.
>
>
Thanks Andrew for taking care of this.

-- 
Best Wishes,
Ashutosh Bapat

Re: Make pg_stat_io view count IOs as bytes instead of blocks

2025-01-08 Thread Nazir Bilal Yavuz

Hi,

Thanks for the review!

On Thu, 9 Jan 2025 at 05:59, Michael Paquier  wrote:
>
> On Thu, Dec 26, 2024 at 02:41:26PM +0300, Nazir Bilal Yavuz wrote:
> > Thanks! v4 is attached. I quickly tested the pg_stat_get_backend_io()
> > function and it seems it is working.
>
> Thanks a lot for the rebased version.  This looks pretty solid.  Here
> are some comments.
>
>  void
> -pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op)
> +pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op, 
> uint64 bytes)
>  {
> -pgstat_count_io_op_n(io_object, io_context, io_op, 1);
> +pgstat_count_io_op_n(io_object, io_context, io_op, 1, bytes);
>
> pgstat_count_io_op_n() is only used locally in pgstat_io.c.  I'm OK to
> keep it as it is used with the time calculations, but wouldn't it be
> better to make it static in pgstat_io.c instead and not declare it in
> pgstat.h?  Do we really have a need for pgstat_count_io_op() at all at
> the end or would it be better to change it so as it can handle a
> number of operations given by the caller?

I am a bit confused, are you suggesting these two alternatives:
1- Making pgstat_count_io_op_n() static and continuing to use
pgstat_count_io_op() as it is.
2- Removing pgstat_count_io_op() and instead using
pgstat_count_io_op_n() everywhere.

>
>  typedef struct PgStat_BktypeIO
>  {
> +uint64
> bytes[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
>
> This wastes a bit of memory, while keeping the code simpler to
> understand.  That's better than more element number manipulations, so
> I'm OK with what you have here.
>
> +static inline bool
> +is_ioop_tracked_in_bytes(IOOp io_op)
> +{
> +Assert((unsigned int) io_op < IOOP_NUM_TYPES);
> +return io_op >= IOOP_EXTEND;
> +}
>
> This is only used in an assertion of pgstat_count_io_op_n() in
> pgstat_io.c.  Let's also keep it this routine local to the file.  The
> assert to make sure that the callers don't assign bytes to the
> operations that don't support the counters is a good idea.

Makes sense, done.

>
> + * If an IOOp does not tracked in bytes, IO_COL_INVALID is returned.
>
> s/does not tracked/is not tracked/.

Done.

>
> +/*
> + * Get the number of the column containing IO bytes for the specified IOOp.
> + * If an IOOp does not tracked in bytes, IO_COL_INVALID is returned.
> + */
> +static io_stat_col
> +pgstat_get_io_byte_index(IOOp io_op)
> +{
>
> Makes sense to me, and that's consistent with how the time attributes
> are handled.
>
> -/*
> - * Hard-code this to the value of BLCKSZ for now. Future values
> - * could include XLOG_BLCKSZ, once WAL IO is tracked, and 
> constant
> - * multipliers, once non-block-oriented IO (e.g. temporary file
> - * IO) is tracked.
> - */
> -values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
>
> Glad to see that gone.
>
> +write_bytes bigint
> +extend_bytes bigint
> +read_bytes bigint
>
> These additions in the documentation are incorrect.  All these
> attributes are of type numeric, not bigint.

Done.

>
> +Number of units of size BLCKSZ which the process
>
> It seems to me that this had better mention 8192 as the default.

I agree, done.

v5 is attached.


--
Regards,
Nazir Bilal Yavuz
Microsoft
From b8123b9447117bdf4c9f34ddf57fd7acbd61745d Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Thu, 9 Jan 2025 09:39:05 +0300
Subject: [PATCH v5] Make pg_stat_io count IOs as bytes instead of blocks

Currently in pg_stat_io view, IOs are counted as blocks. There are two
problems with this approach:

1- The actual number of I/O requests sent to the kernel is lower because
I/O requests may be merged before being sent. Additionally, it gives the
impression that all I/Os are done in block size, which shadows the
benefits of merging I/O requests.

2- There may be some IOs which are not done in block size in the future.
For example, WAL read IOs are done in variable bytes and it is not
possible to correctly show these IOs in pg_stat_io view.

Because of these problems, now show the total number of IO requests to
the kernel (as smgr function calls) and total number of bytes in the IO.
Also, op_bytes column is removed from the pg_stat_io view.
---
 src/include/catalog/pg_proc.dat | 12 +--
 src/include/pgstat.h| 31 ++--
 src/backend/catalog/system_views.sql|  4 +-
 src/backend/storage/buffer/bufmgr.c | 14 ++--
 src/backend/storage/buffer/localbuf.c   |  6 +-
 src/backend/storage/smgr/md.c   |  4 +-
 src/backend/utils/activity/pgstat_backend.c |  2 +
 src/backend/utils/activity/pgstat_io.c  | 23 --
 src/backend/utils/adt/pgstatfuncs.c | 86 -
 src/test/regress/expected/rules.out |  6 +-
 doc/src/sgml/monitoring.sgml| 63 +--
 11 files changed, 172 insertions(+), 79 deletions(-)

diff --

Re: per backend WAL statistics

2025-01-08 Thread Bertrand Drouvot

Hi,

On Thu, Jan 09, 2025 at 01:03:15PM +0900, Michael Paquier wrote:
> On Wed, Jan 08, 2025 at 11:11:59AM +, Bertrand Drouvot wrote:
> > Yeah, that's more elegant as it also means that the main callback will not 
> > change
> > (should we add even more stats in the future). Done that way in v2 attached.
> 
> I've put my hands on v2-0002 to begin with something.
> 
> +/* flag bits for different types of statistics to flush */
> +#define PGSTAT_FLUSH_IO(1 << 0) /* Flush I/O statistics */
> +#define PGSTAT_FLUSH_ALL   (PGSTAT_FLUSH_IO)
> 
> These are located and used only in pgstat_backend.c.  It seems to me
> that we'd better declare them in pgstat_internal.h and extend the
> existing pgstat_flush_backend() with an argument so as callers can do
> what they want.
> 
> + /* Get our own entry_ref if not provided */
> + if (!entry_ref)
> + entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, 
> InvalidOid,
> + 
>  MyProcNumber, false, NULL);
> 
> This relates to the previous remark, actually, where I think that it
> is cleaner to have pgstat_flush_backend() do pgstat_get_entry_ref(),
> same way as HEAD, and just pass down the flags.

I see, so you keep pgstat_flush_backend() calls (with an extra arg) and remove
the new "pgstat_backend_flush_io()" function.

> This comes at the cost of pgstat_flush_backend_entry()
> requiring an extra pgstat_tracks_backend_bktype(), which is not a big
> issue, and the patch gets a bit shorter.

Yeah, all of the above is fine by me.

PFA v3 which is v2 refactoring with your proposed above changes.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From bcdf4eb91f7ff4beeba50433a173ac2650afe432 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 6 Jan 2025 07:51:27 +
Subject: [PATCH v3 1/3] Extract logic filling pg_stat_get_wal()'s tuple into
 its own routine

This commit adds pg_stat_wal_build_tuple(), a helper routine for
pg_stat_get_wal(), that fills its tuple based on the contents
of PgStat_WalStats.  This will be used in a follow-up commit that uses
the same structures as pg_stat_wal for reporting, but for the PGSTAT_KIND_BACKEND
statistics kind.
---
 src/backend/utils/adt/pgstatfuncs.c | 56 ++---
 1 file changed, 36 insertions(+), 20 deletions(-)
 100.0% src/backend/utils/adt/

diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 3245f3a8d8..7309f06993 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1560,20 +1560,22 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
 }
 
 /*
- * Returns statistics of WAL activity
+ * pg_stat_wal_build_tuple
+ *
+ * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
+ * of wal_stats.
  */
-Datum
-pg_stat_get_wal(PG_FUNCTION_ARGS)
+static Datum
+pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
 {
-#define PG_STAT_GET_WAL_COLS	9
+#define PG_STAT_WAL_COLS	9
 	TupleDesc	tupdesc;
-	Datum		values[PG_STAT_GET_WAL_COLS] = {0};
-	bool		nulls[PG_STAT_GET_WAL_COLS] = {0};
+	Datum		values[PG_STAT_WAL_COLS] = {0};
+	bool		nulls[PG_STAT_WAL_COLS] = {0};
 	char		buf[256];
-	PgStat_WalStats *wal_stats;
 
 	/* Initialise attributes information in the tuple descriptor */
-	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS);
+	tupdesc = CreateTemplateTupleDesc(PG_STAT_WAL_COLS);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_records",
 	   INT8OID, -1, 0);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wal_fpi",
@@ -1595,34 +1597,48 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
 
 	BlessTupleDesc(tupdesc);
 
-	/* Get statistics about WAL activity */
-	wal_stats = pgstat_fetch_stat_wal();
-
 	/* Fill values and NULLs */
-	values[0] = Int64GetDatum(wal_stats->wal_records);
-	values[1] = Int64GetDatum(wal_stats->wal_fpi);
+	values[0] = Int64GetDatum(wal_stats.wal_records);
+	values[1] = Int64GetDatum(wal_stats.wal_fpi);
 
 	/* Convert to numeric. */
-	snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats->wal_bytes);
+	snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
 	values[2] = DirectFunctionCall3(numeric_in,
 	CStringGetDatum(buf),
 	ObjectIdGetDatum(0),
 	Int32GetDatum(-1));
 
-	values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
-	values[4] = Int64GetDatum(wal_stats->wal_write);
-	values[5] = Int64GetDatum(wal_stats->wal_sync);
+	values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
+	values[4] = Int64GetDatum(wal_stats.wal_write);
+	values[5] = Int64GetDatum(wal_stats.wal_sync);
 
 	/* Convert counters from microsec to millisec for display */
-	values[6] = Float8GetDatum(((double) wal_stats->wal_write_time) / 1000.0);
-	values[7] = Float8GetDatum(((double) wal_stats->wal_sync_time) / 1000.0);
+	values[6] = Float8GetDatum(((double) wal_stats.wal_write_time) / 1000.0);
+	values[7] = Float8Ge

RE: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

2025-01-08 Thread Hayato Kuroda (Fujitsu)

Dear Shubham,

Thanks for updating the patch. Few comments.

```
+#include "common/int.h"/* include for strtoi64 */
```

I could build the source code without the inclusion. Can you remove?

```
+   max_slot_wal_keep_size = strtoi64(PQgetvalue(res, 0, 6), NULL, 0);
+
+   /* use strtoi64 to convert the string result to a 64-bit integer */
+   if (max_slot_wal_keep_size == 0 && errno != 0)
+   {
+   /* handle conversion error */
+   pg_log_error("Failed to convert max_slot_wal_keep_size to 
int64");
+   }
```

I'm not sure the error handling is really needed. pg_dump also uses the 
function and it does
not have such handlings.

```
+   pg_log_debug("publisher: max_slot_wal_keep_size: %ld",
+max_slot_wal_keep_size);
```

IIUC, "%ld" does not always represent int64 format. Since it is a debug 
message, isn't it enough to
use INT64_FORMAT macro?

```
+# Configure 'max_slot_wal_keep_size = 1' on the publisher and
+# reload configuration
+$node_p->append_conf('postgresql.conf', 'max_slot_wal_keep_size = 1');
+$node_p->reload;
```

Can you use 'max_slot_wal_keep_size = 10MB' or something to test the handling 
is correct?

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Several buildfarm animals fail tests because of shared memory error

2025-01-08 Thread Robins Tharakan

On Thu, 9 Jan 2025 at 15:30, Alexander Lakhin  wrote:
>
> Maybe you could try to reproduce such failures without buildfarm client,
just
> by running select_parallel, for example, with the attached patch applied.
> I mean running `make check` with parallel_schedule like:
> ...
> Or
> TESTS="test_setup copy create_misc create_index $(printf "select_parallel
%.0s" {1..100})" make check-tests
>

Thanks Alexander for pointing to the test steps. I'll try to run these on
leafhopper
the next couple of days and come back if I see anything interesting.

-
robins

Re: per backend WAL statistics

2025-01-08 Thread Michael Paquier

On Wed, Jan 08, 2025 at 11:11:59AM +, Bertrand Drouvot wrote:
> Yeah, that's more elegant as it also means that the main callback will not 
> change
> (should we add even more stats in the future). Done that way in v2 attached.

I've put my hands on v2-0002 to begin with something.

+/* flag bits for different types of statistics to flush */
+#define PGSTAT_FLUSH_IO(1 << 0) /* Flush I/O statistics */
+#define PGSTAT_FLUSH_ALL   (PGSTAT_FLUSH_IO)

These are located and used only in pgstat_backend.c.  It seems to me
that we'd better declare them in pgstat_internal.h and extend the
existing pgstat_flush_backend() with an argument so as callers can do
what they want.

+   /* Get our own entry_ref if not provided */
+   if (!entry_ref)
+   entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, 
InvalidOid,
+   
 MyProcNumber, false, NULL);

This relates to the previous remark, actually, where I think that it
is cleaner to have pgstat_flush_backend() do pgstat_get_entry_ref(),
same way as HEAD, and just pass down the flags.  pgstat_flush_backend
cannot call directly pgstat_backend_flush_cb(), of course, so I've
settled down to a new pgstat_flush_backend_entry() that handles the
entry locking.  This comes at the cost of pgstat_flush_backend_entry()
requiring an extra pgstat_tracks_backend_bktype(), which is not a big
issue, and the patch gets a bit shorter.
--
Michael
From eba357e4e2712cd73fdad585f6d8088de0dbaccc Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 6 Jan 2025 08:44:29 +
Subject: [PATCH v3] PGSTAT_KIND_BACKEND code refactoring

This commit refactors some come related to per backend statistics. It makes
the code more generic or more IO statistics focused as it will be used in a
follow-up commit that will introduce per backend WAL statistics.
---
 src/include/pgstat.h |  6 +-
 src/include/utils/pgstat_internal.h  |  7 +-
 src/backend/utils/activity/pgstat.c  |  2 +-
 src/backend/utils/activity/pgstat_backend.c  | 69 ++--
 src/backend/utils/activity/pgstat_io.c   |  8 +--
 src/backend/utils/activity/pgstat_relation.c |  4 +-
 src/backend/utils/adt/pgstatfuncs.c  |  2 +-
 src/tools/pgindent/typedefs.list |  1 +
 8 files changed, 68 insertions(+), 31 deletions(-)

diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0d8427f27d1..6631bd2d730 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -381,7 +381,7 @@ typedef PgStat_PendingIO PgStat_BackendPendingIO;
 typedef struct PgStat_Backend
 {
 	TimestampTz stat_reset_timestamp;
-	PgStat_BktypeIO stats;
+	PgStat_BktypeIO io_stats;
 } PgStat_Backend;
 
 typedef struct PgStat_StatDBEntry
@@ -523,6 +523,10 @@ typedef struct PgStat_PendingWalStats
 	instr_time	wal_sync_time;
 } PgStat_PendingWalStats;
 
+typedef struct PgStat_BackendPending
+{
+	PgStat_BackendPendingIO pending_io;
+} PgStat_BackendPending;
 
 /*
  * Functions in pgstat.c
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 52eb008710f..4bb8e5c53ab 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -613,9 +613,12 @@ extern void pgstat_archiver_snapshot_cb(void);
  * Functions in pgstat_backend.c
  */
 
-extern void pgstat_flush_backend(bool nowait);
+/* flags for pgstat_flush_backend() */
+#define PGSTAT_BACKEND_FLUSH_IO		(1 << 0)	/* Flush I/O statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL	(PGSTAT_BACKEND_FLUSH_IO)
 
-extern PgStat_BackendPendingIO *pgstat_prep_backend_pending(ProcNumber procnum);
+extern void pgstat_flush_backend(bool nowait, bits32 flags);
+extern PgStat_BackendPending *pgstat_prep_backend_pending(ProcNumber procnum);
 extern bool pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
 
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 16a03b8ce15..34520535d54 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -370,7 +370,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_size = sizeof(PgStatShared_Backend),
 		.shared_data_off = offsetof(PgStatShared_Backend, stats),
 		.shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats),
-		.pending_size = sizeof(PgStat_BackendPendingIO),
+		.pending_size = sizeof(PgStat_BackendPending),
 
 		.flush_pending_cb = pgstat_backend_flush_cb,
 		.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 1f91bfef0a3..3972bcf9456 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -39,23 +39,21 @@ pgstat_fetch_stat_backend(ProcNumber procNumber)
 }

Re: WAL-logging facility for pgstats kinds

2025-01-08 Thread Michael Paquier

On Thu, Jan 02, 2025 at 08:08:42PM -0500, Andres Freund wrote:
> I can't think of a real case where we would want to WAL log the stats
> themselves, rather than re-emitting stats during replay based on the WAL
> record of the "underlying object".
> 
> Do you have counter-examples?

I'm not sure if the rebuild based on the WAL records is simpler than
logging a snapshot of them that the startup process could digest.

Anyway, I've also wanted to be able to replicate stats for historical
tracking for stats logged through hooks, and a second case was
injection point stats.  All these would require registering a custom
rmgr on top of a custom stats kind, and centralizing the logic eases a
lot some of the sanity checks they'd require as the redo callback of a
stats kind can just be attached to its PgStat_KindInfo.

I know that Lucas has been playing a bit with the area, and perhaps
he has more cases in mind where the replication of stats data could be
relevant.  So I am adding him in CC.  Perhaps I could be wrong, of
course.
--
Michael

signature.asc
Description: PGP signature

Re: Proposal: add new API to stringinfo

2025-01-08 Thread Tatsuo Ishii

> If it's not too much work to coax the compiler to do so, then I don't see a
> strong reason to avoid it.  Granted, there are probably much more expensive
> parts of the StringInfo support functions (e.g., palloc()), but these are
> hot enough code paths that IMHO it's worth saving whatever cycles we can.

Ok, I have created v3 patch to do more inlining as you suggested. With
the patch I confirmed that there's no call to functions except palloc
in makeStringInfo, makeStringInfoExt, initStringInfo and
initStringInfoExt in the asm codes (see attached stringinfo.s).

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp


v3-0001-Add-new-StringInfo-APIs-to-allow-callers-to-speci.patch
Description: Binary data
.file   "stringinfo.c"
.text
.p2align 4
.globl  makeStringInfo
.type   makeStringInfo, @function
makeStringInfo:
.LFB100:
.cfi_startproc
endbr64
pushq   %r12
.cfi_def_cfa_offset 16
.cfi_offset 12, -16
movl$24, %edi
callpalloc@PLT
movl$1024, %edi
movq%rax, %r12
callpalloc@PLT
movl$1024, 12(%r12)
movq%rax, (%r12)
movb$0, (%rax)
movq%r12, %rax
movl$0, 8(%r12)
movl$0, 16(%r12)
popq%r12
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE100:
.size   makeStringInfo, .-makeStringInfo
.section.rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "src/common/stringinfo.c"
.LC1:
.string "initsize > 0"
.text
.p2align 4
.globl  makeStringInfoExt
.type   makeStringInfoExt, @function
makeStringInfoExt:
.LFB101:
.cfi_startproc
endbr64
pushq   %r12
.cfi_def_cfa_offset 16
.cfi_offset 12, -16
pushq   %rbx
.cfi_def_cfa_offset 24
.cfi_offset 3, -24
movl%edi, %ebx
movl$24, %edi
subq$8, %rsp
.cfi_def_cfa_offset 32
callpalloc@PLT
testl   %ebx, %ebx
jle .L7
movslq  %ebx, %rdi
movq%rax, %r12
callpalloc@PLT
movl%ebx, 12(%r12)
movq%rax, (%r12)
movb$0, (%rax)
movq%r12, %rax
movl$0, 8(%r12)
movl$0, 16(%r12)
addq$8, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 24
popq%rbx
.cfi_def_cfa_offset 16
popq%r12
.cfi_def_cfa_offset 8
ret
.L7:
.cfi_restore_state
movl$43, %edx
leaq.LC0(%rip), %rsi
leaq.LC1(%rip), %rdi
callExceptionalCondition@PLT
.cfi_endproc
.LFE101:
.size   makeStringInfoExt, .-makeStringInfoExt
.p2align 4
.globl  initStringInfo
.type   initStringInfo, @function
initStringInfo:
.LFB102:
.cfi_startproc
endbr64
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq%rdi, %rbx
movl$1024, %edi
callpalloc@PLT
movl$1024, 12(%rbx)
movq%rax, (%rbx)
movb$0, (%rax)
movl$0, 8(%rbx)
movl$0, 16(%rbx)
popq%rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE102:
.size   initStringInfo, .-initStringInfo
.p2align 4
.globl  initStringInfoExt
.type   initStringInfoExt, @function
initStringInfoExt:
.LFB103:
.cfi_startproc
endbr64
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
pushq   %rbx
.cfi_def_cfa_offset 24
.cfi_offset 3, -24
subq$8, %rsp
.cfi_def_cfa_offset 32
testl   %esi, %esi
jle .L13
movq%rdi, %rbx
movslq  %esi, %rdi
movl%esi, %ebp
callpalloc@PLT
movl%ebp, 12(%rbx)
movq%rax, (%rbx)
movb$0, (%rax)
movl$0, 8(%rbx)
movl$0, 16(%rbx)
addq$8, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 24
popq%rbx
.cfi_def_cfa_offset 16
popq%rbp
.cfi_def_cfa_offset 8
ret
.L13:
.cfi_restore_state
movl$43, %edx
leaq.LC0(%rip), %rsi
leaq.LC1(%rip), %rdi
callExceptionalCondition@PLT
.cfi_endproc
.LFE103:
.size   initStringInfoExt, .-initStringInfoExt
.section.rodata.str1.1
.LC2:
.string "str->maxlen != 0"
.text
.p2align 4
.globl  resetStringInfo
.type   resetStringInfo, @function
resetStringInfo:
.LFB104:
.cfi_startproc
endbr64
movl12(%rdi), %edx
testl   %edx, %edx
je  .L19
movq(%rdi), %rax
movb$0, (%rax)

Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

2025-01-08 Thread Peter Smith

On Sat, Jan 4, 2025 at 4:23 AM Robert Treat  wrote:
>
> On Wed, Dec 18, 2024 at 5:56 AM Amit Kapila  wrote:
> >
> > On Thu, Feb 8, 2024 at 9:57 AM Laurenz Albe  
> > wrote:
> > >
> > > On Thu, 2024-02-08 at 13:40 +1100, Peter Smith wrote:
> > > > -   how to set the replica identity.  If a table without a replica 
> > > > identity is
> > > > +   how to set the replica identity.  If a table without a replica 
> > > > identity
> > > > +   (or with replica identity behavior the same as 
> > > > NOTHING) is
> > > > added to a publication that replicates UPDATE
> > > > or DELETE operations then
> > > > subsequent UPDATE or DELETE
> > >
> > > I had the impression that the root of the confusion was the perceived 
> > > difference
> > > between "REPLICA IDENTITY NOTHING" and "no replica identity", and that 
> > > change
> > > doesn't improve that.
> > >
> > > How about:
> > >
> > >   If a table without a replica identity (explicitly set to 
> > > NOTHING,
> > >   or set to a primary key or index that doesn't exist) is added ...
> > >
> >
> > Is it correct to say "set to a primary key or index that doesn't
> > exist"? Because when it is set to the primary key then it should work.
> >
> > I think Peter's proposal along with Ashutosh's proposal is the simpler
> > approach to clarify things in this area but I am fine if others find
> > some other way of updating docs better.
> >
>
> After reading this thread, I think part of the confusion here is that
> technically speaking there is no such thing as having "no replica
> identity"; when a table is created, by default it's "replica identity"
> is set to DEFAULT, and how it behaves at that point will be dependent
> on the structure of the table (PK or no PK), not whether it is being
> replicated/published. To that end, I've taken the above suggestions
> and re-worked them to remove that language, as well as add some
> additional clarity.
>
> Patch attached for this, I am going to update the commitfest with
> myself as author and will work this further if needed. Thanks all.
>
> Robert Treat
> https://xzilla.net

Hi Robert.

Thanks for picking up this patch.

Some review comments for the 0001 patch.

==
doc/src/sgml/logical-replication.sgml

1.
fallback if no other solution is possible.  If a replica identity other
than FULL is set on the publisher side, a
replica identity
comprising the same or fewer columns must also be set on the subscriber
-   side.  See  for details on
-   how to set the replica identity.  If a table without a replica identity is
+   side.  If a table with replica identity set to NOTHING
+   (or set to use a primary key or index that doesn't exist) is
added to a publication that replicates UPDATE
or DELETE operations then
subsequent UPDATE or DELETE
operations will cause an error on the publisher.  INSERT
operations can proceed regardless of any replica identity.
+   See  for details on
+   how to set the replica identity.

1a.
That part "If a table with replica identity set to
NOTHING (or set to use a primary key or index that
doesn't exist) is added ..." is not very clear to me.

IIUC, there are 3 ways for this to be a problem:
i) RI is set to NOTHING
ii) RI is set to DEFAULT but there is no valid PK ==> same as NOTHING
iii) RI is set USING INDEX but then that index gets dropped ==> same as NOTHING

To avoid any misunderstandings, why don't we just spell that out in
full? So, ...

SUGGESTION
A replica identity behaves the same as NOTHING when
it is set to DEFAULT and there is no primary key,
or when it is set USING INDEX but the index no
longer exists.  If a table with replica identity set to
NOTHING (or behaving the same as
NOTHING) is added to a publication that replicates
UPDATE or DELETE operations then
subsequent UPDATE or DELETE
operations will cause an error on the publisher.
~

1b.
I've always thought that for this important topic of logical
replication it is unusual that there is not even a heading for this
topic anywhere. This makes it unnecessarily difficult to find this
information. IMO it would greatly help just to have a "Replica
Identity" subsection for all this paragraph, so then it will appear
nicely in the Chapter 29 table-of-contents making it much easier to
find.

~

1c.
That great big slab of paragraph text is hard to read. I suspect it
was done that way simply to separate the RI topic visually from the
other (paragraph) topics of the sect1 heading. But now, after we
introduce the sect2 heading (my suggestion #1b above), we don't have
to do that anymore, so more blank lines can be added and it improves
the readability a lot.

==
src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml

2.

- Records the old values of the columns of the primary key, if any.
+ Records the old values of the columns of the primary key. When there
+ is no primary key, the behavior is the same as
NOTHING.
  This is the default for non-system tables.

RE: ecpg command does not warn COPY ... FROM STDIN;

2025-01-08 Thread Hayato Kuroda (Fujitsu)

Dear Tom, Fujii-san,

> > ISTM that ecpg supports COPY TO STDOUT and includes the regression test
> "copystdout" for it. No?
> 
> Oh right.  (Pokes at it...)  It looks like the backend accepts
> "FROM STDOUT" as a synonym for "FROM STDIN", so that's why this
> is checking for both spellings.  But I agree it'd be better
> for the error message to only use the standard spelling.
> 
> Also I tried
> 
> regression=# copy int4_tbl from program stdin;
> ERROR:  STDIN/STDOUT not allowed with PROGRAM
> 
> So we probably don't need to bother with adjusting the check
> to allow that.

To confirm - I have no objections for the decision. I'm also think it is not
an usual grammar. I checked the doc [1] and I could not find "COPY FROM STDOUT".
I.e., first version is enough.

[1]: https://www.postgresql.org/docs/devel/sql-copy.html

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: pure parsers and reentrant scanners

2025-01-08 Thread Peter Eisentraut


On 20.12.24 16:23, Tom Lane wrote:

Ok, we can fix that, but maybe this is also a good moment to think about
whether that is useful.  I could not reproduce the issue with flex
2.5.39.  I could find no download of flex 2.5.35.  The github site only
offers back to 2.5.39, the sourceforce site back to 2.5.36.  lapwing
says it's Debian 7.0, which went out of support in 2016 and out of
super-duper-extended support in 2020.  It also doesn't have a supported
OpenSSL version anymore, and IIRC, it has a weird old compiler that
occasionally gives bogus warnings.  I think it's time to stop supporting
this.


OK, that's fair.  I do see lapwing called out a lot in the commit log,
though it's not clear how much of that is about 32-bitness and how
much about old tools.  It's surely still valuable to have i386
machines in the buildfarm, but I agree that supporting unobtainable
tool versions is a bit much.  Could we get that animal updated to
some newer OS version?

Presumably, we should also rip out the existing yyget_column and
yyset_column kluges in

src/backend/parser/scan.l: extern int  core_yyget_column(yyscan_t 
yyscanner);
src/bin/psql/psqlscanslash.l: extern int   slash_yyget_column(yyscan_t 
yyscanner);
src/bin/pgbench/exprscan.l: extern int expr_yyget_column(yyscan_t 
yyscanner);
src/fe_utils/psqlscan.l: extern intpsql_yyget_column(yyscan_t 
yyscanner);


All my flex-related patches are in now.

Here is a patch that removes the workarounds for compiler warnings with 
flex 2.5.35.  This ended up being a whole lot, including the whole 
fix-old-flex-code.pl script.


The second patch contemplates raising the minimum required flex version, 
but what to?


The most recent incrementing was exactly because 2.5.35 was the oldest 
in the buildfarm.  The previous incrementings were apparently because 
certain features were required or some bugs had to be avoided.


Options:

- Leave at 2.5.35 as long as it's present in the buildfarm.

- Set to 2.5.36 because it's the oldest that compiles without warnings. 
Also, the oldest you can still download from the flex sourceforge site.


- Set to 2.5.37 because that's the next oldest in the buildfarm (for 
CentOS/RHEL 7, so it will stay around for a while).


- Set to 2.5.34 because that's the oldest we actually require as of 
commit b1ef48980dd.


- Remove version check, because these are all so old that no one cares 
anymore.
From fc1e8aedfe7371ae563ec51dd04346c0efd69fff Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 9 Jan 2025 08:36:16 +0100
Subject: [PATCH 1/2] Drop warning-free support for flex 2.5.35

This removes all the various workarounds for avoiding compiler
warnings with flex 2.5.35.  Several recent patches have added
additional warnings that would either need to be fixed along the lines
of the existing workarounds, or we decide to no longer care about
this, which we do here.

Flex 2.5.35 is extremely outdated, and you can't even download it
anymore from any of the Flex project sites, so it's nearly impossible
to support.

After this, using flex 2.5.35 will still work, but the generated code
will produce numerous compiler warnings.

Discussion: 
https://www.postgresql.org/message-id/1a204ccd-7ae6-478c-a431-407b5c48c...@eisentraut.org
---
 src/Makefile.global.in |  1 -
 src/backend/parser/Makefile|  1 -
 src/backend/parser/meson.build |  2 +-
 src/backend/parser/scan.l  |  9 -
 src/bin/pgbench/exprscan.l |  9 -
 src/bin/psql/Makefile  |  1 -
 src/bin/psql/meson.build   |  2 +-
 src/bin/psql/psqlscanslash.l   |  9 -
 src/fe_utils/Makefile  |  1 -
 src/fe_utils/meson.build   |  2 +-
 src/fe_utils/psqlscan.l|  9 -
 src/tools/fix-old-flex-code.pl | 66 --
 src/tools/pgflex   | 13 ---
 13 files changed, 3 insertions(+), 122 deletions(-)
 delete mode 100644 src/tools/fix-old-flex-code.pl

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index eac3d001211..1278b7744f4 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -788,7 +788,6 @@ TAS = @TAS@
 %.c: %.l
$(FLEX) $(if $(FLEX_NO_BACKUP),-b) $(FLEXFLAGS) -o'$@' $<
@$(if $(FLEX_NO_BACKUP),if [ `wc -l &2; exit 1; 
fi)
-   $(if $(FLEX_FIX_WARNING),$(PERL) 
$(top_srcdir)/src/tools/fix-old-flex-code.pl '$@')
 
 %.c: %.y
$(if $(BISON_CHECK_CMD),$(BISON_CHECK_CMD))
diff --git a/src/backend/parser/Makefile b/src/backend/parser/Makefile
index 3162a01f302..8c0fe28d63f 100644
--- a/src/backend/parser/Makefile
+++ b/src/backend/parser/Makefile
@@ -59,7 +59,6 @@ gram.c: BISON_CHECK_CMD = $(PERL) $(srcdir)/check_keywords.pl 
$< $(top_srcdir)/s
 
 scan.c: FLEXFLAGS = -CF -p -p
 scan.c: FLEX_NO_BACKUP=yes
-scan.c: FLEX_FIX_WARNING=yes
 
 
 # Force these dependencies to be known even without dependency info built:
diff --git a/src/backend/parser/meson.build b/src/backend/parser/meson.build
index 4c3ca25dd49..874aa749aa6 100644
--- a/src/b

Re: Coccinelle for PostgreSQL development [4/N]: correcting palloc() use

2025-01-08 Thread Mats Kindahl

On Wed, Jan 8, 2025 at 10:02 PM Peter Eisentraut 
wrote:

> On 07.01.25 20:49, Mats Kindahl wrote:
> > This is the first example semantic patch and shows how to capture and
> > fix a common problem.
> >
> > If you use an palloc() to allocate memory for an object (or an array of
> > objects) and by mistake type something like:
> >
> >  StringInfoData *info = palloc(sizeof(StringInfoData*));
> >
> > You will not allocate enough memory for storing the object. This
> > semantic patch catches any cases where you are either allocating an
> > array of objects or a single object that do not have corret types in
> > this sense, more precisely, it captures assignments to a variable of
> > type T* where palloc() uses sizeof(T) either alone or with a single
> > expression (assuming this is an array count).
> >
> > The semantic patch is overzealous in the sense that allocation to a
> > "const char **" expects a "sizeof(const char *)" and it cannot deal with
> > typedefs that introduce aliases (these two can be seen in the patch).
> > Although the sizes of these are the same, and Coccinelle do not have a
> > good system for comparing types, it might be better to just follow the
> > convention of always using the type "T*" for any "palloc(sizeof(T))"
> > since it makes automated checking easier and is a small inconvenience;
> > especially considering that coccicheck can easily fix this for you. It
> > also simplifies other automated checking to follow this convention.
>
> I think this kind of thing is better addressed with a static analyzer or
> some heavier compiler annotations and warnings or something like that.
> That kind of tool would have the right level of information to decide
> whether some code is correct.  Having a text matching tool do it is just
> never going to be complete, and the problems you mention show that this
> tool fundamentally doesn't understand the code, and just knows how to
> parse the code a little better than grep or sed.
>

Thanks for the comments Peter.

On one hand, I agree with you. Coccinelle understands the *structure* of
the code (e.g., what is a type and what is an identifier, where functions
start and end, etc.), but not the *semantics* of the code (e.g., if two
types are different).

On the other hand, however, development practice is also about following
common coding patterns to avoid problems and improve readability. In this
sense, having strange situations in the code that just happen to work makes
it more difficult for developers as well as for automation tools. For
example, allocating memory for elements of sizeof(T**) and assigning it to
a T** variable will not be caught by something like ASAN (because the sizes
match), and cannot trigger a fault (again, because the sizes match), but
correcting this would still make the life easier for both developers and
static analysis tools.

That said, if your only concern is the "const int *" vs. "int *" thing (but
not the T** vs T* thing), it is possible to deal with this by either adding
Python code to the Coccinelle script to just remove all "const" since they
do not affect the size or detect that one of the types has "const" in it
and just ignore this situation and not suggest to patch it.

> I do like the idea of exploring this tool for facilitating code
> rewriting or semantic patching.  But I'm suspicious about using it for
> enforcing coding styles or patterns.
>

The case we're discussing here is one such case where I think that
enforcing a coding style might not be a good idea, but I think it was
valuable to raise the issue and get opinions. (This is why I mentioned the
problem with the patch being overzealous.)

The real value here is not specific patches though. It is having a tool
like Coccinelle for detecting and correcting problematic code as part of
the development practice since I think it would speed up development and
review and also improve the quality of the code.
-- 
Best wishes,
Mats Kindahl, Timescale

Re: [PATCH] Add get_bytes() and set_bytes() functions

2025-01-08 Thread Michael Paquier

On Fri, Oct 18, 2024 at 05:20:42PM +0300, Aleksander Alekseev wrote:
> Rebased, v5.

v5-0001 includes the following output:

--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -126,9 +126,12 @@ WHERE p1.oid < p2.oid AND
  p1.proretset != p2.proretset OR
  p1.provolatile != p2.provolatile OR
  p1.pronargs != p2.pronargs);
- oid | proname | oid | proname 
--+-+-+-
-(0 rows)
+ oid  | proname  | oid  | proname 
+--+--+--+-
+ 2405 | int2send | 8577 | bytea
+ 2407 | int4send | 8578 | bytea
+ 2409 | int8send | 8579 | bytea
+(3 rows)

This should not happen, as you are using multiple times the same
function.
--
Michael


signature.asc
Description: PGP signature

RE: Conflict detection for update_deleted in logical replication

2025-01-08 Thread Zhijie Hou (Fujitsu)

On Thursday, January 9, 2025 9:48 AM Masahiko Sawada 

Hi,

> 
> On Wed, Jan 8, 2025 at 3:00 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > On Wednesday, January 8, 2025 6:33 PM Masahiko Sawada
>  wrote:
> >
> > Hi,
> >
> > > On Wed, Jan 8, 2025 at 1:53 AM Amit Kapila 
> > > wrote:
> > > > On Wed, Jan 8, 2025 at 3:02 PM Masahiko Sawada
> > >  wrote:
> > > > >
> > > > > On Thu, Dec 19, 2024 at 11:11 PM Nisha Moond
> > >  wrote:
> > > > > >
> > > > > >
> > > > > > [3] Test with pgbench run on both publisher and subscriber.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Test setup:
> > > > > >
> > > > > > - Tests performed on pgHead + v16 patches
> > > > > >
> > > > > > - Created a pub-sub replication system.
> > > > > >
> > > > > > - Parameters for both instances were:
> > > > > >
> > > > > >
> > > > > >
> > > > > >share_buffers = 30GB
> > > > > >
> > > > > >min_wal_size = 10GB
> > > > > >
> > > > > >max_wal_size = 20GB
> > > > > >
> > > > > >autovacuum = false
> > > > >
> > > > > Since you disabled autovacuum on the subscriber, dead tuples
> > > > > created by non-hot updates are accumulated anyway regardless of
> > > > > detect_update_deleted setting, is that right?
> > > > >
> > > >
> > > > I think hot-pruning mechanism during the update operation will
> > > > remove dead tuples even when autovacuum is disabled.
> > >
> > > True, but why did it disable autovacuum? It seems that
> > > case1-2_setup.sh doesn't specify fillfactor, which makes hot-updates less
> likely to happen.
> >
> > IIUC, we disable autovacuum as a general practice in read-write tests
> > for stable TPS numbers.
>
...
> In test case 3, we observed a -53% performance dip, which is worse than the
> results of test case 5 with wal_receiver_status_interval = 100s. Given that
> in test case 5 with wal_receiver_status_interval = 100s we cannot remove dead
> tuples for the most of the whole 120s test time, probably we could not remove
> dead tuples for a long time also in test case 3. I expected that the apply
> worker gets remote transaction XIDs and tries to advance slot.xmin more
> frequently, so this performance dip surprised me.
 
As noted in my previous email[1], the delay primarily occurs during the final
phase (RCI_WAIT_FOR_LOCAL_FLUSH), where we wait for concurrent transactions
from the publisher to be applied and flushed locally (e.g., last_flushpos <
data->remote_lsn). I think that the interval between each transaction ID
advancement is brief, the duration of each advancement itself is significant.
 
> I would like to know how many times the apply worker gets remote transaction
> XIDs and succeeds in advance slot.xmin during the test.
 
my colleague will collect and share the data soon.

[1] 
https://www.postgresql.org/message-id/OS0PR01MB57164C9A65F29875AE63F0BD94132%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Best Regards,
Hou zj

Re: Conflict detection for update_deleted in logical replication

2025-01-08 Thread Amit Kapila

On Wed, Jan 8, 2025 at 2:24 PM Amit Kapila  wrote:
>
> On Wed, Jan 8, 2025 at 2:15 PM Masahiko Sawada  wrote:
> >
> > On Tue, Jan 7, 2025 at 2:49 AM Amit Kapila  wrote:
> > > >
> > > > We thought of another approach, which is to create/drop this slot first 
> > > > as
> > > > soon as one enables/disables detect_update_deleted (E.g. create/drop 
> > > > slot
> > > > during DDL). But it seems complicate to control the concurrent slot
> > > > create/drop. For example, if one backend A enables 
> > > > detect_update_deteled, it
> > > > will create a slot. But if another backend B is disabling the
> > > > detect_update_deteled at the same time, then the newly created slot may 
> > > > be
> > > > dropped by backend B. I thought about checking the number of 
> > > > subscriptions that
> > > > enables detect_update_deteled before dropping the slot in backend B, 
> > > > but the
> > > > subscription changes caused by backend A may not visable yet (e.g. not
> > > > committed yet).
> > > >
> > >
> > > This means that for the transaction whose changes are not yet visible,
> > > we may have already created the slot and the backend B would end up
> > > dropping it. Is it possible that during the change of this new option
> > > via DDL, we take AccessExclusiveLock on pg_subscription as we do in
> > > DropSubscription() to ensure that concurrent transactions can't drop
> > > the slot? Will that help in solving the above scenario?
> >
> > If we create/stop the slot during DDL, how do we support rollback DDLs?
> >
>
> We will prevent changing this setting in a transaction block as we
> already do for slot related case. See use of
> PreventInTransactionBlock() in subscriptioncmds.c.
>

On further thinking, even if we prevent this command in a transaction
block, there is still a small chance of rollback. Say, we created the
slot as the last operation after making database changes, but still,
the transaction can fail in the commit code path. So, it is still not
bulletproof. However, we already create a remote_slot at the end of
CREATE SUBSCRIPTION, so, if by any chance the transaction fails in the
commit code path, we will end up having a dangling slot on the remote
node. The same can happen in the DROP SUBSCRIPTION code path as well.
We can follow that or the other option is to allow creation of the
slot by the backend and let drop be handled by the launcher which can
even take care of dangling slots. However, I feel it will be better to
give the responsibility to the launcher for creating and dropping the
slot as the patch is doing and use the FullTransactionId for each
worker. What do you think?

-- 
With Regards,
Amit Kapila.

Re: Psql meta-command conninfo+

2025-01-08 Thread Hunaid Sohail

Hi,

After looking at this ever more today, I think "Server Parameter Settings"
> is confusing as well. I think "Connection Status" instead of
> "Current Status" as is defined in v36 will work better.
> This way we will have "Connection Info" and "Connection Status".
> Connection Status will reflect the values of specific parameters
> that the server reports.
>

Noted.


> Including all the parameters in [1] under
> "Server Parameter Settings" (or "Connection Status")
> seems like the easy choice here. Some may not be as useful as
> others, but I don't think we should pick and choose either.
> Maybe someone else has other thoughts about this?
>

Sure, let's wait for others' opinions.


> We can include role by marking the "role" guc with
> the GUC_REPORT flag in guc_tables.c. I really think
> without it, the is_superuser field will be incomplete.
> This is because either "role" or "session authorization"
> will change the is_superuser.
>

Thanks! I will take a look.


> A thought also, that if we do choose to report all the parameters
> in [1], it should be coded in a more dynamic way. Maybe loop
> through the conn->pstatus list? For example I see "search_path"
> will be added to the list in the next release.
>

If we loop through conn->pstatus, we will be bypassing the official API.
The list is of type pgParameterStatus, which is an internal struct defined
in libpq-int.h.
As the file's header warns, including this file can lead to issues. Or am I
missing something?

Regards,
Hunaid Sohail

Re: Fix bank selection logic in SLRU

2025-01-08 Thread Alvaro Herrera

On 2024-Dec-27, Andrey Borodin wrote:

> > On 19 Dec 2024, at 20:48, Yura Sokolov  wrote:
> > 
> > Here's version with type change bits16 -> uint16
> 
> Thanks! This version looks good to me. I’ll mark the CF entry as RfC.

Thank you, I have pushed this.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Once again, thank you and all of the developers for your hard work on
PostgreSQL.  This is by far the most pleasant management experience of
any database I've worked on." (Dan Harris)
http://archives.postgresql.org/pgsql-performance/2006-04/msg00247.php

Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

2025-01-08 Thread Shubham Khanna

On Mon, Jan 6, 2025 at 3:22 AM Peter Smith  wrote:
>
> Hi Shubham.
>
> The patch v6-0001 LGTM.
>
> OTOH, if you want to be picky, the docs wording could be slightly
> modified to be more consistent with the coded warning message.
>
> CURRENT
> Replication failures can occur if required WAL files are prematurely
> deleted. To prevent this, the source server must set  linkend="guc-max-slot-wal-keep-size"/> to -1,
> ensuring WAL files are not automatically removed.
>
> SUGGESTION
> Replication failures can occur if required WAL files are missing. To
> prevent this, the source server must set  linkend="guc-max-slot-wal-keep-size"/> to -1 to
> ensure that required WAL files are not prematurely removed.
>
> ~~~
>
> See the attached NITPICKS diff if you want to make this change.
>

I have used your suggestion and updated the 'pg_createsubscriber'
documentation accordingly.
The attached Patch contains the suggested change.

Thanks and regards,
Shubham Khanna.


v7-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patch
Description: Binary data

Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

2025-01-08 Thread Shubham Khanna

On Mon, Jan 6, 2025 at 7:59 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shubham,
>
> Thanks for creating a patch. I have one comment about it.
>
> check_publisher() assumed that the SQL function 
> `pg_catalog.current_setting('max_slot_wal_keep_size')`
> will return the numeric, but it just return the text representation. I.e., if 
> the parameter is
> set to 10MB, the function returns like below:
>
> ```
> postgres=# SELECT * FROM pg_catalog.current_setting('max_slot_wal_keep_size');
>  current_setting
> -
>  10MB
> (1 row)
> ```
>
> Your patch can work well because atoi() ignores the latter part of the string,
> e.g., "10MB" is converted to "10", but this is not clean. I suggest either of
> 1) accepting the value as the string, or 2) using an SQL function 
> pg_size_bytes()
> to get max_slot_wal_keep_size.
>

I have fixed the given comment. The v7 version Patch attached at [1]
has the changes for the same.
[1] - 
https://www.postgresql.org/message-id/CAHv8Rj%2BSxmBgz-vw5f7meRzEh6pgp9YhAjL5BPzTpevu31rY7Q%40mail.gmail.com

Thanks and regards,
Shubham Khanna.

Orphaned users in PG16 and above can only be managed by Superusers

2025-01-08 Thread Ashutosh Sharma

Hi All,

Starting from PG16, it seems that orphaned users can only be managed
by superusers. For example, if userA creates userB, and userB creates
userC, then both userB (the parent of userC) and userA (the
grandparent of userC) would typically have the ability to
manage/administer userC. However, if userB is dropped, userA (the
grandparent of userC) loses the ability to administer userC as well.
This leads to a situation where only superusers can manage userC.

Shouldn't userA retain the permission to manage userC even if userB is
removed? Otherwise, only superusers would have the authority to
administer userC (the orphaned user in this case), which may not be
feasible for cloud environments where superuser access is restricted.

--
With Regards,
Ashutosh Sharma.

Re: Several buildfarm animals fail tests because of shared memory error

2025-01-08 Thread Alexander Lakhin


Hello Robins,

22.12.2024 09:27, Robins Tharakan wrote:
- The only info about leafhopper may be relevant is that it's one of the newest machines (Graviton4) so it comes with 
a recent hardware / kernel / stock gcc 11.4.1.




Could you please take a look at leafhopper. which is producing weird test
failures rather often? For example,
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2024-12-16%2023%3A43%3A03
 - REL_15_STABLE
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2024-12-21%2022%3A18%3A04
 - REL_16_STABLE
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2025-01-02%2009%3A21%3A04
 - REL_17_STABLE

--- /home/bf/proj/bf/build-farm-17/REL_16_STABLE/pgsql.build/src/test/regress/expected/select_parallel.out 2024-12-21 
22:18:03.844773742 +
+++ /home/bf/proj/bf/build-farm-17/REL_16_STABLE/pgsql.build/src/test/recovery/tmp_check/results/select_parallel.out 
2024-12-21 22:23:28.264849796 +

@@ -551,7 +551,7 @@
    ->  Nested Loop (actual rows=98000 loops=1)
  ->  Seq Scan on tenk2 (actual rows=10 loops=1)
    Filter: (thousand = 0)
-   Rows Removed by Filter: 9990
+   Rows Removed by Filter: 9395


Or:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2024-12-18%2023%3A35%3A04
 - master
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2025-01-02%2009%3A22%3A04
 - master
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=leafhopper&dt=2025-01-08%2007%3A38%3A03
 - master
#   Failed test 'regression tests pass'
#   at t/027_stream_regress.pl line 95.
#  got: '256'
# expected: '0'
# Looks like you failed 1 test of 9.
[23:42:59] t/027_stream_regress.pl ...
...
diff -U3 /home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/regress/expected/memoize.out 
/home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/recovery/tmp_check/results/memoize.out
--- /home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/regress/expected/memoize.out 2024-12-18 23:35:04.318987642 
+
+++ /home/bf/proj/bf/build-farm-17/HEAD/pgsql.build/src/test/recovery/tmp_check/results/memoize.out 2024-12-18 
23:42:24.806028990 +

@@ -179,7 +179,7 @@
    Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0  Memory 
Usage: NkB
    ->  Seq Scan on tenk1 t2 (actual rows=1 loops=N)
  Filter: ((t1.twenty = unique1) AND (t1.two = two))
- Rows Removed by Filter: 
+ Rows Removed by Filter: 9775
 (12 rows)


Maybe you could try to reproduce such failures without buildfarm client, just
by running select_parallel, for example, with the attached patch applied.
I mean running `make check` with parallel_schedule like:
...
# --
# Run these alone so they don't run out of parallel workers
# select_parallel depends on create_misc
# --
test: select_parallel
test: select_parallel
test: select_parallel

(e.g. with 100 repetitions)

Or
TESTS="test_setup copy create_misc create_index $(printf "select_parallel %.0s" 
{1..100})" make check-tests


Best regards,
Alexander Lakhin
Neon (https://neon.tech)diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index a809036453..4398d3b17e 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -1,3 +1,6 @@
+SET client_min_messages TO 'warning';
+DROP FUNCTION IF EXISTS sp_parallel_restricted CASCADE;
+RESET client_min_messages;
 --
 -- PARALLEL
 --
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 71a75bc86e..4e5a7471e1 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -1,3 +1,7 @@
+SET client_min_messages TO 'warning';
+DROP FUNCTION IF EXISTS sp_parallel_restricted CASCADE;
+RESET client_min_messages;
+
 --
 -- PARALLEL
 --

Re: ecpg command does not warn COPY ... FROM STDIN;

2025-01-08 Thread Fujii Masao





On 2025/01/09 0:42, Tom Lane wrote:

Fujii Masao  writes:

On 2025/01/08 23:04, Ryo Kanbayashi wrote:
But it is not working.
ecpg command fails to notice though code like above exits on pgc code.



This issue seems to have been introduced in commit 3d009e45bd.


Indeed :-(


As for COPY FROM STDOUT, while it's possible, mentioning it might be
confusing since it's not official syntax. So I'm not sure if this is
good idea.


There's another problem: the correct syntax is COPY TO STDOUT,
and that won't trigger this warning either.


ISTM that ecpg supports COPY TO STDOUT and includes the regression test 
"copystdout" for it. No?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: pgindent exit status if a file encounters an error

2025-01-08 Thread Andrew Dunstan

On 2024-06-28 Fr 8:35 AM, Ashutosh Bapat wrote:

On Wed, Jun 26, 2024 at 8:54 PM Tom Lane  wrote:

Ashutosh Bapat  writes:
> The usage help mentions exit code 2 specifically while
describing --check
> option but it doesn't mention exit code 1. Neither does the
README. So I
> don't think we need to document exit code 3 anywhere. Please let
me know if
> you think otherwise.

I think we should have at least a code comment summarizing the
possible exit codes, along the lines of

# Exit codes:
#   0 -- all OK
#   1 -- could not invoke pgindent, nothing done
#   2 -- --check mode and at least one file requires changes
#   3 -- pgindent failed on at least one file

Thanks. Here's a patch with these lines.
In an offline chat Andrew mentioned that he expects more changes in 
the patch and he would take care of those. Will review and test his patch.

I forget now what I was intending here, so I have committed this with 
minor tweaks.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: Virtual generated columns

2025-01-08 Thread Peter Eisentraut


On 04.12.24 05:55, jian he wrote:

On Fri, Nov 29, 2024 at 6:13 PM Peter Eisentraut  wrote:


- Added support for virtual columns in trigger column lists.  (For that,
I renamed ExecInitStoredGenerated() to ExecInitGenerated(), which
handles the computation of ri_extraUpdatedCols.)



why not duplicate some code from ExecInitStoredGenerated to
ExecGetExtraUpdatedCols?


This answers itself: I'd rather not duplicate code.  I don't see that as 
an improvement.



* now the expression is that something initiated for the virtual
generated column. which may not be necessary for virtual columns.
let's make ResultRelInfo->ri_GeneratedExprsI,
ResultRelInfo->ri_GeneratedExprsU be NULL for virtual columns.

currently it may look like this:
(gdb) p resultRelInfo->ri_GeneratedExprsU
$20 = (ExprState **) 0x34f9638
(gdb) p resultRelInfo->ri_GeneratedExprsU[0]
$21 = (ExprState *) 0x0
(gdb) p resultRelInfo->ri_GeneratedExprsU[1]
$22 = (ExprState *) 0x0
(gdb) p resultRelInfo->ri_GeneratedExprsU[2]
$23 = (ExprState *) 0x40


I have fixed that in v11.


* ExecInitStoredGenerated main used in ExecComputeStoredGenerated.
* we also need to slightly change ExecInitGenerated's comments.


also fixed


* in InitResultRelInfo, do we need explicit set ri_Generated_valid to false?


Doesn't seem necessary.  The struct is initialized to zero at the beginning.

Re: Virtual generated columns

2025-01-08 Thread Peter Eisentraut


On 11.12.24 07:49, jian he wrote:

On Fri, Nov 29, 2024 at 6:01 PM Peter Eisentraut  wrote:

The purpose of check_modified_virtual_generated() for trigger functions
written in C.  The prevent someone from inserting real values into the
trigger tuples, because they would then be processed by the rest of the
system, which would be incorrect.

Higher-level languages such as plpgsql should handle that themselves, by
preventing setting generated columns in trigger functions.  The presence
of check_modified_virtual_generated() is still a backstop for those, but
shouldn't really be necessary.


please check the attached patch.
* remove check_modified_virtual_generated.
* using heap_modify_tuple_by_cols in ExecBRInsertTriggers, ExecBRUpdateTriggers
to overwrite virtual generated columns value to null.

and it's not complicated.
so that trigger behavior for stored and virtual will be more aligned


I have integrated that into v11.  I agree it's not complicated and it's 
better to keep the behavior aligned.


I kept the function check_modified_virtual_generated() but now it just 
modifies the tuple, using your code, instead of erroring.  That avoids 
having to write the same code twice.


I don't understand the purpose of the change in pl_exec.c.

Re: Virtual generated columns

2025-01-08 Thread Peter Eisentraut


On 16.12.24 15:34, jian he wrote:

hi. some minor issues...


 SET EXPRESSION AS
 
  
   This form replaces the expression of a generated column.  Existing data
   in the column is rewritten and all the future changes will apply the new
   generation expression.
  
 

the second sentence seems not to apply to a virtual generated column?


Tweaked the wording in v11.


doc/src/sgml/ref/alter_table.sgml
seems does not explicitly mention the difference of
ALTER TABLE tp ALTER COLUMN b SET EXPRESSION AS (a * 3);
ALTER TABLE ONLY tp ALTER COLUMN b SET EXPRESSION AS (a * 3);
?
the first one will recurse to the child tables and replace any
generated expression in the child table
for the to be altered column, the latter won't.


This is implied by the general meaning of the ONLY clause in ALTER 
TABLE.  This applies to all ALTER TABLE actions.  Is there anything we 
need to explain specifically for this action?



CheckAttributeType, we can change it to
<<<
else if (att_typtype == TYPTYPE_DOMAIN)
{
 if ((flags & CHKATYPE_IS_VIRTUAL) && DomainHasConstraints(atttypid))
 ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("virtual generated column \"%s\" cannot
have a domain type", attname)));
}
<<<
so we can support the domain without any constraints for now.
(I don't have a huge opinion though).


I don't think this would be correct, since constraints could be added to 
the domain later.



ALTER COLUMN SET NOT NULL, if already not-null, then it will become a no-op.
Similarly if old and new generated expressions are the same,
ATExecSetExpression can return  InvalidObjectAddress, making it a no-op.

For example, in ATExecSetExpression, can we make the following ALTER
TABLE a no-op?
CREATE TABLE gtest20 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 3) VIRTUAL );
ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3);

since ATExecSetExpression is not recursive,
Each input argument (AlteredTableInfo *tab) is separated for
partitioned tables and partitions.
so does AlteredTableInfo->newvals, AlteredTableInfo->rewrite information.
so for no-op ATExecSetExpression return InvalidObjectAddress
will also work for partitioned tables, inheritance.


I don't know why we would need to make this a no-op.  I mean, we also 
don't make UPDATE ... SET x = x a no-op.



attached file trying to do that. While testing it,
I found out there is no test case for ALTER COLUMN SET EXPRESSION
for inheritance cases. even though it works.
in src/test/regress/sql/generated_virtual.sql
after line 161, we can add following tests:

<<<
ALTER TABLE ONLY gtest1 ALTER COLUMN b SET EXPRESSION AS (a * 10);
select tableoid::regclass, * from gtest1;
ALTER TABLE gtest1 ALTER COLUMN b SET EXPRESSION AS (a * 100);
select tableoid::regclass, * from gtest1;
<<<


There was already a test for this:

+-- alter only parent's and one child's generation expression
+ALTER TABLE ONLY gtest_parent ALTER COLUMN f3 SET EXPRESSION AS (f2 * 4);
+ALTER TABLE gtest_child ALTER COLUMN f3 SET EXPRESSION AS (f2 * 10);

Is there anything you think this doesn't cover?

Re: Virtual generated columns

2025-01-08 Thread Peter Eisentraut


On 03.12.24 15:15, jian he wrote:

-- check constraints
CREATE TABLE gtest20 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 2) VIRTUAL CHECK (b < 50));
INSERT INTO gtest20 (a) VALUES (10);  -- ok
INSERT INTO gtest20 (a) VALUES (30);  -- violates constraint

ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 100);  --
violates constraint
ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3);  -- ok
-
The above content is in src/test/regress/sql/generated_virtual.sql,
the last two query comments
seem to conflict with the error message for now.


Fixed the comment in the test in patch v11.


i add some regress tests for your v10 changes in
src/backend/commands/statscmds.c.
please check attached.


Added to patch v11.


the sql tests,
"sanity check of system catalog" maybe place it to the end of the sql
file will have better chance of catching some error.
for virtual, we can also check attnotnull, atthasdef value.
like:
SELECT attrelid, attname, attgenerated FROM pg_attribute WHERE
attgenerated IN ('v') and (attnotnull or not atthasdef);


I moved the existing check to the bottom, as you suggest.

I don't understand what the purpose of testing attnotnull is.  That is 
independent of attgenerated, I think.

Re: ecpg command does not warn COPY ... FROM STDIN;

2025-01-08 Thread Tom Lane

Fujii Masao  writes:
> On 2025/01/09 0:42, Tom Lane wrote:
>> There's another problem: the correct syntax is COPY TO STDOUT,
>> and that won't trigger this warning either.

> ISTM that ecpg supports COPY TO STDOUT and includes the regression test 
> "copystdout" for it. No?

Oh right.  (Pokes at it...)  It looks like the backend accepts
"FROM STDOUT" as a synonym for "FROM STDIN", so that's why this
is checking for both spellings.  But I agree it'd be better
for the error message to only use the standard spelling.

Also I tried

regression=# copy int4_tbl from program stdin;
ERROR:  STDIN/STDOUT not allowed with PROGRAM

So we probably don't need to bother with adjusting the check
to allow that.

regards, tom lane

Re: use a non-locking initial test in TAS_SPIN on AArch64

2025-01-08 Thread Nathan Bossart

On Tue, Oct 22, 2024 at 02:54:57PM -0500, Nathan Bossart wrote:
> My colleague Salvatore Dipietro (CC'd) sent me a couple of profiles that
> showed an enormous amount of s_lock() time going to the
> __sync_lock_test_and_set() call in the AArch64 implementation of tas().
> Upon closer inspection, I noticed that we don't implement a custom
> TAS_SPIN() for this architecture, so I quickly hacked together the attached
> patch and ran a couple of benchmarks that stressed the spinlock code.  I
> found no discussion about TAS_SPIN() on ARM in the archives, but I did
> notice that the initial AArch64 support was added [0] before x86_64 started
> using a non-locking test [1].
> 
> These benchmarks are for a c8g.24xlarge running a select-only pgbench with
> 256 clients and pg_stat_statements.track_planning enabled.
> 
> without the patch:
>
> [...] 
> 
> tps = 74135.100891 (without initial connection time)
> 
> with the patch:
>
> [...] 
> 
> tps = 549462.785554 (without initial connection time)

Are there any objections to proceeding with this change?  So far, it's been
tested on a c8g.24xlarge and an Apple M3 (which seems to be too small to
show any effect).  If anyone has access to a larger ARM machine, additional
testing would be greatly appreciated.  I think it would be unfortunate if
this slipped to v19.

-- 
nathan

Re: doc: Mention clock synchronization recommendation for hot_standby_feedback

2025-01-08 Thread Jakub Wartak

On Wed, Dec 18, 2024 at 10:33 AM Amit Kapila  wrote:

Hi Amit!

> On Thu, Dec 5, 2024 at 3:14 PM Jakub Wartak
>  wrote:
> >
> > One of our customers ran into a very odd case, where hot standby feedback 
> > backend_xmin propagation stopped working due to major (hours/days) clock 
> > time shifts on hypervisor-managed VMs. This happens (and is fully 
> > reproducible) e.g. in scenarios where standby connects and its own VM is 
> > having time from the future (relative to primary) and then that time goes 
> > back to "normal". In such situation "sends hot_standby_feedback xmin" 
> > timestamp messages are stopped being transferred, e.g.:
> >
> > 2024-12-05 02:02:35 UTC [6002]: db=,user=,app=,client= DEBUG:  sending hot 
> > standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0
> > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG:  sending 
> > write 6/E9015230 flush 6/E9015230 apply 6/E9015230
> > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG:  sending hot 
> > standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0
> > <-- clock readjustment and no further "sending hot standby feedback"
> ...
> >
> > I can share reproduction steps if anyone is interested. This basically 
> > happens due to usage of TimestampDifferenceExceeds() in 
> > XLogWalRcvSendHSFeedback(), but I bet there are other similiar scenarios.
> >
>
> We started to use a different mechanism in HEAD. See 
> XLogWalRcvSendHSFeedback().

Yes, you are correct somewhat because I was looking on REL13_STABLE,
but I've taken a fresh quick look at 05a7be93558 and tested it too.
Sadly, PG17 still maintains the same behavior of lack of proper
backend_xmin propagation (it stops sending hot standby feedback once
time on standby jumps forward). I believe this is the case because
walreceiver schedules next wakeup in far far future (when the clock /
now() is way ahead, see WalRcvComputeNextWakeup()), so when the clock
is back to normal (resetted back -X hours/days), the next wakeup seems
to be +X hours/days ahead.

> > What I was kind of surprised about was the lack of recommendation for 
> > having primary/standby to have clocks synced when using 
> > hot_standby_feedback, but such a thing is mentioned for 
> > recovery_min_apply_delay. So I would like to add at least one sentence to 
> > hot_standby_feedback to warn about this too, patch attached.
> >
>
> IIUC, this issue doesn't occur because the primary and standby clocks
> are not synchronized. It happened because the clock on standby moved
> backward.

In PG17 it would be because the clock moved way forward too much on
the standby. I don't know how it happened to that customer, but it was
probably done somehow by the hypervisor in that scenario (so time
wasn't slewed improperly by ntpd AFAIR, edge case, I know...)

> This is quite unlike the 'recovery_min_apply_delay' where
> non-synchronization of clocks between primary and standby can lead to
> unexpected results. This is because we don't compare any time on the
> primary with the time on standby. If this understanding is correct
> then the wording proposed by your patch should be changed accordingly.

.. if my understanding is correct, it is both depending on version :^)
I was thinking about backpatching docs (of what is the recommended
policy here? to just update new-release docs?), so I'm proposing
something more generic than earlier, but it takes Your point into
account - would something like below be good enough?

-   
-Using this option requires the primary and standby(s) to have system
-clocks synchronized, otherwise it may lead to prolonged risk of not
-removing dead rows on primary for extended periods of time as the
-feedback mechanism is based on timestamps exchanged between primary
-and standby(s).
-   

+   
+Using this option requires the primary and standby(s) to have system
+clocks synchronized (without big time jumps), otherwise it may lead to
+prolonged risk of not removing dead rows on primary for
extended periods
+of time as the feedback mechanism implementation is timestamp based.
+   

-J.

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-08 Thread Heikki Linnakangas


On 08/01/2025 04:10, Andres Freund wrote:

I instead opted to somewhat rearrange the shutdown sequence:

 This commit changes the shutdown sequence so checkpointer is signalled to
 trigger writing the shutdown checkpoint without terminating it. Once
 checkpointer wrote the checkpoint it will wait for a termination
 signal.

 Postmaster now triggers the shutdown checkpoint where we previously did so 
by
 terminating checkpointer. Checkpointer is terminated after all other 
children
 have been terminated, tracked using the new PM_SHUTDOWN_CHECKPOINTER 
PMState.

 In addition, the existing PM_SHUTDOWN_2 state is renamed to
 PM_XLOG_IS_SHUTDOWN, that seems a lot more descriptive.


Sounds good.


This patch is not fully polished, there are a few details I'm not sure about:

- Previously the shutdown checkpoint and process exit were triggered from
   HandleCheckpointerInterrupts().  I could have continued doing so in the
   patch, but it seemed quite weird to have a wait loop in a function called
   HandleCheckpointerInterrupts().

   Instead I made the main loop in CheckpointerMain() terminate if checkpointer
   was asked to write the shutdown checkpoint or exit

- In process_pm_pmsignal() the code now triggers a PostmasterStateMachine() if
   checkpointer signalled ShutdownXLOG having completed. We didn't really have
   a need for that before. process_pm_child_exit() does call
   PostmasterStateMachine(), but somehow calling it in process_pm_pmsignal()
   feels slightly off


Looks good to me.


- I don't love the naming of the various PMState values (existing and new),
   but a larger renaming should probably be done separately?


We currently have PM_WAIT_BACKENDS and PM_WAIT_DEAD_END. Maybe follow 
that example and name all the shutdown states after what you're waiting for:


PM_WAIT_BACKENDS, /* waiting for live backends to exit */
PM_WAIT_XLOG_SHUTDOWN, /* waiting for checkpointer to do shutdown 
ckpt */

PM_WAIT_XLOG_ARCHIVAL, /* waiting for archiver and walsenders to
  finish */
PM_WAIT_DEAD_END,  /* waiting for dead-end children to exit */
PM_WAIT_CHECKPOINTER,  /* waiting for checkpointer to shut down */
PM_NO_CHILDREN,/* all important children have exited */


- There's logging in ShutdownXLOG() that's only triggered if called from
   checkpointer. Seems kinda odd - shouldn't we just move that to
   checkpointer.c?


What logging do you mean? The "LOG: shutting down" message? No 
objections to moving it, although it doesn't bother me either way.



The first two patches are just logging improvements that I found helpful to
write this patch:

0001: Instead of modifying pmState directly, do so via a new function
   UpdatePMState(), which logs the state transition at DEBUG2.

   Requires a helper to stringify PMState.

   I've written one-off versions of this patch quite a few times. Without
   knowing in what state postmaster is in, it's quite hard to debug the
   state machine.


+1


elog(DEBUG1, "updating PMState from %d/%s to %d/%s",
 pmState, pmstate_name(pmState), newState, 
pmstate_name(newState));


I think the state names are enough, no need to print the numerical values.


0002: Make logging of postmaster signalling child processes more consistent

   I found it somewhat hard to understand what's happening during state
   changes without being able to see the signals being sent. While we did
   have logging in SignalChildren(), we didn't in signal_child(), and most
   signals that are important for the shutdown sequence are sent via
   signal_child().


+1

I don't think the cases where DEBUG2 or DEBUG4 are used currently are 
very well thought out. To save a few lines of code, I'd be happy with 
merging signal_child_ext() and signal_child() and just always use DEBUG2 
or DEBUG4. Or DEBUG3 as a compromise :-).



The next is a small prerequisite:

0003: postmaster: Introduce variadic btmask_all_except()

   I proposed this in another thread, where I also needed
   btmask_all_except3() but then removed the only call to
   btmask_all_except2(). Instead of adding/removing code, it seems better
   to just use a variadic function.


Nice. A variadic btmask_add() might be handy too.


And then 0004, the reason for this thread.


Overall, looks good to me.

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Moving the vacuum GUCs' docs out of the Client Connection Defaults section

2025-01-08 Thread Peter Eisentraut


On 07.01.25 18:31, Melanie Plageman wrote:

On Tue, Jan 7, 2025 at 12:15 PM Melanie Plageman
 wrote:


Cool, I've attached a patch to do this. I left a few of the GUCs under
Resource Consumption (like autovacuum_work_mem and
vacuum_buffer_usage_limit) where they are because it seemed
appropriate.

This is my first docs patch that introduces new sections and such, so
I'm not sure I got the indentation 100% correct (I, of course, tried
to follow conventions).


Oh, one thing I forgot to say. Though I increased the indentation of
some of the subsections that I moved, I didn't rewrap the lines
because they were already not wrapped to 78. I can do this, but I
can't tell from the existing docs what text width the paragraphs of
text are supposed to be wrapped to.


For a patch that moves things around like this, I would leave everything 
as is and not rewrap.  That makes it easier to follow what's going on 
with "git diff -w", "git show -w" etc.


In .dir-locals.el, there is this configuration for Emacs:

 (nxml-mode . ((fill-column . 78)
   (indent-tabs-mode . nil)))

So that's one data point about what the line length should be.

Re: Non-text mode for pg_dumpall

2025-01-08 Thread Mahendra Singh Thalor

Hi all,

On Wed, 8 Jan 2025 at 00:34, Mahendra Singh Thalor 
wrote:
>
> On Mon, 6 Jan 2025 at 23:05, Nathan Bossart 
wrote:
> >
> > On Thu, Jan 02, 2025 at 02:05:13AM +0530, Mahendra Singh Thalor wrote:
> > > Here, I am attaching an updated patch. I fixed some bugs of v01 patch
and
> > > did some code cleanup also.
> >
> > Thank you for picking this up!  I started to review it, but the
> > documentation changes didn't build, and a few tests in check-world are
> > failing.  Would you mind resolving those issues?  Also, if you haven't
> > already, please add an entry to the next commitfest [0] to ensure that
1)
> > this feature is tracked and 2) the automated tests will run.
>
> Thanks Nathan for the quick response.
>
> I fixed bugs of documentation changes and check-world in the latest
patch. Now docs are building and check-world is passing.
>
> I added entry into commitfest for this patch.[0]
>
> >
> > +   if (dbfile)
> > +   {
> > +   printfPQExpBuffer(&cmd, "\"%s\" %s %s", pg_dump_bin,
> > + dbfile, create_opts);
> > +   appendPQExpBufferStr(&cmd, " -F d ");
> > +   }
> >
> > Have you given any thought to allowing a directory of custom format
files,
> > as discussed upthread [1]?  Perhaps that is better handled as a
follow-up
> > patch, but it'd be good to understand the plan, anyway.
>
> I will make these changes and will test. I will update my findings after
doing some testing.

In the latest patch, I added dump and restoring for
directory/custom/tar/plain formats. Please consider this patch for review
and testing.

*Design*:
When we give --format=d|c|t then we are dumping all global sql commands in
global.dat in plain sql format and we are making a map.dat file with dbname
and dboid. For each database, we are making separate subdirectory with
dboid under databases directory and dumping as per archive format(d|c|t).
While restoring, first we are restoring all global sql commands from
global.dat and then we are restoring one by one all databases.  As we are
supporting --exclude-database with pg_dumpall, the same we are supporting
with pg_restore also to skip restoring on some specified database patterns.
If we want to restore a single database, then we can specided particular
subdirectory from the databases folder. To get file name, we refer
dbname into map.file.

*TODO*: Now I will work on test cases for these new added options to the
pg_dumpall and pg_restore.

Here, I am attaching the v04 patch for testing and review.

>
> Apart from these bugs, I added code to handle --exclude-database=
PATTERN. Earlier I was using NAME only to skip databases for restore.
>
> TODO: .pl test cases for new added options.
>
> Here, I am attaching an updated patch for review and feedback.
>
> >
> > [0] https://commitfest.postgresql.org
> > [1]
https://postgr.es/m/CABUevExoQ26jo%2BaQ9QZq%2BUMA1aD6gfpm9xBnh_t5e0DhaCeRYA%40mail.gmail.com
> >
> > --
> > nathan
>
> --
> Thanks and Regards
> Mahendra Singh Thalor
> EnterpriseDB: http://www.enterprisedb.com

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

v04-pg_dumpall-with-directory-format-and-restore-08_jan.patch
Description: Binary data

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

2025-01-08 Thread Bertrand Drouvot

Hi,

On Fri, Jan 03, 2025 at 12:30:14PM -0600, Sami Imseih wrote:
> While reviewing patch [1], I realized that the DEBUG1 message
> for CREATE INDEX could do better in providing information
> about parallel workers launched. Currently, the message just
> shows how many workers are planned, but a user may want
> to ensure that they have the appropriate number of workers
> launched as well when they enable DEBUG1 logging.

Yeah, one could see how many workers are currently running but would not be
able to figure out once the index is created. So adding the info in the log
makes sense to me.

> I prepared a simple patch, attached, for this. The log message matches
> the format
> used in VACUUM VERBOSE ( for consistency sake ).

A few random comments:

=== 1

s/parallel vacuum workers for index vacuuming/parallel workers for index 
creation/?

(2 times)

=== 2

-  (errmsg_internal("building index \"%s\" on table \"%s\" with request for %d 
parallel workers",
+  (errmsg_internal("building index \"%s\" on table \"%s\"",

I'd add "in parallel" to match its counterpart "serially" above. That would
make it more clear in case one just look for "building index" in the log.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: ecpg command does not warn COPY ... FROM STDIN;

2025-01-08 Thread Ryo Kanbayashi

Kuroda-san, thank to your comment.

> > I found a code validation bug in master branch.
> >
> > Now, ecpg does not support 'EXEC SQL COPY ... FROM STDIN ... ;' and
> > code for warning it exits.
> >
> > https://github.com/postgres/postgres/blob/7b27f5fd36cb3270e8ac25aefd73b55
> > 2663d1392/src/interfaces/ecpg/preproc/ecpg.addons#L242-L245
> > ---
> > ECPG: addon CopyStmt COPY opt_binary qualified_name opt_column_list
> > copy_from opt_program copy_file_name copy_delimiter opt_with
> > copy_options where_clause
> > if (strcmp(@6, "from") == 0 &&
> > (strcmp(@7, "stdin") == 0 || strcmp(@7, "stdout") == 0))
> > mmerror(PARSE_ERROR, ET_WARNING, "COPY FROM STDIN is not
> > implemented");
> > ---
> >
> > But it is not working.
> > ecpg command fails to notice though code like above exits on pgc code.
>
> Good catch. I have a comment about the fix.
>
> The parser accepts a statement like "COPY ... FROM STDOUT", and ISTM it is 
> not either
> implemented yet. However, the warning message only mentions STDIN case even 
> STDOUT
> is specified.
>
> EXEC SQL COPY foo FROM STDOUT;
> ->
> WARNING: COPY FROM STDIN is not implemented
>
> I feel we can change like "COPY FROM STDIN/STDOUT...", or prepare two 
> messages.
> Thought?

I think your proposed fix is better too.
So, I modified the patch.

With new patch, warning message is changed like below :)

ryo@DESKTOP-IOASPN6:~/work/postgres/src$ ../master/bin/ecpg
copy_from_should_be_warned.pgc
copy_from_should_be_warned.pgc:24: WARNING: COPY FROM STDIN/STDOUT is
not implemented
ryo@DESKTOP-IOASPN6:~/work/postgres/src$

--
Best regards,
Ryo Kanbayashi
https://github.com/ryogrid

On Wed, Jan 8, 2025 at 10:35 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Kanbayashi-san,
>
> > I found a code validation bug in master branch.
> >
> > Now, ecpg does not support 'EXEC SQL COPY ... FROM STDIN ... ;' and
> > code for warning it exits.
> >
> > https://github.com/postgres/postgres/blob/7b27f5fd36cb3270e8ac25aefd73b55
> > 2663d1392/src/interfaces/ecpg/preproc/ecpg.addons#L242-L245
> > ---
> > ECPG: addon CopyStmt COPY opt_binary qualified_name opt_column_list
> > copy_from opt_program copy_file_name copy_delimiter opt_with
> > copy_options where_clause
> > if (strcmp(@6, "from") == 0 &&
> > (strcmp(@7, "stdin") == 0 || strcmp(@7, "stdout") == 0))
> > mmerror(PARSE_ERROR, ET_WARNING, "COPY FROM STDIN is not
> > implemented");
> > ---
> >
> > But it is not working.
> > ecpg command fails to notice though code like above exits on pgc code.
>
> Good catch. I have a comment about the fix.
>
> The parser accepts a statement like "COPY ... FROM STDOUT", and ISTM it is 
> not either
> implemented yet. However, the warning message only mentions STDIN case even 
> STDOUT
> is specified.
>
> EXEC SQL COPY foo FROM STDOUT;
> ->
> WARNING: COPY FROM STDIN is not implemented
>
> I feel we can change like "COPY FROM STDIN/STDOUT...", or prepare two 
> messages.
> Thought?
>
> Best regards,
> Hayato Kuroda
> FUJITSU LIMITED
>


copy_from_stdin_no_warning2.patch
Description: Binary data

Re: AIO v2.0

2025-01-08 Thread Jakub Wartak

On Mon, Jan 6, 2025 at 5:28 PM Andres Freund  wrote:
>
> Hi,
>
> On 2024-12-19 17:29:12 -0500, Andres Freund wrote:
> > > Not about patch itself, but questions about related stack functionality:
> > > 
> > >
> > >
> > > 7. Is pg_stat_aios still on the table or not ? (AIO 2021 had it). Any 
> > > hints
> > > on how to inspect real I/O calls requested to review if the code is 
> > > issuing
> > > sensible calls: there's no strace for uring, or do you stick to DEBUG3 or
> > > perhaps using some bpftrace / xfsslower is the best way to go ?
> >
> > I think we still want something like it, but I don't think it needs to be in
> > the initial commits.
>
> After I got this question from Thomas as well, I started hacking one up.
>
> What information would you like to see?
>
> Here's what I currently have:
..
> ├─[ RECORD 2 ]───┼┤
> │ pid│ 358212 │
> │ io_id  │ 2051   │
> │ io_generation  │ 4199   │
> │ state  │ IN_FLIGHT  │
> │ operation  │ read   │
> │ offset │ 511967232  │
> │ length │ 262144 │
> │ subject│ smgr   │
> │ iovec_data_len │ 32 │
> │ raw_result │ (null) │
> │ result │ UNKNOWN│
> │ error_desc │ (null) │
> │ subject_desc   │ blocks 1373216..1373247 in file "base/5/16388" │
> │ flag_sync  │ f  │
> │ flag_localmem  │ f  │
> │ flag_buffered  │ t  │

Cool! It's more than enough for me in future, thanks!

> I didn't think that pg_stat_* was quite the right namespace, given that it
> shows not stats, but the currently ongoing IOs.  I am going with pg_aios for
> now, but I don't particularly like that.

If you are looking for other proposals:
* pg_aios_progress ? (to follow pattern of pg_stat_copy|vaccuum_progress?)
* pg_debug_aios ?
* pg_debug_io ?

> I think we'll want a pg_stat_aio as well, tracking things like:
>
> - how often the queue to IO workes was full
> - how many times we submitted IO to the kernel (<= #ios with io_uring)
> - how many times we asked the kernel for events (<= #ios with io_uring)
> - how many times we had to wait for in-flight IOs before issuing more IOs

If I could dream of one thing that would be 99.9% percentile of IO
response times in milliseconds for different classes of I/O traffic
(read/write/flush). But it sounds like it would be very similiar to
pg_stat_io and potentially would have to be
per-tablespace/IO-traffic(subject)-type too. AFAIU pg_stat_io has
improper structure to have that there.

BTW: before trying to even start to compile that AIO v2.2* and
responding to the previous review, what are You looking interested to
hear the most about it so that it adds some value ? Any workload
specific measurements? just general feedback, functionality gaps?
Integrity/data testing with stuff like dm-dust, dm-flakey, dm-delay to
try the error handling routines? Some kind of AIO <-> standby/recovery
interactions?

* - btw, Date: 2025-01-01 04:03:33 - I saw what you did there! so
let's officially recognize the 2025 as the year of AIO in PG, as it
was 1st message :D

-J.

Re: Reorder shutdown sequence, to flush pgstats later

2025-01-08 Thread Bertrand Drouvot

Hi,

On Wed, Jan 08, 2025 at 03:10:03PM +0200, Heikki Linnakangas wrote:
> On 08/01/2025 04:10, Andres Freund wrote:
> 
> > elog(DEBUG1, "updating PMState from %d/%s to %d/%s",
> >  pmState, pmstate_name(pmState), newState, 
> > pmstate_name(newState));
> 
> I think the state names are enough, no need to print the numerical values.

+1

> > 0002: Make logging of postmaster signalling child processes more consistent
> > 
> >I found it somewhat hard to understand what's happening during state
> >changes without being able to see the signals being sent. While we 
> > did
> >have logging in SignalChildren(), we didn't in signal_child(), and 
> > most
> >signals that are important for the shutdown sequence are sent via
> >signal_child().
> 
> +1

Random comments:

=== 1

+static const char *
+pm_signame(int signal)
+{
+#define PM_CASE(state) case state: return #state
+   switch (signal)

What about?
"
#define PM_SIGNAL_CASE(sig) case sig: return #sig
"

to better reflect its context.

=== 2

+   default:
+   /* all signals sent by postmaster should be listed here 
*/
+   Assert(false);
+   return "(unknown)";
+   }
+#undef PM_CASE
+   pg_unreachable();

Shouldn't we remove the "return "(unknown)"? (if not the pg_unreachable() looks
like dead code).

> I don't think the cases where DEBUG2 or DEBUG4 are used currently are very
> well thought out. To save a few lines of code, I'd be happy with merging
> signal_child_ext() and signal_child()

+1

> > The next is a small prerequisite:
> > 
> > 0003: postmaster: Introduce variadic btmask_all_except()
> > 
> >I proposed this in another thread, where I also needed
> >btmask_all_except3() but then removed the only call to
> >btmask_all_except2(). Instead of adding/removing code, it seems 
> > better
> >to just use a variadic function.

LGTM.

> Nice. A variadic btmask_add() might be handy too.
> 
> > And then 0004, the reason for this thread.

Did not give it a detailled review, but IIUC it now provides this flow:

PM_SHUTDOWN->PM_XLOG_IS_SHUTDOWN->PM_WAIT_DEAD_END->PM_SHUTDOWN_CHECKPOINTER->PM_NO_CHILDREN

and that looks good to me to fix the issue.

A few random comments:

=== 3

+ postmaster.c will only
+ * signal checkpointer to exit after all processes that could emit stats
+ * have been shut down.

s/postmaster.c/PostmasterStateMachine()/?

=== 4

+ * too. That allows checkpointer to perform some last bits of of

Typo "of of"

I'll give 0004 a closer look once it leaves the WIP state but want to share that
it looks like a good way to fix the issue.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: pg_settings.unit and DefineCustomXXXVariable

2025-01-08 Thread Tom Lane

Matthias van de Meent  writes:
> On Wed, 8 Jan 2025 at 11:13, Luca Ferrari  wrote:
>> I need to define a few GUCs, and for that purpose I'm using
>> DefineCustomXXXVariable functions that provide hooks for assignment,
>> show and check. However it is not clear to me if it is possible to
>> populate the unit column in pg_settings for the custom defined
>> variables, and if so, how.

> DefineCustomXXXVariable has a 'flags' argument, into which GUC_UNIT_*
> can be or-ed.

Correct.  contrib/auto_explain has some examples.

> AFAIK, the options provided in guc.h are the only kinds of units
> supported and available, and I think only integer variables actually
> support the unit conversion implied by the unit options.

Nowadays we allow units for float GUCs as well.  (Originally it was
indeed just for integers.  But at some point we decided an existing
integer value with units needed to become float, and then we had
to back-fill the implementation for compatibility's sake.)

regards, tom lane

Re: ecpg command does not warn COPY ... FROM STDIN;

2025-01-08 Thread Fujii Masao





On 2025/01/08 23:04, Ryo Kanbayashi wrote:

But it is not working.
ecpg command fails to notice though code like above exits on pgc code.


This issue seems to have been introduced in commit 3d009e45bd. Before that,
as per my testing, ecpg successfully detected COPY FROM STDIN and reported
a warning. It seems the fix will need to be back-patched to all supported 
versions.


I feel we can change like "COPY FROM STDIN/STDOUT...", or prepare two messages.
Thought?


I think your proposed fix is better too.
So, I modified the patch.


As for COPY FROM STDOUT, while it's possible, mentioning it might be
confusing since it's not official syntax. So I'm not sure if this is
good idea.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: CREATE TABLE NOT VALID for check and foreign key

2025-01-08 Thread Alvaro Herrera

Hello,

On 2025-Jan-07, Yasuo Honda wrote:

> I'd like PostgreSQL to raise errors and/or warnings for the NOT VALID
> check constraint for CREATE TABLE.
> Ruby on Rails supports creating check constraints with the NOT VALID
> option and I was not aware that it is just ignored until
> https://github.com/rails/rails/issues/53732 issue is reported.

Thanks.  I left a comment there.  I think raising a WARNING might work.

> Rails has implemented a kind of workaround by not dumping the NOT
> VALID option, but it does not help for the first execution.
> https://github.com/rails/rails/pull/53735

Yeah, that's probably not ideal.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick."  (Andrew Sullivan)
https://postgr.es/m/20050809113420.gd2...@phlogiston.dyndns.org

RE: ecpg command does not warn COPY ... FROM STDIN;

2025-01-08 Thread Hayato Kuroda (Fujitsu)

Dear Kanbayashi-san,

> I found a code validation bug in master branch.
> 
> Now, ecpg does not support 'EXEC SQL COPY ... FROM STDIN ... ;' and
> code for warning it exits.
> 
> https://github.com/postgres/postgres/blob/7b27f5fd36cb3270e8ac25aefd73b55
> 2663d1392/src/interfaces/ecpg/preproc/ecpg.addons#L242-L245
> ---
> ECPG: addon CopyStmt COPY opt_binary qualified_name opt_column_list
> copy_from opt_program copy_file_name copy_delimiter opt_with
> copy_options where_clause
> if (strcmp(@6, "from") == 0 &&
> (strcmp(@7, "stdin") == 0 || strcmp(@7, "stdout") == 0))
> mmerror(PARSE_ERROR, ET_WARNING, "COPY FROM STDIN is not
> implemented");
> ---
> 
> But it is not working.
> ecpg command fails to notice though code like above exits on pgc code.

Good catch. I have a comment about the fix.

The parser accepts a statement like "COPY ... FROM STDOUT", and ISTM it is not 
either
implemented yet. However, the warning message only mentions STDIN case even 
STDOUT
is specified.

EXEC SQL COPY foo FROM STDOUT;
->
WARNING: COPY FROM STDIN is not implemented

I feel we can change like "COPY FROM STDIN/STDOUT...", or prepare two messages.
Thought?

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Virtual generated columns

2025-01-08 Thread Peter Eisentraut


On 08.01.25 09:22, Richard Guo wrote:

- Added support for ALTER TABLE ... SET EXPRESSION.

When using ALTER TABLE to set expression for virtual generated
columns, we don't enforce a rewrite, which means we don't have the
opportunity to check whether the new values for these columns could
cause an underflow or overflow.  For instance,

create table t (a int, b int generated always as (a) virtual);
insert into t values (2147483647);

# alter table t alter column b set expression as (a * 2);
ALTER TABLE

# select * from t;
ERROR:  integer out of range

The same thing could occur with INSERT.  As we don't compute virtual
generated columns on write, we may end up inserting values that cause
underflow or overflow for these columns.

create table t1 (a int, b int generated always as (a * 2) virtual);
insert into t1 values (2147483647);

# select * from t1;
ERROR:  integer out of range

I'm not sure if this is expected or not, so I just wanted to point it
out.


Yes, this is expected behavior.  This also happens with a view.  So it 
is consistent for compute-on-read objects.

Re: Separate GUC for replication origins

2025-01-08 Thread Euler Taveira

On Thu, Dec 19, 2024, at 10:31 AM, Peter Eisentraut wrote:
> On 10.12.24 19:41, Euler Taveira wrote:
> > I'm attaching a patch that adds max_replication_origins. It basically 
> > replaces
> > all of the points that refers to max_replication_slots on the subscriber. It
> > uses the same default value as max_replication_slots (10). I did nothing to
> > keep the backward compatibility. I mean has a special value (like -1) that
> > means same value as max_replication_slots. Is it worth it? (If not, it 
> > should
> > be mentioned in the commit message.)
> 
> I think some backward compatibility tweak like that would be useful. 
> Otherwise, the net effect of this is that most people will have to do 
> one more thing than before to keep their systems working.  Also, all 
> deployment and automation scripts would have to be updated for this.
> 
This new patch includes the special value (-1) that uses max_replication_slots
instead.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 8a80eb6a8aa3cbfb2c0eb09ee3799e857729e205 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Tue, 3 Sep 2024 12:10:20 -0300
Subject: [PATCH v2] Separate GUC for replication origins

This feature already exists but it is provided by an existing GUC:
max_replication_slots. The new GUC (max_replication_origins) defines the
maximum number of replication origins that can be tracked
simultaneously. The max_replication_slots was used for this purpose but
it is confusing (when you are learning about logical replication) and
introduces a limitation (you cannot have a small number of replication
slots and a high number of subscriptions).

For backward compatibility, the default is -1, indicating that the value
of max_replication_slots is used instead.
---
 doc/src/sgml/config.sgml  | 27 ++
 doc/src/sgml/logical-replication.sgml |  6 +-
 src/backend/replication/logical/launcher.c|  4 +-
 src/backend/replication/logical/origin.c  | 91 ---
 src/backend/utils/misc/guc_tables.c   | 12 +++
 src/backend/utils/misc/postgresql.conf.sample |  3 +
 src/bin/pg_basebackup/pg_createsubscriber.c   | 18 ++--
 .../t/040_pg_createsubscriber.pl  |  4 +-
 src/bin/pg_upgrade/check.c| 14 +--
 src/bin/pg_upgrade/t/004_subscription.pl  | 14 +--
 src/include/replication/origin.h  |  3 +
 11 files changed, 111 insertions(+), 85 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8683f0bdf5..1f15d97055 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4493,13 +4493,6 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
  to replica or higher to allow replication slots to
  be used.
 
-
-
- Note that this parameter also applies on the subscriber side, but with
- a different meaning. See 
- in  for more
- details.
-

   
 
@@ -5202,10 +5195,10 @@ ANY num_sync ( 
-  max_replication_slots (integer)
+ 
+  max_replication_origins (integer)

-max_replication_slots configuration parameter
+max_replication_origins configuration parameter
 in a subscriber

   
@@ -5217,18 +5210,14 @@ ANY num_sync ( pg_replication_origin_status)
-will prevent the server from starting.
-max_replication_slots must be set to at least the
+will prevent the server from starting. It defaults to -1, indicating
+that the value of  should be
+used instead. This parameter can only be set at server start.
+
+max_replication_origins must be set to at least the
 number of subscriptions that will be added to the subscriber, plus some
 reserve for table synchronization.

-
-   
-Note that this parameter also applies on a sending server, but with
-a different meaning. See 
-in  for more
-details.
-   
   
  
 
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 8290cd1a08..1c4b6586db 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -2144,9 +2144,7 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
 
   
Logical replication requires several configuration options to be set. Most
-   options are relevant only on one side of the replication. However,
-   max_replication_slots is used on both the publisher and
-   the subscriber, but it has a different meaning for each.
+   options are relevant only on one side of the replication.
   
 
   
@@ -2181,7 +2179,7 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
Subscribers
 

-max_replication_slots
+max_replication_origins
 must be set to at least the number of subscriptions that will be added to
 the subscriber, plus some reserve for tabl

Re: ecpg command does not warn COPY ... FROM STDIN;

2025-01-08 Thread Tom Lane

Fujii Masao  writes:
> On 2025/01/08 23:04, Ryo Kanbayashi wrote:
> But it is not working.
> ecpg command fails to notice though code like above exits on pgc code.

> This issue seems to have been introduced in commit 3d009e45bd.

Indeed :-(

> As for COPY FROM STDOUT, while it's possible, mentioning it might be
> confusing since it's not official syntax. So I'm not sure if this is
> good idea.

There's another problem: the correct syntax is COPY TO STDOUT,
and that won't trigger this warning either.

I'm inclined to drop the test on @5 altogether, and just check for
"stdin" or "stdout" in @7.  There is no variant of those that will
work.  (But maybe we should allow it if opt_program isn't empty?)

The warning message could perhaps be written
"COPY FROM STDIN/TO STDOUT is not implemented".

regards, tom lane

Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?

2025-01-08 Thread Sami Imseih

> s/parallel vacuum workers for index vacuuming/parallel workers for index 
> creation/?

oops, that's my oversight from copying the message from vacuum. fixed.

> I'd add "in parallel" to match its counterpart "serially" above. That would
> make it more clear in case one just look for "building index" in the log.

good point.

Below is the new output with the attached v2.

postgres=# create index foo_idx1 on foo(id);
DEBUG:  building index "foo_idx1" on table "foo" with parallel workers
DEBUG:  launched 1 parallel workers for index creation (planned: 1)
DEBUG:  index "foo_idx1" can safely use deduplication
CREATE INDEX

Regards,

Sami


v2-0001-improve-DEBUG1-logging-of-parallel-workers-for-CR.patch
Description: Binary data

Re: Fwd: Re: A new look at old NFS readdir() problems?

2025-01-08 Thread Bruce Momjian

On Tue, Jan  7, 2025 at 01:03:05AM +, David Steele wrote:
> > > I'm more concerned about the report we saw on SUSE/NFS [1]. If that
> > > report is accurate it indicates this may not be something we can just
> > > document and move on from -- unless we are willing to entirely drop
> > > support for NFS.
> > > [1] https://github.com/pgbackrest/pgbackrest/issues/1423
> > 
> > I installed an up-to-date OpenSUSE image (Leap 15.6) and it passes
> > my "rmtree" test just fine with my NAS.  The report you cite
> > doesn't have any details on what the NFS server was, but I'd be
> > inclined to guess that that server's filesystem lacked support
> > for stable NFS cookies.
> 
> The internal report we received might have had a similar cause. Sure seems
> like a minefield for any user trying to figure out if their setup is
> compliant, though. In many setups (especially production) a drop database is
> rare.

Will people now always get a clear error on failure?  Crazy idea, but
could we have initdb or postmaster start test this? 

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.

Re: Fix a wrong errmsg in AlterRole()

2025-01-08 Thread Nathan Bossart

On Wed, Jan 08, 2025 at 10:29:12AM -0500, Tom Lane wrote:
> "=?ISO-8859-1?B?Y2NhNTUwNw==?="  writes:
>> postgres=> alter group g1 drop user r1;
>> ERROR:  permission denied to alter role
>> DETAIL:  Only roles with the ADMIN option on role "g1" may add members.
> 
>> The errmsg is "add members", but we are doing a drop.
>> Here is a small patch to fix it.
> 
> Hmm.  Seems simpler to just make the errdetail say "... may add or
> drop members."

+1.  It looks like this goes back to commit ce6b672 (v16).  The error
message was changed in commit de4d456, which is also a v16 change, so
fortunately this has a good chance of back-patching cleanly.

-- 
nathan

Re: jsonb_strip_nulls with arrays?

2025-01-08 Thread Andrew Dunstan



On 2024-09-17 Tu 4:53 PM, Florents Tselai wrote:


We could, if we're going to do anything at all in this area.
Another possibility would be to provide a second optional
parameter for json{b}_strip_nulls. That's probably a better way to go.

Here's a patch that adds that argument (only for jsonb; no json 
implementation yet)





I think it looks sane. We're not stripping a top level null, which is 
one thing I looked out for.


I think we need a json implementation as well, though.


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: Virtual generated columns

2025-01-08 Thread Tom Lane

Peter Eisentraut  writes:
> On 03.12.24 15:15, jian he wrote:
>> SELECT attrelid, attname, attgenerated FROM pg_attribute WHERE
>> attgenerated IN ('v') and (attnotnull or not atthasdef);

> I don't understand what the purpose of testing attnotnull is.  That is 
> independent of attgenerated, I think.

Does it make any sense to set NOT NULL on a generated column (virtual
or otherwise, but especially virtual)?  What is the system supposed
to do if the expression evaluates to null?  That concern generalizes
to any constraint really.  Even if we checked it at row storage time,
there's no real guarantee that the expression is immutable enough
to pass the constraint later.

regards, tom lane

Re: Non-text mode for pg_dumpall

2025-01-08 Thread Mahendra Singh Thalor

On Wed, 8 Jan 2025 at 20:07, Mahendra Singh Thalor  wrote:
>
> Hi all,
>
> On Wed, 8 Jan 2025 at 00:34, Mahendra Singh Thalor  wrote:
> >
> > On Mon, 6 Jan 2025 at 23:05, Nathan Bossart  
> > wrote:
> > >
> > > On Thu, Jan 02, 2025 at 02:05:13AM +0530, Mahendra Singh Thalor wrote:
> > > > Here, I am attaching an updated patch. I fixed some bugs of v01 patch 
> > > > and
> > > > did some code cleanup also.
> > >
> > > Thank you for picking this up!  I started to review it, but the
> > > documentation changes didn't build, and a few tests in check-world are
> > > failing.  Would you mind resolving those issues?  Also, if you haven't
> > > already, please add an entry to the next commitfest [0] to ensure that 1)
> > > this feature is tracked and 2) the automated tests will run.
> >
> > Thanks Nathan for the quick response.
> >
> > I fixed bugs of documentation changes and check-world in the latest patch. 
> > Now docs are building and check-world is passing.
> >
> > I added entry into commitfest for this patch.[0]
> >
> > >
> > > +   if (dbfile)
> > > +   {
> > > +   printfPQExpBuffer(&cmd, "\"%s\" %s %s", pg_dump_bin,
> > > + dbfile, create_opts);
> > > +   appendPQExpBufferStr(&cmd, " -F d ");
> > > +   }
> > >
> > > Have you given any thought to allowing a directory of custom format files,
> > > as discussed upthread [1]?  Perhaps that is better handled as a follow-up
> > > patch, but it'd be good to understand the plan, anyway.
> >
> > I will make these changes and will test. I will update my findings after 
> > doing some testing.
>
> In the latest patch, I added dump and restoring for 
> directory/custom/tar/plain formats. Please consider this patch for review and 
> testing.
>
> Design:
> When we give --format=d|c|t then we are dumping all global sql commands in 
> global.dat in plain sql format and we are making a map.dat file with dbname 
> and dboid. For each database, we are making separate subdirectory with dboid 
> under databases directory and dumping as per archive format(d|c|t).
> While restoring, first we are restoring all global sql commands from 
> global.dat and then we are restoring one by one all databases.  As we are 
> supporting --exclude-database with pg_dumpall, the same we are supporting 
> with pg_restore also to skip restoring on some specified database patterns.
> If we want to restore a single database, then we can specided particular 
> subdirectory from the databases folder. To get file name, we refer dbname 
> into map.file.
>
> TODO: Now I will work on test cases for these new added options to the 
> pg_dumpall and pg_restore.
>
> Here, I am attaching the v04 patch for testing and review.

Sorry. My mistake.
v04 was the delta patch on the top of v03.

Here, I am attaching the v05 patch for testing and review.


>
> >
> > Apart from these bugs, I added code to handle --exclude-database= PATTERN. 
> > Earlier I was using NAME only to skip databases for restore.
> >
> > TODO: .pl test cases for new added options.
> >
> > Here, I am attaching an updated patch for review and feedback.
> >
> > >
> > > [0] https://commitfest.postgresql.org
> > > [1] 
> > > https://postgr.es/m/CABUevExoQ26jo%2BaQ9QZq%2BUMA1aD6gfpm9xBnh_t5e0DhaCeRYA%40mail.gmail.com
> > >
> > > --
> > > nathan
> >
> > --
> > Thanks and Regards
> > Mahendra Singh Thalor
> > EnterpriseDB: http://www.enterprisedb.com
>
>
>
> --
> Thanks and Regards
> Mahendra Singh Thalor
> EnterpriseDB: http://www.enterprisedb.com



-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


v05_pg_dumpall-with-directory-tar-custom-format-08-jan.patch
Description: Binary data

Re: Proposal: add new API to stringinfo

2025-01-08 Thread Nathan Bossart

On Tue, Jan 07, 2025 at 03:57:57PM +0900, Tatsuo Ishii wrote:
> So the v2 patch version is 1.3787% slower than master. Do you think we
> need further inlining?

If it's not too much work to coax the compiler to do so, then I don't see a
strong reason to avoid it.  Granted, there are probably much more expensive
parts of the StringInfo support functions (e.g., palloc()), but these are
hot enough code paths that IMHO it's worth saving whatever cycles we can.

-- 
nathan

Re: Fix a wrong errmsg in AlterRole()

2025-01-08 Thread Tom Lane

"=?ISO-8859-1?B?Y2NhNTUwNw==?="  writes:
> postgres=> alter group g1 drop user r1;
> ERROR:  permission denied to alter role
> DETAIL:  Only roles with the ADMIN option on role "g1" may add members.

> The errmsg is "add members", but we are doing a drop.
> Here is a small patch to fix it.

Hmm.  Seems simpler to just make the errdetail say "... may add or
drop members."

regards, tom lane

using PGOPTIONS in meson

2025-01-08 Thread jian he

hi.
i am wondering ways to do
``make check PGOPTIONS="-c debug_parallel_query=regress -c work_mem=50MB"``
in meson way, especially the PGOPTIONS part.

I have looked at [1] and [2].
[1] 
https://www.postgresql.org/docs/current/regress-run.html#REGRESS-RUN-CUSTOM-SETTINGS
[2] https://wiki.postgresql.org/wiki/Meson

Re: Fwd: Re: A new look at old NFS readdir() problems?

2025-01-08 Thread David Steele


On 1/8/25 12:40, Bruce Momjian wrote:

On Tue, Jan  7, 2025 at 01:03:05AM +, David Steele wrote:

I'm more concerned about the report we saw on SUSE/NFS [1]. If that
report is accurate it indicates this may not be something we can just
document and move on from -- unless we are willing to entirely drop
support for NFS.
[1] https://github.com/pgbackrest/pgbackrest/issues/1423


I installed an up-to-date OpenSUSE image (Leap 15.6) and it passes
my "rmtree" test just fine with my NAS.  The report you cite
doesn't have any details on what the NFS server was, but I'd be
inclined to guess that that server's filesystem lacked support
for stable NFS cookies.


The internal report we received might have had a similar cause. Sure seems
like a minefield for any user trying to figure out if their setup is
compliant, though. In many setups (especially production) a drop database is
rare.


Will people now always get a clear error on failure?  


The error will be something like "directory is not empty" when trying to 
drop a database. So not very clear at all.



Crazy idea, but
could we have initdb or postmaster start test this?


I think this test would be too expensive for postmaster start as it 
would require creating and deleting a large number of files (maybe 
multiple times). I suppose it could be a special setting but I doubt 
that would be popular.


initdb might be more promising but clusters are frequently 
copied/restored to different storage so I'm not sure if it would be too 
helpful in the long run.


Regards,
-David

Re: Virtual generated columns

2025-01-08 Thread Marcos Pegoraro

Em qua., 8 de jan. de 2025 às 13:14, Peter Eisentraut 
escreveu:

> Here is a new patch version where I have gathered various pieces of
> feedback and improvement suggestions that are scattered over this
> thread.  I hope I got them all.  I will respond to the respective
> messages directly to give my response to each item.
>

This new version you are not accepting subqueries, like previous ones. But
we can create an immutable SQL function which will do the same. Wouldn't it
be better to explain that on DOCs ?

create table Orders(Order_ID integer not null primary key, Customer_ID
integer references Customer);
create function lkCustomer(integer) returns text language sql immutable as
$function$select Name from Customer where Customer_ID = $1;$function$;
alter table Orders add lkCustomer text generated always as
(lkCustomer(Customer_ID)) stored;

regards
Marcos

Re: Virtual generated columns

2025-01-08 Thread Peter Eisentraut


On 08.01.25 17:38, Tom Lane wrote:

Peter Eisentraut  writes:

On 03.12.24 15:15, jian he wrote:

SELECT attrelid, attname, attgenerated FROM pg_attribute WHERE
attgenerated IN ('v') and (attnotnull or not atthasdef);



I don't understand what the purpose of testing attnotnull is.  That is
independent of attgenerated, I think.


Does it make any sense to set NOT NULL on a generated column (virtual
or otherwise, but especially virtual)?  What is the system supposed
to do if the expression evaluates to null?  That concern generalizes
to any constraint really.  Even if we checked it at row storage time,
there's no real guarantee that the expression is immutable enough
to pass the constraint later.


The generation expression is required to be immutable.  So a table 
definition like


   a int,
   b int generated always as (a * 2) virtual,
   check (b > 0)

is not very different from

   a int,
   check (a * 2 > 0)

in terms of the constraint execution.

The current patch does not support not-null constraints, but that's 
mostly because it's not implemented yet.  Maybe that's what Jian was 
thinking about.

Re: use a non-locking initial test in TAS_SPIN on AArch64

2025-01-08 Thread Andres Freund

Hi,

On 2025-01-08 12:12:19 -0600, Nathan Bossart wrote:
> On Tue, Oct 22, 2024 at 02:54:57PM -0500, Nathan Bossart wrote:
> > My colleague Salvatore Dipietro (CC'd) sent me a couple of profiles that
> > showed an enormous amount of s_lock() time going to the
> > __sync_lock_test_and_set() call in the AArch64 implementation of tas().
> > Upon closer inspection, I noticed that we don't implement a custom
> > TAS_SPIN() for this architecture, so I quickly hacked together the attached
> > patch and ran a couple of benchmarks that stressed the spinlock code.  I
> > found no discussion about TAS_SPIN() on ARM in the archives, but I did
> > notice that the initial AArch64 support was added [0] before x86_64 started
> > using a non-locking test [1].
> >
> > These benchmarks are for a c8g.24xlarge running a select-only pgbench with
> > 256 clients and pg_stat_statements.track_planning enabled.
> >
> > without the patch:
> >
> > [...]
> >
> > tps = 74135.100891 (without initial connection time)
> >
> > with the patch:
> >
> > [...]
> >
> > tps = 549462.785554 (without initial connection time)
>
> Are there any objections to proceeding with this change?  So far, it's been
> tested on a c8g.24xlarge and an Apple M3 (which seems to be too small to
> show any effect).  If anyone has access to a larger ARM machine, additional
> testing would be greatly appreciated.  I think it would be unfortunate if
> this slipped to v19.

Seems reasonable. It's not surprising that spinning with an atomic operation
doesn't scale very well once a you have more than a handful cores.

Of course the whole way locking works in pg_stat_statements is a scalability
disaster, but that's a bigger project to fix.

Out of curiosity, have you measured whether this has a positive effect without
pg_stat_statements? I think it'll e.g. also affect lwlocks, as they also use
perform_spin_delay().

Greetings,

Andres

Re: Fwd: Re: A new look at old NFS readdir() problems?

2025-01-08 Thread Tom Lane

David Steele  writes:
> On 1/8/25 12:40, Bruce Momjian wrote:
>> Will people now always get a clear error on failure?  

> The error will be something like "directory is not empty" when trying to 
> drop a database. So not very clear at all.

Conceivably we could adjust rmtree() to print a HINT mentioning that
unsafe network storage is a possible cause of this error.  But I'm
a little worried that that'd be crying wolf, because on most setups
a much more likely explanation is somebody concurrently creating a
file in the targeted directory.

regards, tom lane

Re: Parameter NOT NULL to CREATE DOMAIN not the same as CHECK (VALUE IS NOT NULL)

2025-01-08 Thread Tom Lane

Bruce Momjian  writes:
> I think this needs some serious research.

We've discussed this topic before.  The spec's definition of IS [NOT]
NULL for composite values is bizarre to say the least.  I think
there's been an intentional choice to keep most NOT NULL checks
"simple", that is we look at the overall value's isnull bit and
don't probe any deeper than that.

If the optimizations added in v17 changed existing behavior,
I agree that's bad.  We should probably fix it so that those
are only applied when argisrow is false.

regards, tom lane

Re: Sample rate added to pg_stat_statements

2025-01-08 Thread Ilia Evdokimov




On 08.01.2025 22:39, Ilia Evdokimov wrote:


After the changes in v9-patch, won’t all the necessary queries now be 
normalized? Since there are no longer any restrictions in the parser 
hook, queries will be normalized as usual, and pgss_planner, 
pgss_ExecutorStart, and ExecutorEnd will simply fetch them from 
'pgss_query_texts.stat' file.


For now, this seems like the simplest approach instead of making 
JumbleState available to other hooks. Moreover, if we do proceed with 
that, we might be able to remove the 'pgss_query_texts.stat' file 
altogether and more other improvements. In my view, if no other 
options arise, we should address making JumbleState available to other 
hooks in a separate thread. If you have any suggestions, I'm always 
open to feedback.


I attached v9 patch with changes and test above.



Unfortunately, these changes do not achieve the intended sampling goal. 
I looked into this more deeply: while the sampled-out queries do not 
appear in pg_stat_statements, an entry is still allocated in the hash 
table after normalization, which, in my view, should not happen when 
sampling is in effect. Therefore, patch v9 is unlikely to meet our needs.


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: New GUC autovacuum_max_threshold ?

2025-01-08 Thread Vinícius Abrahão

On Wed, Jan 8, 2025 at 8:01 PM Nathan Bossart 
wrote:

> On Wed, Jan 08, 2025 at 02:48:10PM +0100, Frédéric Yhuel wrote:
> > For what it's worth, although I would have preferred the sub-linear
> growth
> > thing, I'd much rather have this than nothing.
>
> +1, this is how I feel, too.  But I also don't want to add something that
> folks won't find useful.
>
> > And I have to admit that the proposed formulas were either too
> convoluted or
> > wrong.
> >
> > This very patch is more straightforward. Please let me know if I can help
> > and how.
>
> I read through the thread from the top, and it does seem like there is
> reasonably strong support for the hard cap.  Upon a closer review of the
> patch, I noticed that the relopt was defined such that you couldn't disable
> autovacuum_max_threshold on a per-table basis, so I fixed that in v4.
>
> --
> nathan
>


nathan,

Please also provide the tests on the new parameter you want to introduce.

Best,
vini

Re: EphemeralNamedRelation and materialized view

2025-01-08 Thread Tom Lane

Yugo NAGATA  writes:
> Thank you for your reviewing and editing the patch!
> I agree with your proposal on the error message handling.

Cool, pushed v4 then.

regards, tom lane

Re: Non-text mode for pg_dumpall

2025-01-08 Thread Mahendra Singh Thalor

On Thu, 9 Jan 2025 at 02:30, Guillaume Lelarge  wrote:
>
> Hi,
>
> Le mer. 8 janv. 2025 à 17:41, Mahendra Singh Thalor  a 
> écrit :
>>
>> On Wed, 8 Jan 2025 at 20:07, Mahendra Singh Thalor  
>> wrote:
>> >
>> > Hi all,
>> >
>> > On Wed, 8 Jan 2025 at 00:34, Mahendra Singh Thalor  
>> > wrote:
>> > >
>> > > On Mon, 6 Jan 2025 at 23:05, Nathan Bossart  
>> > > wrote:
>> > > >
>> > > > On Thu, Jan 02, 2025 at 02:05:13AM +0530, Mahendra Singh Thalor wrote:
>> > > > > Here, I am attaching an updated patch. I fixed some bugs of v01 
>> > > > > patch and
>> > > > > did some code cleanup also.
>> > > >
>> > > > Thank you for picking this up!  I started to review it, but the
>> > > > documentation changes didn't build, and a few tests in check-world are
>> > > > failing.  Would you mind resolving those issues?  Also, if you haven't
>> > > > already, please add an entry to the next commitfest [0] to ensure that 
>> > > > 1)
>> > > > this feature is tracked and 2) the automated tests will run.
>> > >
>> > > Thanks Nathan for the quick response.
>> > >
>> > > I fixed bugs of documentation changes and check-world in the latest 
>> > > patch. Now docs are building and check-world is passing.
>> > >
>> > > I added entry into commitfest for this patch.[0]
>> > >
>> > > >
>> > > > +   if (dbfile)
>> > > > +   {
>> > > > +   printfPQExpBuffer(&cmd, "\"%s\" %s %s", pg_dump_bin,
>> > > > + dbfile, create_opts);
>> > > > +   appendPQExpBufferStr(&cmd, " -F d ");
>> > > > +   }
>> > > >
>> > > > Have you given any thought to allowing a directory of custom format 
>> > > > files,
>> > > > as discussed upthread [1]?  Perhaps that is better handled as a 
>> > > > follow-up
>> > > > patch, but it'd be good to understand the plan, anyway.
>> > >
>> > > I will make these changes and will test. I will update my findings after 
>> > > doing some testing.
>> >
>> > In the latest patch, I added dump and restoring for 
>> > directory/custom/tar/plain formats. Please consider this patch for review 
>> > and testing.
>> >
>> > Design:
>> > When we give --format=d|c|t then we are dumping all global sql commands in 
>> > global.dat in plain sql format and we are making a map.dat file with 
>> > dbname and dboid. For each database, we are making separate subdirectory 
>> > with dboid under databases directory and dumping as per archive 
>> > format(d|c|t).
>> > While restoring, first we are restoring all global sql commands from 
>> > global.dat and then we are restoring one by one all databases.  As we are 
>> > supporting --exclude-database with pg_dumpall, the same we are supporting 
>> > with pg_restore also to skip restoring on some specified database patterns.
>> > If we want to restore a single database, then we can specided particular 
>> > subdirectory from the databases folder. To get file name, we refer dbname 
>> > into map.file.
>> >
>> > TODO: Now I will work on test cases for these new added options to the 
>> > pg_dumpall and pg_restore.
>> >
>> > Here, I am attaching the v04 patch for testing and review.
>>
>> Sorry. My mistake.
>> v04 was the delta patch on the top of v03.
>>
>> Here, I am attaching the v05 patch for testing and review.
>>
>
> Just FWIW, I did a quick test tonight. It applies cleanly, compiles OK. I did 
> a dump:

Thanks for testing and review.

>
> $ pg_dumpall -Fd -f dir
>
> and then a restore (after dropping the databases I had):
>
> $ pg_restore -Cd postgres -v dir
>
> It worked really well. That's great.
>
> Quick thing to fix: you've got this error message:
> pg_restore: error:  -d/--dbanme should be given when using archive dump of 
> pg_dumpall
>
> I guess it is --dbname, rather than --dbanme.

Fixed.

>
> Of course, it needs much more testing, but this feature would be great to 
> have. Thanks for working on this!
>
>

Apart from above typo, I fixed some review comments those I received
from Andrew in offline discussion. Thanks Andrew for the quick review.

Here, I am attaching an updated patch for review and testing.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


v06_pg_dumpall-with-directory-tar-custom-format-08-jan.patch
Description: Binary data

Re: Make pg_stat_io view count IOs as bytes instead of blocks

2025-01-08 Thread Michael Paquier

On Thu, Dec 26, 2024 at 02:41:26PM +0300, Nazir Bilal Yavuz wrote:
> Thanks! v4 is attached. I quickly tested the pg_stat_get_backend_io()
> function and it seems it is working.

Thanks a lot for the rebased version.  This looks pretty solid.  Here
are some comments.

 void
-pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op)
+pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op, 
uint64 bytes)
 {
-pgstat_count_io_op_n(io_object, io_context, io_op, 1);
+pgstat_count_io_op_n(io_object, io_context, io_op, 1, bytes);

pgstat_count_io_op_n() is only used locally in pgstat_io.c.  I'm OK to
keep it as it is used with the time calculations, but wouldn't it be
better to make it static in pgstat_io.c instead and not declare it in
pgstat.h?  Do we really have a need for pgstat_count_io_op() at all at
the end or would it be better to change it so as it can handle a
number of operations given by the caller?

 typedef struct PgStat_BktypeIO
 {
+uint64
bytes[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];

This wastes a bit of memory, while keeping the code simpler to
understand.  That's better than more element number manipulations, so
I'm OK with what you have here.

+static inline bool
+is_ioop_tracked_in_bytes(IOOp io_op)
+{
+Assert((unsigned int) io_op < IOOP_NUM_TYPES);
+return io_op >= IOOP_EXTEND;
+}

This is only used in an assertion of pgstat_count_io_op_n() in
pgstat_io.c.  Let's also keep it this routine local to the file.  The
assert to make sure that the callers don't assign bytes to the
operations that don't support the counters is a good idea.

+ * If an IOOp does not tracked in bytes, IO_COL_INVALID is returned.

s/does not tracked/is not tracked/.

+/*
+ * Get the number of the column containing IO bytes for the specified IOOp.
+ * If an IOOp does not tracked in bytes, IO_COL_INVALID is returned.
+ */
+static io_stat_col
+pgstat_get_io_byte_index(IOOp io_op)
+{

Makes sense to me, and that's consistent with how the time attributes
are handled.

-/*
- * Hard-code this to the value of BLCKSZ for now. Future values
- * could include XLOG_BLCKSZ, once WAL IO is tracked, and constant
- * multipliers, once non-block-oriented IO (e.g. temporary file
- * IO) is tracked.
- */
-values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);

Glad to see that gone.

+write_bytes bigint
+extend_bytes bigint
+read_bytes bigint

These additions in the documentation are incorrect.  All these
attributes are of type numeric, not bigint.

+Number of units of size BLCKSZ which the process

It seems to me that this had better mention 8192 as the default.
--
Michael

signature.asc
Description: PGP signature

Re: Windows pg_basebackup unable to create >2GB pg_wal.tar tarballs ("could not close file: Invalid argument" when creating pg_wal.tar of size ~ 2^31 bytes)

2025-01-08 Thread Thomas Munro

On Tue, Jan 7, 2025 at 9:54 AM Thomas Munro  wrote:
> On Tue, Jan 7, 2025 at 5:23 AM Andrew Dunstan  wrote:
> > Do you have a plan for moving ahead with this?
>
> I think that all looks good, and I'll go ahead and commit it in the
> next day or two.  Sorry for the delay.

I changed some remaining off_t, because I don't want to worry about
which ones can and can't overflow.  Might as we'll get 'em all.  I
also noticed a warning due to the new lseek() macro, which just
required adding it to plperl_system.h.

And pushed.

Re: Eager aggregation, take 3

2025-01-08 Thread jian he

hi.
in create_grouping_expr_infos

tce = lookup_type_cache(exprType((Node *) tle->expr),
TYPECACHE_BTREE_OPFAMILY);
if (!OidIsValid(tce->btree_opf) ||
!OidIsValid(tce->btree_opintype))
return;
   
/*
 * Get the operator in the btree's opfamily.
 */
eq_op = get_opfamily_member(tce->btree_opf,
tce->btree_opintype,
tce->btree_opintype,
BTEqualStrategyNumber);
if (!OidIsValid(eq_op))
return;
eq_opfamilies = get_mergejoin_opfamilies(eq_op);
if (!eq_opfamilies)
return;
btree_opfamily = linitial_oid(eq_opfamilies);


If eq_op is valid, then we don't need to call get_mergejoin_opfamilies?
since get_mergejoin_opfamilies output will be the same as tce->btree_opf.
and we already checked (tce->btree_opf) is valid.

In other words, I think eq_op is valid imply
that tce->btree_opf is the value (btree opfamily) we need.

Re: psql: Option to use expanded mode for various meta-commands

2025-01-08 Thread Greg Sabino Mullane

On Wed, Jan 8, 2025 at 8:44 AM Dean Rasheed 
wrote:

> Attached is a more complete patch


+1, looks good

So in the end, I decided to just add a sentence to each command's
> description, keeping it as
> short as possible.
>

Yes, that makes sense.

Cheers,
Greg

1 2 >

1 - 100 of 142 matches

Mail list logo