from:"Bertrand DROUVOT"

Re: System username in pg_stat_activity

2024-01-10 Thread Bertrand Drouvot

Hi,

On Wed, Jan 10, 2024 at 02:08:03PM +0100, Magnus Hagander wrote:
> On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev
>  wrote:
> >
> > Hi,
> >
> > Thanks for the patch.

+1

> > > This overlaps with for example the values in pg_stat_gss, but it will
> > > include values for authentication methods that don't have their own
> > > view such as peer/ident. gss/ssl info will of course still be shown,
> > > it is just in more than one place.

Yeah, I think that's a good idea.

> > It hurts my sense of beauty that usename and authname are of different
> > types. But if I'm the only one, maybe we can close our eyes on this.
> > Also I suspect that placing usename and authname in a close proximity
> > can be somewhat confusing. Perhaps adding authname as the last column
> > of the view will solve both nitpicks?
> 
> But it should probably actually be name, given that's the underlying
> datatype. I kept changing it around and ended up half way in
> between...
> 
> 
> > ```
> > +/* Information about the authenticated user */
> > +charst_authuser[NAMEDATALEN];
> > ```
> >
> > Well, here it's called "authuser" and it looks like the intention was
> > to use `name` datatype... I suggest using "authname" everywhere for
> > consistency.

I think it depends what we want the new field to reflect. If it is the exact
same thing as the SYSTEM_USER then I think it has to be text (as the SYSTEM_USER
is made of "auth_method:identity"). Now if we want it to be "only" the identity
part of it, then the `name` datatype would be fine. I'd vote for the exact same
thing as the SYSTEM_USER (means auth_method:identity).

> + 
> +  
> +   authname name
> +  
> +  
> +   The authentication method and identity (if any) that the user
> +   used to log in. It contains the same value as
> +returns in the backend.
> +  
> + 

I'm fine with auth_method:identity.

> +    S.authname,

What about using system_user as the field name? (because if we keep
auth_method:identity it's not really the authname anyway).

Also, what about adding a test in say 003_peer.pl to check that the value 
matches
the SYSTEM_USER one?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: System username in pg_stat_activity

2024-01-10 Thread Bertrand Drouvot

Hi,

On Wed, Jan 10, 2024 at 02:59:42PM +0100, Magnus Hagander wrote:
> On Wed, Jan 10, 2024 at 2:56 PM Bertrand Drouvot
> I definitely think it should be the same. If it's not exactly the
> same, then it should be *two* columns, one with auth method and one
> with the name.
> 
> And thinking more about it maybe that's cleaner, because that makes it
> easier to do things like filter based on auth method?

Yeah, that's sounds even better.

> 
> > > + 
> > > +  
> > > +   authname name
> > > +  
> > > +  
> > > +   The authentication method and identity (if any) that the user
> > > +   used to log in. It contains the same value as
> > > +returns in the backend.
> > > +  
> > > + 
> >
> > I'm fine with auth_method:identity.
> >
> > > +S.authname,
> >
> > What about using system_user as the field name? (because if we keep
> > auth_method:identity it's not really the authname anyway).
> 
> I was worried system_user or sysuser would both be confusing with the
> fact that we have usesysid -- which would reference a *different*
> sys...

If we go the 2 fields way, then what about auth_identity and auth_method then?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

2024-01-10 Thread Bertrand Drouvot

Hi,

On Wed, Jan 10, 2024 at 05:00:01PM +0300, Alexander Lakhin wrote:
> 10.01.2024 12:46, Bertrand Drouvot wrote:
> 
> > Would it be possible to also send the standby logs?
> 
> Yes, please look at the attached logs. This time I've build postgres with
> -DWAL_DEBUG and ran tests with TEMP_CONFIG as below:
> wal_keep_size=1GB
> wal_debug = on
> log_autovacuum_min_duration = 0
> log_statement = 'all'
> log_min_messages = INFO
> 
> The archive attached contains logs from four runs:
> recovery-1-ok -- an example of successful run for reference
> recovery-7-pruning and recovery-19-pruning -- failures with a failed
>  subtest 'activeslot slot invalidation is logged with on-access pruning'
> recovery-15-vacuum_pg_class -- a failure with a failed
>  subtest 'activeslot slot invalidation is logged with vacuum on pg_class'

Thanks a lot for the testing!

> is an absent message 'obsolete replication slot "row_removal_activeslot"'

Looking at 
recovery-15-vacuum_pg_class/i035_standby_logical_decoding_standby.log here:

Yeah, the missing message has to come from InvalidatePossiblyObsoleteSlot().

In case of an active slot we first call ReportSlotInvalidation with the second
parameter set to true (to emit the "terminating" message), then SIGTERM the 
active
process and then (later) we should call the other ReportSlotInvalidation()
call with the second parameter set to false (to emit the message that we don't
see here).

So it means InvalidatePossiblyObsoleteSlot() did not trigger the second 
ReportSlotInvalidation()
call. 

The thing it that it looks like we exited the loop in 
InvalidatePossiblyObsoleteSlot()
because there is more messages from the startup process (789618) after the:

"
2024-01-10 11:00:29.109 UTC [789618] LOG:  terminating process 790377 to 
release replication slot "row_removal_activeslot"
"

one.

Do you think you could try to add more debug messages in 
InvalidatePossiblyObsoleteSlot()
to understand why the second call to ReportSlotInvalidation() is not done and 
IIUC
why/how we exited the loop?

FWIW, I did try to reproduce by launching pg_recvlogical and then kill -SIGSTOP
it. Then producing a conflict, I'm able to get the first message and not the 
second
one (which is expected). But the startup process does not exit the loop, which 
is
expected here.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-01-10 Thread Bertrand Drouvot

Hi,

On Wed, Jan 10, 2024 at 12:23:14PM +, Zhijie Hou (Fujitsu) wrote:
> On Wednesday, January 10, 2024 2:26 PM Dilip Kumar  
> wrote:
> > 
> > + LogicalConfirmReceivedLocation(remote_slot->confirmed_lsn);
> > + LogicalIncreaseXminForSlot(remote_slot->confirmed_lsn,
> > +remote_slot->catalog_xmin);
> > + LogicalIncreaseRestartDecodingForSlot(remote_slot->confirmed_lsn,
> > +   remote_slot->restart_lsn);
> > +}
> > 
> > IIUC on the standby we just want to overwrite what we get from primary no? 
> > If
> > so why we are using those APIs that are meant for the actual decoding slots
> > where it needs to take certain logical decisions instead of mere 
> > overwriting?
> 
> I think we don't have a strong reason to use these APIs, but it was 
> convenient to
> use these APIs as they can take care of updating the slots info and will call
> functions like, ReplicationSlotsComputeRequiredXmin,
> ReplicationSlotsComputeRequiredLSN internally. Or do you prefer directly 
> overwriting
> the fields and call these manually ?

I'd vote for using the APIs as I think it will be harder to maintain if we are
not using them (means ensure the "direct" overwriting still makes sense over 
time).

FWIW, pg_failover_slots also rely on those APIs from what I can see.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

2024-01-11 Thread Bertrand Drouvot

Hi,

On Thu, Jan 11, 2024 at 01:00:00PM +0300, Alexander Lakhin wrote:
> Hi Bertrand,
> 
> 10.01.2024 19:32, Bertrand Drouvot wrote:
> > 
> > > is an absent message 'obsolete replication slot "row_removal_activeslot"'
> > Looking at 
> > recovery-15-vacuum_pg_class/i035_standby_logical_decoding_standby.log here:
> > 
> > Yeah, the missing message has to come from InvalidatePossiblyObsoleteSlot().
> > 
> > In case of an active slot we first call ReportSlotInvalidation with the 
> > second
> > parameter set to true (to emit the "terminating" message), then SIGTERM the 
> > active
> > process and then (later) we should call the other ReportSlotInvalidation()
> > call with the second parameter set to false (to emit the message that we 
> > don't
> > see here).
> > 
> > So it means InvalidatePossiblyObsoleteSlot() did not trigger the second 
> > ReportSlotInvalidation()
> > call.
> 
> I've found a way to reproduce the issue without slowing down a machine or
> running multiple tests in parallel. It's enough for this to add a delay to
> allow BackgroundWriterMain() to execute LogStandbySnapshot():
> @@ -692,2 +690,3 @@
>  $node_primary->safe_psql('testdb', qq[UPDATE prun SET s = 'D';]);
> +$node_primary->safe_psql('testdb', qq[SELECT pg_sleep(15);]);
>  $node_primary->safe_psql('testdb', qq[UPDATE prun SET s = 'E';]);
> 
> With this delay, I get the failure immediately:
> $ PROVE_TESTS="t/035*" TEMP_CONFIG=/tmp/extra.config make check -s -C 
> src/test/recovery
> # +++ tap check in src/test/recovery +++
> t/035_standby_logical_decoding.pl .. 47/?
> #   Failed test 'activeslot slot invalidation is logged with on-access 
> pruning'
> #   at t/035_standby_logical_decoding.pl line 227.

Thanks a lot for the testing!

So I think we have 2 issues here:

1) The one you're mentioning above related to the on-access pruning test:

I think the engine behavior is right here and that the test is racy. I'm
proposing to bypass the active slot invalidation check for this particular test 
(
as I don't see any "easy" way to take care of this race condition). The active 
slot
invalidation is already well covered in the other tests anyway. 

I'm proposing the attached 
v4-0001-Fix-035_standby_logical_decoding.pl-race-conditio.patch
for it.

2) The fact that sometime we're getting a termination message which is not 
followed
by an obsolete one (like as discussed in [1]).

For this one, I think that InvalidatePossiblyObsoleteSlot() is racy:

In case of an active slot we proceed in 2 steps:
 - terminate the backend holding the slot
 - report the slot as obsolete

This is racy because between the two we release the mutex on the slot, which
means the effective_xmin and effective_catalog_xmin could advance during that 
time.

I'm proposing the attached 
v1-0001-Fix-race-condition-in-InvalidatePossiblyObsoleteS.patch
for it.

Would it be possible to re-launch your repro (the slow one, not the pg_sleep() 
one)
with bot patch applied and see how it goes? (Please note that v4 replaces v3 
that
you're already using in your tests).

If it helps, I'll propose 
v1-0001-Fix-race-condition-in-InvalidatePossiblyObsoleteS.patch
into a dedicated hackers thread.

[1]: 
https://www.postgresql.org/message-id/ZZ7GpII4bAYN%2BjT5%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 8347851b3cc42655cfd81914f0c2f5cc1d22e2b8 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Thu, 11 Jan 2024 13:36:57 +
Subject: [PATCH v4] Fix 035_standby_logical_decoding.pl race conditions

We want to ensure that vacuum was able to remove dead rows (aka no other
transactions holding back global xmin) before testing for slots invalidation on
the standby.

Also, get rid of the active slot invalidation check during the on-access pruning
check. This test is racy for active slots and active slots invalidations are well
covered in other tests.
---
 .../t/035_standby_logical_decoding.pl | 114 +-
 1 file changed, 59 insertions(+), 55 deletions(-)
 100.0% src/test/recovery/t/

diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index 8bc39a5f03..7a50187326 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -191,9 +191,9 @@ sub check_slots_conflict_reason

 # Drop the slots, re-create them, change hot_standby_feedback,
 # check xmin and catalog_xmin values, make slot active and reset stat.
-sub reactive_slots_change_hfs_and_wait_f

Re: Synchronizing slots from primary to standby

2024-01-11 Thread Bertrand Drouvot

Hi,

On Thu, Jan 11, 2024 at 04:22:56PM +0530, Amit Kapila wrote:
> On Tue, Jan 9, 2024 at 6:39 PM Amit Kapila  wrote:
> >
> > +static bool
> > +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
> > {
> > ...
> > + /* Slot ready for sync, so sync it. */
> > + else
> > + {
> > + /*
> > + * Sanity check: With hot_standby_feedback enabled and
> > + * invalidations handled appropriately as above, this should never
> > + * happen.
> > + */
> > + if (remote_slot->restart_lsn < slot->data.restart_lsn)
> > + elog(ERROR,
> > + "cannot synchronize local slot \"%s\" LSN(%X/%X)"
> > + " to remote slot's LSN(%X/%X) as synchronization"
> > + " would move it backwards", remote_slot->name,
> > + LSN_FORMAT_ARGS(slot->data.restart_lsn),
> > + LSN_FORMAT_ARGS(remote_slot->restart_lsn));
> > ...
> > }
> >
> > I was thinking about the above code in the patch and as far as I can
> > think this can only occur if the same name slot is re-created with
> > prior restart_lsn after the existing slot is dropped. Normally, the
> > newly created slot (with the same name) will have higher restart_lsn
> > but one can mimic it by copying some older slot by using
> > pg_copy_logical_replication_slot().
> >
> > I don't think as mentioned in comments even if hot_standby_feedback is
> > temporarily set to off, the above shouldn't happen. It can only lead
> > to invalidated slots on standby.

I also think so.

> >
> > To close the above race, I could think of the following ways:
> > 1. Drop and re-create the slot.
> > 2. Emit LOG/WARNING in this case and once remote_slot's LSN moves
> > ahead of local_slot's LSN then we can update it; but as mentioned in
> > your previous comment, we need to update all other fields as well. If
> > we follow this then we probably need to have a check for catalog_xmin
> > as well.

IIUC, this would be a sync slot (so not usable until promotion) that could
not be used anyway (invalidated), so I'll vote for drop / re-create then.

> > Now, related to this the other case which needs some handling is what
> > if the remote_slot's restart_lsn is greater than local_slot's
> > restart_lsn but it is a re-created slot with the same name. In that
> > case, I think the other properties like 'two_phase', 'plugin' could be
> > different. So, is simply copying those sufficient or do we need to do
> > something else as well?
> >
>

I'm not sure to follow here. If the remote slot is re-created then it would
be also dropped / re-created locally, or am I missing something?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: System username in pg_stat_activity

2024-01-11 Thread Bertrand Drouvot

Hi,

On Thu, Jan 11, 2024 at 02:24:58PM +0100, Magnus Hagander wrote:
> On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot
>  wrote:
> >
> > If we go the 2 fields way, then what about auth_identity and auth_method 
> > then?
> 
> 
> Here is an updated patch based on this idea.

Thanks!

+ 
+  
+   auth_method text
+  
+  
+   The authentication method used for authenticating the connection, or
+   NULL for background processes.
+  

I'm wondering if it would make sense to populate it for parallel workers too.
I think it's doable thanks to d951052, but I'm not sure it's worth it (one could
join based on the leader_pid though). OTOH that would be consistent with
how the SYSTEM_USER behaves with parallel workers (it's populated).

+  
+   auth_identity text
+  
+  
+   The identity (if any) that the user presented during the authentication
+   cycle before they were assigned a database role.  Contains the same
+   value as 

Same remark regarding the parallel workers case +:

- Would it be better to use the `name` datatype for auth_identity?
- what about "Contains the same value as the identity part in "?

+   /*
+* Trust doesn't set_authn_id(), but we still need to 
store the
+* auth_method
+*/
+   MyClientConnectionInfo.auth_method = uaTrust;

+1, I think it is useful here to provide "trust" and not a NULL value in the
context of this patch.

+# pg_stat_activity shold contain trust and empty string for trust auth

typo: s/shold/should/

+# Users with md5 auth should show both auth method and name in pg_stat_activity

what about "show both auth method and identity"?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-01-11 Thread Bertrand Drouvot

Hi,

On Fri, Jan 12, 2024 at 08:42:39AM +0530, Amit Kapila wrote:
> On Thu, Jan 11, 2024 at 9:11 PM Bertrand Drouvot
>  wrote:
> >
> > On Thu, Jan 11, 2024 at 04:22:56PM +0530, Amit Kapila wrote:
> > > >
> > > > To close the above race, I could think of the following ways:
> > > > 1. Drop and re-create the slot.
> > > > 2. Emit LOG/WARNING in this case and once remote_slot's LSN moves
> > > > ahead of local_slot's LSN then we can update it; but as mentioned in
> > > > your previous comment, we need to update all other fields as well. If
> > > > we follow this then we probably need to have a check for catalog_xmin
> > > > as well.
> >
> > IIUC, this would be a sync slot (so not usable until promotion) that could
> > not be used anyway (invalidated), so I'll vote for drop / re-create then.
> >
> 
> No, it can happen for non-sync slots as well.

Yeah, I meant that we could decide to drop/re-create only for sync slots.

> 
> > > > Now, related to this the other case which needs some handling is what
> > > > if the remote_slot's restart_lsn is greater than local_slot's
> > > > restart_lsn but it is a re-created slot with the same name. In that
> > > > case, I think the other properties like 'two_phase', 'plugin' could be
> > > > different. So, is simply copying those sufficient or do we need to do
> > > > something else as well?
> > > >
> > >
> >
> > I'm not sure to follow here. If the remote slot is re-created then it would
> > be also dropped / re-created locally, or am I missing something?
> >
> 
> As our slot-syncing mechanism is asynchronous (from time to time we
> check the slot information on primary), isn't it possible that the
> same name slot is dropped and recreated between slot-sync worker's
> checks?
> 

Yeah, I should have thought harder ;-) So for this case, let's imagine that If 
we
had an easy way to detect that a remote slot has been drop/re-created then I 
think
we would also drop and re-create it on the standby too. 

If so, I think we should then update all the fields (that we're currently 
updating
in the "create locally" case) when we detect that (at least) one of the 
following differs:

- dboid
- plugin
- two_phase

Maybe the "best" approach would be to have a way to detect that a slot has been
re-created on the primary (but that would mean rely on more than the slot name
to "identify" a slot and probably add a new member to the struct to do so).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-01-11 Thread Bertrand Drouvot

Hi,

On Fri, Jan 12, 2024 at 03:46:00AM +, Zhijie Hou (Fujitsu) wrote:
> On Thursday, January 11, 2024 11:42 PM Bertrand Drouvot 
>  wrote:
> 
> Hi,
>  
> > On Thu, Jan 11, 2024 at 04:22:56PM +0530, Amit Kapila wrote:
> > IIUC, this would be a sync slot (so not usable until promotion) that could 
> > not be
> > used anyway (invalidated), so I'll vote for drop / re-create then.
> 
> Such race can happen when user drop and re-create the same failover slot on
> primary as well. For example, user dropped one failover slot and them
> immediately created a new one by copying from an old slot(using
> pg_copy_logical_replication_slot). Then the slotsync worker will find the
> restart_lsn of this slot go backwards.
> 
> The steps:
> 
> SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 
> 'pgoutput', false, false, true);
> SELECT 'init' FROM pg_create_logical_replication_slot('test', 'pgoutput', 
> false, false, true);
> 
> - Advance the restart_lsn of 'test' slot
> CREATE TABLE test2(a int);
> INSERT INTO test2 SELECT generate_series(1,1,1);
> SELECT slot_name FROM pg_replication_slot_advance('test', 
> pg_current_wal_lsn());
> 
> - re-create the test slot but based on the old isolation_slot.
> SELECT pg_drop_replication_slot('test');
> SELECT 'copy' FROM pg_copy_logical_replication_slot('isolation_slot', 'test');
> 
> Then the restart_lsn of 'test' slot will go backwards.

Yeah, that's right.

BTW, I think it's worth to add those "corner cases" in the TAP tests related to
the sync slot feature (the more coverage the better).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

2024-01-11 Thread Bertrand Drouvot

Hi,

On Fri, Jan 12, 2024 at 07:01:55AM +0900, Michael Paquier wrote:
> On Thu, Jan 11, 2024 at 11:00:01PM +0300, Alexander Lakhin wrote:
> > Bertrand, I've relaunched tests in the same slowed down VM with both
> > patches applied (but with no other modifications) and got a failure
> > with pg_class, similar to what we had seen before:
> > 9   #   Failed test 'activeslot slot invalidation is logged with vacuum 
> > on pg_class'
> > 9   #   at t/035_standby_logical_decoding.pl line 230.
> > 
> > Please look at the logs attached (I see there Standby/RUNNING_XACTS near
> > 'invalidating obsolete replication slot "row_removal_inactiveslot"').

Thanks! 

For this one, the "good" news is that it looks like that we don’t see the
"terminating" message not followed by an "obsolete" message (so the engine
behaves correctly) anymore.

There is simply nothing related to the row_removal_activeslot at all (the 
catalog_xmin
advanced and there is no conflict).

And I agree that this is due to the Standby/RUNNING_XACTS that is "advancing" 
the
catalog_xmin of the active slot.

> Standby/RUNNING_XACTS is exactly why 039_end_of_wal.pl uses wal_level
> = minimal, because these lead to unpredictible records inserted,
> impacting the reliability of the tests.  We cannot do that here,
> obviously.  That may be a long shot, but could it be possible to tweak
> the test with a retry logic, retrying things if such a standby
> snapshot is found because we know that the invalidation is not going
> to work anyway?

I think it all depends what the xl_running_xacts does contain (means does it
"advance" or not the catalog_xmin in our case).

In our case it does advance it (should it occurs) due to the "select 
txid_current()"
that is done in wait_until_vacuum_can_remove() in 
035_standby_logical_decoding.pl.

I suggest to make use of txid_current_snapshot() instead (that does not produce
a Transaction/COMMIT wal record, as opposed to txid_current()).

I think that it could be "enough" for our case here, and it's what v5 attached 
is
now doing.

Let's give v5 a try? (please apply 
v1-0001-Fix-race-condition-in-InvalidatePossiblyObsoleteS.patch
too).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From b1c316a85a2dd43a5ab33adbda1ee82020d84bb1 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Thu, 11 Jan 2024 13:36:57 +
Subject: [PATCH v5] Fix 035_standby_logical_decoding.pl race conditions

We want to ensure that vacuum was able to remove dead rows (aka no other
transactions holding back global xmin) before testing for slots invalidation on
the standby.

Also, get rid of the active slot invalidation check during the on-access pruning
check. This test is racy for active slots and active slots invalidations are well
covered in other tests.
---
 .../t/035_standby_logical_decoding.pl | 114 +-
 1 file changed, 59 insertions(+), 55 deletions(-)
 100.0% src/test/recovery/t/

diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index 8bc39a5f03..dd4149c8bc 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -191,9 +191,9 @@ sub check_slots_conflict_reason
 
 # Drop the slots, re-create them, change hot_standby_feedback,
 # check xmin and catalog_xmin values, make slot active and reset stat.
-sub reactive_slots_change_hfs_and_wait_for_xmins
+sub recreate_slots_change_hfs_and_wait_for_xmins
 {
-	my ($previous_slot_prefix, $slot_prefix, $hsf, $invalidated) = @_;
+	my ($previous_slot_prefix, $slot_prefix, $hsf, $invalidated, $activate) = @_;
 
 	# drop the logical slots
 	drop_logical_slots($previous_slot_prefix);
@@ -203,8 +203,11 @@ sub reactive_slots_change_hfs_and_wait_for_xmins
 
 	change_hot_standby_feedback_and_wait_for_xmins($hsf, $invalidated);
 
-	$handle =
-	  make_slot_active($node_standby, $slot_prefix, 1, \$stdout, \$stderr);
+	if ($activate)
+	{
+		$handle =
+			make_slot_active($node_standby, $slot_prefix, 1, \$stdout, \$stderr);
+	}
 
 	# reset stat: easier to check for confl_active_logicalslot in pg_stat_database_conflicts
 	$node_standby->psql('testdb', q[select pg_stat_reset();]);
@@ -213,7 +216,7 @@ sub reactive_slots_change_hfs_and_wait_for_xmins
 # Check invalidation in the logfile and in pg_stat_database_conflicts
 sub check_for_invalidation
 {
-	my ($slot_prefix, $log_start, $test_name) = @_;
+	my ($slot_prefix, $log_start, $test_name, $activated) = @_;
 
 	my $active_slot = $slot_prefix . 'activeslot';
 	my $inactive_slot = $slot_prefix . 'inactiveslot';
@@ -230,14 +233,33 @@ sub check_for_invalidation
 		"activesl

Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

2024-01-12 Thread Bertrand Drouvot

Hi,

On Fri, Jan 12, 2024 at 02:00:01PM +0300, Alexander Lakhin wrote:
> Hi,
> 
> 12.01.2024 10:15, Bertrand Drouvot wrote:
> > 
> > For this one, the "good" news is that it looks like that we don’t see the
> > "terminating" message not followed by an "obsolete" message (so the engine
> > behaves correctly) anymore.
> > 
> > There is simply nothing related to the row_removal_activeslot at all (the 
> > catalog_xmin
> > advanced and there is no conflict).
> 
> Yes, judging from all the failures that we see now, it looks like the
> 0001-Fix-race-condition...patch works as expected.

Yeah, thanks for confirming, I'll create a dedicated hackers thread for that 
one.

> > Let's give v5 a try? (please apply 
> > v1-0001-Fix-race-condition-in-InvalidatePossiblyObsoleteS.patch
> > too).
> 
> Unfortunately, I've got the failure again (please see logs attached).
> (_primary.log can confirm that I have used exactly v5 — I see no
> txid_current() calls there...)

Okay ;-( Thanks for the testing. Then I can think of:

1) Michael's proposal up-thread (means tweak the test with a retry logic, 
retrying
things if such a standby snapshot is found).

2) Don't report a test error for active slots in case its catalog_xmin advanced.

I'd vote for 2) as:

- this is a corner case and the vast majority of the animals don't report any
issues (means the active slot conflict detection is already well covered).

- even on the same animal it should be pretty rare to not have an active slot 
conflict detection not covered at all (and the "failing" one would be probably
moving over time).

- It may be possible that 1) ends up failing (as we'd need to put a limit on the
retry logic anyhow).

What do you think?

And BTW, looking closely at wait_until_vacuum_can_remove(), I'm not sure it's
fully correct, so I'll give it another look.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-01-14 Thread Bertrand Drouvot

Hi hackers,

While working on [1], we discovered (thanks Alexander for the testing) that an
conflicting active logical slot on a standby could be "terminated" without
leading to an "obsolete" message (see [2]).

Indeed, in case of an active slot we proceed in 2 steps in
InvalidatePossiblyObsoleteSlot():

- terminate the backend holding the slot
- report the slot as obsolete

This is racy because between the two we release the mutex on the slot, which
means that the slot's effective_xmin and effective_catalog_xmin could advance
during that time (leading to exit the loop).

I think that holding the mutex longer is not an option (given what we'd to do
while holding it) so the attached proposal is to record the effective_xmin and
effective_catalog_xmin instead that was used during the backend termination.

[1]: 
https://www.postgresql.org/message-id/flat/bf67e076-b163-9ba3-4ade-b9fc51a3a8f6%40gmail.com
[2]: 
https://www.postgresql.org/message-id/7c025095-5763-17a6-8c80-899b35bd0459%40gmail.com

Looking forward to your feedback,

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 2e4d580af4a1e6b2cb000d4e9db2c42e40aa4cd2 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Thu, 11 Jan 2024 13:57:53 +
Subject: [PATCH v1] Fix race condition in InvalidatePossiblyObsoleteSlot()

In case of an active slot we proceed in 2 steps:
- terminate the backend holding the slot
- report the slot as obsolete

This is racy because between the two we release the mutex on the slot, which
means the effective_xmin and effective_catalog_xmin could advance during that time.

Holding the mutex longer is not an option (given what we'd to do while holding it)
so record the effective_xmin and effective_catalog_xmin instead that was used during
the backend termination.
---
 src/backend/replication/slot.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)
 100.0% src/backend/replication/

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 52da694c79..2e34cca0e8 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1352,6 +1352,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 {
 	int			last_signaled_pid = 0;
 	bool		released_lock = false;
+	bool		terminated = false;
+	XLogRecPtr	initial_effective_xmin;
+	XLogRecPtr	initial_catalog_effective_xmin;
 
 	for (;;)
 	{
@@ -1396,15 +1399,20 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 case RS_INVAL_HORIZON:
 	if (!SlotIsLogical(s))
 		break;
+	if (!terminated)
+	{
+		initial_effective_xmin = s->effective_xmin;
+		initial_catalog_effective_xmin = s->effective_catalog_xmin;
+	}
 	/* invalid DB oid signals a shared relation */
 	if (dboid != InvalidOid && dboid != s->data.database)
 		break;
-	if (TransactionIdIsValid(s->effective_xmin) &&
-		TransactionIdPrecedesOrEquals(s->effective_xmin,
+	if (TransactionIdIsValid(initial_effective_xmin) &&
+		TransactionIdPrecedesOrEquals(initial_effective_xmin,
 	  snapshotConflictHorizon))
 		conflict = cause;
-	else if (TransactionIdIsValid(s->effective_catalog_xmin) &&
-			 TransactionIdPrecedesOrEquals(s->effective_catalog_xmin,
+	else if (TransactionIdIsValid(initial_catalog_effective_xmin) &&
+			 TransactionIdPrecedesOrEquals(initial_catalog_effective_xmin,
 		   snapshotConflictHorizon))
 		conflict = cause;
 	break;
@@ -1499,6 +1507,7 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 	(void) kill(active_pid, SIGTERM);
 
 last_signaled_pid = active_pid;
+terminated = true;
 			}
 
 			/* Wait until the slot is released. */
-- 
2.34.1

Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

2024-01-15 Thread Bertrand Drouvot

Hi,

On Mon, Jan 15, 2024 at 01:11:26PM +0900, Michael Paquier wrote:
> On Sun, Jan 14, 2024 at 11:08:39PM -0500, Tom Lane wrote:
> > Michael Paquier  writes:
> >> While thinking about that, a second idea came into my mind: a
> >> superuser-settable developer GUC to disable such WAL records to be
> >> generated within certain areas of the test.  This requires a small
> >> implementation, but nothing really huge, while being portable
> >> everywhere.  And it is not the first time I've been annoyed with these
> >> records when wanting a predictible set of WAL records for some test
> >> case.
> > 
> > Hmm ... I see what you are after, but to what extent would this mean
> > that what we are testing is not our real-world behavior?
> 
> Don't think so.  We don't care much about these records when it comes
> to checking slot invalidation scenarios with a predictible XID
> horizon, AFAIK.

Yeah, we want to test slot invalidation behavior so we need to ensure that such
an invalidation occur (which is not the case if we get a xl_running_xacts in the
middle) at the first place.

OTOH I also see Tom's point: for example I think we'd not have discovered [1]
(outside from the field) with such a developer GUC in place.

We did a few things in this thread, so to sum up what we've discovered:

- a race condition in InvalidatePossiblyObsoleteSlot() (see [1])
- we need to launch the vacuum(s) only if we are sure we got a newer XID horizon
( proposal in in v6 attached)
- we need a way to control how frequent xl_running_xacts are emmitted (to ensure
they are not triggered in a middle of an active slot invalidation test).

I'm not sure it's possible to address Tom's concern and keep the test 
"predictable".

So, I think I'd vote for Michael's proposal to implement a superuser-settable
developer GUC (as sending a SIGSTOP on the bgwriter (and bypass $windows_os) 
would
still not address Tom's concern anyway).

Another option would be to "sacrifice" the full predictablity of the test (in
favor of real-world behavior testing)?

[1]: 
https://www.postgresql.org/message-id/ZaTjW2Xh%2BTQUCOH0%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From a35a308626b6b61c3994531cbf89fe835f4842c2 Mon Sep 17 00:00:00 2001
From: bdrouvot 
Date: Tue, 9 Jan 2024 05:08:35 +
Subject: [PATCH v6] Fix 035_standby_logical_decoding.pl race condition

We want to ensure that vacuum was able to remove dead rows (aka no other
transactions holding back global xmin) before testing for slots invalidation on
the standby.
---
 .../t/035_standby_logical_decoding.pl | 59 ++-
 1 file changed, 30 insertions(+), 29 deletions(-)
 100.0% src/test/recovery/t/

diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index 8bc39a5f03..9bfa8833b5 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -238,6 +238,25 @@ sub check_for_invalidation
 	) or die "Timed out waiting confl_active_logicalslot to be updated";
 }
 
+# Launch $sql and wait for a new snapshot that has a newer horizon before
+# doing the vacuum with $vac_option on $to_vac.
+sub wait_until_vacuum_can_remove
+{
+	my ($vac_option, $sql, $to_vac) = @_;
+
+	# get the current xid horizon
+	my $xid_horizon = $node_primary->safe_psql('testdb', qq[select pg_snapshot_xmin(pg_current_snapshot());]);
+
+	# launch our sql
+	$node_primary->safe_psql('testdb', qq[$sql]);
+
+	$node_primary->poll_query_until('testdb',
+		"SELECT (select pg_snapshot_xmin(pg_current_snapshot())::text::int - $xid_horizon) > 0")
+	  or die "new snapshot does not have a newer horizon";
+
+	$node_primary->safe_psql('testdb', qq[VACUUM $vac_option verbose $to_vac;
+		  INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal;]);
+}
 
 # Initialize primary node
 
@@ -248,6 +267,7 @@ $node_primary->append_conf(
 wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
+autovacuum = off
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -468,13 +488,8 @@ reactive_slots_change_hfs_and_wait_for_xmins('behaves_ok_', 'vacuum_full_',
 	0, 1);
 
 # This should trigger the conflict
-$node_primary->safe_psql(
-	'testdb', qq[
-  CREATE TABLE conflict_test(x integer, y text);
-  DROP TABLE conflict_test;
-  VACUUM full pg_class;
-  INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal
-]);
+wait_until_vacuum_can_remove('full', 'CREATE TABLE confl

Re: Synchronizing slots from primary to standby

2024-01-15 Thread Bertrand Drouvot

Hi,

On Sat, Jan 13, 2024 at 10:05:52AM +0530, Amit Kapila wrote:
> On Fri, Jan 12, 2024 at 12:07 PM Bertrand Drouvot
>  wrote:
> > Maybe the "best" approach would be to have a way to detect that a slot has 
> > been
> > re-created on the primary (but that would mean rely on more than the slot 
> > name
> > to "identify" a slot and probably add a new member to the struct to do so).
> >
> 
> Right, I also thought so but not sure further complicating the slot
> machinery is worth detecting this case explicitly. If we see any
> problem with the idea discussed then we may need to think something
> along those lines.

Yeah, let's see. On one side that would require extra work but on the other side
that would also probably simplify (and less bug prone in the mid-long term?)
other parts of the code.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: System username in pg_stat_activity

2024-01-15 Thread Bertrand Drouvot

Hi,

On Fri, Jan 12, 2024 at 05:16:53PM +0100, Magnus Hagander wrote:
> On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot
>  wrote:
> >
> > I'm wondering if it would make sense to populate it for parallel workers 
> > too.
> > I think it's doable thanks to d951052, but I'm not sure it's worth it (one 
> > could
> > join based on the leader_pid though). OTOH that would be consistent with
> > how the SYSTEM_USER behaves with parallel workers (it's populated).
> 
> I guess one could conceptually argue that "authentication happens int
> he leader". But we do populate it with the other user records, and
> it'd be weird if this one was excluded.
> 
> The tricky thing is that pgstat_bestart() is called long before we
> deserialize the data. But from what I can tell it should be safe to
> change it per the attached? That should be AFAICT an extremely short
> window of time longer before we report it, not enough to matter.

Thanks! Yeah, that seems reasonable to me. Also, I think we should remove the
"MyProcPort" test here then (looking at v3):

+   if (MyProcPort && MyClientConnectionInfo.authn_id)
+   strlcpy(lbeentry.st_auth_identity, 
MyClientConnectionInfo.authn_id, NAMEDATALEN);
+   else
+   MemSet(&lbeentry.st_auth_identity, 0, 
sizeof(lbeentry.st_auth_identity));

to get the st_auth_identity propagated to the parallel workers.

> >
> > Same remark regarding the parallel workers case +:
> >
> > - Would it be better to use the `name` datatype for auth_identity?
> 
> I've been going back and forth. And I think my conclusion is that it's
> not a postgres identifier, so it shouldn't be. See the earlier
> discussion, and for example that that's what we do for cert names when
> SSL is used.

Yeah, Okay let's keep text then.

> 
> > - what about "Contains the same value as the identity part in  > linkend="system-user" />"?

Not sure, but looks like you missed this comment?

> >
> > +   /*
> > +* Trust doesn't set_authn_id(), but we still need 
> > to store the
> > +* auth_method
> > +*/
> > +   MyClientConnectionInfo.auth_method = uaTrust;
> >
> > +1, I think it is useful here to provide "trust" and not a NULL value in the
> > context of this patch.
> 
> Yeah, that's probably "independently correct", but actually useful here.

+1

> > +# Users with md5 auth should show both auth method and name in 
> > pg_stat_activity
> >
> > what about "show both auth method and identity"?
> 
> Good spot, yeah, I changed it over to identity everywhere else so it
> should be here as well.

Did you forget to share the new revision (aka v4)? I can only see the
"reorder_parallel_worker_bestart.patch" attached.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-01-16 Thread Bertrand Drouvot

Hi,

On Sat, Jan 13, 2024 at 12:53:50PM +0530, Amit Kapila wrote:
> On Fri, Jan 12, 2024 at 5:50 PM shveta malik  wrote:
> >
> > There are multiple approaches discussed and tried when it comes to
> > starting a slot-sync worker. I am summarizing all here:
> >
> >  1) Make slotsync worker as an Auxiliary Process (like checkpointer,
> > walwriter, walreceiver etc). The benefit this approach provides is, it
> > can control begin and stop in a more flexible way as each auxiliary
> > process could have different checks before starting and can have
> > different stop conditions. But it needs code duplication for process
> > management(start, stop, crash handling, signals etc) and currently it
> > does not support db-connection smoothly (none of the auxiliary process
> > has one so far)
> >
> 
> As slotsync worker needs to perform transactions and access syscache,
> we can't make it an auxiliary process as that doesn't initialize the
> required stuff like syscache. Also, see the comment "Auxiliary
> processes don't run transactions ..." in AuxiliaryProcessMain() which
> means this is not an option.
> 
> >
> > 2) Make slotsync worker as a 'special' process like AutoVacLauncher
> > which is neither an Auxiliary process nor a bgworker one. It allows
> > db-connection and also provides flexibility to have start and stop
> > conditions for a process.
> >
> 
> Yeah, due to these reasons, I think this option is worth considering
> and another plus point is that this allows us to make enable_syncslot
> a PGC_SIGHUP GUC rather than a PGC_POSTMASTER.
> 
> >
> > 3) Make slotysnc worker a bgworker. Here we just need to register our
> > process as a bgworker (RegisterBackgroundWorker()) by providing a
> > relevant start_time and restart_time and then the process management
> > is well taken care of. It does not need any code-duplication and
> > allows db-connection smoothly in registered process. The only thing it
> > lacks is that it does not provide flexibility of having
> > start-condition which then makes us to have 'enable_syncslot' as
> > PGC_POSTMASTER parameter rather than PGC_SIGHUP. Having said this, I
> > feel enable_syncslot is something which will not be changed frequently
> > and with the benefits provided by bgworker infra, it seems a
> > reasonably good option to choose this approach.
> >
> 
> I agree but it may be better to make it a PGC_SIGHUP parameter.
> 
> > 4) Another option is to have Logical Replication Launcher(or a new
> > process) to launch slot-sync worker. But going by the current design
> > where we have only 1 slotsync worker, it may be an overhead to have an
> > additional manager process maintained.
> >
> 
> I don't see any good reason to have an additional launcher process here.
> 
> >
> > Thus weighing pros and cons of all these options, we have currently
> > implemented the bgworker approach (approach 3).  Any feedback is
> > welcome.
> >
> 
> I vote to go for (2) unless we face difficulties in doing so but (3)
> is also okay especially if others also think so.
> 

Yeah, I think that (2) would be the "ideal" one but (3) is fine too. I think
that if we think/see that (2) is too "complicated"/long to implement maybe we
could do (3) initially and switch to (2) later. What I mean by that is that I
don't think that not doing (2) should be a blocker.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-01-17 Thread Bertrand Drouvot

Hi,

On Tue, Jan 16, 2024 at 05:27:05PM +0530, shveta malik wrote:
> PFA v62. Details:

Thanks! 

> v62-003:
> It is a new patch which attempts to implement slot-sync worker as a
> special process which is neither a bgworker nor an Auxiliary process.
> Here we get the benefit of converting enable_syncslot to a PGC_SIGHUP
> Guc rather than PGC_POSTMASTER. We launch the slot-sync worker only if
> it is hot-standby and 'enable_syncslot' is ON.

The implementation looks reasonable to me (from what I can see some parts is
copy/paste from an already existing "special" process and some parts are
"sync slot" specific) which makes fully sense.

A few remarks:

1 ===
+* Was it the slot sycn worker?

Typo: sycn

2 ===
+* ones), and no walwriter, autovac launcher or bgwriter or 
slot sync

Instead? "* ones), and no walwriter, autovac launcher, bgwriter or slot sync"

3 ===
+ * restarting slot slyc worker. If stopSignaled is set, the worker will

Typo: slyc

4 ===
+/* Flag to tell if we are in an slot sync worker process */

s/an/a/ ?

5 === (coming from v62-0002)
+   Assert(tuplestore_tuple_count(res->tuplestore) == 1);

Is it even possible for the related query to not return only one row? (I think 
the 
"count" ensures it).

6 ===
if (conninfo_changed ||
primary_slotname_changed ||
+   old_enable_syncslot != enable_syncslot ||
(old_hot_standby_feedback != hot_standby_feedback))
{
ereport(LOG,
errmsg("slot sync worker will restart because 
of"
   " a parameter change"));

I don't think "slot sync worker will restart" is true if one change 
enable_syncslot
from on to off.

IMHO, v62-003 is in good shape and could be merged in v62-002 (that would ease
the review). But let's wait to see if others think differently.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Fix race condition in InvalidatePossiblyObsoleteSlot()

2024-01-18 Thread Bertrand Drouvot

Hi,

On Thu, Jan 18, 2024 at 04:59:39PM +0530, Bharath Rupireddy wrote:
> IIUC, the issue is that we terminate the process holding the
> replication slot, and the conflict cause that we recorded before
> terminating the process changes in the next iteration due to the
> advancement in effective_xmin and/or effective_catalog_xmin.

Thanks for looking at it!

Yeah, and that could lead to no conflict detected anymore (like in the
case [2] reported up-thread).

> FWIW, a test code something like [1], can help detect above race issues, 
> right?

I think so and I've added it in v2 attached (except that it uses the new
"terminated" variable, see below), thanks!

> Some comments on the patch:
> 
> 1.
>  last_signaled_pid = active_pid;
> +terminated = true;
>  }
> 
> Why is a separate variable needed? Can't last_signaled_pid != 0 enough
> to determine that a process was terminated earlier?

Yeah probably, I thought about it but preferred to add a new variable for this 
purpose for clarity and avoid race conditions (in case futur patches "touch" the
last_signaled_pid anyhow). I was thinking that this part of the code is already
not that easy.

> 2. If my understanding of the racy behavior is right, can the same
> issue happen due to the advancement in restart_lsn?

I'm not sure as I never saw it but it should not hurt to also consider this
"potential" case so it's done in v2 attached.

> I'm not sure if it
> can happen at all, but I think we can rely on previous conflict reason
> instead of previous effective_xmin/effective_catalog_xmin/restart_lsn.

I'm not sure what you mean here. I think we should still keep the "initial" LSN
so that the next check done with it still makes sense. The previous conflict
reason as you're proposing also makes sense to me but for another reason: PANIC
in case the issue still happen (for cases we did not think about, means not
covered by what the added previous LSNs are covering).

> 3. Is there a way to reproduce this racy issue, may be by adding some
> sleeps or something? If yes, can you share it here, just for the
> records and verify the whatever fix provided actually works?

Alexander was able to reproduce it on a slow machine and the issue was not there
anymore with v1 in place. I think it's tricky to reproduce as it would need the
slot to advance between the 2 checks.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 4665945cba163bcb4beadc6391ee65c755e566d8 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Thu, 11 Jan 2024 13:57:53 +
Subject: [PATCH v2] Fix race condition in InvalidatePossiblyObsoleteSlot()

In case of an active slot we proceed in 2 steps:
- terminate the backend holding the slot
- report the slot as obsolete

This is racy because between the two we release the mutex on the slot, which
means the effective_xmin and effective_catalog_xmin could advance during that time.

Holding the mutex longer is not an option (given what we'd to do while holding it)
so record the previous LSNs instead that were used during the backend termination.
---
 src/backend/replication/slot.c | 37 --
 1 file changed, 31 insertions(+), 6 deletions(-)
 100.0% src/backend/replication/

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 52da694c79..f136916285 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1352,6 +1352,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 {
 	int			last_signaled_pid = 0;
 	bool		released_lock = false;
+	bool		terminated = false;
+	XLogRecPtr	initial_effective_xmin = InvalidXLogRecPtr;
+	XLogRecPtr	initial_catalog_effective_xmin = InvalidXLogRecPtr;
+	XLogRecPtr	initial_restart_lsn = InvalidXLogRecPtr;
+	ReplicationSlotInvalidationCause conflict_prev = RS_INVAL_NONE;
 
 	for (;;)
 	{
@@ -1386,11 +1391,18 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 		 */
 		if (s->data.invalidated == RS_INVAL_NONE)
 		{
+			if (!terminated)
+			{
+initial_restart_lsn = s->data.restart_lsn;
+initial_effective_xmin = s->effective_xmin;
+initial_catalog_effective_xmin = s->effective_catalog_xmin;
+			}
+
 			switch (cause)
 			{
 case RS_INVAL_WAL_REMOVED:
-	if (s->data.restart_lsn != InvalidXLogRecPtr &&
-		s->data.restart_lsn < oldestLSN)
+	if (initial_restart_lsn != InvalidXLogRecPtr &&
+		initial_restart_lsn < oldestLSN)
 		conflict = cause;
 	break;
 case RS_INVAL_HORIZON:
@@ -1399,12 +1411,12 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
 	/* invalid DB oid signals a shared relation */
 	if (dboid

Re: System username in pg_stat_activity

2024-01-18 Thread Bertrand Drouvot

Hi,

On Thu, Jan 18, 2024 at 04:01:33PM +0100, Magnus Hagander wrote:
> On Mon, Jan 15, 2024 at 11:17 AM Bertrand Drouvot
> > Did you forget to share the new revision (aka v4)? I can only see the
> > "reorder_parallel_worker_bestart.patch" attached.
> 
> I did. Here it is, and also including that suggested docs fix as well
> as a rebase on current master.
> 

Thanks!

Just a few comments:

1 ===

+   The authentication method used for authenticating the connection, or
+   NULL for background processes.

What about? "NULL for background processes (except for parallel workers which
inherit this information from their leader process)"

2 ===

+   cycle before they were assigned a database role.  Contains the same
+   value as the identity part in , or NULL
+   for background processes.

Same comment about parallel workers.

3 ===

+# pg_stat_activity should contain trust and empty string for trust auth
+$res = $node->safe_psql(
+   'postgres',
+   "SELECT auth_method, auth_identity='' FROM pg_stat_activity WHERE 
pid=pg_backend_pid()",
+   connstr => "user=scram_role");
+is($res, 'trust|t', 'Users with trust authentication should show correctly in 
pg_stat_activity');
+
+# pg_stat_activity should contain NULL for auth of background processes
+# (test is a bit out of place here, but this is where the other 
pg_stat_activity.auth* tests are)
+$res = $node->safe_psql(
+   'postgres',
+   "SELECT auth_method IS NULL, auth_identity IS NULL FROM 
pg_stat_activity WHERE backend_type='checkpointer'",
+);
+is($res, 't|t', 'Background processes should show NULL for auth in 
pg_stat_activity');

What do you think about testing the parallel workers cases too? (I'm not 100%
sure it's worth the extra complexity though).

Apart from those 3, it looks good to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-01-19 Thread Bertrand Drouvot

Hi,

On Fri, Jan 19, 2024 at 11:46:51AM +0530, shveta malik wrote:
> On Fri, Jan 19, 2024 at 11:23 AM Amit Kapila  wrote:
> >
> > > > 5 === (coming from v62-0002)
> > > > +   Assert(tuplestore_tuple_count(res->tuplestore) == 1);
> > > >
> > > > Is it even possible for the related query to not return only one row? 
> > > > (I think the
> > > > "count" ensures it).
> > >
> > > I think you are right. This assertion was added sometime back on the
> > > basis of feedback on hackers. Let me review that again. I can consider
> > > this comment in the next version.
> > >
> >
> > OTOH, can't we keep the assert as it is but remove "= 1" from
> > "count(*) = 1" in the query. There shouldn't be more than one slot
> > with same name on the primary. Or, am I missing something?
> 
> There will be 1 record max and 0 record if the primary_slot_name is
> invalid. 

I think we'd have exactly one record in all the cases (due to the count):

postgres=# SELECT pg_is_in_recovery(), count(*) FROM pg_replication_slots WHERE 
1 = 2;
 pg_is_in_recovery | count
---+---
 f | 0
(1 row)

postgres=# SELECT pg_is_in_recovery(), count(*) FROM pg_replication_slots WHERE 
1 = 1;
 pg_is_in_recovery | count
---+---
 f | 1
(1 row)

> Keeping 'count(*)=1' gives the benefit that it will straight
> away give us true/false indicating if we are good or not wrt
> primary_slot_name. I feel Assert can be removed and we can simply
> have:
> 
>     if (!tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
> elog(ERROR, "failed to fetch primary_slot_name tuple");
> 

I'd also vote for keeping it as it is and remove the Assert.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

2024-01-19 Thread Bertrand Drouvot

Hi,

On Fri, Jan 19, 2024 at 09:00:01AM +0300, Alexander Lakhin wrote:
> Hello,
> 
> 15.01.2024 12:00, Alexander Lakhin wrote:
> > If this approach looks promising to you, maybe we could add a submodule to
> > perl/PostgreSQL/Test/ and use this functionality in other tests (e.g., in
> > 019_replslot_limit) as well.
> > 
> > Personally I think that having such a functionality for using in tests
> > might be useful not only to avoid some "problematic" behaviour but also to
> > test the opposite cases.
> 
> After spending a few days on it, I've discovered two more issues:
> https://www.postgresql.org/message-id/16d6d9cc-f97d-0b34-be65-425183ed3721%40gmail.com
> https://www.postgresql.org/message-id/b0102688-6d6c-c86a-db79-e0e91d245b1a%40gmail.com
> 
> (The latter is not related to bgwriter directly, but it was discovered
> thanks to the RUNNING_XACTS record flew in WAL in a lucky moment.)
> 
> So it becomes clear that the 035 test is not the only one, which might
> suffer from bgwriter's activity,

Yeah... thanks for sharing!

> and inventing a way to stop bgwriter/
> control it is a different subject, getting out of scope of the current
> issue.

Agree.

> 15.01.2024 11:49, Bertrand Drouvot wrote:
> > We did a few things in this thread, so to sum up what we've discovered:
> > 
> > - a race condition in InvalidatePossiblyObsoleteSlot() (see [1])
> > - we need to launch the vacuum(s) only if we are sure we got a newer XID 
> > horizon
> > ( proposal in in v6 attached)
> > - we need a way to control how frequent xl_running_xacts are emmitted (to 
> > ensure
> > they are not triggered in a middle of an active slot invalidation test).
> > 
> > I'm not sure it's possible to address Tom's concern and keep the test 
> > "predictable".
> > 
> > So, I think I'd vote for Michael's proposal to implement a 
> > superuser-settable
> > developer GUC (as sending a SIGSTOP on the bgwriter (and bypass 
> > $windows_os) would
> > still not address Tom's concern anyway).
> > 
> > Another option would be to "sacrifice" the full predictablity of the test 
> > (in
> > favor of real-world behavior testing)?
> > 
> > [1]: 
> > https://www.postgresql.org/message-id/ZaTjW2Xh%2BTQUCOH0%40ip-10-97-1-34.eu-west-3.compute.internal
> 
> So, now we have the test 035 failing due to nondeterministic vacuum
> activity in the first place, and due to bgwriter's activity in the second.

Yeah, that's also my understanding.

> Maybe it would be a right move to commit the fix, and then think about
> more rare failures.

+1

> Though I have a couple of question regarding the fix left, if you don't
> mind:
> 1) The test has minor defects in the comments, that I noted before [1];
> would you like to fix them in passing?
> 
> > BTW, it looks like the comment:
> > # One way to produce recovery conflict is to create/drop a relation and
> > # launch a vacuum on pg_class with hot_standby_feedback turned off on the 
> > standby.
> > in scenario 3 is a copy-paste error.

Nice catch, thanks! Fixed in v7 attached.

> > Also, there are two "Scenario 4" in this test.

D'oh! Fixed in v7.

> > 
> 
> 2) Shall we align the 035_standby_logical_decoding with
> 031_recovery_conflict in regard to improving stability of vacuum?

Yeah, I think that could make sense.

> I see the following options for this:
> a) use wait_until_vacuum_can_remove() and autovacuum = off in both tests;
> b) use FREEZE and autovacuum = off in both tests;
> c) use wait_until_vacuum_can_remove() in 035, FREEZE in 031, and
>  autovacuum = off in both.
>

I'd vote for a) as I've the feeling it's "easier" to understand (and I'm not
sure using FREEZE would give full "stabilization predictability", at least for
035_standby_logical_decoding.pl). That said I did not test what the outcome 
would
be for 031_recovery_conflict.pl by making use of a).

> I've re-tested the v6 patch today and confirmed that it makes the test
> more stable. I ran (with bgwriter_delay = 1 in temp.config) 20 tests in
> parallel and got failures ('inactiveslot slot invalidation is logged with
> vacuum on pg_authid') on iterations 2, 6, 6 with no patch applied.
> (With unlimited CPU, when average test duration is around 70 seconds.)
> 
> But with v6 applied, 60 iterations succeeded.

Nice! Thanks for the testing!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From b45f0c55d17cddfac7c99d11000b819c8b09fa56 Mon Sep 17 00:00:00 20

Re: Synchronizing slots from primary to standby

2024-01-21 Thread Bertrand Drouvot

Hi,

On Fri, Jan 19, 2024 at 05:23:53PM +0530, Amit Kapila wrote:
> On Wed, Jan 17, 2024 at 4:00 PM shveta malik  wrote:
> >
> 
> Now, the concerns related to this could be that users would probably
> need to change existing mechanisms/tools to update priamry_conninfo

Yeah, for the ones that want the sync slot feature.

> and one of the alternatives proposed is to have an additional GUC like
> slot_sync_dbname. Users won't be able to drop the database this worker
> is connected to aka whatever is specified in slot_sync_dbname but as
> the user herself sets up the configuration it shouldn't be a big deal.

Same point of view here.

> Then we also discussed whether extending libpqwalreceiver's connect
> API is a good idea and whether we need to further extend it in the
> future. As far as I can see, slotsync worker's primary requirement is
> to execute SQL queries which the current API is sufficient, and don't
> see something that needs any drastic change in this API. Note that
> tablesync worker that executes SQL also uses these APIs, so we may
> need something in the future for either of those. Then finally we need
> a slotsync worker to also connect to a database to use SQL and fetch
> results.
>

On my side the nits concerns about using the libpqrcv_connect / walrcv_connect 
are:

- cosmetic: the "rcv" do not really align with the sync slot worker
- we're using a WalReceiverConn, while a PGconn should suffice. From what I can
see the "overhead" is (1 byte + 7 bytes hole + 8 bytes). I don't think that's a
big deal even if we switch to a multi sync slot worker design later on.

Those have already been discussed in [1] and I'm fine with them.

> Now, let us consider if we extend the replication commands like
> READ_REPLICATION_SLOT and or introduce a new set of replication
> commands to fetch the required information then we don't need a DB
> connection with primary or a connection in slotsync worker. As per my
> current understanding, it is quite doable but I think we will slowly
> go in the direction of making replication commands something like SQL
> because today we need to extend it to fetch all slots info that have
> failover marked as true, the existence of a particular replication,
> etc. Then tomorrow, if we want to extend this work to have multiple
> slotsync workers say workers perdb then we have to extend the
> replication command to fetch per-database failover marked slots. To
> me, it sounds more like we are slowly adding SQL-like features to
> replication commands.

Agree. Also it seems to me that extending the replication commands is more like
a one-way door change.

> Apart from this when we are reading per-db replication slots without
> connecting to a database, we probably need some additional protection
> mechanism so that the database won't get dropped.
> 
> Considering all this it seems that for now probably extending
> replication commands can simplify a few things like mentioned above
> but using SQL's with db-connection is more extendable.

I'd vote for using a SQL db-connection (like we are doing currently).
It seems more extendable and more a two-way door (as compared to extending the
replication commands): I think it still gives us the flexibility to switch to
extending the replication commands if we want to in the future.

[1]: 
https://www.postgresql.org/message-id/ZZe6sok7IWmhKReU%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

2024-01-22 Thread Bertrand Drouvot

Hi,

On Mon, Jan 22, 2024 at 03:54:44PM +0900, Michael Paquier wrote:
> On Fri, Jan 19, 2024 at 09:03:01AM +0000, Bertrand Drouvot wrote:
> > On Fri, Jan 19, 2024 at 09:00:01AM +0300, Alexander Lakhin wrote:
> +# Launch $sql and wait for a new snapshot that has a newer horizon before
> +# doing the vacuum with $vac_option on $to_vac.
> +sub wait_until_vacuum_can_remove
> 
> This had better document what the arguments of this routine are,
> because that's really unclear.  $to_vac is the relation that will be
> vacuumed, for one.

Agree, done that way in v8 attached.

> Also, wouldn't it be better to document in the test why
> txid_current_snapshot() is chosen?  Contrary to txid_current(), it
> does not generate a Transaction/COMMIT to make the test more
> predictible, something you have mentioned upthread, and implied in the
> test.

Good point, added more comments in v8 (but not mentioning txid_current() as 
after giving more thought about it then I think it was not the right approach in
any case).

> 
> -  INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal
> 
> This removes two INSERTs on flush_wal and refactors the code to do the
> INSERT in wait_until_vacuum_can_remove(), using a SQL comment to
> document a reference about the reason why an INSERT is used.  Couldn't
> you just use a normal comment here?

Agree, done in v8.

> >> I've re-tested the v6 patch today and confirmed that it makes the test
> >> more stable. I ran (with bgwriter_delay = 1 in temp.config) 20 tests in
> >> parallel and got failures ('inactiveslot slot invalidation is logged with
> >> vacuum on pg_authid') on iterations 2, 6, 6 with no patch applied.
> >> (With unlimited CPU, when average test duration is around 70 seconds.)
> >> 
> >> But with v6 applied, 60 iterations succeeded.
> > 
> > Nice! Thanks for the testing!
> 
> I need to review what you have more thoroughly, but is it OK to assume
> that both of you are happy with the latest version of the patch in
> terms of stability gained?  That's still not the full picture, still a
> good step towards that.

Yeah, I can clearly see how this patch helps from a theoritical perspective but
rely on Alexander's testing to see how it actually stabilizes the test.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 1380408965d8c7cb1a2e63043af7d1b8033a7e89 Mon Sep 17 00:00:00 2001
From: bdrouvot 
Date: Tue, 9 Jan 2024 05:08:35 +
Subject: [PATCH v8] Fix 035_standby_logical_decoding.pl race condition

We want to ensure that vacuum was able to remove dead rows (aka no other
transactions holding back global xmin) before testing for slots invalidation on
the standby.

While at it, also fixing some typos/bad test description in it.
---
 .../t/035_standby_logical_decoding.pl | 78 +++
 1 file changed, 45 insertions(+), 33 deletions(-)
 100.0% src/test/recovery/t/

diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index 8bc39a5f03..6664405772 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -238,6 +238,35 @@ sub check_for_invalidation
 	) or die "Timed out waiting confl_active_logicalslot to be updated";
 }
 
+# Launch $sql and wait for a new snapshot that has a newer horizon before
+# doing the vacuum. Arguments are: $vac_option (the option to be passed to the
+# vacuum command), $sql (the sql to launch before triggering the vacuum) and
+# $to_vac (the relation to vacuum).
+# Note that to get the horizon we're using pg_current_snapshot() (it does not
+# generate a Transaction/COMMIT wal record, so we don't increase the risk to see
+# a xl_running_xacts that would advance active replication slot's catalog_xmin).
+# Indeed advancing the active replication slot's catalog_xmin would break some
+# tests that expect the active slot to conflict with the catalog xmin horizon.
+sub wait_until_vacuum_can_remove
+{
+	my ($vac_option, $sql, $to_vac) = @_;
+
+	# Get the current xid horizon
+	my $xid_horizon = $node_primary->safe_psql('testdb', qq[select pg_snapshot_xmin(pg_current_snapshot());]);
+
+	# Launch our sql
+	$node_primary->safe_psql('testdb', qq[$sql]);
+
+	# Wait until we get a newer horizon
+	$node_primary->poll_query_until('testdb',
+		"SELECT (select pg_snapshot_xmin(pg_current_snapshot())::text::int - $xid_horizon) > 0")
+	  or die "new snapshot does not have a newer horizon";
+
+	# Launch the vacuum command and insert into flush_wal (see create table
+	# flush_wal for the reason why).
+	$node_primary->safe_psql('tes

Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()

2024-01-22 Thread Bertrand Drouvot

Hi,

On Mon, Jan 22, 2024 at 04:14:46PM +1100, Peter Smith wrote:
> 2024-01 Commitfest.
> 
> Hi, This patch has a CF status of "Needs Review" [1], but it seems
> there were CFbot test failures last time it was run [2]. Please have a
> look and post an updated version if necessary.

Thanks for the warning! I don't think the current code is causing any issues so
given the feedback I've had so far I think I'll withdraw the patch.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

2024-01-23 Thread Bertrand Drouvot

Hi,

On Tue, Jan 23, 2024 at 02:50:06PM +0900, Michael Paquier wrote:
> On Mon, Jan 22, 2024 at 09:07:45AM +0000, Bertrand Drouvot wrote:
> 
> I've rewritten some of that, and applied the patch down to v16.

Thanks!

> > Yeah, I can clearly see how this patch helps from a theoritical perspective 
> > but
> > rely on Alexander's testing to see how it actually stabilizes the test.
> 
> Anyway, that's not the end of it.  What should we do for snapshot
> snapshot records coming from the bgwriter?

I've mixed feeling about it. On one hand we'll decrease even more the risk of
seeing a xl_running_xacts in the middle of a test, but on the other hand I agree
with Tom's concern [1]: I think that we could miss "corner cases/bug" detection
(like the one reported in [2]).

What about?

1) Apply "wait_until_vacuum_can_remove() + autovacuum disabled" where it makes
sense and for tests that suffers from random xl_running_xacts. I can look at
031_recovery_conflict.pl, do you have others in mind?
2) fix [2]
3) depending on how stabilized this test (and others that suffer from "random"
xl_running_xacts) is, then think about the bgwriter.

[1]: https://www.postgresql.org/message-id/1375923.1705291719%40sss.pgh.pa.us
[2]: 
https://www.postgresql.org/message-id/flat/zatjw2xh+tquc...@ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-01-24 Thread Bertrand Drouvot

Hi,

On Wed, Jan 24, 2024 at 01:51:54PM +0530, Amit Kapila wrote:
> On Wed, Jan 24, 2024 at 11:24 AM Masahiko Sawada  
> wrote:
> >
> > On Wed, Jan 24, 2024 at 2:43 PM Amit Kapila  wrote:
> > >
> > > >
> > > > +/* GUC variable */
> > > > +bool   enable_syncslot = false;
> > > >
> > > > Is enable_syncslot a really good name? We use "enable" prefix only for
> > > > planner parameters such as enable_seqscan, and it seems to me that
> > > > "slot" is not specific. Other candidates are:
> > > >
> > > > * synchronize_replication_slots = on|off
> > > > * synchronize_failover_slots = on|off
> > > >
> > >
> > > I would prefer the second one. Would it be better to just say
> > > sync_failover_slots?
> >
> > Works for me. But if we want to extend this option for non-failover
> > slots as well in the future, synchronize_replication_slots (or
> > sync_replication_slots) seems better. We can extend it by having an
> > enum later. For example, the values can be on, off, or failover etc.
> >
> 
> I see your point. Let us see if others have any suggestions on this.

I also see Sawada-San's point and I'd vote for "sync_replication_slots". Then 
for
the current feature I think "failover" and "on" should be the values to turn the
feature on (assuming "on" would mean "all kind of supported slots").

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-01-24 Thread Bertrand Drouvot

Hi,

On Wed, Jan 24, 2024 at 04:09:15PM +0530, shveta malik wrote:
> On Wed, Jan 24, 2024 at 2:38 PM Bertrand Drouvot
>  wrote:
> >
> > I also see Sawada-San's point and I'd vote for "sync_replication_slots". 
> > Then for
> > the current feature I think "failover" and "on" should be the values to 
> > turn the
> > feature on (assuming "on" would mean "all kind of supported slots").
> 
> Even if others agree and we change this GUC name to
> "sync_replication_slots", I feel we should keep the values as "on" and
> "off" currently, where "on" would mean 'sync failover slots' (docs can
> state that clearly).

I gave more thoughts on it and I think the values should only be "failover" or
"off".

The reason is that if we allow "on" and change the "on" behavior in future
versions (to support more than failover slots) then that would change the 
behavior 
for the ones that used "on".

That's right that we can mention it in the docs, but there is still the risk of
users not reading the doc (that's why I think that it would be good if we can 
put 
this extra "safety" in the code too).

>  I do not think we should support sync of "all
> kinds of supported slots" in the first version. Maybe we can think
> about it for future versions.

Yeah I think the same (I was mentioning the future "on" behavior up-thread).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-01-24 Thread Bertrand Drouvot

Hi,

On Thu, Jan 25, 2024 at 02:57:30AM +, Zhijie Hou (Fujitsu) wrote:
> On Wednesday, January 24, 2024 6:31 PM Amit Kapila  
> wrote:
> > 
> > On Tue, Jan 23, 2024 at 5:13 PM shveta malik  wrote:
> > >
> > > Thanks Ajin for testing the patch. PFA v66 which fixes this issue.
> > >
> > 
> > I think we should try to commit the patch as all of the design concerns are
> > resolved now. To achieve that, can we split the failover setting patch into 
> > the
> > following: (a) setting failover property via SQL commands and display it in
> > pg_replication_slots (b) replication protocol command (c) failover property 
> > via
> > subscription commands?
> > 
> > It will make each patch smaller and it would be easier to detect any 
> > problem in
> > the same after commit.
> 
> Agreed. I split the original 0001 patch into 3 patches as suggested.
> Here is the V68 patch set.

Thanks!

Some comments.

Looking at 0002:

1 ===

+  The following options are supported:

What about "The following option is supported"? (as currently only the 
"FAILOVER"
is)

2 ===

What about adding some TAP tests too? (I can see that ALTER_REPLICATION_SLOT 
test
is added in v68-0004 but I think having some in 0002 would make sense too).

Looking at 0003:

1 ===

+  parameter specified in the subscription. When creating the slot,
+  ensure the slot failover property matches the
+  failover
+  parameter value of the subscription.

What about explaining what would be the consequence of not doing so?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Split index and table statistics into different types of stats

2024-01-25 Thread Bertrand Drouvot

Hi,

On Tue, Nov 14, 2023 at 09:04:03AM +0100, Drouvot, Bertrand wrote:
> On 11/13/23 9:44 PM, Andres Freund wrote:
> > Hi,
> > 
> > It's not nice from a layering POV that we need this level of awareness in
> > bufmgr.c.  I wonder if this is an argument for first splitting out stats 
> > like
> > blocks_hit, blocks_fetched into something like "relfilenode stats" - they're
> > agnostic of the relkind.
> 
> Thanks for looking at it! Yeah I think that would make a lot of sense
> to track some stats per relfilenode.
> 
> > There aren't that many such stats right now,
> > admittedly, but I think we'll want to also track dirtied, written blocks on 
> > a
> > per relation basis once we can (i.e. we key the relevant stats by 
> > relfilenode
> > instead of oid, so we can associate stats when writing out buffers).
> > 
> > 
> 
> Agree. Then, I think that would make sense to start this effort before the
> split index/table one. I can work on a per relfilenode stat patch first.
> 
> Does this patch ordering make sense to you?
> 
> 1) Introduce per relfilenode stats
> 2) Split index and table stats

Just a quick update on this: I had a chat with Andres at pgconf.eu and we agreed
on the above ordering so that:

1) I started working on relfilenode stats (I hope to be able to provide a POC
patch soon).

2) The CF entry [1] status related to this thread has been changed to "Waiting
on Author".

[1]: https://commitfest.postgresql.org/47/4792/

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-01-25 Thread Bertrand Drouvot

Hi,

On Thu, Jan 25, 2024 at 03:54:45PM +0530, Amit Kapila wrote:
> On Thu, Jan 25, 2024 at 1:25 PM Bertrand Drouvot
>  wrote:
> >
> > On Thu, Jan 25, 2024 at 02:57:30AM +, Zhijie Hou (Fujitsu) wrote:
> > >
> > > Agreed. I split the original 0001 patch into 3 patches as suggested.
> > > Here is the V68 patch set.
> 
> Thanks, I have pushed 0001.
> 
> >
> > Thanks!
> >
> > Some comments.
> >
> > Looking at 0002:
> >
> > 1 ===
> >
> > +  The following options are supported:
> >
> > What about "The following option is supported"? (as currently only the 
> > "FAILOVER"
> > is)
> >
> > 2 ===
> >
> > What about adding some TAP tests too? (I can see that 
> > ALTER_REPLICATION_SLOT test
> > is added in v68-0004 but I think having some in 0002 would make sense too).
> >
> 
> The subscription tests in v68-0003 will test this functionality. The
> one advantage of adding separate tests for this is that if in the
> future we extend this replication command, it could be convenient to
> test various options. However, the same could be said about existing
> replication commands as well.

I initially did check for "START_REPLICATION" and I saw it's part of 
006_logical_decoding.pl (but did not check all the "REPLICATION" commands).

That said, it's more a Nit and I think it's fine with having the test in 
v68-0004
(as it is currently done) + the ones in v68-0003.

> But is it worth having extra tests which
> will be anyway covered in the next commit in a few days?
> 
> I understand that it is a good idea and makes one comfortable to have
> tests for each separate commit but OTOH, in the longer term it will
> just be adding more test time without achieving much benefit. I think
> we can tell explicitly in the commit message of this patch that the
> subsequent commit will cover the tests for this functionality

Yeah, I think that's enough (at least someone reading the commit message, the 
diff changes and not following this dedicated thread closely would know the lack
of test is not a miss).

> One minor comment on 0002:
> +  so that logical replication can be resumed after failover.
> + 
> 
> Can we move this and similar comments or doc changes to the later 0004
> patch where we are syncing the slots?

Sure.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-01-26 Thread Bertrand Drouvot

Hi,

On Thu, Jan 25, 2024 at 01:11:50PM +, Zhijie Hou (Fujitsu) wrote:
> Here is the V69 patch set which includes the following changes.
> 
> V69-0001, V69-0002
> 1) Addressed Bertrand's comments[1].

Thanks!

V69-0001 LGTM.

As far V69-0002 I just have one more last remark:

+*/
+   if (sub->enabled)
+   ereport(ERROR,
+   
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+errmsg("cannot 
set %s for enabled subscription",
+   
"failover")));

Worth to add a test for it in 050_standby_failover_slots_sync.pl? (I had a quick
look and it does not seem to be covered).

Remarks,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-01-29 Thread Bertrand Drouvot

Hi,

On Mon, Jan 29, 2024 at 10:24:11AM +0530, shveta malik wrote:
> On Sat, Jan 27, 2024 at 12:02 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > Attach the V70 patch set which addressed above comments and Bertrand's 
> > comments in [1]
> >
> 
> Since v70-0001 is pushed, rebased and attached v70_2 patches.  There
> are no new changes.

Thanks!

Looking at 0001:

+  When altering the
+  slot_name,
+  the failover property value of the named slot may 
differ from the
+  failover
+  parameter specified in the subscription. When creating the slot,
+  ensure the slot failover property matches the
+  failover
+  parameter value of the subscription. Otherwise, the slot on the 
publisher may
+  not be enabled to be synced to standbys.

Not related to this patch series but while at it shouldn't we also add a few
words about two_phase too? (I mean ensure the slot property matchs the
subscription one).

Or would it be better to create a dedicated patch (outside of this thread) for
the "two_phase" remark? (If so I can take care of it).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-01-29 Thread Bertrand Drouvot

Hi,

On Mon, Jan 29, 2024 at 02:35:52PM +0530, Amit Kapila wrote:
> On Mon, Jan 29, 2024 at 2:22 PM Bertrand Drouvot
>  wrote:
> > Looking at 0001:
> >
> > +  When altering the
> > +   > linkend="sql-createsubscription-params-with-slot-name">slot_name,
> > +  the failover property value of the named slot may 
> > differ from the
> > +   > linkend="sql-createsubscription-params-with-failover">failover
> > +  parameter specified in the subscription. When creating the slot,
> > +  ensure the slot failover property matches the
> > +   > linkend="sql-createsubscription-params-with-failover">failover
> > +  parameter value of the subscription. Otherwise, the slot on the 
> > publisher may
> > +  not be enabled to be synced to standbys.
> >
> > Not related to this patch series but while at it shouldn't we also add a few
> > words about two_phase too? (I mean ensure the slot property matchs the
> > subscription one).
> >
> > Or would it be better to create a dedicated patch (outside of this thread) 
> > for
> > the "two_phase" remark? (If so I can take care of it).
> >
> 
> I think it is better to create a separate patch for two_phase after
> this patch gets committed.

Yeah, makes sense, will do, thanks!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Documentation: warn about two_phase when altering a subscription

2024-01-30 Thread Bertrand Drouvot

Hi hackers,

776621a5e4 added a "warning" in the documentation to alter a subscription (to
ensure the slot's failover property matches the subscription's one).

The same remark could be done for the two_phase option. This patch is an attempt
to do so.

Looking forward to your feedback,

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 756886e59afddd09fa6f87ab95af7292ebca3e76 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Tue, 30 Jan 2024 14:48:16 +
Subject: [PATCH v1] Documentation: warn about two_phase when altering a
 subscription's slot name.

776621a5e4 added a warning about the newly failover option. Doing the same
for the already existing two_phase one.
---
 doc/src/sgml/ref/alter_subscription.sgml | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)
 100.0% doc/src/sgml/ref/

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index e9e6d9d74a..cd553f6312 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -235,15 +235,17 @@ ALTER SUBSCRIPTION name RENAME TO <
  
   When altering the
   slot_name,
-  the failover property value of the named slot may differ from the
+  the failover and two_phase properties
+  values of the named slot may differ from their counterparts
   failover
-  parameter specified in the subscription. When creating the slot,
-  ensure the slot failover property matches the
+  and two_phase
+  parameters specified in the subscription. When creating the slot, ensure
+  the slot failover and two_phase
+  properties match their counterparts parameters values of the subscription.
+  Otherwise, the slot on the publisher may behave differently from what subscription's
   failover
-  parameter value of the subscription. Otherwise, the slot on the
-  publisher may behave differently from what subscription's
-  failover
-  option says. The slot on the publisher could either be
+  and two_phase
+  options say: for example, the slot on the publisher could either be
   synced to the standbys even when the subscription's
   failover
   option is disabled or could be disabled for sync
-- 
2.34.1

Re: Synchronizing slots from primary to standby

2024-01-30 Thread Bertrand Drouvot

Hi,

On Mon, Jan 29, 2024 at 09:15:57AM +, Bertrand Drouvot wrote:
> Hi,
> 
> On Mon, Jan 29, 2024 at 02:35:52PM +0530, Amit Kapila wrote:
> > I think it is better to create a separate patch for two_phase after
> > this patch gets committed.
> 
> Yeah, makes sense, will do, thanks!

It's done in [1].

[1]: 
https://www.postgresql.org/message-id/ZbkYrLPhH%2BRxpZlW%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Documentation: warn about two_phase when altering a subscription

2024-01-30 Thread Bertrand Drouvot

Hi,

On Wed, Jan 31, 2024 at 01:47:16PM +1100, Peter Smith wrote:
> Hi, thanks for the patch. Here are some review comments for v1

Thanks for the review!

> 
> ==
> 
> (below is not showing the links and other sgml rendering -- all those LGTM)
> 
> BEFORE
> When altering the slot_name, the failover and two_phase properties
> values of the named slot may differ from their counterparts failover
> and two_phase parameters specified in the subscription. When creating
> the slot, ensure the slot failover and two_phase properties match
> their counterparts parameters values of the subscription.
> 
> SUGGESTION
> When altering the slot_name, the failover and two_phase property
> values of the named slot may differ from the counterpart failover and
> two_phase parameters specified by the subscription. When creating the
> slot, ensure the slot properties failover and two_phase match their
> counterpart parameters of the subscription.
> 
> ~
> 
> BEFORE
> Otherwise, the slot on the publisher may behave differently from what
> subscription's failover and two_phase options say: for example, the
> slot on the publisher could ...
> 
> SUGGESTION:
> Otherwise, the slot on the publisher may behave differently from what
> these subscription options say: for example, the slot on the publisher
> could ...
> 

As a non native English speaker somehow I have to rely on you for those
suggestions ;-)

They make sense to me so applied both in v2 attached.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 29ef47f6a201a81557ce1b4b37b414118a623634 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Tue, 30 Jan 2024 14:48:16 +
Subject: [PATCH v2] Documentation: warn about two_phase when altering a
 subscription's slot name.

776621a5e4 added a warning about the newly failover option. Doing the same
for the already existing two_phase one.
---
 doc/src/sgml/ref/alter_subscription.sgml | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)
 100.0% doc/src/sgml/ref/

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index e9e6d9d74a..11f69f330d 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -235,15 +235,15 @@ ALTER SUBSCRIPTION name RENAME TO <
  
   When altering the
   slot_name,
-  the failover property value of the named slot may differ from the
+  the failover and two_phase property
+  values of the named slot may differ from the counterpart
   failover
-  parameter specified in the subscription. When creating the slot,
-  ensure the slot failover property matches the
-  failover
-  parameter value of the subscription. Otherwise, the slot on the
-  publisher may behave differently from what subscription's
-  failover
-  option says. The slot on the publisher could either be
+  and two_phase
+  parameters specified by the subscription. When creating the slot, ensure
+  the slot properties failover and two_phase
+  match their counterpart parameters of the subscription.
+  Otherwise, the slot on the publisher may behave differently from what these
+  subscription options say: for example, the slot on the publisher could either be
   synced to the standbys even when the subscription's
   failover
   option is disabled or could be disabled for sync
-- 
2.34.1

Re: Synchronizing slots from primary to standby

2024-02-01 Thread Bertrand Drouvot

Hi,

On Thu, Feb 01, 2024 at 04:12:43PM +0530, Amit Kapila wrote:
> On Thu, Jan 25, 2024 at 11:26 AM Bertrand Drouvot
>  wrote:
> >
> > On Wed, Jan 24, 2024 at 04:09:15PM +0530, shveta malik wrote:
> > > On Wed, Jan 24, 2024 at 2:38 PM Bertrand Drouvot
> > >  wrote:
> > > >
> > > > I also see Sawada-San's point and I'd vote for 
> > > > "sync_replication_slots". Then for
> > > > the current feature I think "failover" and "on" should be the values to 
> > > > turn the
> > > > feature on (assuming "on" would mean "all kind of supported slots").
> > >
> > > Even if others agree and we change this GUC name to
> > > "sync_replication_slots", I feel we should keep the values as "on" and
> > > "off" currently, where "on" would mean 'sync failover slots' (docs can
> > > state that clearly).
> >
> > I gave more thoughts on it and I think the values should only be "failover" 
> > or
> > "off".
> >
> > The reason is that if we allow "on" and change the "on" behavior in future
> > versions (to support more than failover slots) then that would change the 
> > behavior
> > for the ones that used "on".
> >
> 
> I again thought on this point and feel that even if we start to sync
> say physical slots their purpose would also be to allow
> failover/switchover, otherwise, there is no use of syncing the slots.

Yeah, I think this is a good point.

> So, by that theory, we can just go for naming it as
> sync_failover_slots or simply sync_slots with values 'off' and 'on'.
> Now, if these are used for switchover then there is an argument that
> adding 'failover' in the GUC name could be confusing but I feel
> 'failover' is used widely enough that it shouldn't be a problem for
> users to understand, otherwise, we can go with simple name like
> sync_slots as well.
> 

I agree and "on"/"off" looks enough to me now. As far the GUC name I've the
feeling that "replication" should be part of it, and think that 
sync_replication_slots
is fine. The reason behind is that "sync_slots" could be confusing if in the 
future other kind of "slot" (other than replication ones) are added in the 
engine.

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-02-01 Thread Bertrand Drouvot

Hi,

On Thu, Feb 01, 2024 at 05:29:15PM +0530, shveta malik wrote:
> Attached v75 patch-set. Changes are:
> 
> 1) Re-arranged the patches:
> 1.1) 'libpqrc' related changes (from v74-001 and v74-004) are
> separated out in v75-001 as those are independent changes.
> 1.2) 'Add logical slot sync capability', 'Slot sync worker as special
> process' and 'App-name changes' are now merged to single patch which
> makes v75-002.
> 1.3) 'Wait for physical Standby confirmation' and 'Failover Validation
> Document' patches are maintained as is (v75-003 and v75-004 now).

Thanks!

I only looked at the commit message for v75-0002 and see that it has changed
since the comment done in [1], but it still does not look correct to me.

"
If a logical slot on the primary is valid but is invalidated on the standby,
then that slot is dropped and recreated on the standby in next sync-cycle
provided the slot still exists on the primary server. It is okay to recreate
such slots as long as these are not consumable on the standby (which is the
case currently). This situation may occur due to the following reasons:
- The max_slot_wal_keep_size on the standby is insufficient to retain WAL
  records from the restart_lsn of the slot.
- primary_slot_name is temporarily reset to null and the physical slot is
  removed.
- The primary changes wal_level to a level lower than logical.
"

If a logical decoding slot "still exists on the primary server" then the primary
can not change the wal_level to lower than logical, one would get something 
like:

"FATAL:  logical replication slot "logical_slot" exists, but wal_level < 
logical"

and then slots won't get invalidated on the standby. I've the feeling that the
wal_level conflict part may need to be explained separately? (I think it's not
possible that they end up being re-created on the standby for this conflict, 
they
will be simply removed as it would mean the counterpart one on the primary 
does not exist anymore).

[1]: 
https://www.postgresql.org/message-id/ZYWdSIeAMQQcLmVT%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-02-02 Thread Bertrand Drouvot

Hi,

On Fri, Feb 02, 2024 at 12:25:30PM +0530, Amit Kapila wrote:
> On Thu, Feb 1, 2024 at 5:29 PM shveta malik  wrote:
> >
> > On Thu, Feb 1, 2024 at 2:35 PM Amit Kapila  wrote:
> > >
> > > Agreed, and I am fine with merging 0001, 0002, and 0004 as suggested
> > > by you though I have a few minor comments on 0002 and 0004. I was
> > > thinking about what will be a logical way to split the slot sync
> > > worker patch (combined result of 0001, 0002, and 0004), and one idea
> > > occurred to me is that we can have the first patch as
> > > synchronize_solts() API and the functionality required to implement
> > > that API then the second patch would be a slot sync worker which uses
> > > that API to synchronize slots and does all the required validations.
> > > Any thoughts?
> >
> > If we shift 'synchronize_slots()' to the first patch but there is no
> > caller of it, we may have a compiler warning for the same. The only
> > way it can be done is if we temporarily add SQL function on standby
> > which uses 'synchronize_slots()'. This SQL function can then be
> > removed in later patches where we actually have a caller for
> > 'synchronize_slots'.
> >
> 
> Can such a SQL function say pg_synchronize_slots() which can sync all
> slots that have a failover flag set be useful in general apart from
> just writing tests for this new API? I am thinking maybe users want
> more control over when to sync the slots and write their bgworker or
> simply do it just before shutdown once (sort of planned switchover) or
> at some other pre-defined times.

Big +1 for having this kind of function in user's hands (as the standby's slots
may be lagging behind during a switchover for example). As far the name, I think
it would make sense to add "replication" or "repl" something like
pg_sync_replication_slots()? (that would be aligned with
 pg_create_logical_replication_slot() and friends).

> BTW, we also have
> pg_log_standby_snapshot() which otherwise would be done periodically
> by background processes.
> 
> >
> > 1) Re-arranged the patches:
> > 1.1) 'libpqrc' related changes (from v74-001 and v74-004) are
> > separated out in v75-001 as those are independent changes.
> 
> Bertrand, Sawada-San, and others, do you see a problem with such a
> split? Can we go ahead with v75_0001 separately after fixing the open
> comments?

I think that makes sense, specially if we're also creating a user callable
function to sync the slot(s) at wish.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-02-05 Thread Bertrand Drouvot

Hi,

On Thu, Jan 11, 2024 at 10:48:13AM +0530, Bharath Rupireddy wrote:
> Hi,
> 
> Therefore, it is often easy for developers to do the following:
> a) set an XID age (age of slot's xmin or catalog_xmin) of say 1 or 1.5
> billion, after which the slots get invalidated.
> b) set a timeout of say 1 or 2 or 3 days, after which the inactive
> slots get invalidated.
> 
> To implement (a), postgres needs a new GUC called max_slot_xid_age.
> The checkpointer then invalidates all the slots whose xmin (the oldest
> transaction that this slot needs the database to retain) or
> catalog_xmin (the oldest transaction affecting the system catalogs
> that this slot needs the database to retain) has reached the age
> specified by this setting.
> 
> To implement (b), first postgres needs to track the replication slot
> metrics like the time at which the slot became inactive (inactive_at
> timestamptz) and the total number of times the slot became inactive in
> its lifetime (inactive_count numeric) in ReplicationSlotPersistentData
> structure. And, then it needs a new timeout GUC called
> inactive_replication_slot_timeout. Whenever a slot becomes inactive,
> the current timestamp and inactive count are stored in
> ReplicationSlotPersistentData structure and persisted to disk. The
> checkpointer then invalidates all the slots that are lying inactive
> for about inactive_replication_slot_timeout duration starting from
> inactive_at.
> 
> In addition to implementing (b), these two new metrics enable
> developers to improve their monitoring tools as the metrics are
> exposed via pg_replication_slots system view. For instance, one can
> build a monitoring tool that signals when replication slots are lying
> inactive for a day or so using inactive_at metric, and/or when a
> replication slot is becoming inactive too frequently using inactive_at
> metric.

Thanks for the patch and +1 for the idea, I think adding those new
"invalidation reasons" make sense.

> 
> I’m attaching the v1 patch set as described below:
> 0001 - Tracks invalidation_reason in pg_replication_slots. This is
> needed because slots now have multiple reasons for slot invalidation.
> 0002 - Tracks inactive replication slot information inactive_at and
> inactive_timeout.
> 0003 - Adds inactive_timeout based replication slot invalidation.
> 0004 - Adds XID based replication slot invalidation.
>

I think it's better to have the XID one being discussed/implemented before the
inactive_timeout one: what about changing the 0002, 0003 and 0004 ordering?

0004 -> 0002
0002 -> 0003
0003 -> 0004

As far 0001:

"
This commit renames conflict_reason to
invalidation_reason, and adds the support to show invalidation
reasons for both physical and logical slots.
"

I'm not sure I like the fact that "invalidations" and "conflicts" are merged
into a single field. I'd vote to keep conflict_reason as it is and add a new
invalidation_reason (and put "conflict" as value when it is the case). The 
reason
is that I think they are 2 different concepts (could be linked though) and that
it would be easier to check for conflicts (means conflict_reason is not NULL).

Regards,
 
-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-02-05 Thread Bertrand Drouvot

Hi,

On Tue, Feb 06, 2024 at 03:19:11AM +, Zhijie Hou (Fujitsu) wrote:
> On Friday, February 2, 2024 2:03 PM Bertrand Drouvot 
>  wrote:
> > 
> > Hi,
> > 
> > On Thu, Feb 01, 2024 at 05:29:15PM +0530, shveta malik wrote:
> > > Attached v75 patch-set. Changes are:
> > >
> > > 1) Re-arranged the patches:
> > > 1.1) 'libpqrc' related changes (from v74-001 and v74-004) are
> > > separated out in v75-001 as those are independent changes.
> > > 1.2) 'Add logical slot sync capability', 'Slot sync worker as special
> > > process' and 'App-name changes' are now merged to single patch which
> > > makes v75-002.
> > > 1.3) 'Wait for physical Standby confirmation' and 'Failover Validation
> > > Document' patches are maintained as is (v75-003 and v75-004 now).
> > 
> > Thanks!
> > 
> > I only looked at the commit message for v75-0002 and see that it has changed
> > since the comment done in [1], but it still does not look correct to me.
> > 
> > "
> > If a logical slot on the primary is valid but is invalidated on the 
> > standby, then
> > that slot is dropped and recreated on the standby in next sync-cycle 
> > provided
> > the slot still exists on the primary server. It is okay to recreate such 
> > slots as long
> > as these are not consumable on the standby (which is the case currently). 
> > This
> > situation may occur due to the following reasons:
> > - The max_slot_wal_keep_size on the standby is insufficient to retain WAL
> >   records from the restart_lsn of the slot.
> > - primary_slot_name is temporarily reset to null and the physical slot is
> >   removed.
> > - The primary changes wal_level to a level lower than logical.
> > "
> > 
> > If a logical decoding slot "still exists on the primary server" then the 
> > primary
> > can not change the wal_level to lower than logical, one would get something
> > like:
> > 
> > "FATAL:  logical replication slot "logical_slot" exists, but wal_level < 
> > logical"
> > 
> > and then slots won't get invalidated on the standby. I've the feeling that 
> > the
> > wal_level conflict part may need to be explained separately? (I think it's 
> > not
> > possible that they end up being re-created on the standby for this conflict,
> > they will be simply removed as it would mean the counterpart one on the
> > primary does not exist anymore).
> 
> This is possible in some extreme cases, because the slot is synced
> asynchronously.
> 
> For example: If on the primary the wal_level is changed to 'replica'

It means that all the logical slots have been dropped on the primary (if not,
it's not possible to change it to a level < logical).

> and then
> changed back to 'logical', the standby would receive two XLOG_PARAMETER_CHANGE
> wals. And before the standby replay these wals, user can create a failover 
> slot

And now it is re-created.

So the slot has been dropped and recreated on the primary, to it's kind of 
expected
it is also dropped and re-created on the standby (should it be invalidated or 
not).

> Although I think it doesn't seem a real world case, so I am not sure is it 
> worth
> separate explanation.

Yeah, I don't think your example is worth a separate explanation also because
it's expected to see the slot being dropped / re-created anyway (see above).

That said, I still think the commit message needs some re-wording, what about?

=
If a logical slot on the primary is valid but is invalidated on the standby,
then that slot is dropped and can be recreated on the standby in next
pg_sync_replication_slots() call provided the slot still exists on the primary
server. It is okay to recreate such slots as long as these are not consumable
on the standby (which is the case currently). This situation may occur due to
the following reasons:

- The max_slot_wal_keep_size on the standby is insufficient to retain WAL
  records from the restart_lsn of the slot.
- primary_slot_name is temporarily reset to null and the physical slot is
  removed.

Changing the primary wal_level to a level lower than logical is only possible
if the logical slots are removed on the primary, so it's expected to see
the slots being removed on the standby too (and re-created if they are
re-created on the primary).
=

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-02-08 Thread Bertrand Drouvot

Hi,

On Wed, Feb 07, 2024 at 12:22:07AM +0530, Bharath Rupireddy wrote:
> On Mon, Feb 5, 2024 at 3:15 PM Bertrand Drouvot
>  wrote:
> >
> > I'm not sure I like the fact that "invalidations" and "conflicts" are merged
> > into a single field. I'd vote to keep conflict_reason as it is and add a new
> > invalidation_reason (and put "conflict" as value when it is the case). The 
> > reason
> > is that I think they are 2 different concepts (could be linked though) and 
> > that
> > it would be easier to check for conflicts (means conflict_reason is not 
> > NULL).
> 
> So, do you want conflict_reason for only logical slots, and a separate
> column for invalidation_reason for both logical and physical slots?

Yes, with "conflict" as value in case of conflicts (and one would need to refer
to the conflict_reason reason to see the reason).

> Is there any strong reason to have two properties "conflict" and
> "invalidated" for slots?

I think "conflict" is an important topic and does contain several reasons. The
slot "first" conflict and then leads to slot "invalidation". 

> They both are the same internally, so why
> confuse the users?

I don't think that would confuse the users, I do think that would be easier to
check for conflicting slots.

I did not look closely at the code, just played a bit with the patch and was 
able
to produce something like:

postgres=# select 
slot_name,slot_type,active,active_pid,wal_status,invalidation_reason from 
pg_replication_slots;
  slot_name  | slot_type | active | active_pid | wal_status | 
invalidation_reason
-+---++++-
 rep1| physical  | f  || reserved   |
 master_slot | physical  | t  |1482441 | unreserved | wal_removed
(2 rows)

does that make sense to have an "active/working" slot "ivalidated"?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-03-15 Thread Bertrand Drouvot

Hi,

On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote:
> Hi,
> 
> Since the standby_slot_names patch has been committed, I am attaching the last
> doc patch for review.
> 

Thanks!

1 ===

+   continue subscribing to publications now on the new primary server without
+   any data loss.

I think "without any data loss" should be re-worded in this context. Data loss
in the sense "data committed on the primary and not visible on the subscriber in
case of failover" can still occurs (in case synchronous replication is not 
used).

2 ===

+   If the result (failover_ready) of both above steps is
+   true, existing subscriptions will be able to continue without data loss.
+  

I don't think that's true if synchronous replication is not used. Say,

- synchronous replication is not used
- primary is not able to reach the standby anymore and standby_slot_names is set
- new data is inserted into the primary
- then not replicated to subscriber (due to standby_slot_names)

Then I think the both above steps will return true but data would be lost in
case of failover.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-15 Thread Bertrand Drouvot

Hi,

On Thu, Mar 14, 2024 at 12:24:00PM +0530, Amit Kapila wrote:
> On Wed, Mar 13, 2024 at 9:24 PM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Mar 13, 2024 at 9:21 AM Amit Kapila  wrote:
> > >
> > > > So, how about we turn conflict_reason to only report the reasons that
> > > > actually cause conflict with recovery for logical slots, something
> > > > like below, and then have invalidation_cause as a generic column for
> > > > all sorts of invalidation reasons for both logical and physical slots?
> > >
> > > If our above understanding is correct then coflict_reason will be a
> > > subset of invalidation_reason. If so, whatever way we arrange this
> > > information, there will be some sort of duplicity unless we just have
> > > one column 'invalidation_reason' and update the docs to interpret it
> > > correctly for conflicts.
> >
> > Yes, there will be some sort of duplicity if we emit conflict_reason
> > as a text field. However, I still think the better way is to turn
> > conflict_reason text to conflict boolean and set it to true only on
> > rows_removed and wal_level_insufficient invalidations. When conflict
> > boolean is true, one (including all the tests that we've added
> > recently) can look for invalidation_reason text field for the reason.
> > This sounds reasonable to me as opposed to we just mentioning in the
> > docs that "if invalidation_reason is rows_removed or
> > wal_level_insufficient it's the reason for conflict with recovery".
> >
> 
> Fair point. I think we can go either way. Bertrand, Nathan, and
> others, do you have an opinion on this matter?

Sounds like a good approach to me and one will be able to quickly identify
if a conflict occured.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Autogenerate some wait events code and documentation

2024-03-18 Thread Bertrand Drouvot

Hi,

On Mon, Mar 18, 2024 at 08:02:24AM +0900, Michael Paquier wrote:
> On Sun, Mar 17, 2024 at 11:31:14AM -0700, Noah Misch wrote:
> > Adding an event
> > will renumber others, which can make an extension report the wrong event 
> > until
> > recompiled.  Extensions citus, pg_bulkload, and vector refer to static 
> > events.
> > If a back-patch added WAIT_EVENT_MESSAGE_QUEUE_SOMETHING_NEW, an old-build
> > pg_bulkload report of WAIT_EVENT_PARALLEL_CREATE_INDEX_SCAN would show up in
> > pg_stat_activity as WAIT_EVENT_PARALLEL_BITMAP_SCAN.  (WAIT_EVENT_EXTENSION 
> > is
> > not part of a generated enum, fortunately.)

Nice catch, thanks!

> I see an option (4), similar to your (2) without the per-line
> annotation: add a new magic keyword like the existing "Section" that
> is used in the first lines of generate-wait_event_types.pl where we
> generate tab-separated lines with the section name as prefix of each
> line.  So I can think of something like:
> Section: ClassName - WaitEventFoo
> FOO_1 "Waiting in foo1"
> FOO_2 "Waiting in foo2"
> Backpatch:
> BAR_1 "Waiting in bar1"
> BAR_2 "Waiting in bar2"
> 
> Then force the ordering for the docs and keep the elements in the
> backpatch section at the end of the enums in the order in the txt.

Yeah I think that's a good idea.

> One thing that could make sense is to enforce that "Backpatch" is at
> the end of a section, meaning that we would need a second keyword like
> a "Section: EndBackpatch" or something like that.  That's not strictly
> necessary IMO as the format of the txt is easy to follow.

I gave it a try in the POC patch attached. I did not use a "EndBackpatch"
section to keep the perl script as simple a possible though (but documented the
expectation instead).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From f93412201df9fa9156cee0b2492a25ac89ff0921 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 18 Mar 2024 08:34:19 +
Subject: [PATCH v1] Add a "Backpatch" section in wait_event_names.txt

When backpatching, adding an event will renumber others, which can make an
extension report the wrong event until recompiled. This is due to the fact that
generate-wait_event_types.pl sorts events to position them. The "Backpatch"
section added here ensures no ordering is done for the wait events found after
this delimiter.
---
 .../activity/generate-wait_event_types.pl | 26 ++-
 .../utils/activity/wait_event_names.txt   | 10 +--
 2 files changed, 33 insertions(+), 3 deletions(-)
 100.0% src/backend/utils/activity/

diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl
index f1adf0e8e7..d129d94889 100644
--- a/src/backend/utils/activity/generate-wait_event_types.pl
+++ b/src/backend/utils/activity/generate-wait_event_types.pl
@@ -38,7 +38,9 @@ die "Not possible to specify --docs and --code simultaneously"
 
 open my $wait_event_names, '<', $ARGV[0] or die;
 
+my @backpatch_lines;
 my @lines;
+my $backpatch = 0;
 my $section_name;
 my $note;
 my $note_name;
@@ -59,10 +61,26 @@ while (<$wait_event_names>)
 	{
 		$section_name = $_;
 		$section_name =~ s/^.*- //;
+		$backpatch = 0;
 		next;
 	}
 
-	push(@lines, $section_name . "\t" . $_);
+	# Are we dealing with backpatch wait events?
+	if (/^Backpatch:$/)
+	{
+		$backpatch = 1;
+		next;
+	}
+
+	# Backpatch wait events won't be sorted during code generation
+	if ($gen_code && $backpatch)
+	{
+		push(@backpatch_lines, $section_name . "\t" . $_);
+	}
+	else
+	{
+		push(@lines, $section_name . "\t" . $_);
+	}
 }
 
 # Sort the lines based on the second column.
@@ -70,6 +88,12 @@ while (<$wait_event_names>)
 my @lines_sorted =
   sort { uc((split(/\t/, $a))[1]) cmp uc((split(/\t/, $b))[1]) } @lines;
 
+# If we are generating code then concat @lines_sorted and @backpatch_lines.
+if ($gen_code)
+{
+	push(@lines_sorted, @backpatch_lines);
+}
+
 # Read the sorted lines and populate the hash table
 foreach my $line (@lines_sorted)
 {
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index c08e00d1d6..62c835df2e 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -24,7 +24,12 @@
 #  SGML tables of wait events for inclusion in the documentation.
 #
 # When adding a new wait event, make sure it is placed in the appropriate
-# ClassName section.
+# ClassName section. If the wait event is backpatched to a version < 17 then
+# put it under a "Backpatch" delimiter at the end of the relat

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-18 Thread Bertrand Drouvot

Hi,

On Sat, Mar 16, 2024 at 09:29:01AM +0530, Bharath Rupireddy wrote:
> I've also responded to Bertrand's comments here.

Thanks!

> 
> On Wed, Mar 6, 2024 at 3:56 PM Bertrand Drouvot
>  wrote:
> >
> > 5 ===
> >
> > -# Check conflict_reason is NULL for physical slot
> > +# Check invalidation_reason is NULL for physical slot
> >  $res = $node_primary->safe_psql(
> > 'postgres', qq[
> > -SELECT conflict_reason is null FROM pg_replication_slots 
> > where slot_name = '$primary_slotname';]
> > +SELECT invalidation_reason is null FROM 
> > pg_replication_slots where slot_name = '$primary_slotname';]
> >  );
> >
> >
> > I don't think this test is needed anymore: it does not make that much sense 
> > since
> > it's done after the primary database initialization and startup.
> 
> It is now turned into a test verifying 'conflicting boolean' is null
> for the physical slot. Isn't that okay?

Yeah makes more sense now, thanks!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-18 Thread Bertrand Drouvot

Hi,

On Mon, Mar 18, 2024 at 08:50:56AM +0530, Amit Kapila wrote:
> On Sun, Mar 17, 2024 at 2:03 PM Bharath Rupireddy
>  wrote:
> >
> > On Sat, Mar 16, 2024 at 3:55 PM Amit Kapila  wrote:
> > >
> > > procArray->replication_slot_catalog_xmin) but then don't adjust it for
> > > 'max_slot_xid_age'. I could be missing something in this but it is
> > > better to keep discussing this and try to move with another parameter
> > > 'inactive_replication_slot_timeout' which according to me can be kept
> > > at slot level instead of a GUC but OTOH we need to see the arguments
> > > on both side and then decide which makes more sense.
> >
> > Hm. Are you suggesting inactive_timeout to be a slot level parameter
> > similar to 'failover' property added recently by
> > c393308b69d229b664391ac583b9e07418d411b6 and
> > 73292404370c9900a96e2bebdc7144f7010339cf? With this approach, one can
> > set inactive_timeout while creating the slot either via
> > pg_create_physical_replication_slot() or
> > pg_create_logical_replication_slot() or CREATE_REPLICATION_SLOT or
> > ALTER_REPLICATION_SLOT command, and postgres tracks the
> > last_inactive_at for every slot based on which the slot gets
> > invalidated. If this understanding is right, I can go ahead and work
> > towards it.
> >
> 
> Yeah, I have something like that in mind. You can prepare the patch
> but it would be good if others involved in this thread can also share
> their opinion.

I think it makes sense to put the inactive_timeout granularity at the slot
level (as the activity could vary a lot say between one slot linked to a 
subcription and one linked to some plugins). As far max_slot_xid_age I've the
feeling that a new GUC is good enough.

> > Alternatively, we can go the route of making GUC a list of key-value
> > pairs of {slot_name, inactive_timeout}, but this kind of GUC for
> > setting slot level parameters is going to be the first of its kind, so
> > I'd prefer the above approach.
> >
> 
> I would prefer a slot-level parameter in this case rather than a GUC.

Yeah, same here.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-18 Thread Bertrand Drouvot

Hi,

On Sat, Mar 16, 2024 at 09:29:01AM +0530, Bharath Rupireddy wrote:
> Please see the attached v11 patch set with all the above review
> comments addressed.

Thanks!

Looking at 0001:

1 ===

+   True if this logical slot conflicted with recovery (and so is now
+   invalidated). When this column is true, check

Worth to add back the physical slot mention "Always NULL for physical slots."?

2 ===

@@ -1023,9 +1023,10 @@ CREATE VIEW pg_replication_slots AS
 L.wal_status,
 L.safe_wal_size,
 L.two_phase,
-L.conflict_reason,
+L.conflicting,
 L.failover,
-L.synced
+L.synced,
+L.invalidation_reason

What about making invalidation_reason close to conflict_reason?

3 ===

- * Maps a conflict reason for a replication slot to
+ * Maps a invalidation reason for a replication slot to

s/a invalidation/an invalidation/?

4 ===

While at it, shouldn't we also rename "conflict" to say "invalidation_cause" in
InvalidatePossiblyObsoleteSlot()?

5 ===

+ * rows_removed and wal_level_insufficient are only two reasons

s/are only two/are the only two/?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-18 Thread Bertrand Drouvot

Hi,

On Thu, Mar 14, 2024 at 12:27:26PM +0530, Amit Kapila wrote:
> On Wed, Mar 13, 2024 at 10:16 PM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Mar 13, 2024 at 11:13 AM shveta malik  
> > wrote:
> > >
> > > > Thanks. v8-0001 is how it looks. Please see the v8 patch set with this 
> > > > change.
> > >
> > > JFYI, the patch does not apply to the head. There is a conflict in
> > > multiple files.
> >
> > Thanks for looking into this. I noticed that the v8 patches needed
> > rebase. Before I go do anything with the patches, I'm trying to gain
> > consensus on the design. Following is the summary of design choices
> > we've discussed so far:
> > 1) conflict_reason vs invalidation_reason.
> > 2) When to compute the XID age?
> >
> 
> I feel we should focus on two things (a) one is to introduce a new
> column invalidation_reason, and (b) let's try to first complete
> invalidation due to timeout. We can look into XID stuff if time
> permits, remember, we don't have ample time left.

Agree. While it makes sense to invalidate slots for wal removal in
CreateCheckPoint() (because this is the place where wal is removed), I 'm not
sure this is the right place for the 2 new cases.

Let's focus on the timeout one as proposed above (as probably the simplest one):
as this one is purely related to time and activity what about to invalidate them
when?:

- their usage resume
- in pg_get_replication_slots()

The idea is to invalidate the slot when one resumes activity on it or wants to
get information about it (and among other things wants to know if the slot is
valid or not).

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Autogenerate some wait events code and documentation

2024-03-18 Thread Bertrand Drouvot

Hi,

On Mon, Mar 18, 2024 at 08:49:34AM -0700, Noah Misch wrote:
> On Mon, Mar 18, 2024 at 09:04:44AM +0000, Bertrand Drouvot wrote:
> > --- a/src/backend/utils/activity/wait_event_names.txt
> > +++ b/src/backend/utils/activity/wait_event_names.txt
> > @@ -24,7 +24,12 @@
> >  #  SGML tables of wait events for inclusion in the documentation.
> >  #
> >  # When adding a new wait event, make sure it is placed in the appropriate
> > -# ClassName section.
> > +# ClassName section. If the wait event is backpatched to a version < 17 
> > then
> > +# put it under a "Backpatch" delimiter at the end of the related ClassName
> > +# section.
> 
> Back-patch from v17 to pre-v17 won't use this, because v16 has hand-maintained
> enums.  It's back-patch v18->v17 or v22->v17 where this will come up.

Thanks for looking at it!
Oh right, the comment is wrong, re-worded in v2 attached.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 868d97a853c45639685198c9090cedd77dc4b9af Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 18 Mar 2024 08:34:19 +
Subject: [PATCH v2] Add "Backpatch" regions in wait_event_names.txt

When backpatching, adding an event will renumber others, which can make an
extension report the wrong event until recompiled. This is due to the fact that
generate-wait_event_types.pl sorts events to position them. The "Backpatch"
region added here ensures no ordering is done for the wait events found after
this delimiter.
---
 .../activity/generate-wait_event_types.pl | 26 ++-
 .../utils/activity/wait_event_names.txt   | 19 +-
 2 files changed, 43 insertions(+), 2 deletions(-)
 100.0% src/backend/utils/activity/

diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl
index f1adf0e8e7..d129d94889 100644
--- a/src/backend/utils/activity/generate-wait_event_types.pl
+++ b/src/backend/utils/activity/generate-wait_event_types.pl
@@ -38,7 +38,9 @@ die "Not possible to specify --docs and --code simultaneously"
 
 open my $wait_event_names, '<', $ARGV[0] or die;
 
+my @backpatch_lines;
 my @lines;
+my $backpatch = 0;
 my $section_name;
 my $note;
 my $note_name;
@@ -59,10 +61,26 @@ while (<$wait_event_names>)
 	{
 		$section_name = $_;
 		$section_name =~ s/^.*- //;
+		$backpatch = 0;
 		next;
 	}
 
-	push(@lines, $section_name . "\t" . $_);
+	# Are we dealing with backpatch wait events?
+	if (/^Backpatch:$/)
+	{
+		$backpatch = 1;
+		next;
+	}
+
+	# Backpatch wait events won't be sorted during code generation
+	if ($gen_code && $backpatch)
+	{
+		push(@backpatch_lines, $section_name . "\t" . $_);
+	}
+	else
+	{
+		push(@lines, $section_name . "\t" . $_);
+	}
 }
 
 # Sort the lines based on the second column.
@@ -70,6 +88,12 @@ while (<$wait_event_names>)
 my @lines_sorted =
   sort { uc((split(/\t/, $a))[1]) cmp uc((split(/\t/, $b))[1]) } @lines;
 
+# If we are generating code then concat @lines_sorted and @backpatch_lines.
+if ($gen_code)
+{
+	push(@lines_sorted, @backpatch_lines);
+}
+
 # Read the sorted lines and populate the hash table
 foreach my $line (@lines_sorted)
 {
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index c08e00d1d6..b6303a0fdb 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -24,7 +24,11 @@
 #  SGML tables of wait events for inclusion in the documentation.
 #
 # When adding a new wait event, make sure it is placed in the appropriate
-# ClassName section.
+# ClassName section. If the wait event is backpatched from master to a version
+# >= 17 then put it under a "Backpatch:" delimiter at the end of the related
+# ClassName section (on the non master branches) or at its natural position on
+# the master branch.
+# Ensure that the backpatch regions are always empty on the master branch.
 #
 # WaitEventLWLock and WaitEventLock have their own C code for their wait event
 # enums and function names.  Hence, for these, only the SGML documentation is
@@ -61,6 +65,7 @@ WAL_SENDER_MAIN	"Waiting in main loop of WAL sender process."
 WAL_SUMMARIZER_WAL	"Waiting in WAL summarizer for more WAL to be generated."
 WAL_WRITER_MAIN	"Waiting in main loop of WAL writer process."
 
+Backpatch:
 
 #
 # Wait Events - Client
@@ -82,6 +87,7 @@ WAIT_FOR_STANDBY_CONFIRMATION	"Waiting for WAL to be received and flushed by the
 WAL_SENDER_WAIT_FOR_WAL	"Waiting for WAL to be flushed in WAL sender process."
 WAL_SENDER_WRITE_DATA	"Waiting for any activity when processing replies from WAL receiver in WAL sender proces

Re: Autogenerate some wait events code and documentation

2024-03-19 Thread Bertrand Drouvot

Hi,

On Tue, Mar 19, 2024 at 09:59:35AM +0900, Michael Paquier wrote:
> On Mon, Mar 18, 2024 at 05:57:02PM +0000, Bertrand Drouvot wrote:
> > Thanks for looking at it!
> > Oh right, the comment is wrong, re-worded in v2 attached.
> 
> I've added a couple of fake events in my txt file, and this results in
> an ordering of the wait events in the docs while the backpatched wait
> events are added at the end of the enums, based on their order in the
> txt file.

Thanks for testing!

>  # When adding a new wait event, make sure it is placed in the appropriate
> -# ClassName section.
> +# ClassName section. If the wait event is backpatched from master to a 
> version
> +# >= 17 then put it under a "Backpatch:" delimiter at the end of the related
> +# ClassName section (on the non master branches) or at its natural position 
> on
> +# the master branch.
> +# Ensure that the backpatch regions are always empty on the master branch.
> 
> I'd recommend to not mention a version number at all, as this would
> need a manual refresh each time a new stable branch is forked.

I'm not sure as v2 used the "version >= 17" wording which I think would not need
manual refresh each time a new stable branch is forked.

But to avoid any doubt, I'm following your recommendation in v3 attached (then
only mentioning the "master branch" and "any other branch").

> Your solution is simpler than what I finished in mind when looking at
> the code yesterday, with the addition of a second array that's pushed
> to be at the end of the "sorted" lines ordered by the second column.
> That does the job.

Yeah.

> (Note that I'll go silent for some time; I'll handle this thread when
> I get back as this is not urgent.)

Right and enjoy!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 8755162399faa070cbcb56bb4687f1ca1df6b4b1 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 18 Mar 2024 08:34:19 +
Subject: [PATCH v3] Add "Backpatch" regions in wait_event_names.txt

When backpatching, adding an event will renumber others, which can make an
extension report the wrong event until recompiled. This is due to the fact that
generate-wait_event_types.pl sorts events to position them. The "Backpatch"
region added here ensures no ordering is done for the wait events found after
this delimiter.
---
 .../activity/generate-wait_event_types.pl | 26 ++-
 .../utils/activity/wait_event_names.txt   | 18 -
 2 files changed, 42 insertions(+), 2 deletions(-)
 100.0% src/backend/utils/activity/

diff --git a/src/backend/utils/activity/generate-wait_event_types.pl b/src/backend/utils/activity/generate-wait_event_types.pl
index f1adf0e8e7..d129d94889 100644
--- a/src/backend/utils/activity/generate-wait_event_types.pl
+++ b/src/backend/utils/activity/generate-wait_event_types.pl
@@ -38,7 +38,9 @@ die "Not possible to specify --docs and --code simultaneously"
 
 open my $wait_event_names, '<', $ARGV[0] or die;
 
+my @backpatch_lines;
 my @lines;
+my $backpatch = 0;
 my $section_name;
 my $note;
 my $note_name;
@@ -59,10 +61,26 @@ while (<$wait_event_names>)
 	{
 		$section_name = $_;
 		$section_name =~ s/^.*- //;
+		$backpatch = 0;
 		next;
 	}
 
-	push(@lines, $section_name . "\t" . $_);
+	# Are we dealing with backpatch wait events?
+	if (/^Backpatch:$/)
+	{
+		$backpatch = 1;
+		next;
+	}
+
+	# Backpatch wait events won't be sorted during code generation
+	if ($gen_code && $backpatch)
+	{
+		push(@backpatch_lines, $section_name . "\t" . $_);
+	}
+	else
+	{
+		push(@lines, $section_name . "\t" . $_);
+	}
 }
 
 # Sort the lines based on the second column.
@@ -70,6 +88,12 @@ while (<$wait_event_names>)
 my @lines_sorted =
   sort { uc((split(/\t/, $a))[1]) cmp uc((split(/\t/, $b))[1]) } @lines;
 
+# If we are generating code then concat @lines_sorted and @backpatch_lines.
+if ($gen_code)
+{
+	push(@lines_sorted, @backpatch_lines);
+}
+
 # Read the sorted lines and populate the hash table
 foreach my $line (@lines_sorted)
 {
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index c08e00d1d6..0c4788fe77 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -24,7 +24,10 @@
 #  SGML tables of wait events for inclusion in the documentation.
 #
 # When adding a new wait event, make sure it is placed in the appropriate
-# ClassName section.
+# ClassName section. Put it in its natural position in the master branch, and
+# then put it in the "Backpatch:" region for any other branch (should the wait
+# event be backpatched).
+# Ensure

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-19 Thread Bertrand Drouvot

Hi,

On Tue, Mar 19, 2024 at 10:56:25AM +0530, Amit Kapila wrote:
> On Mon, Mar 18, 2024 at 8:19 PM Bertrand Drouvot
>  wrote:
> > Agree. While it makes sense to invalidate slots for wal removal in
> > CreateCheckPoint() (because this is the place where wal is removed), I 'm 
> > not
> > sure this is the right place for the 2 new cases.
> >
> > Let's focus on the timeout one as proposed above (as probably the simplest 
> > one):
> > as this one is purely related to time and activity what about to invalidate 
> > them
> > when?:
> >
> > - their usage resume
> > - in pg_get_replication_slots()
> >
> > The idea is to invalidate the slot when one resumes activity on it or wants 
> > to
> > get information about it (and among other things wants to know if the slot 
> > is
> > valid or not).
> >
> 
> Trying to invalidate at those two places makes sense to me but we
> still need to cover the cases where it takes very long to resume the
> slot activity and the dangling slot cases where the activity is never
> resumed.

I understand it's better to have the slot reflecting its real status internally
but it is a real issue if that's not the case until the activity on it is 
resumed?
(just asking, not saying we should not)

> How about apart from the above two places, trying to
> invalidate in CheckPointReplicationSlots() where we are traversing all
> the slots?

I think that's a good place but there is still a window of time (that could also
be "large" depending of the activity and the checkpoint frequency) during which
the slot is not known as invalid internally. But yeah, at leat we know that 
we'll
mark it as invalid at some point...

BTW:

if (am_walsender)
{
+   if (slot->data.persistency == RS_PERSISTENT)
+   {
+   SpinLockAcquire(&slot->mutex);
+   slot->data.last_inactive_at = GetCurrentTimestamp();
+   slot->data.inactive_count++;
+   SpinLockRelease(&slot->mutex);

I'm also feeling the same concern as Shveta mentioned in [1]: that a "normal"
backend using pg_logical_slot_get_changes() or friends would not set the
last_inactive_at.

[1]: 
https://www.postgresql.org/message-id/CAJpy0uD64X%3D2ENmbHaRiWTKeQawr-rbGoy_GdhQQLVXzUSKTMg%40mail.gmail.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-19 Thread Bertrand Drouvot

Hi,

On Tue, Mar 19, 2024 at 04:20:35PM +0530, Amit Kapila wrote:
> On Tue, Mar 19, 2024 at 3:11 PM Bertrand Drouvot
>  wrote:
> >
> > On Tue, Mar 19, 2024 at 10:56:25AM +0530, Amit Kapila wrote:
> > > On Mon, Mar 18, 2024 at 8:19 PM Bertrand Drouvot
> > >  wrote:
> > > > Agree. While it makes sense to invalidate slots for wal removal in
> > > > CreateCheckPoint() (because this is the place where wal is removed), I 
> > > > 'm not
> > > > sure this is the right place for the 2 new cases.
> > > >
> > > > Let's focus on the timeout one as proposed above (as probably the 
> > > > simplest one):
> > > > as this one is purely related to time and activity what about to 
> > > > invalidate them
> > > > when?:
> > > >
> > > > - their usage resume
> > > > - in pg_get_replication_slots()
> > > >
> > > > The idea is to invalidate the slot when one resumes activity on it or 
> > > > wants to
> > > > get information about it (and among other things wants to know if the 
> > > > slot is
> > > > valid or not).
> > > >
> > >
> > > Trying to invalidate at those two places makes sense to me but we
> > > still need to cover the cases where it takes very long to resume the
> > > slot activity and the dangling slot cases where the activity is never
> > > resumed.
> >
> > I understand it's better to have the slot reflecting its real status 
> > internally
> > but it is a real issue if that's not the case until the activity on it is 
> > resumed?
> > (just asking, not saying we should not)
> >
> 
> Sorry, I didn't understand your point. Can you try to explain by example?

Sorry if that was not clear, let me try to rephrase it first: what issue to you
see if the invalidation of such a slot occurs only when its usage resume or
when pg_get_replication_slots() is triggered? I understand that this could lead
to the slot not being invalidated (maybe forever) but is that an issue for an
inactive slot?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-20 Thread Bertrand Drouvot

Hi,

On Wed, Mar 20, 2024 at 08:58:05AM +0530, Amit Kapila wrote:
> On Wed, Mar 20, 2024 at 12:49 AM Bharath Rupireddy
>  wrote:
> >
> >
> > Following are some open points:
> >
> > 1. Where to do inactive_timeout invalidation exactly if not the 
> > checkpointer.
> >
> 
> I have suggested to do it at the time of CheckpointReplicationSlots()
> and Bertrand suggested to do it whenever we resume using the slot. I
> think we should follow both the suggestions.

Agree. I also think that pg_get_replication_slots() would be a good place, so
that queries would return the right invalidation status.

> > 4. New invalidation mechanisms interaction with slot sync feature.
> >
> 
> Yeah, this is important. My initial thoughts are that synced slots
> shouldn't be invalidated on the standby due to timeout.

+1

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-20 Thread Bertrand Drouvot

Hi,

On Wed, Mar 20, 2024 at 12:48:55AM +0530, Bharath Rupireddy wrote:
> On Mon, Mar 18, 2024 at 3:02 PM Bertrand Drouvot
>  wrote:
> >
> > > > Hm. Are you suggesting inactive_timeout to be a slot level parameter
> > > > similar to 'failover' property added recently by
> > > > c393308b69d229b664391ac583b9e07418d411b6 and
> > > > 73292404370c9900a96e2bebdc7144f7010339cf?
> > >
> > > Yeah, I have something like that in mind. You can prepare the patch
> > > but it would be good if others involved in this thread can also share
> > > their opinion.
> >
> > I think it makes sense to put the inactive_timeout granularity at the slot
> > level (as the activity could vary a lot say between one slot linked to a
> > subcription and one linked to some plugins). As far max_slot_xid_age I've 
> > the
> > feeling that a new GUC is good enough.
> 
> Well, here I'm implementing the above idea.

Thanks!

> The attached v12 patches
> majorly have the following changes:
> 
> 2. last_inactive_at and inactive_timeout are now tracked in on-disk
> replication slot data structure.

Should last_inactive_at be tracked on disk? Say the engine is down for a period
of time > inactive_timeout then the slot will be invalidated after the engine
re-start (if no activity before we invalidate the slot). Should the time the
engine is down be counted as "inactive" time? I've the feeling it should not, 
and
that we should only take into account inactive time while the engine is up.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-20 Thread Bertrand Drouvot

Hi,

On Wed, Mar 20, 2024 at 12:48:55AM +0530, Bharath Rupireddy wrote:
> On Mon, Mar 18, 2024 at 3:02 PM Bertrand Drouvot
>  wrote:
> >
> > > > Hm. Are you suggesting inactive_timeout to be a slot level parameter
> > > > similar to 'failover' property added recently by
> > > > c393308b69d229b664391ac583b9e07418d411b6 and
> > > > 73292404370c9900a96e2bebdc7144f7010339cf?
> > >
> > > Yeah, I have something like that in mind. You can prepare the patch
> > > but it would be good if others involved in this thread can also share
> > > their opinion.
> >
> > I think it makes sense to put the inactive_timeout granularity at the slot
> > level (as the activity could vary a lot say between one slot linked to a
> > subcription and one linked to some plugins). As far max_slot_xid_age I've 
> > the
> > feeling that a new GUC is good enough.
> 
> Well, here I'm implementing the above idea. The attached v12 patches
> majorly have the following changes:
> 

Regarding v12-0004: "Allow setting inactive_timeout in the replication command",
shouldn't we also add an new SQL API say: pg_alter_replication_slot() that would
allow to change the timeout property? 

That would allow users to alter this property without the need to make a
replication connection. 

But the issue is that it would make it inconsistent with the new inactivetimeout
in the subscription that is added in "v12-0005". But do we need to display
subinactivetimeout in pg_subscription (and even allow it at subscription 
creation
/ alter) after all? (I've the feeling there is less such a need as compare to
subfailover, subtwophasestate for example).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-20 Thread Bertrand Drouvot

Hi,

On Thu, Mar 21, 2024 at 08:47:18AM +0530, Amit Kapila wrote:
> On Wed, Mar 20, 2024 at 1:51 PM Bertrand Drouvot
>  wrote:
> >
> > On Wed, Mar 20, 2024 at 12:48:55AM +0530, Bharath Rupireddy wrote:
> > >
> > > 2. last_inactive_at and inactive_timeout are now tracked in on-disk
> > > replication slot data structure.
> >
> > Should last_inactive_at be tracked on disk? Say the engine is down for a 
> > period
> > of time > inactive_timeout then the slot will be invalidated after the 
> > engine
> > re-start (if no activity before we invalidate the slot). Should the time the
> > engine is down be counted as "inactive" time? I've the feeling it should 
> > not, and
> > that we should only take into account inactive time while the engine is up.
> >
> 
> Good point. The question is how do we achieve this without persisting
> the 'last_inactive_at'? Say, 'last_inactive_at' for a particular slot
> had some valid value before we shut down but it still didn't cross the
> configured 'inactive_timeout' value, so, we won't be able to
> invalidate it. Now, after the restart, as we don't know the
> last_inactive_at's value before the shutdown, we will initialize it
> with 0 (this is what Bharath seems to have done in the latest
> v13-0002* patch). After this, even if walsender or backend never
> acquires the slot, we won't invalidate it. OTOH, if we track
> 'last_inactive_at' on the disk, after, restart, we could initialize it
> to the current time if the value is non-zero. Do you have any better
> ideas?
> 

I think that setting last_inactive_at when we restart makes sense if the slot
has been active previously. I think the idea is because it's holding 
xmin/catalog_xmin
and that we don't want to prevent rows removal longer that the timeout.

So what about relying on xmin/catalog_xmin instead that way?

- For physical slots if xmin is set then set last_inactive_at to the current
time at restart (else zero).

- For logical slot, it's not the same as the catalog_xmin is set at the slot
creation time. So what about setting last_inactive_at at the current time at 
restart but also at creation time for logical slot? (Setting it to zero at
creation time (as we do in v13) does not look right, given the fact that it's
"already" holding a catalog_xmin).

That way, we'd ensure that we are not holding rows for longer that the timeout
and we don't need to persist last_inactive_at.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-20 Thread Bertrand Drouvot

Hi,

On Thu, Mar 21, 2024 at 10:53:54AM +0530, Bharath Rupireddy wrote:
> On Thu, Mar 21, 2024 at 9:07 AM Amit Kapila  wrote:
> > > > But the issue is that it would make it inconsistent with the new 
> > > > inactivetimeout
> > > > in the subscription that is added in "v12-0005".
> > >
> > > Can you please elaborate what the inconsistency it causes with 
> > > inactivetimeout?
> > >
> > I think the inconsistency can arise from the fact that on publisher
> > one can change the inactive_timeout for the slot corresponding to a
> > subscription but the subscriber won't know, so it will still show the
> > old value.

Yeah, that was what I had in mind.

> > If we want we can document this as a limitation and let
> > users be aware of it. However, I feel at this stage, let's not even
> > expose this from the subscription or maybe we can discuss it once/if
> > we are done with other patches.

I agree, it's important to expose it for things like "failover" but I think we
can get rid of it for the timeout one.

>>  Anyway, if one wants to use this
> > feature with a subscription, she can create a slot first on the
> > publisher with inactive_timeout value and then associate such a slot
> > with a required subscription.

Right.

> 
> If we are not exposing it via subscription (meaning, we don't consider
> v13-0004 and v13-0005 patches), I feel we can have a new SQL API
> pg_alter_replication_slot(int inactive_timeout) for now just altering
> the inactive_timeout of a given slot.

Agree, that seems more "natural" that going through a replication connection.

> With this approach, one can do either of the following:
> 1) Create a slot with SQL API with inactive_timeout set, and use it
> for subscriptions or for streaming standbys.

Yes.

> 2) Create a slot with SQL API without inactive_timeout set, use it for
> subscriptions or for streaming standbys, and set inactive_timeout
> later via pg_alter_replication_slot() depending on how the slot is
> consumed

Yes.

> 3) Create a subscription with create_slot=true, and set
> inactive_timeout via pg_alter_replication_slot() depending on how the
> slot is consumed.

Yes.

We could also do the above 3 and altering the timeout with a replication
connection but the SQL API seems more natural to me.

> 
> This approach seems consistent and minimal to start with.
> 
> If we agree on this, I'll drop both 0004 and 0005 that are allowing
> inactive_timeout to be set via replication commands and via
> create/alter subscription respectively, and implement
> pg_alter_replication_slot().

+1 on this.

> FWIW, adding the new SQL API pg_alter_replication_slot() isn't that hard.

Also I think we should ensure that one could "only" alter the timeout property
for the time being (if not that could lead to the subscription inconsistency 
mentioned above).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-20 Thread Bertrand Drouvot

Hi,

On Thu, Mar 21, 2024 at 11:43:54AM +0530, Amit Kapila wrote:
> On Thu, Mar 21, 2024 at 11:23 AM Bertrand Drouvot
>  wrote:
> >
> > On Thu, Mar 21, 2024 at 08:47:18AM +0530, Amit Kapila wrote:
> > > On Wed, Mar 20, 2024 at 1:51 PM Bertrand Drouvot
> > >  wrote:
> > > >
> > > > On Wed, Mar 20, 2024 at 12:48:55AM +0530, Bharath Rupireddy wrote:
> > > > >
> > > > > 2. last_inactive_at and inactive_timeout are now tracked in on-disk
> > > > > replication slot data structure.
> > > >
> > > > Should last_inactive_at be tracked on disk? Say the engine is down for 
> > > > a period
> > > > of time > inactive_timeout then the slot will be invalidated after the 
> > > > engine
> > > > re-start (if no activity before we invalidate the slot). Should the 
> > > > time the
> > > > engine is down be counted as "inactive" time? I've the feeling it 
> > > > should not, and
> > > > that we should only take into account inactive time while the engine is 
> > > > up.
> > > >
> > >
> > > Good point. The question is how do we achieve this without persisting
> > > the 'last_inactive_at'? Say, 'last_inactive_at' for a particular slot
> > > had some valid value before we shut down but it still didn't cross the
> > > configured 'inactive_timeout' value, so, we won't be able to
> > > invalidate it. Now, after the restart, as we don't know the
> > > last_inactive_at's value before the shutdown, we will initialize it
> > > with 0 (this is what Bharath seems to have done in the latest
> > > v13-0002* patch). After this, even if walsender or backend never
> > > acquires the slot, we won't invalidate it. OTOH, if we track
> > > 'last_inactive_at' on the disk, after, restart, we could initialize it
> > > to the current time if the value is non-zero. Do you have any better
> > > ideas?
> > >
> >
> > I think that setting last_inactive_at when we restart makes sense if the 
> > slot
> > has been active previously. I think the idea is because it's holding 
> > xmin/catalog_xmin
> > and that we don't want to prevent rows removal longer that the timeout.
> >
> > So what about relying on xmin/catalog_xmin instead that way?
> >
> 
> That doesn't sound like a great idea because xmin/catalog_xmin values
> won't tell us before restart whether it was active or not. It could
> have been inactive for long time before restart but the xmin values
> could still be valid.

Right, the idea here was more like "don't hold xmin/catalog_xmin" for longer
than timeout.

My concern was that we set catalog_xmin at logical slot creation time. So if we
set last_inactive_at to zero at creation time and the slot is not used for a 
long
period of time > timeout, then I think it's not helping there.

> What about we always set 'last_inactive_at' at
> restart (if the slot's inactive_timeout has non-zero value) and reset
> it as soon as someone acquires that slot? Now, if the slot doesn't get
> acquired till 'inactive_timeout', checkpointer will invalidate the
> slot.

Yeah that sounds good to me, but I think we should set last_inactive_at at 
creation
time too, if not:

- physical slot could remain valid for long time after creation (which is fine)
but the behavior would change at restart.
- logical slot would have the "issue" reported above (holding catalog_xmin). 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-20 Thread Bertrand Drouvot

Hi,

On Thu, Mar 21, 2024 at 11:53:32AM +0530, Amit Kapila wrote:
> On Thu, Mar 21, 2024 at 11:37 AM Bertrand Drouvot
>  wrote:
> >
> > We could also do the above 3 and altering the timeout with a replication
> > connection but the SQL API seems more natural to me.
> >
> 
> If we want to go with this then I think we should at least ensure that
> if one specified timeout via CREATE_REPLICATION_SLOT or
> ALTER_REPLICATION_SLOT that should be honored.

Yeah, agree.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-21 Thread Bertrand Drouvot

Hi,

On Thu, Mar 21, 2024 at 05:05:46AM +0530, Bharath Rupireddy wrote:
> On Wed, Mar 20, 2024 at 1:04 PM Bertrand Drouvot
>  wrote:
> >
> > On Wed, Mar 20, 2024 at 08:58:05AM +0530, Amit Kapila wrote:
> > > On Wed, Mar 20, 2024 at 12:49 AM Bharath Rupireddy
> > >  wrote:
> > > >
> > > > Following are some open points:
> > > >
> > > > 1. Where to do inactive_timeout invalidation exactly if not the 
> > > > checkpointer.
> > > >
> > > I have suggested to do it at the time of CheckpointReplicationSlots()
> > > and Bertrand suggested to do it whenever we resume using the slot. I
> > > think we should follow both the suggestions.
> >
> > Agree. I also think that pg_get_replication_slots() would be a good place, 
> > so
> > that queries would return the right invalidation status.
> 
> I've addressed review comments and attaching the v13 patches with the
> following changes:

Thanks!

v13-0001 looks good to me. The only Nit (that I've mentioned up-thread) is that
in the pg_replication_slots view, the invalidation_reason is "far away" from the
conflicting field. I understand that one could query the fields individually but
when describing the view or reading the doc, it seems more appropriate to see
them closer. Also as "failover" and "synced" are also new in version 17, there
is no risk to break order by "17,18" kind of queries (which are the failover
and sync positions).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-21 Thread Bertrand Drouvot

Hi,

On Thu, Mar 21, 2024 at 04:13:31PM +0530, Bharath Rupireddy wrote:
> On Thu, Mar 21, 2024 at 3:20 PM Amit Kapila  wrote:
> >
> > > My concern was that we set catalog_xmin at logical slot creation time. So 
> > > if we
> > > set last_inactive_at to zero at creation time and the slot is not used 
> > > for a long
> > > period of time > timeout, then I think it's not helping there.
> >
> > But, we do call ReplicationSlotRelease() after slot creation. For
> > example, see CreateReplicationSlot(). So wouldn't that take care of
> > the case you are worried about?
> 
> Right. That's true even for pg_create_physical_replication_slot and
> pg_create_logical_replication_slot. AFAICS, setting it to the current
> timestamp in ReplicationSlotRelease suffices unless I'm missing
> something.

Right, but we have:

"
if (set_last_inactive_at &&
slot->data.persistency == RS_PERSISTENT)
{
/*
 * There's no point in allowing failover slots to get invalidated
 * based on slot's inactive_timeout parameter on standby. The failover
 * slots simply get synced from the primary on the standby.
 */
if (!(RecoveryInProgress() && slot->data.failover))
{
SpinLockAcquire(&slot->mutex);
slot->last_inactive_at = GetCurrentTimestamp();
SpinLockRelease(&slot->mutex);
}
}
"

while we set set_last_inactive_at to false at creation time so that 
last_inactive_at
is not set to GetCurrentTimestamp(). We should set set_last_inactive_at to true
if a timeout is provided during the slot creation.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-22 Thread Bertrand Drouvot

Hi,

On Fri, Mar 22, 2024 at 10:49:17AM +0530, Amit Kapila wrote:
> On Thu, Mar 21, 2024 at 11:21 PM Bharath Rupireddy
>  wrote:
> >
> >
> > Please find the v14-0001 patch for now.

Thanks!

> LGTM. Let's wait for Bertrand to see if he has more comments on 0001
> and then I'll push it.

LGTM too.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-22 Thread Bertrand Drouvot

Hi,

On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote:
> On Fri, Mar 22, 2024 at 12:39 PM Bertrand Drouvot
>  wrote:
> >
> > > > Please find the v14-0001 patch for now.
> >
> > Thanks!
> >
> > > LGTM. Let's wait for Bertrand to see if he has more comments on 0001
> > > and then I'll push it.
> >
> > LGTM too.
> 
> Thanks. Here I'm implementing the following:

Thanks!

> 0001 Track invalidation_reason in pg_replication_slots
> 0002 Track last_inactive_at in pg_replication_slots
> 0003 Allow setting inactive_timeout for replication slots via SQL API
> 0004 Introduce new SQL funtion pg_alter_replication_slot
> 0005 Allow setting inactive_timeout in the replication command
> 0006 Add inactive_timeout based replication slot invalidation
> 
> 1. Keep it last_inactive_at as a shared memory variable, but always
> set it at restart if the slot's inactive_timeout has non-zero value
> and reset it as soon as someone acquires that slot so that if the slot
> doesn't get acquired  till inactive_timeout, checkpointer will
> invalidate the slot.
> 4. last_inactive_at should also be set to the current time during slot
> creation because if one creates a slot and does nothing with it then
> it's the time it starts to be inactive.

I did not look at the code yet but just tested the behavior. It works as you
describe it but I think this behavior is weird because:

- when we create a slot without a timeout then last_inactive_at is set. I think
that's fine, but then:
- when we restart the engine, then last_inactive_at is gone (as timeout is not
set).

I think last_inactive_at should be set also at engine restart even if there is
no timeout. I don't think we should link both. Changing my mind here on this
subject due to the testing.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-22 Thread Bertrand Drouvot

Hi,

On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote:
> On Fri, Mar 22, 2024 at 12:39 PM Bertrand Drouvot
>  wrote:
> >
> > > > Please find the v14-0001 patch for now.
> >
> > Thanks!
> >
> > > LGTM. Let's wait for Bertrand to see if he has more comments on 0001
> > > and then I'll push it.
> >
> > LGTM too.
> 
> 
> Please see the attached v14 patch set. No change in the attached
> v14-0001 from the previous patch.

Looking at v14-0002:

1 ===

@@ -691,6 +699,13 @@ ReplicationSlotRelease(void)
ConditionVariableBroadcast(&slot->active_cv);
}

+   if (slot->data.persistency == RS_PERSISTENT)
+   {
+   SpinLockAcquire(&slot->mutex);
+   slot->last_inactive_at = GetCurrentTimestamp();
+   SpinLockRelease(&slot->mutex);
+   }

I'm not sure we should do system calls while we're holding a spinlock.
Assign a variable before?

2 ===

Also, what about moving this here?

"
if (slot->data.persistency == RS_PERSISTENT)
{
/*
 * Mark persistent slot inactive.  We're not freeing it, just
 * disconnecting, but wake up others that may be waiting for it.
 */
SpinLockAcquire(&slot->mutex);
slot->active_pid = 0;
SpinLockRelease(&slot->mutex);
ConditionVariableBroadcast(&slot->active_cv);
}
"

That would avoid testing twice "slot->data.persistency == RS_PERSISTENT".

3 ===

@@ -2341,6 +2356,7 @@ RestoreSlotFromDisk(const char *name)

slot->in_use = true;
slot->active_pid = 0;
+   slot->last_inactive_at = 0;

I think we should put GetCurrentTimestamp() here. It's done in v14-0006 but I
think it's better to do it in 0002 (and not taking care of inactive_timeout).

4 ===

Track last_inactive_at in pg_replication_slots

 doc/src/sgml/system-views.sgml   | 11 +++
 src/backend/catalog/system_views.sql |  3 ++-
 src/backend/replication/slot.c   | 16 
 src/backend/replication/slotfuncs.c  |  7 ++-
 src/include/catalog/pg_proc.dat  |  6 +++---
 src/include/replication/slot.h   |  3 +++
 src/test/regress/expected/rules.out  |  5 +++--
 7 files changed, 44 insertions(+), 7 deletions(-)

Worth to add some tests too (or we postpone them in future commits because we're
confident enough they will follow soon)?

5 ===

Most of the fields that reflect a time (not duration) in the system views are
_time, so I'm wondering if instead of "last_inactive_at" we should use
something like "last_inactive_time"?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-22 Thread Bertrand Drouvot

Hi,

On Fri, Mar 22, 2024 at 02:59:21PM +0530, Amit Kapila wrote:
> On Fri, Mar 22, 2024 at 2:27 PM Bertrand Drouvot
>  wrote:
> >
> > On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote:
> >
> > > 0001 Track invalidation_reason in pg_replication_slots
> > > 0002 Track last_inactive_at in pg_replication_slots
> > > 0003 Allow setting inactive_timeout for replication slots via SQL API
> > > 0004 Introduce new SQL funtion pg_alter_replication_slot
> > > 0005 Allow setting inactive_timeout in the replication command
> > > 0006 Add inactive_timeout based replication slot invalidation
> > >
> > > 1. Keep it last_inactive_at as a shared memory variable, but always
> > > set it at restart if the slot's inactive_timeout has non-zero value
> > > and reset it as soon as someone acquires that slot so that if the slot
> > > doesn't get acquired  till inactive_timeout, checkpointer will
> > > invalidate the slot.
> > > 4. last_inactive_at should also be set to the current time during slot
> > > creation because if one creates a slot and does nothing with it then
> > > it's the time it starts to be inactive.
> >
> > I did not look at the code yet but just tested the behavior. It works as you
> > describe it but I think this behavior is weird because:
> >
> > - when we create a slot without a timeout then last_inactive_at is set. I 
> > think
> > that's fine, but then:
> > - when we restart the engine, then last_inactive_at is gone (as timeout is 
> > not
> > set).
> >
> > I think last_inactive_at should be set also at engine restart even if there 
> > is
> > no timeout.
> 
> I think it is the opposite. Why do we need to set  'last_inactive_at'
> when inactive_timeout is not set?

I think those are unrelated, one could want to know when a slot has been 
inactive
even if no timeout is set. I understand that for this patch series we have in 
mind 
to use them both to invalidate slots but I think that there is use case to not
use both in correlation. Also not setting last_inactive_at could give the 
"false"
impression that the slot is active.

> BTW, haven't we discussed that we
> don't need to set 'last_inactive_at' at the time of slot creation as
> it is sufficient to set it at the time ReplicationSlotRelease()?

Right.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-22 Thread Bertrand Drouvot

Hi,

On Fri, Mar 22, 2024 at 03:56:23PM +0530, Amit Kapila wrote:
> On Fri, Mar 22, 2024 at 3:15 PM Bertrand Drouvot
>  wrote:
> >
> > On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote:
> >
> > 1 ===
> >
> > @@ -691,6 +699,13 @@ ReplicationSlotRelease(void)
> > ConditionVariableBroadcast(&slot->active_cv);
> > }
> >
> > +   if (slot->data.persistency == RS_PERSISTENT)
> > +   {
> > +   SpinLockAcquire(&slot->mutex);
> > +   slot->last_inactive_at = GetCurrentTimestamp();
> > +   SpinLockRelease(&slot->mutex);
> > +   }
> >
> > I'm not sure we should do system calls while we're holding a spinlock.
> > Assign a variable before?
> >
> > 2 ===
> >
> > Also, what about moving this here?
> >
> > "
> > if (slot->data.persistency == RS_PERSISTENT)
> > {
> > /*
> >  * Mark persistent slot inactive.  We're not freeing it, just
> >  * disconnecting, but wake up others that may be waiting for it.
> >  */
> > SpinLockAcquire(&slot->mutex);
> > slot->active_pid = 0;
> > SpinLockRelease(&slot->mutex);
> > ConditionVariableBroadcast(&slot->active_cv);
> > }
> > "
> >
> > That would avoid testing twice "slot->data.persistency == RS_PERSISTENT".
> >
> 
> That sounds like a good idea. Also, don't we need to consider physical
> slots where we don't reserve WAL during slot creation? I don't think
> there is a need to set inactive_at for such slots.

If the slot is not active, why shouldn't we set inactive_at? I can understand
that such a slots do not present "any risks" but I think we should still set
inactive_at (also to not give the false impression that the slot is active).

> > 5 ===
> >
> > Most of the fields that reflect a time (not duration) in the system views 
> > are
> > xxxx_time, so I'm wondering if instead of "last_inactive_at" we should use
> > something like "last_inactive_time"?
> >
> 
> How about naming it as last_active_time? This will indicate the time
> at which the slot was last active.

I thought about it too but I think it could be missleading as one could think 
that 
it should be updated each time WAL record decoding is happening.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-22 Thread Bertrand Drouvot

Hi,

On Fri, Mar 22, 2024 at 04:16:19PM +0530, Amit Kapila wrote:
> On Fri, Mar 22, 2024 at 3:23 PM Bertrand Drouvot
>  wrote:
> >
> > On Fri, Mar 22, 2024 at 02:59:21PM +0530, Amit Kapila wrote:
> > > On Fri, Mar 22, 2024 at 2:27 PM Bertrand Drouvot
> > >  wrote:
> > > >
> > > > On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote:
> > > >
> > > > > 0001 Track invalidation_reason in pg_replication_slots
> > > > > 0002 Track last_inactive_at in pg_replication_slots
> > > > > 0003 Allow setting inactive_timeout for replication slots via SQL API
> > > > > 0004 Introduce new SQL funtion pg_alter_replication_slot
> > > > > 0005 Allow setting inactive_timeout in the replication command
> > > > > 0006 Add inactive_timeout based replication slot invalidation
> > > > >
> > > > > 1. Keep it last_inactive_at as a shared memory variable, but always
> > > > > set it at restart if the slot's inactive_timeout has non-zero value
> > > > > and reset it as soon as someone acquires that slot so that if the slot
> > > > > doesn't get acquired  till inactive_timeout, checkpointer will
> > > > > invalidate the slot.
> > > > > 4. last_inactive_at should also be set to the current time during slot
> > > > > creation because if one creates a slot and does nothing with it then
> > > > > it's the time it starts to be inactive.
> > > >
> > > > I did not look at the code yet but just tested the behavior. It works 
> > > > as you
> > > > describe it but I think this behavior is weird because:
> > > >
> > > > - when we create a slot without a timeout then last_inactive_at is set. 
> > > > I think
> > > > that's fine, but then:
> > > > - when we restart the engine, then last_inactive_at is gone (as timeout 
> > > > is not
> > > > set).
> > > >
> > > > I think last_inactive_at should be set also at engine restart even if 
> > > > there is
> > > > no timeout.
> > >
> > > I think it is the opposite. Why do we need to set  'last_inactive_at'
> > > when inactive_timeout is not set?
> >
> > I think those are unrelated, one could want to know when a slot has been 
> > inactive
> > even if no timeout is set. I understand that for this patch series we have 
> > in mind
> > to use them both to invalidate slots but I think that there is use case to 
> > not
> > use both in correlation. Also not setting last_inactive_at could give the 
> > "false"
> > impression that the slot is active.
> >
> 
> I see your point and agree with this. I feel we can commit this part
> first then,

Agree that in this case the current ordering makes sense (as setting
last_inactive_at would be completly unrelated to the timeout).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-22 Thread Bertrand Drouvot

Hi,

On Fri, Mar 22, 2024 at 06:02:11PM +0530, Amit Kapila wrote:
> On Fri, Mar 22, 2024 at 5:30 PM Bertrand Drouvot
>  wrote:
> > On Fri, Mar 22, 2024 at 03:56:23PM +0530, Amit Kapila wrote:
> > > >
> > > > That would avoid testing twice "slot->data.persistency == 
> > > > RS_PERSISTENT".
> > > >
> > >
> > > That sounds like a good idea. Also, don't we need to consider physical
> > > slots where we don't reserve WAL during slot creation? I don't think
> > > there is a need to set inactive_at for such slots.
> >
> > If the slot is not active, why shouldn't we set inactive_at? I can 
> > understand
> > that such a slots do not present "any risks" but I think we should still set
> > inactive_at (also to not give the false impression that the slot is active).
> >
> 
> But OTOH, there is a chance that we will invalidate such slots even
> though they have never reserved WAL in the first place which doesn't
> appear to be a good thing.

That's right but I don't think it is not a good thing. I think we should treat
inactive_at as an independent field (like if the timeout one does not exist at
all) and just focus on its meaning (slot being inactive). If one sets a timeout
(> 0) and gets an invalidation then I think it works as designed (even if the
slot does not present any "risk" as it does not hold any rows or WAL). 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-23 Thread Bertrand Drouvot

Hi,

On Sat, Mar 23, 2024 at 01:11:50PM +0530, Bharath Rupireddy wrote:
> On Sat, Mar 23, 2024 at 11:27 AM Amit Kapila  wrote:
> >
> > How about adding the test in 019_replslot_limit? It is not a direct
> > fit but I feel later we can even add 'invalid_timeout' related tests
> > in this file which will use last_inactive_time feature.
> 
> I'm thinking the other way. Now, the new TAP file 043_replslot_misc.pl
> can have last_inactive_time tests, and later invalid_timeout ones too.
> This way 019_replslot_limit.pl is not cluttered.

I share the same opinion as Amit: I think 019_replslot_limit would be a better
place, because I see the timeout as another kind of limit.

> 
> > It is also
> > possible that some of the tests added by the 'invalid_timeout' feature
> > will obviate the need for some of these tests.
> 
> Might be. But, I prefer to keep both these tests separate but in the
> same file 043_replslot_misc.pl. Because we cover some corner cases the
> last_inactive_time is set upon loading the slot from disk.

Right but I think that this test does not necessary have to be in the same .pl
as the one testing the timeout. Could be added in one of the existing .pl like
001_stream_rep.pl for example.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-25 Thread Bertrand Drouvot

Hi,

On Mon, Mar 25, 2024 at 12:25:21PM +0530, Bharath Rupireddy wrote:
> On Mon, Mar 25, 2024 at 10:28 AM Amit Kapila  wrote:
> >
> > On Mon, Mar 25, 2024 at 9:48 AM Amit Kapila  wrote:
> > >
> > >
> > > Such a test looks reasonable but shall we add equal to in the second
> > > part of the test (like '$last_inactive_time'::timestamptz >=
> > > > '$slot_creation_time'::timestamptz;). This is just to be sure that even 
> > > > if the test ran fast enough to give the same time, the test shouldn't 
> > > > fail. I think it won't matter for correctness as well.
> 
> Agree. I added that in v19 patch. I was having that concern in my
> mind. That's the reason I wasn't capturing current_time something like
> below for the same worry that current_timestamp might be the same (or
> nearly the same) as the slot creation time. That's why I ended up
> capturing current_timestamp in a separate query than clubbing it up
> with pg_create_physical_replication_slot.
> 
> SELECT current_timestamp FROM pg_create_physical_replication_slot('foo');
> 
> > Apart from this, I have made minor changes in the comments. See and
> > let me know what you think of attached.
> 

Thanks!

v19-0001 LGTM, just one Nit comment for 019_replslot_limit.pl:

The code for "Get last_inactive_time value after the slot's creation" and 
"Check that the captured time is sane" is somehow duplicated: is it worth 
creating
2 functions?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-25 Thread Bertrand Drouvot

Hi,

On Mon, Mar 25, 2024 at 12:59:52PM +0530, Amit Kapila wrote:
> On Mon, Mar 25, 2024 at 12:43 PM shveta malik  wrote:
> >
> > On Mon, Mar 25, 2024 at 11:53 AM shveta malik  
> > wrote:
> > >
> > > On Mon, Mar 25, 2024 at 10:33 AM shveta malik  
> > > wrote:
> > > >
> > > > On Sun, Mar 24, 2024 at 3:06 PM Bharath Rupireddy
> > > >  wrote:
> > > > >
> > > > > I've attached the v18 patch set here.
> >
> > I have one concern, for synced slots on standby, how do we disallow
> > invalidation due to inactive-timeout immediately after promotion?
> >
> > For synced slots, last_inactive_time and inactive_timeout are both
> > set.

Yeah, and I can see last_inactive_time is moving on the standby (while not the
case on the primary), probably due to the sync worker slot acquisition/release
which does not seem right.

> Let's say I bring down primary for promotion of standby and then
> > promote standby, there are chances that it may end up invalidating
> > synced slots (considering standby is not brought down during promotion
> > and thus inactive_timeout may already be past 'last_inactive_time').
> >
> 
> This raises the question of whether we need to set
> 'last_inactive_time' synced slots on the standby?

Yeah, I think that last_inactive_time should stay at 0 on synced slots on the
standby because such slots are not usable anyway (until the standby gets 
promoted).

So, I think that last_inactive_time does not make sense if the slot never had
the chance to be active.

OTOH I think the timeout invalidation (if any) should be synced from primary.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-25 Thread Bertrand Drouvot

Hi,

On Mon, Mar 25, 2024 at 02:07:21PM +0530, shveta malik wrote:
> On Mon, Mar 25, 2024 at 1:37 PM Bertrand Drouvot
>  wrote:
> >
> > Hi,
> >
> > On Mon, Mar 25, 2024 at 12:59:52PM +0530, Amit Kapila wrote:
> > > On Mon, Mar 25, 2024 at 12:43 PM shveta malik  
> > > wrote:
> > > >
> > > > On Mon, Mar 25, 2024 at 11:53 AM shveta malik  
> > > > wrote:
> > > > >
> > > > > On Mon, Mar 25, 2024 at 10:33 AM shveta malik 
> > > > >  wrote:
> > > > > >
> > > > > > On Sun, Mar 24, 2024 at 3:06 PM Bharath Rupireddy
> > > > > >  wrote:
> > > > > > >
> > > > > > > I've attached the v18 patch set here.
> > > >
> > > > I have one concern, for synced slots on standby, how do we disallow
> > > > invalidation due to inactive-timeout immediately after promotion?
> > > >
> > > > For synced slots, last_inactive_time and inactive_timeout are both
> > > > set.
> >
> > Yeah, and I can see last_inactive_time is moving on the standby (while not 
> > the
> > case on the primary), probably due to the sync worker slot 
> > acquisition/release
> > which does not seem right.
> >
> > > Let's say I bring down primary for promotion of standby and then
> > > > promote standby, there are chances that it may end up invalidating
> > > > synced slots (considering standby is not brought down during promotion
> > > > and thus inactive_timeout may already be past 'last_inactive_time').
> > > >
> > >
> > > This raises the question of whether we need to set
> > > 'last_inactive_time' synced slots on the standby?
> >
> > Yeah, I think that last_inactive_time should stay at 0 on synced slots on 
> > the
> > standby because such slots are not usable anyway (until the standby gets 
> > promoted).
> >
> > So, I think that last_inactive_time does not make sense if the slot never 
> > had
> > the chance to be active.
> >
> > OTOH I think the timeout invalidation (if any) should be synced from 
> > primary.
> 
> Yes, even I feel that last_inactive_time makes sense only when the
> slot is available to be used. Synced slots are not available to be
> used until standby is promoted and thus last_inactive_time can be
> skipped to be set for synced_slots. But once primay is invalidated due
> to inactive-timeout, that invalidation should be synced to standby
> (which is happening currently).
> 

yeah, syncing the invalidation and always keeping last_inactive_time to zero 
for synced slots looks right to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-25 Thread Bertrand Drouvot

Hi,

On Mon, Mar 25, 2024 at 02:39:50PM +0530, shveta malik wrote:
> I am listing the concerns raised by me:
> 3) alter replication slot to alter inactive_timout for synced slots on
> standby, should this be allowed?

I don't think it should be allowed.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: pgsql: Track last_inactive_time in pg_replication_slots.

2024-03-25 Thread Bertrand Drouvot

Hi,

On Mon, Mar 25, 2024 at 08:38:16PM +0530, Amit Kapila wrote:
> On Mon, Mar 25, 2024 at 7:51 PM Robert Haas  wrote:
> >
> > On Mon, Mar 25, 2024 at 10:02 AM Amit Kapila  
> > wrote:
> > > We considered the other two names as last_inactive_at and
> > > last_active_time. For the first (last_inactive_at), there was an
> > > argument that most other fields that display time ends with _time. For
> > > the second (last_active_time), there was an argument that it could be
> > > misleading as one could think that it should be updated each time WAL
> > > record decoding is happening [1]. The other possibility is to name it
> > > last_used_time but I think it won't be much different from
> > > last_active_time.
> >
> > I don't understand the bit about updating it each time WAL record
> > decoding is happening. If it's the last active time, and the slot is
> > currently active, then the answer is either "right now" or "currently
> > undefined." I'd expect to see NULL in the system view in such a case.
> > And if that's so, then there's nothing to update each time a record is
> > decoded, because it's just still going to show NULL.
> >
> 
> IIUC, Bertrand's point was that users can interpret last_active_time
> as a value that gets updated each time they decode a change which is
> not what we are doing. So, this can confuse users. Your expectation of
> answer (NULL) when the slot is active is correct and that is what will
> happen.

Yeah, and so would be the confusion: why is last_active_time NULL while one is
using the slot?

> > Why does this field get set to the current time when the slot is
> > restored from disk?
> >
> 
> It is because we don't want to include the time the server is down in
> the last_inactive_time. Say, if we are shutting down the server at
> time X and the server remains down for another two hours, we don't
> want to include those two hours as the slot inactive time. The related
> theory is that this field will be used to invalidate inactive slots
> based on a threshold (say inactive_timeout). Say, before the shutdown,
> we release the slot and set the current_time for last_inactive_time
> for each slot and persist that information as well. Now, if the server
> is down for a long time, we may invalidate the slots as soon as the
> server comes up. So, instead, we just set this field at the time we
> read slots for disk and then reset it to 0/NULL as soon as the slot
> became active.

Right, and we also want to invalidate the slot if not used duration > timeout,
so that setting the field to zero when the slot is restored from disk is also 
not
an option.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: pgsql: Track last_inactive_time in pg_replication_slots.

2024-03-25 Thread Bertrand Drouvot

Hi,

On Mon, Mar 25, 2024 at 08:59:55PM +0530, Amit Kapila wrote:
> On Mon, Mar 25, 2024 at 8:46 PM Bertrand Drouvot
>  wrote:
> >
> > On Mon, Mar 25, 2024 at 08:38:16PM +0530, Amit Kapila wrote:
> > > On Mon, Mar 25, 2024 at 7:51 PM Robert Haas  wrote:
> > > >
> > > > On Mon, Mar 25, 2024 at 10:02 AM Amit Kapila  
> > > > wrote:
> > > > > We considered the other two names as last_inactive_at and
> > > > > last_active_time. For the first (last_inactive_at), there was an
> > > > > argument that most other fields that display time ends with _time. For
> > > > > the second (last_active_time), there was an argument that it could be
> > > > > misleading as one could think that it should be updated each time WAL
> > > > > record decoding is happening [1]. The other possibility is to name it
> > > > > last_used_time but I think it won't be much different from
> > > > > last_active_time.
> > > >
> > > > I don't understand the bit about updating it each time WAL record
> > > > decoding is happening. If it's the last active time, and the slot is
> > > > currently active, then the answer is either "right now" or "currently
> > > > undefined." I'd expect to see NULL in the system view in such a case.
> > > > And if that's so, then there's nothing to update each time a record is
> > > > decoded, because it's just still going to show NULL.
> > > >
> > >
> > > IIUC, Bertrand's point was that users can interpret last_active_time
> > > as a value that gets updated each time they decode a change which is
> > > not what we are doing. So, this can confuse users. Your expectation of
> > > answer (NULL) when the slot is active is correct and that is what will
> > > happen.
> >
> > Yeah, and so would be the confusion: why is last_active_time NULL while one 
> > is
> > using the slot?
> >
> 
> It is because we set it to zero when we acquire the slot and that
> value will remain the same till the slot is active. I am not sure if I
> understood your question so what I am saying might not make sense.

There is no "real" question, I was just highlighting the confusion in case we
name the field "last_active_time".

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: pgsql: Track last_inactive_time in pg_replication_slots.

2024-03-25 Thread Bertrand Drouvot

Hi,

On Mon, Mar 25, 2024 at 11:49:00AM -0400, Robert Haas wrote:
> On Mon, Mar 25, 2024 at 11:16 AM Bertrand Drouvot
>  wrote:
> > > IIUC, Bertrand's point was that users can interpret last_active_time
> > > as a value that gets updated each time they decode a change which is
> > > not what we are doing. So, this can confuse users. Your expectation of
> > > answer (NULL) when the slot is active is correct and that is what will
> > > happen.
> >
> > Yeah, and so would be the confusion: why is last_active_time NULL while one 
> > is
> > using the slot?
> 
> I agree that users could get confused here, but the solution to that
> shouldn't be to give the field a name that is the opposite of what it
> actually does. I expect a field called last_inactive_time to tell me
> the last time that the slot was inactive. Here, it tells me the last
> time that a currently-inactive slot previously *WAS* active. How can
> you justify calling that the last *INACTIVE* time?
> 
> AFAICS, the user who has the confusion that you mention here is simply
> wrong. If they are looking at a field called "last active time" and
> the slot is active, then the correct answer is "right now" or
> "undefined" and that is what they will see. Sure, they might not
> understand that. But flipping the name of the field on its head cannot
> be the right way to help them.
> 
> With the current naming, I expect to have the exact opposite confusion
> as your hypothetical confused user. I'm going to be looking at a slot
> that's currently inactive, and it's going to tell me that the
> last_inactive_time was at some time in the past. And I'm going to say
> "what the heck is going on here, the slot is inactive *right now*!"
> 
> Half of me wonders whether we should avoid this whole problem by
> renaming it to something like last_state_change or
> last_state_change_time, or maybe just state_change like we do in
> pg_stat_activity, and making it mean the last time the slot flipped
> between active and inactive in either direction. I'm not sure if this
> is better, but unless I'm misunderstanding something, the current
> situation is terrible.
> 

Now that I read your arguments I think that last__time could be
both missleading because at the end they rely on users "expectation".

Would "released_time" sounds better? (at the end this is exactly what it does 
represent unless for the case where it is restored from disk for which the 
meaning
would still makes sense to me though). It seems to me that released_time does 
not
lead to any expectation then removing any confusion.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: pgsql: Track last_inactive_time in pg_replication_slots.

2024-03-25 Thread Bertrand Drouvot

Hi,

On Mon, Mar 25, 2024 at 07:32:11PM +0530, Amit Kapila wrote:
> On Mon, Mar 25, 2024 at 6:57 PM Robert Haas  wrote:
> > And I'm suspicious that having an exception for slots being synced is
> > a bad idea. That makes too much of a judgement about how the user will
> > use this field. It's usually better to just expose the data, and if
> > the user needs helps to make sense of that data, then give them that
> > help separately.
> 
> The reason we didn't set this for sync slots is that they won't be
> usable (one can't use them to decode WAL) unless standby is promoted
> [2]. But I see your point as well. So, I have copied the others
> involved in this discussion to see what they think.

Yeah I also see Robert's point. If we also sync the "last inactive time" field 
then
we would need to take care of the corner case mentioned by Shveta in [1] during
promotion.

[1]: 
https://www.postgresql.org/message-id/CAJpy0uCLu%2BmqAwAMum%3DpXE9YYsy0BE7hOSw_Wno5vjwpFY%3D63g%40mail.gmail.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: pgsql: Track last_inactive_time in pg_replication_slots.

2024-03-25 Thread Bertrand Drouvot

Hi,

On Mon, Mar 25, 2024 at 12:25:37PM -0400, Robert Haas wrote:
> On Mon, Mar 25, 2024 at 12:12 PM Bertrand Drouvot
>  wrote:
> > Would "released_time" sounds better? (at the end this is exactly what it 
> > does
> > represent unless for the case where it is restored from disk for which the 
> > meaning
> > would still makes sense to me though). It seems to me that released_time 
> > does not
> > lead to any expectation then removing any confusion.
> 
> Yeah, that's not bad. I mean, I don't agree that released_time doesn't
> lead to any expectation,
> but what it leads me to expect is that you're
> going to tell me the time at which the slot was released. So if it's
> currently active, then I see NULL, because it's not released; but if
> it's inactive, then I see the time at which it became so.
> 
> In the same vein, I think deactivated_at or inactive_since might be
> good names to consider. I think they get at the same thing as
> released_time, but they avoid introducing a completely new word
> (release, as opposed to active/inactive).
> 

Yeah, I'd vote for inactive_since then.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-25 Thread Bertrand Drouvot

Hi,

On Tue, Mar 26, 2024 at 09:30:32AM +0530, shveta malik wrote:
> On Mon, Mar 25, 2024 at 12:43 PM shveta malik  wrote:
> >
> > I have one concern, for synced slots on standby, how do we disallow
> > invalidation due to inactive-timeout immediately after promotion?
> >
> > For synced slots, last_inactive_time and inactive_timeout are both
> > set. Let's say I bring down primary for promotion of standby and then
> > promote standby, there are chances that it may end up invalidating
> > synced slots (considering standby is not brought down during promotion
> > and thus inactive_timeout may already be past 'last_inactive_time').
> >
> 
> On standby, if we decide to maintain valid last_inactive_time for
> synced slots, then invalidation is correctly restricted in
> InvalidateSlotForInactiveTimeout() for synced slots using the check:
> 
> if (RecoveryInProgress() && slot->data.synced)
> return false;

Right.

> But immediately after promotion, we can not rely on the above check
> and thus possibility of synced slots invalidation is there. To
> maintain consistent behavior regarding the setting of
> last_inactive_time for synced slots, similar to user slots, one
> potential solution to prevent this invalidation issue is to update the
> last_inactive_time of all synced slots within the ShutDownSlotSync()
> function during FinishWalRecovery(). This approach ensures that
> promotion doesn't immediately invalidate slots, and henceforth, we
> possess a correct last_inactive_time as a basis for invalidation going
> forward. This will be equivalent to updating last_inactive_time during
> restart (but without actual restart during promotion).
> The plus point of maintaining last_inactive_time for synced slots
> could be, this can provide data to the user on when last time the sync
> was attempted on that particular slot by background slot sync worker
> or SQl function. Thoughts?

Yeah, another plus point is that if the primary is down then one could look
at the synced "active_since" on the standby to get an idea of it (depends of the
last sync though).

The issue that I can see with your proposal is: what if one synced the slots
manually (with pg_sync_replication_slots()) but does not use the sync worker?
Then I think ShutDownSlotSync() is not going to help in that case.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-25 Thread Bertrand Drouvot

Hi,

On Tue, Mar 26, 2024 at 05:55:11AM +, Bertrand Drouvot wrote:
> Hi,
> 
> On Tue, Mar 26, 2024 at 09:30:32AM +0530, shveta malik wrote:
> > On Mon, Mar 25, 2024 at 12:43 PM shveta malik  
> > wrote:
> > >
> > > I have one concern, for synced slots on standby, how do we disallow
> > > invalidation due to inactive-timeout immediately after promotion?
> > >
> > > For synced slots, last_inactive_time and inactive_timeout are both
> > > set. Let's say I bring down primary for promotion of standby and then
> > > promote standby, there are chances that it may end up invalidating
> > > synced slots (considering standby is not brought down during promotion
> > > and thus inactive_timeout may already be past 'last_inactive_time').
> > >
> > 
> > On standby, if we decide to maintain valid last_inactive_time for
> > synced slots, then invalidation is correctly restricted in
> > InvalidateSlotForInactiveTimeout() for synced slots using the check:
> > 
> > if (RecoveryInProgress() && slot->data.synced)
> > return false;
> 
> Right.
> 
> > But immediately after promotion, we can not rely on the above check
> > and thus possibility of synced slots invalidation is there. To
> > maintain consistent behavior regarding the setting of
> > last_inactive_time for synced slots, similar to user slots, one
> > potential solution to prevent this invalidation issue is to update the
> > last_inactive_time of all synced slots within the ShutDownSlotSync()
> > function during FinishWalRecovery(). This approach ensures that
> > promotion doesn't immediately invalidate slots, and henceforth, we
> > possess a correct last_inactive_time as a basis for invalidation going
> > forward. This will be equivalent to updating last_inactive_time during
> > restart (but without actual restart during promotion).
> > The plus point of maintaining last_inactive_time for synced slots
> > could be, this can provide data to the user on when last time the sync
> > was attempted on that particular slot by background slot sync worker
> > or SQl function. Thoughts?
> 
> Yeah, another plus point is that if the primary is down then one could look
> at the synced "active_since" on the standby to get an idea of it (depends of 
> the
> last sync though).
> 
> The issue that I can see with your proposal is: what if one synced the slots
> manually (with pg_sync_replication_slots()) but does not use the sync worker?
> Then I think ShutDownSlotSync() is not going to help in that case.

It looks like ShutDownSlotSync() is always called (even if 
sync_replication_slots = off),
so that sounds ok to me (I should have checked the code, I was under the 
impression
ShutDownSlotSync() was not called if sync_replication_slots = off).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-26 Thread Bertrand Drouvot

Hi,

On Tue, Mar 26, 2024 at 11:07:51AM +0530, Bharath Rupireddy wrote:
> On Tue, Mar 26, 2024 at 9:30 AM shveta malik  wrote:
> > But immediately after promotion, we can not rely on the above check
> > and thus possibility of synced slots invalidation is there. To
> > maintain consistent behavior regarding the setting of
> > last_inactive_time for synced slots, similar to user slots, one
> > potential solution to prevent this invalidation issue is to update the
> > last_inactive_time of all synced slots within the ShutDownSlotSync()
> > function during FinishWalRecovery(). This approach ensures that
> > promotion doesn't immediately invalidate slots, and henceforth, we
> > possess a correct last_inactive_time as a basis for invalidation going
> > forward. This will be equivalent to updating last_inactive_time during
> > restart (but without actual restart during promotion).
> > The plus point of maintaining last_inactive_time for synced slots
> > could be, this can provide data to the user on when last time the sync
> > was attempted on that particular slot by background slot sync worker
> > or SQl function. Thoughts?
> 
> Please find the attached v21 patch implementing the above idea. It
> also has changes for renaming last_inactive_time to inactive_since.

Thanks!

A few comments:

1 ===

One trailing whitespace:

Applying: Fix review comments for slot's last_inactive_time property
.git/rebase-apply/patch:433: trailing whitespace.
# got a valid inactive_since value representing the last slot sync time.
warning: 1 line adds whitespace errors.

2 ===

It looks like inactive_since is set to the current timestamp on the standby
each time the sync worker does a cycle:

primary:

postgres=# select slot_name,inactive_since from pg_replication_slots where 
failover = 't';
  slot_name  |inactive_since
-+---
 lsub27_slot | 2024-03-26 07:39:19.745517+00
 lsub28_slot | 2024-03-26 07:40:24.953826+00

standby:

postgres=# select slot_name,inactive_since from pg_replication_slots where 
failover = 't';
  slot_name  |inactive_since
-+---
 lsub27_slot | 2024-03-26 07:43:56.387324+00
 lsub28_slot | 2024-03-26 07:43:56.387338+00

I don't think that should be the case.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-26 Thread Bertrand Drouvot

Hi,

On Tue, Mar 26, 2024 at 01:37:21PM +0530, Amit Kapila wrote:
> On Tue, Mar 26, 2024 at 1:15 PM Bertrand Drouvot
>  wrote:
> >
> > 2 ===
> >
> > It looks like inactive_since is set to the current timestamp on the standby
> > each time the sync worker does a cycle:
> >
> > primary:
> >
> > postgres=# select slot_name,inactive_since from pg_replication_slots where 
> > failover = 't';
> >   slot_name  |inactive_since
> > -+---
> >  lsub27_slot | 2024-03-26 07:39:19.745517+00
> >  lsub28_slot | 2024-03-26 07:40:24.953826+00
> >
> > standby:
> >
> > postgres=# select slot_name,inactive_since from pg_replication_slots where 
> > failover = 't';
> >   slot_name  |inactive_since
> > -+---
> >  lsub27_slot | 2024-03-26 07:43:56.387324+00
> >  lsub28_slot | 2024-03-26 07:43:56.387338+00
> >
> > I don't think that should be the case.
> >
> 
> But why? This is exactly what we discussed in another thread where we
> agreed to update inactive_since even for sync slots.

Hum, I thought we agreed to "sync" it and to "update it to current time"
only at promotion time.

I don't think updating inactive_since to current time during each cycle makes
sense (I mean I understand the use case: being able to say when slots have been
sync, but if this is what we want then we should consider an extra view or an
extra field but not relying on the inactive_since one).

If the primary goes down, not updating inactive_since to the current time could
also provide benefit such as knowing the inactive_since of the primary slots
(from the standby) the last time it has been synced. If we update it to the 
current
time then this information is lost.

> In each sync
> cycle, we acquire/release the slot, so the inactive_since gets
> updated. See synchronize_one_slot().

Right, and I think we should put an extra condition if in recovery.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: pgsql: Track last_inactive_time in pg_replication_slots.

2024-03-26 Thread Bertrand Drouvot

Hi,

On Tue, Mar 26, 2024 at 01:45:23PM +0530, Amit Kapila wrote:
> On Tue, Mar 26, 2024 at 1:09 PM Alvaro Herrera  
> wrote:
> >
> > On 2024-Mar-26, Amit Kapila wrote:
> >
> > > We have a consensus on inactive_since, so I'll make that change.
> >
> > Sounds reasonable.  So this is a timestamptz if the slot is inactive,
> > NULL if active, right?
> >
> 
> Yes.
> 
> >  What value is it going to have for sync slots?
> >
> 
> The behavior will be the same for non-sync slots. In each sync cycle,
> we acquire/release the sync slots. So at the time of release,
> inactive_since will be updated. See email [1].

I don't think we should set inactive_since to the current time at each sync 
cycle,
see [1] as to why. What do you think?

[1]: 
https://www.postgresql.org/message-id/ZgKGIDC5lttWTdJH%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-26 Thread Bertrand Drouvot

Hi,

On Tue, Mar 26, 2024 at 02:27:17PM +0530, Bharath Rupireddy wrote:
> Please use the v22 patch set.

Thanks!

1 ===

+reset_synced_slots_info(void)

I'm not sure "reset" is the right word, what about slot_sync_shutdown_update()?

2 ===

+   for (int i = 0; i < max_replication_slots; i++)
+   {
+   ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
+
+   /* Check if it is a synchronized slot */
+   if (s->in_use && s->data.synced)
+   {
+   TimestampTz now;
+
+   Assert(SlotIsLogical(s));
+   Assert(s->active_pid == 0);
+
+   /*
+* Set the time since the slot has become inactive 
after shutting
+* down slot sync machinery. This helps correctly 
interpret the
+* time if the standby gets promoted without a restart. 
We get the
+* current time beforehand to avoid a system call while 
holding
+* the lock.
+*/
+   now = GetCurrentTimestamp();

What about moving "now = GetCurrentTimestamp()" outside of the for loop? (it
would be less costly and probably good enough).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-26 Thread Bertrand Drouvot

Hi,

On Tue, Mar 26, 2024 at 03:17:36PM +0530, shveta malik wrote:
> On Tue, Mar 26, 2024 at 1:54 PM Bertrand Drouvot
>  wrote:
> >
> > Hi,
> >
> > On Tue, Mar 26, 2024 at 01:37:21PM +0530, Amit Kapila wrote:
> > > On Tue, Mar 26, 2024 at 1:15 PM Bertrand Drouvot
> > >  wrote:
> > > >
> > > > 2 ===
> > > >
> > > > It looks like inactive_since is set to the current timestamp on the 
> > > > standby
> > > > each time the sync worker does a cycle:
> > > >
> > > > primary:
> > > >
> > > > postgres=# select slot_name,inactive_since from pg_replication_slots 
> > > > where failover = 't';
> > > >   slot_name  |inactive_since
> > > > -+---
> > > >  lsub27_slot | 2024-03-26 07:39:19.745517+00
> > > >  lsub28_slot | 2024-03-26 07:40:24.953826+00
> > > >
> > > > standby:
> > > >
> > > > postgres=# select slot_name,inactive_since from pg_replication_slots 
> > > > where failover = 't';
> > > >   slot_name  |inactive_since
> > > > -+---
> > > >  lsub27_slot | 2024-03-26 07:43:56.387324+00
> > > >  lsub28_slot | 2024-03-26 07:43:56.387338+00
> > > >
> > > > I don't think that should be the case.
> > > >
> > >
> > > But why? This is exactly what we discussed in another thread where we
> > > agreed to update inactive_since even for sync slots.
> >
> > Hum, I thought we agreed to "sync" it and to "update it to current time"
> > only at promotion time.
> 
> I think there may have been some misunderstanding here.

Indeed ;-)

> But now if I
> rethink this, I am fine with 'inactive_since' getting synced from
> primary to standby. But if we do that, we need to add docs stating
> "inactive_since" represents primary's inactivity and not standby's
> slots inactivity for synced slots.

Yeah sure.

> The reason for this clarification
> is that the synced slot might be generated much later, yet
> 'inactive_since' is synced from the primary, potentially indicating a
> time considerably earlier than when the synced slot was actually
> created.

Right.

> Another approach could be that "inactive_since" for synced slot
> actually gives its own inactivity data rather than giving primary's
> slot data. We update inactive_since on standby only at 3 occasions:
> 1) at the time of creation of the synced slot.
> 2) during standby restart.
> 3) during promotion of standby.
> 
> I have attached a sample patch for this idea as.txt file.

Thanks!

> I am fine with any of these approaches.  One gives data synced from
> primary for synced slots, while another gives actual inactivity data
> of synced slots.

What about another approach?: inactive_since gives data synced from primary for
synced slots and another dedicated field (could be added later...) could
represent what you suggest as the other option.

Another cons of updating inactive_since at the current time during each slot
sync cycle is that calling GetCurrentTimestamp() very frequently
(during each sync cycle of very active slots) could be too costly.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-26 Thread Bertrand Drouvot

Hi,

On Tue, Mar 26, 2024 at 04:49:18PM +0530, shveta malik wrote:
> On Tue, Mar 26, 2024 at 4:35 PM Bharath Rupireddy
>  wrote:
> >
> > On Tue, Mar 26, 2024 at 4:18 PM shveta malik  wrote:
> > >
> > > > What about another approach?: inactive_since gives data synced from 
> > > > primary for
> > > > synced slots and another dedicated field (could be added later...) could
> > > > represent what you suggest as the other option.
> > >
> > > Yes, okay with me. I think there is some confusion here as well. In my
> > > second approach above, I have not suggested anything related to
> > > sync-worker. We can think on that later if we really need another
> > > field which give us sync time.  In my second approach, I have tried to
> > > avoid updating inactive_since for synced slots during sync process. We
> > > update that field during creation of synced slot so that
> > > inactive_since reflects correct info even for synced slots (rather
> > > than copying from primary). Please have a look at my patch and let me
> > > know your thoughts. I am fine with copying it from primary as well and
> > > documenting this behaviour.
> >
> > I took a look at your patch.
> >
> > --- a/src/backend/replication/logical/slotsync.c
> > +++ b/src/backend/replication/logical/slotsync.c
> > @@ -628,6 +628,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
> > remote_dbid)
> >  SpinLockAcquire(&slot->mutex);
> >  slot->effective_catalog_xmin = xmin_horizon;
> >  slot->data.catalog_xmin = xmin_horizon;
> > +slot->inactive_since = GetCurrentTimestamp();
> >  SpinLockRelease(&slot->mutex);
> >
> > If we just sync inactive_since value for synced slots while in
> > recovery from the primary, so be it. Why do we need to update it to
> > the current time when the slot is being created?
> 
> If we update inactive_since  at synced slot's creation or during
> restart (skipping setting it during sync), then this time reflects
> actual 'inactive_since' for that particular synced slot.  Isn't that a
> clear info for the user and in alignment of what the name
> 'inactive_since' actually suggests?
> 
> > We don't expose slot
> > creation time, no?
> 
> No, we don't. But for synced slot, that is the time since that slot is
> inactive  (unless promoted), so we are exposing inactive_since and not
> creation time.
> 
> >Aren't we fine if we just sync the value from
> > primary and document that fact? After the promotion, we can reset it
> > to the current time so that it gets its own time. Do you see any
> > issues with it?
> 
> Yes, we can do that. But curious to know, do we see any additional
> benefit of reflecting primary's inactive_since at standby which I
> might be missing?

In case the primary goes down, then one could use the value on the standby
to get the value coming from the primary. I think that could be useful info to
have.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-26 Thread Bertrand Drouvot

Hi,

On Tue, Mar 26, 2024 at 04:17:53PM +0530, shveta malik wrote:
> On Tue, Mar 26, 2024 at 3:50 PM Bertrand Drouvot
>  wrote:
> >
> > Hi,
> >
> > > I think there may have been some misunderstanding here.
> >
> > Indeed ;-)
> >
> > > But now if I
> > > rethink this, I am fine with 'inactive_since' getting synced from
> > > primary to standby. But if we do that, we need to add docs stating
> > > "inactive_since" represents primary's inactivity and not standby's
> > > slots inactivity for synced slots.
> >
> > Yeah sure.
> >
> > > The reason for this clarification
> > > is that the synced slot might be generated much later, yet
> > > 'inactive_since' is synced from the primary, potentially indicating a
> > > time considerably earlier than when the synced slot was actually
> > > created.
> >
> > Right.
> >
> > > Another approach could be that "inactive_since" for synced slot
> > > actually gives its own inactivity data rather than giving primary's
> > > slot data. We update inactive_since on standby only at 3 occasions:
> > > 1) at the time of creation of the synced slot.
> > > 2) during standby restart.
> > > 3) during promotion of standby.
> > >
> > > I have attached a sample patch for this idea as.txt file.
> >
> > Thanks!
> >
> > > I am fine with any of these approaches.  One gives data synced from
> > > primary for synced slots, while another gives actual inactivity data
> > > of synced slots.
> >
> > What about another approach?: inactive_since gives data synced from primary 
> > for
> > synced slots and another dedicated field (could be added later...) could
> > represent what you suggest as the other option.
> 
> Yes, okay with me. I think there is some confusion here as well. In my
> second approach above, I have not suggested anything related to
> sync-worker.

Yeah, no confusion, understood that way.

> We can think on that later if we really need another
> field which give us sync time.

I think that calling GetCurrentTimestamp() so frequently could be too costly, so
I'm not sure we should.

> In my second approach, I have tried to
> avoid updating inactive_since for synced slots during sync process. We
> update that field during creation of synced slot so that
> inactive_since reflects correct info even for synced slots (rather
> than copying from primary). 

Yeah, and I think we could create a dedicated field with this information
if we feel the need.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: pgsql: Track last_inactive_time in pg_replication_slots.

2024-03-26 Thread Bertrand Drouvot

Hi,

On Tue, Mar 26, 2024 at 04:39:55PM +0100, Alvaro Herrera wrote:
> On 2024-Mar-26, Nathan Bossart wrote:
> > I don't object to a
> > time-based setting as well, but that won't always work as well for this
> > particular use-case, especially if we are relying on users to set a
> > slot-level parameter.
> 
> I think slot-level parameters are mostly useless, because it takes just
> one slot where you forget to set it for disaster to strike.

I think that's a fair point. So maybe we should focus on having a GUC first and
later on re-think about having (or not) a slot based one (in addition to the 
GUC).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-26 Thread Bertrand Drouvot

Hi,

On Tue, Mar 26, 2024 at 09:59:23PM +0530, Bharath Rupireddy wrote:
> On Tue, Mar 26, 2024 at 4:35 PM Bharath Rupireddy
>  wrote:
> >
> > If we just sync inactive_since value for synced slots while in
> > recovery from the primary, so be it. Why do we need to update it to
> > the current time when the slot is being created? We don't expose slot
> > creation time, no? Aren't we fine if we just sync the value from
> > primary and document that fact? After the promotion, we can reset it
> > to the current time so that it gets its own time.
> 
> I'm attaching v24 patches. It implements the above idea proposed
> upthread for synced slots. I've now separated
> s/last_inactive_time/inactive_since and synced slots behaviour. Please
> have a look.

Thanks!

 v24-0001

It's now pure mechanical changes and it looks good to me.

 v24-0002

1 ===

This commit does two things:
1) Updates inactive_since for sync slots with the value
received from the primary's slot.

Tested it and it does that.

2 ===

2) Ensures the value is set to current timestamp during the
shutdown of slot sync machinery to help correctly interpret the
time if the standby gets promoted without a restart.

Tested it and it does that.

3 ===

+/*
+ * Reset the synced slots info such as inactive_since after shutting
+ * down the slot sync machinery.
+ */
+static void
+update_synced_slots_inactive_time(void)

Looks like the comment "reset" is not matching the name of the function and
what it does.

4 ===

+   /*
+* We get the current time beforehand and only once to 
avoid
+* system calls overhead while holding the lock.
+*/
+   if (now == 0)
+   now = GetCurrentTimestamp();

Also +1 of having GetCurrentTimestamp() just called one time within the loop.

5 ===

-   if (!(RecoveryInProgress() && slot->data.synced))
+   if (!(InRecovery && slot->data.synced))
slot->inactive_since = GetCurrentTimestamp();
else
slot->inactive_since = 0;

Not related to this change but more the way RestoreSlotFromDisk() behaves here:

For a sync slot on standby it will be set to zero and then later will be
synchronized with the one coming from the primary. I think that's fine to have
it to zero for this window of time.

Now, if the standby is down and one sets sync_replication_slots to off,
then inactive_since will be set to zero on the standby at startup and not 
synchronized (unless one triggers a manual sync). I also think that's fine but
it might be worth to document this behavior (that after a standby startup
inactive_since is zero until the next sync...). 

6 ===

+   print "HI  $slot_name $name $inactive_since $slot_creation_time\n";

garbage?

7 ===

+# Capture and validate inactive_since of a given slot.
+sub capture_and_validate_slot_inactive_since
+{
+   my ($node, $slot_name, $slot_creation_time) = @_;
+   my $name = $node->name;

We know have capture_and_validate_slot_inactive_since at 2 places:
040_standby_failover_slots_sync.pl and 019_replslot_limit.pl.

Worth to create a sub in Cluster.pm?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-26 Thread Bertrand Drouvot

Hi,

On Wed, Mar 27, 2024 at 10:08:33AM +0530, Bharath Rupireddy wrote:
> On Tue, Mar 26, 2024 at 11:22 PM Bertrand Drouvot
>  wrote:
> >
> > -   if (!(RecoveryInProgress() && slot->data.synced))
> > +   if (!(InRecovery && slot->data.synced))
> > slot->inactive_since = GetCurrentTimestamp();
> > else
> > slot->inactive_since = 0;
> >
> > Not related to this change but more the way RestoreSlotFromDisk() behaves 
> > here:
> >
> > For a sync slot on standby it will be set to zero and then later will be
> > synchronized with the one coming from the primary. I think that's fine to 
> > have
> > it to zero for this window of time.
> 
> Right.
> 
> > Now, if the standby is down and one sets sync_replication_slots to off,
> > then inactive_since will be set to zero on the standby at startup and not
> > synchronized (unless one triggers a manual sync). I also think that's fine 
> > but
> > it might be worth to document this behavior (that after a standby startup
> > inactive_since is zero until the next sync...).
> 
> Isn't this behaviour applicable for other slot parameters that the
> slot syncs from the remote slot on the primary?

No they are persisted on disk. If not, we'd not know where to resume the 
decoding
from on the standby in case primary is down and/or sync is off.

> I've added the following note in the comments when we update
> inactive_since in RestoreSlotFromDisk.
> 
>  * Note that for synced slots after the standby starts up (i.e. after
>  * the slots are loaded from the disk), the inactive_since will remain
>  * zero until the next slot sync cycle.
>  */
> if (!(InRecovery && slot->data.synced))
> slot->inactive_since = GetCurrentTimestamp();
> else
> slot->inactive_since = 0;

I think we should add some words in the doc too and also about what the meaning
of inactive_since on the standby is (as suggested by Shveta in [1]).

[1]: 
https://www.postgresql.org/message-id/CAJpy0uDkTW%2Bt1k3oPkaipFBzZePfFNB5DmiA%3D%3DpxRGcAdpF%3DPg%40mail.gmail.com

> > 7 ===
> >
> > +# Capture and validate inactive_since of a given slot.
> > +sub capture_and_validate_slot_inactive_since
> > +{
> > +   my ($node, $slot_name, $slot_creation_time) = @_;
> > +   my $name = $node->name;
> >
> > We know have capture_and_validate_slot_inactive_since at 2 places:
> > 040_standby_failover_slots_sync.pl and 019_replslot_limit.pl.
> >
> > Worth to create a sub in Cluster.pm?
> 
> I'd second that thought for now. We might have to debate first if it's
> useful for all the nodes even without replication, and if yes, the
> naming stuff and all that. Historically, we've had such duplicated
> functions until recently, for instance advance_wal and log_contains.
> We
> moved them over to a common perl library Cluster.pm very recently. I'm
> sure we can come back later to move it to Cluster.pm.

I thought that would be the right time not to introduce duplicated code.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: pgsql: Track last_inactive_time in pg_replication_slots.

2024-03-27 Thread Bertrand Drouvot

Hi,

On Wed, Mar 27, 2024 at 12:28:06PM +0530, Bharath Rupireddy wrote:
> On Wed, Mar 27, 2024 at 10:10 AM Amit Kapila  wrote:
> >
> > On Tue, Mar 26, 2024 at 9:10 PM Alvaro Herrera  
> > wrote:
> > >
> > > On 2024-Mar-26, Nathan Bossart wrote:
> > >
> > > > FWIW I'd really prefer to have something like max_slot_xid_age for 
> > > > this.  A
> > > > time-based parameter would likely help with most cases, but transaction 
> > > > ID
> > > > usage will vary widely from server to server, so it'd be nice to have
> > > > something to protect against wraparound more directly.
> > >
> > > Yeah, I tend to agree that an XID-based limit makes more sense than a
> > > time-based one.
> > >
> > So, do we want the time-based parameter or just max_slot_xid_age
> > considering both will be GUC's? Yes, the xid_age based parameter
> > sounds to be directly helpful for transaction ID wraparound or dead
> > row cleanup, OTOH having a lot of inactive slots (which is possible in
> > use cases where a tool creates a lot of slots and forgets to remove
> > them, or the tool exits without cleaning up slots (say due to server
> > shutdown)) also prohibit removing dead space which is not nice either?
> 
> I've personally seen the leftover slots problem on production systems
> where a timeout based invalidation mechanism could have been of more
> help to save time and reduce manual intervention. Usually, most if not
> all, migration/upgrade/other tools create slots, and if at all any
> errors occur or the operation gets cancelled midway, there's a chance
> that the slots can be leftover if such tools forgets to clean them up
> either because there was a bug or for whatever reasons. These are
> unintended/ghost slots for the database user unnecessarily holding up
> resources such as XIDs, dead rows and WAL; which might lead to XID
> wraparound or server crash if unnoticed. Although XID based GUC helps
> a lot, why do these unintended and unnoticed slots need to hold up the
> resources even before the XID age of say 1.5 or 2 billion is reached.
> 
> With both GUCs max_slot_xid_age and replication_slot_inactive_timeout
> in place, I can set max_slot_xid_age = 1.5 billion or so and also set
> replication_slot_inactive_timeout =  2 days or so to make the database
> foolproof.

Yeah, I think that both makes senses. The reason is that one depends of the
database activity and slot activity (the xid age one) while the other (the
timeout one) depends only of the slot activity.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-27 Thread Bertrand Drouvot

Hi,

On Wed, Mar 27, 2024 at 02:55:17PM +0530, Bharath Rupireddy wrote:
> Please check the attached v27 patch which also has Bertrand's comment
> on deduplicating the TAP function. I've now moved it to Cluster.pm.

Thanks!

1 ===

+Note that the slots on the standbys that are being synced from a
+primary server (whose synced field is
+true), will get the
+inactive_since value from the
+corresponding remote slot on the primary. Also, note that for the
+synced slots on the standby, after the standby starts up (i.e. after
+the slots are loaded from the disk), the inactive_since will remain
+zero until the next slot sync cycle.

Not sure we should mention the "(i.e. after the slots are loaded from the disk)"
and also "cycle" (as that does not sound right in case of manual sync).

My proposal (in text) but feel free to reword it:

Note that the slots on the standbys that are being synced from a
primary server (whose synced field is true), will get the inactive_since value
from the corresponding remote slot on the primary. Also, after the standby 
starts
up, the inactive_since (for such synced slots) will remain zero until the next
synchronization.

2 ===

+=item $node->create_logical_slot_on_standby(self, primary, slot_name, dbname)

get_slot_inactive_since_value instead?

3 ===

+against given reference time.

s/given reference/optional given reference/?

Apart from the above, LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-27 Thread Bertrand Drouvot

Hi,

On Wed, Mar 27, 2024 at 05:55:05PM +0530, Bharath Rupireddy wrote:
> On Wed, Mar 27, 2024 at 3:42 PM Bertrand Drouvot
> Please see the attached v28 patch.

Thanks!

1 === sorry I missed it in the previous review

if (!(RecoveryInProgress() && slot->data.synced))
+   {
now = GetCurrentTimestamp();
+   update_inactive_since = true;
+   }
+   else
+   update_inactive_since = false;

I think update_inactive_since is not needed, we could rely on (now > 0) instead.

2 ===

+=item $node->get_slot_inactive_since_value(self, primary, slot_name, dbname)
+
+Get inactive_since column value for a given replication slot validating it
+against optional reference time.
+
+=cut
+
+sub get_slot_inactive_since_value
+{

shouldn't be "=item $node->get_slot_inactive_since_value(self, slot_name, 
reference_time)"
instead?

Apart from the above, LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-27 Thread Bertrand Drouvot

Hi,

On Wed, Mar 27, 2024 at 09:00:37PM +0530, Bharath Rupireddy wrote:
> On Wed, Mar 27, 2024 at 6:54 PM Bertrand Drouvot
>  wrote:
> >
> > Hi,
> >
> > On Wed, Mar 27, 2024 at 05:55:05PM +0530, Bharath Rupireddy wrote:
> > > On Wed, Mar 27, 2024 at 3:42 PM Bertrand Drouvot
> > > Please see the attached v28 patch.
> >
> > Thanks!
> >
> > 1 === sorry I missed it in the previous review
> >
> > if (!(RecoveryInProgress() && slot->data.synced))
> > +   {
> > now = GetCurrentTimestamp();
> > +   update_inactive_since = true;
> > +   }
> > +   else
> > +   update_inactive_since = false;
> >
> > I think update_inactive_since is not needed, we could rely on (now > 0) 
> > instead.
> 
> Thought of using it, but, at the expense of readability. I prefer to
> use a variable instead.

That's fine too.

> However, I changed the variable to be more meaningful to is_slot_being_synced.

Yeah makes sense and even easier to read.

v29-0001 LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-28 Thread Bertrand Drouvot

slot/%s", NameStr(slot->data.name));
+   SaveSlotToPath(slot, path, ERROR);
+   }

Maybe we could create a new function say GivenReplicationSlotSave()
with a slot as parameter, that ReplicationSlotSave() could call too?

CR9 ===

+   if (check_for_invalidation)
+   {
+   /* The slot is ours by now */
+   Assert(s->active_pid == MyProcPid);
+
+   /*
+* Well, the slot is not yet ours really unless we check for the
+* invalidation below.
+*/
+   s->active_pid = 0;
+   if (InvalidateReplicationSlotForInactiveTimeout(s, true, true))
+   {
+   /*
+* If the slot has been invalidated, recalculate the 
resource
+* limits.
+*/
+   ReplicationSlotsComputeRequiredXmin(false);
+   ReplicationSlotsComputeRequiredLSN();
+
+   /* Might need it for slot clean up on error, so restore 
it */
+   s->active_pid = MyProcPid;
+   ereport(ERROR,
+   
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+errmsg("cannot acquire invalidated 
replication slot \"%s\"",
+   
NameStr(MyReplicationSlot->data.name;
+   }
+   s->active_pid = MyProcPid;

Are we not missing some SpinLockAcquire/Release on the slot's mutex here? (the
places where we set the active_pid).

CR10 ===

@@ -1628,6 +1674,10 @@ 
InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
if (SlotIsLogical(s))
invalidation_cause = cause;
break;
+   case RS_INVAL_INACTIVE_TIMEOUT:
+   if 
(InvalidateReplicationSlotForInactiveTimeout(s, false, false))
+   invalidation_cause = cause;
+   break;

InvalidatePossiblyObsoleteSlot() is not called with such a reason, better to use
an Assert here and in the caller too?

CR11 ===

+++ b/src/test/recovery/t/050_invalidate_slots.pl

why not using 019_replslot_limit.pl?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-03-28 Thread Bertrand Drouvot

Hi,

On Thu, Mar 28, 2024 at 04:38:19AM +, Zhijie Hou (Fujitsu) wrote:
> Hi,
> 
> When analyzing one BF error[1], we find an issue of slotsync: Since we don't
> perform logical decoding for the synced slots when syncing the lsn/xmin of
> slot, no logical snapshots will be serialized to disk. So, when user starts to
> use these synced slots after promotion, it needs to re-build the consistent
> snapshot from the restart_lsn if the WAL(xl_running_xacts) at restart_lsn
> position indicates that there are running transactions. This however could
> cause the data that before the consistent point to be missed[2].

I see, nice catch and explanation, thanks!

> This issue doesn't exist on the primary because the snapshot at restart_lsn
> should have been serialized to disk (SnapBuildProcessRunningXacts ->
> SnapBuildSerialize), so even if the logical decoding restarts, it can find
> consistent snapshot immediately at restart_lsn.

Right.

> To fix this, we could use the fast forward logical decoding to advance the 
> synced
> slot's lsn/xmin when syncing these values instead of directly updating the
> slot's info. This way, the snapshot will be serialized to disk when decoding.
> If we could not reach to the consistent point at the remote restart_lsn, the
> slot is marked as temp and will be persisted once it reaches the consistent
> point. I am still analyzing the fix and will share once ready.

Thanks! I'm wondering about the performance impact (even in fast_forward mode),
might be worth to keep an eye on it.

Should we create a 17 open item [1]?

[1]: https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-03-28 Thread Bertrand Drouvot

Hi,

On Thu, Mar 28, 2024 at 05:05:35PM +0530, Amit Kapila wrote:
> On Thu, Mar 28, 2024 at 3:34 PM Bertrand Drouvot
>  wrote:
> >
> > On Thu, Mar 28, 2024 at 04:38:19AM +, Zhijie Hou (Fujitsu) wrote:
> >
> > > To fix this, we could use the fast forward logical decoding to advance 
> > > the synced
> > > slot's lsn/xmin when syncing these values instead of directly updating the
> > > slot's info. This way, the snapshot will be serialized to disk when 
> > > decoding.
> > > If we could not reach to the consistent point at the remote restart_lsn, 
> > > the
> > > slot is marked as temp and will be persisted once it reaches the 
> > > consistent
> > > point. I am still analyzing the fix and will share once ready.
> >
> > Thanks! I'm wondering about the performance impact (even in fast_forward 
> > mode),
> > might be worth to keep an eye on it.
> >
> 
> True, we can consider performance but correctness should be a
> priority,

Yeah of course.

> and can we think of a better way to fix this issue?

I'll keep you posted if there is one that I can think of.

> > Should we create a 17 open item [1]?
> >
> > [1]: https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items
> >
> 
> Yes, we can do that.

done.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-28 Thread Bertrand Drouvot

Hi,

On Fri, Mar 29, 2024 at 09:39:31AM +0530, Amit Kapila wrote:
> On Wed, Mar 27, 2024 at 9:00 PM Bharath Rupireddy
>  wrote:
> >
> >
> > Thanks. I'm attaching v29 patches. 0001 managing inactive_since on the
> > standby for sync slots.
> >
> 
> Commit message states: "why we can't just update inactive_since for
> synced slots on the standby with the value received from remote slot
> on the primary. This is consistent with any other slot parameter i.e.
> all of them are synced from the primary."
> 
> The inactive_since is not consistent with other slot parameters which
> we copy. We don't perform anything related to those other parameters
> like say two_phase phase which can change that property. However, we
> do acquire the slot, advance the slot (as per recent discussion [1]),
> and release it. Since these operations can impact inactive_since, it
> seems to me that inactive_since is not the same as other parameters.
> It can have a different value than the primary. Why would anyone want
> to know the value of inactive_since from primary after the standby is
> promoted?

I think it can be useful "before" it is promoted and in case the primary is 
down.
I agree that tracking the activity time of a synced slot can be useful, why
not creating a dedicated field for that purpose (and keep inactive_since a
perfect "copy" of the primary)?

> Now, the other concern is that calling GetCurrentTimestamp()
> could be costly when the values for the slot are not going to be
> updated but if that happens we can optimize such that before acquiring
> the slot we can have some minimal pre-checks to ensure whether we need
> to update the slot or not.

Right, but for a very active slot it is likely that we call 
GetCurrentTimestamp()
during almost each sync cycle.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2024-03-28 Thread Bertrand Drouvot

Hi,

On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote:
> Attach a new version patch which fixed an un-initialized variable issue and
> added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so 
> that
> we can analyze the possible CFbot failures easily.
> 

Thanks!

+   if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
+   {
+   /*
+* By advancing the restart_lsn, confirmed_lsn, and xmin using
+* fast-forward logical decoding, we ensure that the required 
snapshots
+* are saved to disk. This enables logical decoding to quickly 
reach a
+* consistent point at the restart_lsn, eliminating the risk of 
missing
+* data during snapshot creation.
+*/
+   pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
+   
found_consistent_point);
+   ReplicationSlotsComputeRequiredLSN();
+   updated_lsn = true;
+   }

Instead of using pg_logical_replication_slot_advance() for each synced slot
and during sync cycles what about?:

- keep sync slot synchronization as it is currently (not using 
pg_logical_replication_slot_advance())
- create "an hidden" logical slot if sync slot feature is on
- at the time of promotion use pg_logical_replication_slot_advance() on this
hidden slot only to advance to the max lsn of the synced slots

I'm not sure that would be enough, just asking your thoughts on this (benefits
would be to avoid calling pg_logical_replication_slot_advance() on each sync 
slots
and during the sync cycles).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

1 2 3 4 5 6 7 8 9 >

1 - 100 of 812 matches

Mail list logo