Hi,
On 11/10/23 8:55 AM, Amit Kapila wrote:
On Fri, Nov 10, 2023 at 12:50 PM Drouvot, Bertrand
<bertranddrouvot...@gmail.com> wrote:
But even if we ERROR out instead of emitting a WARNING, the user would still
need to be notified/monitor such errors. I agree that then probably they will
come to know earlier because the slot sync mechanism would be stopped but still
it is not "guaranteed" (specially if there is no others "working" synced slots
around.)
And if they do not, then there is still a risk to use this slot after a
failover thinking this is a "synced" slot.
I think this is another reason that probably giving ERROR has better
chances for the user to notice before failover. IF knowing such errors
user still proceeds with the failover, the onus is on her.
Agree. My concern is more when they don't know about the error.
We can
probably document this hazard along with the failover feature so that
users are aware that they either need to be careful while creating
slots on standby or consult ERROR logs. I guess we can even make it
visible in the view also.
Yeah.
Giving more thoughts, what about using a dedicated/reserved naming convention
for
synced slot like synced_<primary_slot_name> or such and then:
- prevent user to create sync_<whatever> slots on standby
- sync <slot> on primary to sync_<slot> on standby
- during failover, rename sync_<slot> to <slot> and if <slot> exists then
emit a WARNING and keep sync_<slot> in place.
That way both slots are still in place (the manually created <slot> and
the sync_<slot<) and one could decide what to do with them.
Hmm, I think after failover, users need to rename all slots or we need
to provide a way to rename them so that they can be used by
subscribers which sounds like much more work.
Agree that's much more work for the subscriber case. Maybe that's not worth
the extra work.
Also, the current coding doesn't ensure
we will always give WARNING. If we see the below code that deals with
this WARNING,
+ /* User created slot with the same name exists, emit WARNING. */
+ else if (found && s->data.sync_state == SYNCSLOT_STATE_NONE)
+ {
+ ereport(WARNING,
+ errmsg("not synchronizing slot %s; it is a user created slot",
+ remote_slot->name));
+ }
+ /* Otherwise create the slot first. */
+ else
+ {
+ TransactionId xmin_horizon = InvalidTransactionId;
+ ReplicationSlot *slot;
+
+ ReplicationSlotCreate(remote_slot->name, true, RS_EPHEMERAL,
+ remote_slot->two_phase, false);
I think this is not a solid check to ensure that the slot existed
before. Because it could be created as soon as the slot sync worker
invokes ReplicationSlotCreate() here.
Agree.
So, having a concrete check to give WARNING would require some more
logic which I don't think is a good idea to handle this boundary case.
Yeah good point, agree to just error out in all the case then (if we discard
the sync_ reserved wording proposal, which seems to be the case as probably
not worth the extra work).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com