Hi,
On 11/10/23 4:31 AM, shveta malik wrote:
On Thu, Nov 9, 2023 at 9:15 PM Drouvot, Bertrand
<bertranddrouvot...@gmail.com> wrote:
Yeah I think so, because there is a time window when one could "use" the slot
after the promotion and before it is removed. Producing things like:
"
2023-11-09 15:16:50.294 UTC [2580462] LOG: dropped replication slot
"logical_slot2" of dbid 5 as it was not sync-ready
2023-11-09 15:16:50.295 UTC [2580462] LOG: dropped replication slot
"logical_slot3" of dbid 5 as it was not sync-ready
2023-11-09 15:16:50.297 UTC [2580462] LOG: dropped replication slot
"logical_slot4" of dbid 5 as it was not sync-ready
2023-11-09 15:16:50.297 UTC [2580462] ERROR: replication slot "logical_slot5"
is active for PID 2594628
"
After the promotion one was able to use logical_slot5 and now we can now drop
it.
Yes, I was suspicious about this small window which may allow others
to use this slot, that is why I was thinking of putting it in the
promotion flow and thus asked that question earlier. But the slot-sync
worker may end up creating it again in case it has not exited.
Sorry, there is a typo up-thread, I meant "After the promotion one was able to
use logical_slot5 and now we can NOT drop it.". We can not drop it because it
is in use.
So we
need to carefully decide at what all places we need to put 'not-in
recovery' checks in slot-sync workers. In the previous version,
synchronize_one_slot() had that check and it was skipping sync if
'!RecoveryInProgress'. But I have removed that check in v32 thinking
that the slots which the worker has already fetched from the primary,
let them all get synced and exit after that nstead of syncing half
and leaving rest. But now on rethinking, was the previous behaviour
correct i.e. skip sync at that point onward where we see it is no
longer in standby-mode while few of the slots have already been synced
in that sync-cycle. Thoughts?
I think we still need to think/discuss the promotion flow. I think we would need
to have the slot sync worker shutdown during the promotion (as suggested by
Amit in [1])
but before that let the sync slot worker knows it is now acting during
promotion.
Something like:
- let the sync worker know it is now acting under promotion
- do what needs to be done while acting under promotion
- shutdown the sync worker
That way we would avoid any "risk" of having the sync worker doing something
we don't expect while not in recovery anymore.
Regarding "do what needs to be done while acting under promotion":
- Ensure all slots in 'r' state are synced
- drop slots that are in 'i' state
Thoughts?
[1]:
https://www.postgresql.org/message-id/CAA4eK1J2Pc%3D5TOgty5u4bp--y7ZHaQx3_2eWPL%3DVPJ7A_0JF2g%40mail.gmail.com
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com