Re: Synchronizing slots from primary to standby

shveta malik Fri, 21 Jul 2023 04:47:01 -0700

On Fri, Jul 21, 2023 at 11:36 AM Bharath Rupireddy
<[email protected]> wrote:
>
> On Thu, Jul 20, 2023 at 5:05 PM shveta malik <[email protected]> wrote:
> >
> > On Fri, Jun 16, 2023 at 3:26 PM Amit Kapila <[email protected]> wrote:
> > >
> > > On Mon, Apr 17, 2023 at 7:37 PM Drouvot, Bertrand
> > > <[email protected]> wrote:
> > > >
> >
> > > 3. As mentioned in the initial email, I think it would be better to
> > > replace LIST_SLOTS command with a SELECT query.
> > >
> >
> > I had a look at this thread. I am interested to work on this and can
> > spend some time addressing the comments given here.
>
> Thanks for your interest. Coincidentally, I started to split the patch
> into 2 recently - 0001 making the specified logical wal senders wait
> for specified standbys to ack, 0002 synchronize logical slots. I think
> I'll have these patches ready by early next week. For 0002, I'll
> consider your latest changes having LIST_SLOTS removed.
>


Thanks Bharat for letting us know. It is okay to split the patch, it
may definitely help to understand the modules better but shall we take
a step back and try to reevaluate the design first before moving to
other tasks?

I analyzed more on the issues stated in [1] for replacing LIST_SLOTS
with SELECT query. On rethinking, it might not be a good idea to
replace this cmd with SELECT in Launcher code-path, because we do not
have any database-connection in launcher and 'select' query needs one
and thus we need to supply dbname to it. We may take the
primary-dbname info in a new GUC from users, but I feel retaining LIST
cmd is a better idea over adding a new GUC. But I do not see a reason
why we should get complete replication-slots info in LIST command. The
logic in Launcher is to get distinct database-ids info out of all the
slots and start worker per database-id. Since we are only interested
in database-id, I think we should change LIST_SLOTS to something like
LIST_DBID_FOR_LOGICAL_SLOTS. This new command may get only unique
database-ids for all the logical-slots (or the ones mentioned in
synchronize_slot_names) from primary. By doing so, we can avoid huge
network traffic in cases where the number of replication slots is
quite high considering that max_replication_slots can go upto
MAX_BACKENDS:2^18-1.  So I plan to make this change where we retain
LIST cmd over SELECT query but make this cmd's output restricted to
only database-Ids. Thoughts?

Secondly, I was thinking if the design proposed in the patch is the
best one. No doubt, it is the most simplistic design and thus may
prove very efficient for scenarios where we have a reasonable number
of workers starting and each one actively busy in
slots-synchronisation, handling almost equivalent load.  But since we
are starting one worker per database id, it may not be most efficient
for cases where not all the databases are actively being used. We may
have some workers (started for databases not in use) just waking up
and sending queries to primary and then going back to sleep and in the
process generating network traffic, while others may be heavily loaded
to deal with large numbers of active slots for a heavily loaded
database. I feel the design should be adaptable to load conditions
i.e. if we have more number of actively used slots, then we should
have more workers spawned to handle it and when the work is less then
the number of spawned workers should be less.
I have not thought it thoroughly yet and also not sure whether it will
actually come out as a better one, but it or more such designs should
be considered before we start fixing bugs in this patch. Kindly let me
know if there are already discussions around it which I might have
missed.  Any feedback is appreciated.

[1] : 
https://www.postgresql.org/message-id/CAJpy0uCMNz3XERP-Vzp-7rHFztJgc6d%2BxsmUVCqsxWPkZvQz0Q%40mail.gmail.com

thanks
Shveta

Re: Synchronizing slots from primary to standby

Reply via email to