On Fri, Jul 21, 2023 at 11:36 AM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > On Thu, Jul 20, 2023 at 5:05 PM shveta malik <shveta.ma...@gmail.com> wrote: > > > > On Fri, Jun 16, 2023 at 3:26 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > > > On Mon, Apr 17, 2023 at 7:37 PM Drouvot, Bertrand > > > <bertranddrouvot...@gmail.com> wrote: > > > > > > > > > 3. As mentioned in the initial email, I think it would be better to > > > replace LIST_SLOTS command with a SELECT query. > > > > > > > I had a look at this thread. I am interested to work on this and can > > spend some time addressing the comments given here. > > Thanks for your interest. Coincidentally, I started to split the patch > into 2 recently - 0001 making the specified logical wal senders wait > for specified standbys to ack, 0002 synchronize logical slots. I think > I'll have these patches ready by early next week. For 0002, I'll > consider your latest changes having LIST_SLOTS removed. >
Thanks Bharat for letting us know. It is okay to split the patch, it may definitely help to understand the modules better but shall we take a step back and try to reevaluate the design first before moving to other tasks? I analyzed more on the issues stated in [1] for replacing LIST_SLOTS with SELECT query. On rethinking, it might not be a good idea to replace this cmd with SELECT in Launcher code-path, because we do not have any database-connection in launcher and 'select' query needs one and thus we need to supply dbname to it. We may take the primary-dbname info in a new GUC from users, but I feel retaining LIST cmd is a better idea over adding a new GUC. But I do not see a reason why we should get complete replication-slots info in LIST command. The logic in Launcher is to get distinct database-ids info out of all the slots and start worker per database-id. Since we are only interested in database-id, I think we should change LIST_SLOTS to something like LIST_DBID_FOR_LOGICAL_SLOTS. This new command may get only unique database-ids for all the logical-slots (or the ones mentioned in synchronize_slot_names) from primary. By doing so, we can avoid huge network traffic in cases where the number of replication slots is quite high considering that max_replication_slots can go upto MAX_BACKENDS:2^18-1. So I plan to make this change where we retain LIST cmd over SELECT query but make this cmd's output restricted to only database-Ids. Thoughts? Secondly, I was thinking if the design proposed in the patch is the best one. No doubt, it is the most simplistic design and thus may prove very efficient for scenarios where we have a reasonable number of workers starting and each one actively busy in slots-synchronisation, handling almost equivalent load. But since we are starting one worker per database id, it may not be most efficient for cases where not all the databases are actively being used. We may have some workers (started for databases not in use) just waking up and sending queries to primary and then going back to sleep and in the process generating network traffic, while others may be heavily loaded to deal with large numbers of active slots for a heavily loaded database. I feel the design should be adaptable to load conditions i.e. if we have more number of actively used slots, then we should have more workers spawned to handle it and when the work is less then the number of spawned workers should be less. I have not thought it thoroughly yet and also not sure whether it will actually come out as a better one, but it or more such designs should be considered before we start fixing bugs in this patch. Kindly let me know if there are already discussions around it which I might have missed. Any feedback is appreciated. [1] : https://www.postgresql.org/message-id/CAJpy0uCMNz3XERP-Vzp-7rHFztJgc6d%2BxsmUVCqsxWPkZvQz0Q%40mail.gmail.com thanks Shveta