On Mon, Jul 24, 2023 at 9:00 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > 2. All candidate standbys will start one slot sync worker per logical > > slot which might not be scalable. > > Yeah, that doesn't sound like a good idea but IIRC, the proposed patch > is using one worker per database (for all slots corresponding to a > database).
Right. It's based on one worker for each database. > > Is having one (or a few more - not > > necessarily one for each logical slot) worker for all logical slots > > enough? > > I guess for a large number of slots the is a possibility of a large > gap in syncing the slots which probably means we need to retain > corresponding WAL for a much longer time on the primary. If we can > prove that the gap won't be large enough to matter then this would be > probably worth considering otherwise, I think we should find a way to > scale the number of workers to avoid the large gap. I think the gap is largely determined by the time taken to advance each slot and the amount of WAL that each logical slot moves ahead on primary. I've measured the time it takes for pg_logical_replication_slot_advance with different amounts WAL on my system. It took 2595ms/5091ms/31238ms to advance the slot by 3.7GB/7.3GB/13GB respectively. To put things into perspective here, imagine there are 3 logical slots to sync for a single slot sync worker and each of them are in need of advancing the slot by 3.7GB/7.3GB/13GB of WAL. The slot sync worker gets to slot 1 again after 2595ms+5091ms+31238ms (~40sec), gets to slot 2 again after advance time of slot 1 with amount of WAL that the slot has moved ahead on primary during 40sec, gets to slot 3 again after advance time of slot 1 and slot 2 with amount of WAL that the slot has moved ahead on primary and so on. If WAL generation on the primary is pretty fast, and if the logical slot moves pretty fast on the primary, the time it takes for a single sync worker to sync a slot can increase. Now, let's think what happens if there's a large gap, IOW, a logical slot on standby is behind X amount of WAL from that of the logical slot on primary. The standby needs to retain more WAL for sure. IIUC, primary doesn't need to retain the WAL required for a logical slot on standby, no? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com