Hi,
On 8/14/23 11:52 AM, shveta malik wrote:
We (myself and Ajin) performed the tests to compute the lag in standby
slots as compared to primary slots with different number of slot-sync
workers configured.
Thanks!
3 DBs were created, each with 30 tables and each table having one
logical-pub/sub configured. So this made a total of 90 logical
replication slots to be synced. Then the workload was run for aprox 10
mins. During this workload, at regular intervals, primary and standby
slots' lsns were captured (from pg_replication_slots) and compared. At
each capture, the intent was to know how much is each standby's slot
lagging behind corresponding primary's slot by taking the distance
between confirmed_flush_lsn of primary and standby slot. Then we took
the average (integer value) of this distance over the span of 10 min
workload
Thanks for the explanations, make sense to me.
and this is what we got:
With max_slot_sync_workers=1, average-lag = 42290.3563
With max_slot_sync_workers=2, average-lag = 24585.1421
With max_slot_sync_workers=3, average-lag = 14964.9215
This shows that more workers have better chances to keep logical
replication slots in sync for this case.
Agree.
Another statistics if it interests you is, we ran a frequency test as
well (this by changing code, unit test sort of) to figure out the
'total number of times synchronization done' with different number of
sync-slots workers configured. Same 3 DBs setup with each DB having 30
logical replication slots. With 'max_slot_sync_workers' set at 1, 2
and 3; total number of times synchronization done was 15874, 20205 and
23414 respectively. Note: this is not on the same machine where we
captured lsn-gap data, it is on a little less efficient machine but
gives almost the same picture
Next we are planning to capture this data for a lesser number of slots
like 10,30,50 etc. It may happen that the benefit of multi-workers
over single workers in such cases could be less, but let's have the
data to verify that.
Thanks a lot for those numbers and for the testing!
Do you think it would make sense to also get the number of using
the pg_failover_slots module? (and compare the pg_failover_slots numbers with
the
"one worker" case here). Idea is to check if the patch does introduce
some overhead as compare to pg_failover_slots.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com