On Mon, Jul 7, 2025 at 11:03 AM Zhijie Hou (Fujitsu) wrote: > > On Sun, Jul 6, 2025 at 10:51 PM Masahiko Sawada wrote: > ================================================ > > ====================== > > > == > > > The workload is mostly same as [4]. > > > > > > Workload: > > > - Initially ran pgbench with 40 clients for the *both side*. > > > - Set max_conflict_retention_duration = {60, 120} > > > - When the slot is invalidated on the subscriber side, stop the > > > benchmark > > and > > > wait until the subscriber would be caught up. Then the number of > > > clients > > on > > > the publisher would be half. > > > In this test the conflict slot could be invalidated as expected > > > when the > > workload > > > on the publisher was high, and it would not get invalidated anymore > after > > > reducing the workload. This shows even if the slot has been > > > invalidated > > once, > > > users can continue to detect the update_deleted conflict by reduce the > > > workload on the publisher. > > > - Total period of the test was 900s for each cases. > > > > > > (max_conflixt.tar.gz can run the same workload) > > > > > > Observation: > > > - > > > - Parallelism of the publisher side is reduced till 15->7->3 and finally > > > the > > > conflict slot is not invalidated. > > > - TPS on the subscriber side is improved when the concurrency was > > reduced. > > > This is because the dead tuple accumulation is reduced on > > > subscriber > > due to > > > the reduced workload on the publisher. > > > - when publisher has Nclients=3, no regression in subscriber's TPS > > > > I think that users typically cannot control the amount of workloads in > > production, meaning that once the performance regression starts to > > happen the subscriber could enter the loop where invalidating the > > slot, recovreing the performance, creating the slot, and having the > performance problem. > > Yes, you are right. The test is designed to demonstrate that the slot can be > invalidated under high workload conditions as expected, while remaining valid > if the workload is reduced. In production systems where workload reduction > may not be possible, it’s recommended to disable `retain_conflict_info` to > enhance performance. This decision involves balancing the need for reliable > conflict detection with optimal system performance. > > I think the hot standby feedback also has a similar impact on the performance > of the primary, which is done to prevent the early removal of data necessary > for > the standby, ensuring that it remains accessible when needed.
For reference, we conducted test[1] to evaluate the impact of enabling hot standby feedback in a physical replication setup, observing approximately a 50% regression in TPS on the primary as well. [1] https://www.postgresql.org/message-id/CABdArM4OEwmh_31dQ8_F__VmHwk2ag_M%3DYDD4H%2ByYQBG%2BbHGzg%40mail.gmail.com Best Regards, Hou zj