On Thu, Jan 19, 2023 at 11:11 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Wed, Jan 18, 2023 at 12:09 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Fri, Jan 13, 2023 at 11:50 AM Peter Smith <smithpb2...@gmail.com> wrote: > > > > > > Here are some review comments for patch v79-0002. > > > > > > > So, this is about the latest v84-0001-Stop-extra-worker-if-GUC-was-changed. > > > > > > > > I feel this patch just adds more complexity for almost no gain: > > > - reducing the 'max_apply_workers_per_suibscription' seems not very > > > common in the first place. > > > - even when the GUC is reduced, at that point in time all the workers > > > might be in use so there may be nothing that can be immediately done. > > > - IIUC the excess workers (for a reduced GUC) are going to get freed > > > naturally anyway over time as more transactions are completed so the > > > pool size will reduce accordingly. > > > > > > > I am still not sure if it is worth pursuing this patch because of the > > above reasons. I don't think it would be difficult to add this even at > > a later point in time if we really see a use case for this. > > Sawada-San, IIRC, you raised this point. What do you think? > > > > The other point I am wondering is whether we can have a different way > > to test partial serialization apart from introducing another developer > > GUC (stream_serialize_threshold). One possibility could be that we can > > have a subscription option (parallel_send_timeout or something like > > that) with some default value (current_timeout used in the patch) > > which will be used only when streaming = parallel. Users may want to > > wait for more time before serialization starts depending on the > > workload (say when resource usage is high on a subscriber-side > > machine, or there are concurrent long-running transactions that can > > block parallel apply for a bit longer time). I know with this as well > > it may not be straightforward to test the functionality because we > > can't be sure how many changes would be required for a timeout to > > occur. This is just for brainstorming other options to test the > > partial serialization functionality. > > > > Apart from the above, we can also have a subscription option to > specify parallel_shm_queue_size (queue_size used to determine the > queue between the leader and parallel worker) and that can be used for > this purpose. Basically, configuring it to a smaller value can help in > reducing the test time but still, it will not eliminate the need for > dependency on timing we have to wait before switching to partial > serialize mode. I think this can be used in production as well to tune > the performance depending on workload. > > Yet another way is to use the existing parameter logical_decode_mode > [1]. If the value of logical_decoding_mode is 'immediate', then we can > immediately switch to partial serialize mode. This will eliminate the > dependency on timing. The one argument against using this is that it > won't be as clear as a separate parameter like > 'stream_serialize_threshold' proposed by the patch but OTOH we already > have a few parameters that serve a different purpose when used on the > subscriber. For example, 'max_replication_slots' is used to define the > maximum number of replication slots on the publisher and the maximum > number of origins on subscribers. Similarly, > wal_retrieve_retry_interval' is used for different purposes on > subscriber and standby nodes. > > [1] - https://www.postgresql.org/docs/devel/runtime-config-developer.html > > -- > With Regards, > Amit Kapila.
Hi Amit, On rethinking the complete model, what I feel is that the name logical_decoding_mode is not something which defines modes of logical decoding. We, I think, picked it based on logical_decoding_work_mem. As per current implementation, the parameter 'logical_decoding_mode' tells what happens when work-memory used by logical decoding reaches its limit. So it is in-fact 'logicalrep_workmem_vacate_mode' or 'logicalrep_trans_eviction_mode'. So if it is set to immediate, meaning vacate the work-memory immediately or evict transactions immediately. Add buffered means the reverse (i.e. keep on buffering transactions until we reach a limit). Now coming to subscribers, we can reuse the same parameter. On subscriber as well, shared-memory queue could be considered as its workmem and thus the name 'logicalrep_workmem_vacate_mode' might look more relevant. On publisher: logicalrep_workmem_vacate_mode=immediate, buffered. On subscriber: logicalrep_workmem_vacate_mode=stream_serialize (or if we want to keep immediate here too, that will also be fine). Thoughts? And I am assuming it is possible to change the GUC name before the coming release. If not, please let me know, we can brainstorm other ideas. thanks Shveta