Re: Perform streaming logical transactions by background workers and parallel apply

shveta malik Thu, 19 Jan 2023 02:14:38 -0800

On Thu, Jan 19, 2023 at 11:11 AM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Wed, Jan 18, 2023 at 12:09 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
> >
> > On Fri, Jan 13, 2023 at 11:50 AM Peter Smith <smithpb2...@gmail.com> wrote:
> > >
> > > Here are some review comments for patch v79-0002.
> > >
> >
> > So, this is about the latest v84-0001-Stop-extra-worker-if-GUC-was-changed.
> >
> > >
> > > I feel this patch just adds more complexity for almost no gain:
> > > - reducing the 'max_apply_workers_per_suibscription' seems not very
> > > common in the first place.
> > > - even when the GUC is reduced, at that point in time all the workers
> > > might be in use so there may be nothing that can be immediately done.
> > > - IIUC the excess workers (for a reduced GUC) are going to get freed
> > > naturally anyway over time as more transactions are completed so the
> > > pool size will reduce accordingly.
> > >
> >
> > I am still not sure if it is worth pursuing this patch because of the
> > above reasons. I don't think it would be difficult to add this even at
> > a later point in time if we really see a use case for this.
> > Sawada-San, IIRC, you raised this point. What do you think?
> >
> > The other point I am wondering is whether we can have a different way
> > to test partial serialization apart from introducing another developer
> > GUC (stream_serialize_threshold). One possibility could be that we can
> > have a subscription option (parallel_send_timeout or something like
> > that) with some default value (current_timeout used in the patch)
> > which will be used only when streaming = parallel. Users may want to
> > wait for more time before serialization starts depending on the
> > workload (say when resource usage is high on a subscriber-side
> > machine, or there are concurrent long-running transactions that can
> > block parallel apply for a bit longer time). I know with this as well
> > it may not be straightforward to test the functionality because we
> > can't be sure how many changes would be required for a timeout to
> > occur. This is just for brainstorming other options to test the
> > partial serialization functionality.
> >
>
> Apart from the above, we can also have a subscription option to
> specify parallel_shm_queue_size (queue_size used to determine the
> queue between the leader and parallel worker) and that can be used for
> this purpose. Basically, configuring it to a smaller value can help in
> reducing the test time but still, it will not eliminate the need for
> dependency on timing we have to wait before switching to partial
> serialize mode. I think this can be used in production as well to tune
> the performance depending on workload.
>
> Yet another way is to use the existing parameter logical_decode_mode
> [1]. If the value of logical_decoding_mode is 'immediate', then we can
> immediately switch to partial serialize mode. This will eliminate the
> dependency on timing. The one argument against using this is that it
> won't be as clear as a separate parameter like
> 'stream_serialize_threshold' proposed by the patch but OTOH we already
> have a few parameters that serve a different purpose when used on the
> subscriber. For example, 'max_replication_slots' is used to define the
> maximum number of replication slots on the publisher and the maximum
> number of origins on subscribers. Similarly,
> wal_retrieve_retry_interval' is used for different purposes on
> subscriber and standby nodes.
>
> [1] - https://www.postgresql.org/docs/devel/runtime-config-developer.html
>
> --
> With Regards,
> Amit Kapila.


Hi Amit,

On rethinking the complete model, what I feel is that the name
logical_decoding_mode is not something which defines modes of logical
decoding. We, I think, picked it based on logical_decoding_work_mem.
As per current implementation, the parameter 'logical_decoding_mode'
tells what happens when work-memory used by logical decoding reaches
its limit. So it is in-fact 'logicalrep_workmem_vacate_mode' or
'logicalrep_trans_eviction_mode'. So if it is set to immediate,
meaning vacate the work-memory immediately or evict transactions
immediately. Add buffered means the reverse (i.e. keep on buffering
transactions until we reach a limit). Now coming to subscribers, we
can reuse the same parameter. On subscriber as well, shared-memory
queue could be considered as its workmem and thus the name
'logicalrep_workmem_vacate_mode' might look more relevant.

On publisher:
logicalrep_workmem_vacate_mode=immediate, buffered.

On subscriber:
logicalrep_workmem_vacate_mode=stream_serialize  (or if we want to
keep immediate here too, that will also be fine).

Thoughts?
And I am assuming it is possible to change the GUC name before the
coming release. If not, please let me know, we can brainstorm other
ideas.

thanks
Shveta

Re: Perform streaming logical transactions by background workers and parallel apply

Reply via email to