On Thu, Oct 27, 2022 at 11:34 AM shiy.f...@fujitsu.com <shiy.f...@fujitsu.com> wrote: > > On Wed, Oct 26, 2022 7:19 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Tue, Oct 25, 2022 at 8:38 AM Masahiko Sawada > > <sawada.m...@gmail.com> wrote: > > > > > > On Fri, Oct 21, 2022 at 6:32 PM houzj.f...@fujitsu.com > > > <houzj.f...@fujitsu.com> wrote: > > > > > > I've started to review this patch. I tested v40-0001 patch and have > > > one question: > > > > > > IIUC even when most of the changes in the transaction are filtered out > > > in pgoutput (eg., by relation filter or row filter), the walsender > > > sends STREAM_START. This means that the subscriber could end up > > > launching parallel apply workers also for almost empty (and streamed) > > > transactions. For example, I created three subscriptions each of which > > > subscribes to a different table. When I loaded a large amount of data > > > into one table, all three (leader) apply workers received START_STREAM > > > and launched their parallel apply workers. > > > > > > > The apply workers will be launched just the first time then we > > maintain a pool so that we don't need to restart them. > > > > > However, two of them > > > finished without applying any data. I think this behaviour looks > > > problematic since it wastes workers and rather decreases the apply > > > performance if the changes are not large. Is it worth considering a > > > way to delay launching a parallel apply worker until we find out the > > > amount of changes is actually large? > > > > > > > I think even if changes are less there may not be much difference > > because we have observed that the performance improvement comes from > > not writing to file. > > > > > For example, the leader worker > > > writes the streamed changes to files as usual and launches a parallel > > > worker if the amount of changes exceeds a threshold or the leader > > > receives the second segment. After that, the leader worker switches to > > > send the streamed changes to parallel workers via shm_mq instead of > > > files. > > > > > > > I think writing to file won't be a good idea as that can hamper the > > performance benefit in some cases and not sure if it is worth. > > > > I tried to test some cases that only a small part of the transaction or an > empty > transaction is sent to subscriber, to see if using streaming parallel will > bring > performance degradation. > > The test was performed ten times, and the average was taken. > The results are as follows. The details and the script of the test is > attached. > > 10% of rows are sent > ---------------------------------- > HEAD 24.4595 > patched 18.4545 > > 5% of rows are sent > ---------------------------------- > HEAD 21.244 > patched 17.9655 > > 0% of rows are sent > ---------------------------------- > HEAD 18.0605 > patched 17.893 > > > It shows that when only 5% or 10% of rows are sent to subscriber, using > parallel > apply takes less time than HEAD, and even if all rows are filtered there's no > performance degradation.
Thank you for the testing! I think this performance improvement comes from both applying changes in parallel to receiving changes and avoiding writing a file. I'm happy to know there is also a benefit also for small streaming transactions. I've also measured the overhead when processing streaming empty transactions and confirmed the overhead is negligible. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com