On Tue, Dec 27, 2022 at 10:36 AM Dilip Kumar <dilipbal...@gmail.com> wrote: > > On Tue, Dec 27, 2022 at 9:15 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Mon, Dec 26, 2022 at 7:35 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > > > > > In the commit message, there is a statement like this > > > > > > "However, if the leader apply worker times out while attempting to > > > send a message to the > > > parallel apply worker, it will switch to "partial serialize" mode - in > > > this > > > mode the leader serializes all remaining changes to a file and notifies > > > the > > > parallel apply workers to read and apply them at the end of the > > > transaction." > > > > > > I think it is a good idea to serialize the change to the file in this > > > case to avoid deadlocks, but why does the parallel worker need to wait > > > till the transaction commits to reading the file? I mean we can > > > switch the serialize state and make a parallel worker pull changes > > > from the file and if the parallel worker has caught up with the > > > changes then it can again change the state to "share memory" and now > > > the apply worker can again start sending through shared memory. > > > > > > I think generally streaming transactions are large and it is possible > > > that the shared memory queue gets full because of a lot of changes for > > > a particular transaction but later when the load switches to the other > > > transactions then it would be quite common for the worker to catch up > > > with the changes then it better to again take advantage of using > > > memory. Otherwise, in this case, we are just wasting resources > > > (worker/shared memory queue) but still writing in the file. > > > > > > > Note that there is a certain threshold timeout for which we wait > > before switching to serialize mode and normally it happens only when > > PA starts waiting on some lock acquired by the backend. Now, apart > > from that even if we decide to switch modes, the current BufFile > > mechanism doesn't have a good way for that. It doesn't allow two > > processes to open the same buffile at the same time which means we > > need to maintain multiple files to achieve the mode where we can > > switch back from serialize mode. We cannot let LA wait for PA to close > > the file as that could introduce another kind of deadlock. For > > details, see the discussion in the email [1]. The other problem is > > that we have no way to deal with partially sent data via a shared > > memory queue. Say, if we timeout while sending the data, we have to > > resend the same message until it succeeds which will be tricky because > > we can't keep retrying as that can lead to deadlock. I think if we try > > to build this new mode, it will be a lot of effort without equivalent > > returns. In common cases, we didn't see that we time out and switch to > > serialize mode. It is mostly in cases where PA starts to wait for the > > lock acquired by other backend or the machine is slow enough to deal > > with the number of parallel apply workers. So, it doesn't seem worth > > adding more complexity to the first version but we don't rule out the > > possibility of the same in the future if we really see such cases are > > common. > > > > [1] - > > https://www.postgresql.org/message-id/CAD21AoDScLvLT8JBfu5WaGCPQs_qhxsybMT%2BsMXJ%3DQrDMTyr9w%40mail.gmail.com > > Okay, I see. And once we change to serialize mode we can't release > the worker as well because we have already applied partial changes > under some transaction from a PA so we can not apply remaining from > the LA. I understand it might introduce a lot of complex design to > change it back to parallel apply mode but my only worry is that in > such cases we will be holding on to the parallel worker just to wait > till commit to reading from the spool file. But as you said it should > not be very common case so maybe this is fine. >
Right and as said previously if required (which is not clear at this stage) we can develop it in the later version as well. -- With Regards, Amit Kapila.