On Mon, Dec 26, 2022 at 7:35 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > In the commit message, there is a statement like this > > "However, if the leader apply worker times out while attempting to > send a message to the > parallel apply worker, it will switch to "partial serialize" mode - in this > mode the leader serializes all remaining changes to a file and notifies the > parallel apply workers to read and apply them at the end of the transaction." > > I think it is a good idea to serialize the change to the file in this > case to avoid deadlocks, but why does the parallel worker need to wait > till the transaction commits to reading the file? I mean we can > switch the serialize state and make a parallel worker pull changes > from the file and if the parallel worker has caught up with the > changes then it can again change the state to "share memory" and now > the apply worker can again start sending through shared memory. > > I think generally streaming transactions are large and it is possible > that the shared memory queue gets full because of a lot of changes for > a particular transaction but later when the load switches to the other > transactions then it would be quite common for the worker to catch up > with the changes then it better to again take advantage of using > memory. Otherwise, in this case, we are just wasting resources > (worker/shared memory queue) but still writing in the file. >
Note that there is a certain threshold timeout for which we wait before switching to serialize mode and normally it happens only when PA starts waiting on some lock acquired by the backend. Now, apart from that even if we decide to switch modes, the current BufFile mechanism doesn't have a good way for that. It doesn't allow two processes to open the same buffile at the same time which means we need to maintain multiple files to achieve the mode where we can switch back from serialize mode. We cannot let LA wait for PA to close the file as that could introduce another kind of deadlock. For details, see the discussion in the email [1]. The other problem is that we have no way to deal with partially sent data via a shared memory queue. Say, if we timeout while sending the data, we have to resend the same message until it succeeds which will be tricky because we can't keep retrying as that can lead to deadlock. I think if we try to build this new mode, it will be a lot of effort without equivalent returns. In common cases, we didn't see that we time out and switch to serialize mode. It is mostly in cases where PA starts to wait for the lock acquired by other backend or the machine is slow enough to deal with the number of parallel apply workers. So, it doesn't seem worth adding more complexity to the first version but we don't rule out the possibility of the same in the future if we really see such cases are common. [1] - https://www.postgresql.org/message-id/CAD21AoDScLvLT8JBfu5WaGCPQs_qhxsybMT%2BsMXJ%3DQrDMTyr9w%40mail.gmail.com -- With Regards, Amit Kapila.