At Tue, 13 Oct 2020 11:56:51 +0900, Masahiko Sawada <masahiko.saw...@2ndquadrant.com> wrote in > On Tue, 13 Oct 2020 at 10:00, Kyotaro Horiguchi <horikyota....@gmail.com> > wrote: > > > > At Fri, 9 Oct 2020 21:45:57 +0900, Masahiko Sawada > > <masahiko.saw...@2ndquadrant.com> wrote in > > > On Fri, 9 Oct 2020 at 14:55, Kyotaro Horiguchi <horikyota....@gmail.com> > > > wrote: > > > > > > > > At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.ta...@fujitsu.com" > > > > <tsunakawa.ta...@fujitsu.com> wrote in > > > > > From: Masahiko Sawada <masahiko.saw...@2ndquadrant.com> > > > > > > What about temporary network failures? I think there are users who > > > > > > don't want to give up resolving foreign transactions failed due to a > > > > > > temporary network failure. Or even they might want to wait for > > > > > > transaction completion until they send a cancel request. If we want > > > > > > to > > > > > > call the commit routine only once and therefore want FDW to retry > > > > > > connecting the foreign server within the call, it means we require > > > > > > all > > > > > > FDW implementors to write a retry loop code that is interruptible > > > > > > and > > > > > > ensures not to raise an error, which increases difficulty. > > > > > > > > > > > > Yes, but if we don’t retry to resolve foreign transactions at all on > > > > > > an unreliable network environment, the user might end up requiring > > > > > > every transaction to check the status of foreign transactions of the > > > > > > previous distributed transaction before starts. If we allow to do > > > > > > retry, I guess we ease that somewhat. > > > > > > > > > > OK. As I said, I'm not against trying to cope with temporary network > > > > > failure. I just don't think it's mandatory. If the network failure > > > > > is really temporary and thus recovers soon, then the resolver will be > > > > > able to commit the transaction soon, too. > > > > > > > > I should missing something, though... > > > > > > > > I don't understand why we hate ERRORs from fdw-2pc-commit routine so > > > > much. I think remote-commits should be performed before local commit > > > > passes the point-of-no-return and the v26-0002 actually places > > > > AtEOXact_FdwXact() before the critical section. > > > > > > > > > > So you're thinking the following sequence? > > > > > > 1. Prepare all foreign transactions. > > > 2. Commit the all prepared foreign transactions. > > > 3. Commit the local transaction. > > > > > > Suppose we have the backend process call the commit routine, what if > > > one of FDW raises an ERROR during committing the foreign transaction > > > after committing other foreign transactions? The transaction will end > > > up with an abort but some foreign transactions are already committed. > > > > Ok, I understand what you are aiming. > > > > It is apparently out of the focus of the two-phase commit > > protocol. Each FDW server can try to keep the contract as far as its > > ability reaches, but in the end such kind of failure is > > inevitable. Even if we require FDW developers not to respond until a > > 2pc-commit succeeds, that just leads the whole FDW-cluster to freeze > > even not in an extremely bad case. > > > > We have no other choices than shutting the server down (then the > > succeeding server start removes the garbage commits) or continueing > > working leaving some information in a system storage (or reverting the > > garbage commits). What we can do in that case is to provide a > > automated way to resolve the inconsistency. > > > > > Also, what if the backend process failed to commit the local > > > transaction? Since it already committed all foreign transactions it > > > cannot ensure the global atomicity in this case too. Therefore, I > > > think we should commit the distributed transactions in the following > > > sequence: > > > > Ditto. It's out of the range of 2pc. Using p2c for local transaction > > could reduce that kind of failure but I'm not sure. 3pc, 4pc ...npc > > could reduce the probability but can't elimite failure cases. > > IMO the problems I mentioned arise from the fact that the above > sequence doesn't really follow the 2pc protocol in the first place. > > We can think of the fact that we commit the local transaction without > preparation while preparing foreign transactions as that we’re using > the 2pc with last resource transaction optimization (or last agent > optimization)[1]. That is, we prepare all foreign transactions first > and the local node is always the last resource to process. At this > time, the outcome of the distributed transaction completely depends on > the fate of the last resource (i.g., the local transaction). If it > fails, the distributed transaction must be abort by rolling back > prepared foreign transactions. OTOH, if it succeeds, all prepared > foreign transaction must be committed. Therefore, we don’t need to > prepare the last resource and can commit it. In this way, if we want
There are cases of commit-failure of a local transaction caused by too-many notifications or by serialization failure. > to commit the local transaction without preparation, the local > transaction must be committed at last. But since the above sequence > doesn’t follow this protocol, we will have such problems. I think if > we follow the 2pc properly, such basic failures don't happen. True. But I haven't suggested that sequence. > > > 1. Prepare all foreign transactions. > > > 2. Commit the local transaction. > > > 3. Commit the all prepared foreign transactions. > > > > > > But this is still not a perfect solution. If we have the backend > > > > 2pc is not a perfect solution in the first place. Attaching a similar > > phase to it cannot make it "perfect". > > > > > process call the commit routine and an error happens during executing > > > the commit routine of an FDW (i.g., at step 3) it's too late to report > > > an error to the client because we already committed the local > > > transaction. So the current solution is to have a background process > > > commit the foreign transactions so that the backend can just wait > > > without the possibility of errors. > > > > Whatever process tries to complete a transaction, the client must wait > > for the transaction to end and anyway that's just a freeze in the > > client's view, unless you intended to respond to local commit before > > all participant complete. > > Yes, but the point of using a separate process is that even if FDW > code raises an error, the client wanting for transaction resolution > doesn't get it and it's interruptible. > > [1] https://docs.oracle.com/cd/E13222_01/wls/docs91/jta/llr.html I don't get the point. If FDW-commit is called on the same process, an error from FDW-commit outright leads to the failure of the current commit. Isn't "the client wanting for transaction resolution" the client of the leader process of the 2pc-commit in the same-process model? I should missing something, but postgres_fdw allows query cancelation at commit time. (But I think it is depends on timing whether the remote commit is completed or aborted.). Perhaps the feature was introduced after the project started? > commit ae9bfc5d65123aaa0d1cca9988037489760bdeae > Author: Robert Haas <rh...@postgresql.org> > Date: Wed Jun 7 15:14:55 2017 -0400 > > postgres_fdw: Allow cancellation of transaction control commands. I thought that we are discussing on fdw-errors during the 2pc-commit phase. regards. -- Kyotaro Horiguchi NTT Open Source Software Center