On Wed, 14 Oct 2020 at 10:16, Kyotaro Horiguchi <horikyota....@gmail.com> wrote: > > At Tue, 13 Oct 2020 11:56:51 +0900, Masahiko Sawada > <masahiko.saw...@2ndquadrant.com> wrote in > > On Tue, 13 Oct 2020 at 10:00, Kyotaro Horiguchi <horikyota....@gmail.com> > > wrote: > > > > > > At Fri, 9 Oct 2020 21:45:57 +0900, Masahiko Sawada > > > <masahiko.saw...@2ndquadrant.com> wrote in > > > > On Fri, 9 Oct 2020 at 14:55, Kyotaro Horiguchi > > > > <horikyota....@gmail.com> wrote: > > > > > > > > > > At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.ta...@fujitsu.com" > > > > > <tsunakawa.ta...@fujitsu.com> wrote in > > > > > > From: Masahiko Sawada <masahiko.saw...@2ndquadrant.com> > > > > > > > What about temporary network failures? I think there are users who > > > > > > > don't want to give up resolving foreign transactions failed due > > > > > > > to a > > > > > > > temporary network failure. Or even they might want to wait for > > > > > > > transaction completion until they send a cancel request. If we > > > > > > > want to > > > > > > > call the commit routine only once and therefore want FDW to retry > > > > > > > connecting the foreign server within the call, it means we > > > > > > > require all > > > > > > > FDW implementors to write a retry loop code that is interruptible > > > > > > > and > > > > > > > ensures not to raise an error, which increases difficulty. > > > > > > > > > > > > > > Yes, but if we don’t retry to resolve foreign transactions at all > > > > > > > on > > > > > > > an unreliable network environment, the user might end up requiring > > > > > > > every transaction to check the status of foreign transactions of > > > > > > > the > > > > > > > previous distributed transaction before starts. If we allow to do > > > > > > > retry, I guess we ease that somewhat. > > > > > > > > > > > > OK. As I said, I'm not against trying to cope with temporary > > > > > > network failure. I just don't think it's mandatory. If the > > > > > > network failure is really temporary and thus recovers soon, then > > > > > > the resolver will be able to commit the transaction soon, too. > > > > > > > > > > I should missing something, though... > > > > > > > > > > I don't understand why we hate ERRORs from fdw-2pc-commit routine so > > > > > much. I think remote-commits should be performed before local commit > > > > > passes the point-of-no-return and the v26-0002 actually places > > > > > AtEOXact_FdwXact() before the critical section. > > > > > > > > > > > > > So you're thinking the following sequence? > > > > > > > > 1. Prepare all foreign transactions. > > > > 2. Commit the all prepared foreign transactions. > > > > 3. Commit the local transaction. > > > > > > > > Suppose we have the backend process call the commit routine, what if > > > > one of FDW raises an ERROR during committing the foreign transaction > > > > after committing other foreign transactions? The transaction will end > > > > up with an abort but some foreign transactions are already committed. > > > > > > Ok, I understand what you are aiming. > > > > > > It is apparently out of the focus of the two-phase commit > > > protocol. Each FDW server can try to keep the contract as far as its > > > ability reaches, but in the end such kind of failure is > > > inevitable. Even if we require FDW developers not to respond until a > > > 2pc-commit succeeds, that just leads the whole FDW-cluster to freeze > > > even not in an extremely bad case. > > > > > > We have no other choices than shutting the server down (then the > > > succeeding server start removes the garbage commits) or continueing > > > working leaving some information in a system storage (or reverting the > > > garbage commits). What we can do in that case is to provide a > > > automated way to resolve the inconsistency. > > > > > > > Also, what if the backend process failed to commit the local > > > > transaction? Since it already committed all foreign transactions it > > > > cannot ensure the global atomicity in this case too. Therefore, I > > > > think we should commit the distributed transactions in the following > > > > sequence: > > > > > > Ditto. It's out of the range of 2pc. Using p2c for local transaction > > > could reduce that kind of failure but I'm not sure. 3pc, 4pc ...npc > > > could reduce the probability but can't elimite failure cases. > > > > IMO the problems I mentioned arise from the fact that the above > > sequence doesn't really follow the 2pc protocol in the first place. > > > > We can think of the fact that we commit the local transaction without > > preparation while preparing foreign transactions as that we’re using > > the 2pc with last resource transaction optimization (or last agent > > optimization)[1]. That is, we prepare all foreign transactions first > > and the local node is always the last resource to process. At this > > time, the outcome of the distributed transaction completely depends on > > the fate of the last resource (i.g., the local transaction). If it > > fails, the distributed transaction must be abort by rolling back > > prepared foreign transactions. OTOH, if it succeeds, all prepared > > foreign transaction must be committed. Therefore, we don’t need to > > prepare the last resource and can commit it. In this way, if we want > > There are cases of commit-failure of a local transaction caused by > too-many notifications or by serialization failure.
Yes, even if that happens we are still able to rollback all foreign transactions. > > > to commit the local transaction without preparation, the local > > transaction must be committed at last. But since the above sequence > > doesn’t follow this protocol, we will have such problems. I think if > > we follow the 2pc properly, such basic failures don't happen. > > True. But I haven't suggested that sequence. Okay, I might have missed your point. Could you elaborate on the idea you mentioned before, "I think remote-commits should be performed before local commit passes the point-of-no-return"? > > > > > 1. Prepare all foreign transactions. > > > > 2. Commit the local transaction. > > > > 3. Commit the all prepared foreign transactions. > > > > > > > > But this is still not a perfect solution. If we have the backend > > > > > > 2pc is not a perfect solution in the first place. Attaching a similar > > > phase to it cannot make it "perfect". > > > > > > > process call the commit routine and an error happens during executing > > > > the commit routine of an FDW (i.g., at step 3) it's too late to report > > > > an error to the client because we already committed the local > > > > transaction. So the current solution is to have a background process > > > > commit the foreign transactions so that the backend can just wait > > > > without the possibility of errors. > > > > > > Whatever process tries to complete a transaction, the client must wait > > > for the transaction to end and anyway that's just a freeze in the > > > client's view, unless you intended to respond to local commit before > > > all participant complete. > > > > Yes, but the point of using a separate process is that even if FDW > > code raises an error, the client wanting for transaction resolution > > doesn't get it and it's interruptible. > > > > [1] https://docs.oracle.com/cd/E13222_01/wls/docs91/jta/llr.html > > I don't get the point. If FDW-commit is called on the same process, an > error from FDW-commit outright leads to the failure of the current > commit. Isn't "the client wanting for transaction resolution" the > client of the leader process of the 2pc-commit in the same-process > model? > > I should missing something, but postgres_fdw allows query cancelation > at commit time. (But I think it is depends on timing whether the > remote commit is completed or aborted.). Perhaps the feature was > introduced after the project started? > > > commit ae9bfc5d65123aaa0d1cca9988037489760bdeae > > Author: Robert Haas <rh...@postgresql.org> > > Date: Wed Jun 7 15:14:55 2017 -0400 > > > > postgres_fdw: Allow cancellation of transaction control commands. > > I thought that we are discussing on fdw-errors during the 2pc-commit > phase. > Yes, I'm also discussing on fdw-errors during the 2pc-commit phase that happens after committing the local transaction. Even if FDW-commit raises an error due to the user's cancel request or whatever reason during committing the prepared foreign transactions, it's too late. The client will get an error like "ERROR: canceling statement due to user request" and would think the transaction is aborted but it's not true, the local transaction is already committed. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services