Re: Transactions involving multiple postgres foreign servers, take 2

Masahiko Sawada Tue, 13 Oct 2020 20:11:01 -0700

On Wed, 14 Oct 2020 at 10:16, Kyotaro Horiguchi <horikyota....@gmail.com> wrote:
>
> At Tue, 13 Oct 2020 11:56:51 +0900, Masahiko Sawada 
> <masahiko.saw...@2ndquadrant.com> wrote in
> > On Tue, 13 Oct 2020 at 10:00, Kyotaro Horiguchi <horikyota....@gmail.com> 
> > wrote:
> > >
> > > At Fri, 9 Oct 2020 21:45:57 +0900, Masahiko Sawada 
> > > <masahiko.saw...@2ndquadrant.com> wrote in
> > > > On Fri, 9 Oct 2020 at 14:55, Kyotaro Horiguchi 
> > > > <horikyota....@gmail.com> wrote:
> > > > >
> > > > > At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa.ta...@fujitsu.com" 
> > > > > <tsunakawa.ta...@fujitsu.com> wrote in
> > > > > > From: Masahiko Sawada <masahiko.saw...@2ndquadrant.com>
> > > > > > > What about temporary network failures? I think there are users who
> > > > > > > don't want to give up resolving foreign transactions failed due 
> > > > > > > to a
> > > > > > > temporary network failure. Or even they might want to wait for
> > > > > > > transaction completion until they send a cancel request. If we 
> > > > > > > want to
> > > > > > > call the commit routine only once and therefore want FDW to retry
> > > > > > > connecting the foreign server within the call, it means we 
> > > > > > > require all
> > > > > > > FDW implementors to write a retry loop code that is interruptible 
> > > > > > > and
> > > > > > > ensures not to raise an error, which increases difficulty.
> > > > > > >
> > > > > > > Yes, but if we don’t retry to resolve foreign transactions at all 
> > > > > > > on
> > > > > > > an unreliable network environment, the user might end up requiring
> > > > > > > every transaction to check the status of foreign transactions of 
> > > > > > > the
> > > > > > > previous distributed transaction before starts. If we allow to do
> > > > > > > retry, I guess we ease that somewhat.
> > > > > >
> > > > > > OK.  As I said, I'm not against trying to cope with temporary 
> > > > > > network failure.  I just don't think it's mandatory.  If the 
> > > > > > network failure is really temporary and thus recovers soon, then 
> > > > > > the resolver will be able to commit the transaction soon, too.
> > > > >
> > > > > I should missing something, though...
> > > > >
> > > > > I don't understand why we hate ERRORs from fdw-2pc-commit routine so
> > > > > much. I think remote-commits should be performed before local commit
> > > > > passes the point-of-no-return and the v26-0002 actually places
> > > > > AtEOXact_FdwXact() before the critical section.
> > > > >
> > > >
> > > > So you're thinking the following sequence?
> > > >
> > > > 1. Prepare all foreign transactions.
> > > > 2. Commit the all prepared foreign transactions.
> > > > 3. Commit the local transaction.
> > > >
> > > > Suppose we have the backend process call the commit routine, what if
> > > > one of FDW raises an ERROR during committing the foreign transaction
> > > > after committing other foreign transactions? The transaction will end
> > > > up with an abort but some foreign transactions are already committed.
> > >
> > > Ok, I understand what you are aiming.
> > >
> > > It is apparently out of the focus of the two-phase commit
> > > protocol. Each FDW server can try to keep the contract as far as its
> > > ability reaches, but in the end such kind of failure is
> > > inevitable. Even if we require FDW developers not to respond until a
> > > 2pc-commit succeeds, that just leads the whole FDW-cluster to freeze
> > > even not in an extremely bad case.
> > >
> > > We have no other choices than shutting the server down (then the
> > > succeeding server start removes the garbage commits) or continueing
> > > working leaving some information in a system storage (or reverting the
> > > garbage commits). What we can do in that case is to provide a
> > > automated way to resolve the inconsistency.
> > >
> > > > Also, what if the backend process failed to commit the local
> > > > transaction? Since it already committed all foreign transactions it
> > > > cannot ensure the global atomicity in this case too. Therefore, I
> > > > think we should commit the distributed transactions in the following
> > > > sequence:
> > >
> > > Ditto. It's out of the range of 2pc. Using p2c for local transaction
> > > could reduce that kind of failure but I'm not sure. 3pc, 4pc ...npc
> > > could reduce the probability but can't elimite failure cases.
> >
> > IMO the problems I mentioned arise from the fact that the above
> > sequence doesn't really follow the 2pc protocol in the first place.
> >
> > We can think of the fact that we commit the local transaction without
> > preparation while preparing foreign transactions as that we’re using
> > the 2pc with last resource transaction optimization (or last agent
> > optimization)[1]. That is, we prepare all foreign transactions first
> > and the local node is always the last resource to process. At this
> > time, the outcome of the distributed transaction completely depends on
> > the fate of the last resource (i.g., the local transaction). If it
> > fails, the distributed transaction must be abort by rolling back
> > prepared foreign transactions. OTOH, if it succeeds, all prepared
> > foreign transaction must be committed. Therefore, we don’t need to
> > prepare the last resource and can commit it. In this way, if we want
>
> There are cases of commit-failure of a local transaction caused by
> too-many notifications or by serialization failure.


Yes, even if that happens we are still able to rollback all foreign
transactions.

>
> > to commit the local transaction without preparation, the local
> > transaction must be committed at last. But since the above sequence
> > doesn’t follow this protocol, we will have such problems. I think if
> > we follow the 2pc properly, such basic failures don't happen.
>
> True. But I haven't suggested that sequence.

Okay, I might have missed your point. Could you elaborate on the idea
you mentioned before, "I think remote-commits should be performed
before local commit passes the point-of-no-return"?

>
> > > > 1. Prepare all foreign transactions.
> > > > 2. Commit the local transaction.
> > > > 3. Commit the all prepared foreign transactions.
> > > >
> > > > But this is still not a perfect solution. If we have the backend
> > >
> > > 2pc is not a perfect solution in the first place. Attaching a similar
> > > phase to it cannot make it "perfect".
> > >
> > > > process call the commit routine and an error happens during executing
> > > > the commit routine of an FDW (i.g., at step 3) it's too late to report
> > > > an error to the client because we already committed the local
> > > > transaction. So the current solution is to have a background process
> > > > commit the foreign transactions so that the backend can just wait
> > > > without the possibility of errors.
> > >
> > > Whatever process tries to complete a transaction, the client must wait
> > > for the transaction to end and anyway that's just a freeze in the
> > > client's view, unless you intended to respond to local commit before
> > > all participant complete.
> >
> > Yes, but the point of using a separate process is that even if FDW
> > code raises an error, the client wanting for transaction resolution
> > doesn't get it and it's interruptible.
> >
> > [1] https://docs.oracle.com/cd/E13222_01/wls/docs91/jta/llr.html
>
> I don't get the point. If FDW-commit is called on the same process, an
> error from FDW-commit outright leads to the failure of the current
> commit.  Isn't "the client wanting for transaction resolution" the
> client of the leader process of the 2pc-commit in the same-process
> model?
>
> I should missing something, but postgres_fdw allows query cancelation
> at commit time. (But I think it is depends on timing whether the
> remote commit is completed or aborted.).  Perhaps the feature was
> introduced after the project started?
>
> > commit ae9bfc5d65123aaa0d1cca9988037489760bdeae
> > Author: Robert Haas <rh...@postgresql.org>
> > Date:   Wed Jun 7 15:14:55 2017 -0400
> >
> >     postgres_fdw: Allow cancellation of transaction control commands.
>
> I thought that we are discussing on fdw-errors during the 2pc-commit
> phase.
>

Yes, I'm also discussing on fdw-errors during the 2pc-commit phase
that happens after committing the local transaction.

Even if FDW-commit raises an error due to the user's cancel request or
whatever reason during committing the prepared foreign transactions,
it's too late. The client will get an error like "ERROR:  canceling
statement due to user request" and would think the transaction is
aborted but it's not true, the local transaction is already committed.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Transactions involving multiple postgres foreign servers, take 2

Reply via email to