On 2020-09-18 00:54, Bruce Momjian wrote:
On Tue, Sep 8, 2020 at 01:36:16PM +0300, Alexey Kondratov wrote:
Thank you for the link!
After a quick look on the Sawada-san's patch set I think that there
are two
major differences:
1. There is a built-in foreign xacts resolver in the [1], which should
be
much more convenient from the end-user perspective. It involves huge
in-core
changes and additional complexity that is of course worth of.
However, it's still not clear for me that it is possible to resolve
all
foreign prepared xacts on the Postgres' own side with a 100%
guarantee.
Imagine a situation when the coordinator node is actually a HA cluster
group
(primary + sync + async replica) and it failed just after PREPARE
stage of
after local COMMIT. In that case all foreign xacts will be left in the
prepared state. After failover process complete synchronous replica
will
become a new primary. Would it have all required info to properly
resolve
orphan prepared xacts?
Probably, this situation is handled properly in the [1], but I've not
yet
finished a thorough reading of the patch set, though it has a great
doc!
On the other hand, previous 0003 and my proposed patch rely on either
manual
resolution of hung prepared xacts or usage of external
monitor/resolver.
This approach is much simpler from the in-core perspective, but
doesn't look
as complete as [1] though.
Have we considered how someone would clean up foreign transactions if
the
coordinating server dies? Could it be done manually? Would an
external
resolver, rather than an internal one, make this easier?
Both Sawada-san's patch [1] and in this thread (e.g. mine [2]) use 2PC
with a special gid format including a xid + server identification info.
Thus, one can select from pg_prepared_xacts, get xid and coordinator
info, then use txid_status() on the coordinator (or ex-coordinator) to
get transaction status and finally either commit or abort these stale
prepared xacts. Of course this could be wrapped into some user-level
support routines as it is done in the [1].
As for the benefits of using an external resolver, I think that there
are some of them from the whole system perspective:
1) If one follows the logic above, then this resolver could be
stateless, it takes all the required info from the Postgres nodes
themselves.
2) Then you can easily put it into container, which make it easier do
deploy to all these 'cloud' stuff like kubernetes.
3) Also you can scale resolvers independently from Postgres nodes.
I do not think that either of these points is a game changer, but we use
a very simple external resolver altogether with [2] in our sharding
prototype and it works just fine so far.
[1]
https://www.postgresql.org/message-id/CA%2Bfd4k4HOVqqC5QR4H984qvD0Ca9g%3D1oLYdrJT_18zP9t%2BUsJg%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/3ef7877bfed0582019eab3d462a43275%40postgrespro.ru
--
Alexey Kondratov
Postgres Professional https://www.postgrespro.com
Russian Postgres Company