On Fri, May 21, 2021 at 12:45 PM Masahiro Ikeda <ikeda...@oss.nttdata.com> wrote: > > > > On 2021/05/21 10:39, Masahiko Sawada wrote: > > On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikeda...@oss.nttdata.com> > > wrote: > >> > >> > >> On 2021/05/11 13:37, Masahiko Sawada wrote: > >>> I've attached the updated patches that incorporated comments from > >>> Zhihong and Ikeda-san. > >> > >> Thanks for updating the patches! > >> > >> > >> I have other comments including trivial things. > >> > >> > >> a. about "foreign_transaction_resolver_timeout" parameter > >> > >> Now, the default value of "foreign_transaction_resolver_timeout" is 60 > >> secs. > >> Is there any reason? Although the following is minor case, it may confuse > >> some > >> users. > >> > >> Example case is that > >> > >> 1. a client executes transaction with 2PC when the resolver is processing > >> FdwXactResolverProcessInDoubtXacts(). > >> > >> 2. the resolution of 1st transaction must be waited until the other > >> transactions for 2pc are executed or timeout. > >> > >> 3. if the client check the 1st result value, it should wait until > >> resolution > >> is finished for atomic visibility (although it depends on the way how to > >> realize atomic visibility.) The clients may be waited > >> foreign_transaction_resolver_timeout". Users may think it's stale. > >> > >> Like this situation can be observed after testing with pgbench. Some > >> unresolved transaction remains after benchmarking. > >> > >> I assume that this default value refers to wal_sender, archiver, and so on. > >> But, I think this parameter is more like "commit_delay". If so, 60 seconds > >> seems to be big value. > > > > IIUC this situation seems like the foreign transaction resolution is > > bottle-neck and doesn’t catch up to incoming resolution requests. But > > how foreignt_transaction_resolver_timeout relates to this situation? > > foreign_transaction_resolver_timeout controls when to terminate the > > resolver process that doesn't have any foreign transactions to > > resolve. So if we set it several milliseconds, resolver processes are > > terminated immediately after each resolution, imposing the cost of > > launching resolver processes on the next resolution. > > Thanks for your comments! > > No, this situation is not related to the foreign transaction resolution is > bottle-neck or not. This issue may happen when the workload has very few > foreign transactions. > > If new foreign transaction comes while the transaction resolver is processing > resolutions via FdwXactResolverProcessInDoubtXacts(), the foreign transaction > waits until starting next transaction resolution. If next foreign transaction > doesn't come, the foreign transaction must wait starting resolution until > timeout. I mentioned this situation.
Thanks for your explanation. I think that in this case we should set the latch of the resolver after preparing all foreign transactions so that the resolver process those transactions without sleep. > > Thanks for letting me know the side effect if setting resolution timeout to > several milliseconds. I agree. But, why termination is needed? Is there a > possibility to stale like walsender? The purpose of this timeout is to terminate resolvers that are idle for a long time. The resolver processes don't necessarily need to keep running all the time for every database. On the other hand, launching a resolver process per commit would be a high cost. So we have resolver processes keep running at least for foreign_transaction_resolver_timeout. > > > >> > >> > >> b. about performance bottleneck (just share my simple benchmark results) > >> > >> The resolver process can be performance bottleneck easily although I think > >> some users want this feature even if the performance is not so good. > >> > >> I tested with very simple workload in my laptop. > >> > >> The test condition is > >> * two remote foreign partitions and one transaction inserts an entry in > >> each > >> partitions. > >> * local connection only. If NW latency became higher, the performance > >> became > >> worse. > >> * pgbench with 8 clients. > >> > >> The test results is the following. The performance of 2PC is only 10% > >> performance of the one of without 2PC. > >> > >> * with foreign_twophase_commit = requried > >> -> If load with more than 10TPS, the number of unresolved foreign > >> transactions > >> is increasing and stop with the warning "Increase > >> max_prepared_foreign_transactions". > > > > What was the value of max_prepared_foreign_transactions? > > Now, I tested with 200. > > If each resolution is finished very soon, I thought it's enough because > 8clients x 2partitions = 16, though... But, it's difficult how to know the > stable values. During resolving one distributed transaction, the resolver needs both one round trip and fsync-ing WAL record for each foreign transaction. Since the client doesn’t wait for the distributed transaction to be resolved, the resolver process can be easily bottle-neck given there are 8 clients. If foreign transaction resolution was resolved synchronously, 16 would suffice. > > > > To speed up the foreign transaction resolution, some ideas have been > > discussed. As another idea, how about launching resolvers for each > > foreign server? That way, we resolve foreign transactions on each > > foreign server in parallel. If foreign transactions are concentrated > > on the particular server, we can have multiple resolvers for the one > > foreign server. It doesn’t change the fact that all foreign > > transaction resolutions are processed by resolver processes. > > Awesome! There seems to be another pros that even if a foreign server is > temporarily busy or stopped due to fail over, other foreign server's > transactions can be resolved. Yes. We also might need to be careful about the order of foreign transaction resolution. I think we need to resolve foreign transactions in arrival order at least within a foreign server. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/