Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-03 Thread ikeda...@oss.nttdata.com


> 2021/06/04 12:28、Masahiko Sawada のメール:
> 
> On Thu, Jun 3, 2021 at 1:56 PM Masahiro Ikeda  > wrote:
>> 
>> 
>> 
>> On 2021/05/25 21:59, Masahiko Sawada wrote:
>>> On Fri, May 21, 2021 at 5:48 PM Masahiro Ikeda  
>>> wrote:
 
 On 2021/05/21 13:45, Masahiko Sawada wrote:
> 
> Yes. We also might need to be careful about the order of foreign
> transaction resolution. I think we need to resolve foreign> transactions 
> in arrival order at least within a foreign server.
 
 I agree it's better.
 
 (Although this is my interest...)
 Is it necessary? Although this idea seems to be for atomic visibility,
 2PC can't realize that as you know. So, I wondered that.
>>> 
>>> I think it's for fairness. If a foreign transaction arrived earlier
>>> gets put off so often for other foreign transactions arrived later due
>>> to its index in FdwXactCtl->xacts, it’s not understandable for users
>>> and not fair. I think it’s better to handle foreign transactions in
>>> FIFO manner (although this problem exists even in the current code).
>> 
>> OK, thanks.
>> 
>> 
>> On 2021/05/21 12:45, Masahiro Ikeda wrote:
>>> On 2021/05/21 10:39, Masahiko Sawada wrote:
 On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda 
>> wrote:
 How much is the performance without those 2PC patches and with the
 same workload? i.e., how fast is the current postgres_fdw that uses
 XactCallback?
>>> 
>>> OK, I'll test.
>> 
>> The test results are followings. But, I couldn't confirm the performance
>> improvements of 2PC patches though I may need to be changed the test 
>> condition.
>> 
>> [condition]
>> * 1 coordinator and 3 foreign servers
>> * There are two custom scripts which access different two foreign servers per
>> transaction
>> 
>> ``` fxact_select.pgbench
>> BEGIN;
>> SELECT * FROM part:p1 WHERE id = :id;
>> SELECT * FROM part:p2 WHERE id = :id;
>> COMMIT;
>> ```
>> 
>> ``` fxact_update.pgbench
>> BEGIN;
>> UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
>> UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
>> COMMIT;
>> ```
>> 
>> [results]
>> 
>> I have tested three times.
>> Performance difference seems to be within the range of errors.
>> 
>> # 6d0eb38557 with 2pc patches(v36) and foreign_twophase_commit = disable
>> - fxact_update.pgbench
>> 72.3, 74.9, 77.5  TPS  => avg 74.9 TPS
>> 110.5, 106.8, 103.2  ms => avg 106.8 ms
>> 
>> - fxact_select.pgbench
>> 1767.6, 1737.1, 1717.4 TPS  => avg 1740.7 TPS
>> 4.5, 4.6, 4.7 ms => avg 4.6ms
>> 
>> # 6d0eb38557 without 2pc patches
>> - fxact_update.pgbench
>> 76.5, 70.6, 69.5 TPS => avg 72.2 TPS
>> 104.534 + 113.244 + 115.097 => avg 111.0 ms
>> 
>> -fxact_select.pgbench
>> 1810.2, 1748.3, 1737.2 TPS => avg 1765.2 TPS
>> 4.2, 4.6, 4.6 ms=>  4.5 ms
>> 
> 
> Thank you for testing!
> 
> I think the result shows that managing foreign transactions on the
> core side would not be a problem in terms of performance.
> 
>> 
>> 
>> 
>> 
>> # About the bottleneck of the resolver process
>> 
>> I investigated the performance bottleneck of the resolver process using perf.
>> The main bottleneck is the following functions.
>> 
>> 1st. 42.8% routine->CommitForeignTransaction()
>> 2nd. 31.5% remove_fdwxact()
>> 3rd. 10.16% CommitTransaction()
>> 
>> 1st and 3rd problems can be solved by parallelizing resolver processes per
>> remote servers. But, I wondered that the idea, which backends call also
>> "COMMIT/ABORT PREPARED" and the resolver process only takes changes of
>> resolving in-doubt foreign transactions, is better. In many cases, I think
>> that the number of connections is much greater than the number of remote
>> servers. If so, the parallelization is not enough.
>> 
>> So, I think the idea which backends execute "PREPARED COMMIT" synchronously 
>> is
>> better. The citus has the 2PC feature and backends send "PREPARED COMMIT" in
>> the extension. So, this idea is not bad.
> 
> Thank you for pointing it out. This idea has been proposed several
> times and there were discussions. I'd like to summarize the proposed
> ideas and those pros and cons before replying to your other comments.
> 
> There are 3 ideas. After backend both prepares all foreign transaction
> and commit the local transaction,
> 
> 1. the backend continues attempting to commit all prepared foreign
> transactions until all of them are committed.
> 2. the backend attempts to commit all prepared foreign transactions
> once. If an error happens, leave them for the resolver.
> 3. the backend asks the resolver that launched per foreign server to
> commit the prepared foreign transactions (and backend waits or doesn't
> wait for the commit completion depending on the setting).
> 
> With ideas 1 and 2, since the backend itself commits all foreign
> transactions the resolver process cannot be a bottleneck, and probably
> the code can get more simple as backends don't need to communicate
> with resolver processes.
>

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-06 Thread ikeda...@oss.nttdata.com



> 2021/06/04 17:16、Masahiko Sawada のメール:
> 
> On Fri, Jun 4, 2021 at 3:58 PM ikeda...@oss.nttdata.com
>  wrote:
>> 
>> 
>> 
>> 2021/06/04 12:28、Masahiko Sawada のメール:
>> 
>> 
>> Thank you for pointing it out. This idea has been proposed several
>> times and there were discussions. I'd like to summarize the proposed
>> ideas and those pros and cons before replying to your other comments.
>> 
>> There are 3 ideas. After backend both prepares all foreign transaction
>> and commit the local transaction,
>> 
>> 1. the backend continues attempting to commit all prepared foreign
>> transactions until all of them are committed.
>> 2. the backend attempts to commit all prepared foreign transactions
>> once. If an error happens, leave them for the resolver.
>> 3. the backend asks the resolver that launched per foreign server to
>> commit the prepared foreign transactions (and backend waits or doesn't
>> wait for the commit completion depending on the setting).
>> 
>> With ideas 1 and 2, since the backend itself commits all foreign
>> transactions the resolver process cannot be a bottleneck, and probably
>> the code can get more simple as backends don't need to communicate
>> with resolver processes.
>> 
>> However, those have two problems we need to deal with:
>> 
>> 
>> Thanks for sharing the summarize. I understood there are problems related to
>> FDW implementation.
>> 
>> First, users could get an error if an error happens during the backend
>> committing prepared foreign transaction but the local transaction is
>> already committed and some foreign transactions could also be
>> committed, confusing users. There were two opinions to this problem:
>> FDW developers should be responsible for writing FDW code such that
>> any error doesn't happen during committing foreign transactions, and
>> users can accept that confusion since an error could happen after
>> writing the commit WAL even today without this 2PC feature. For the
>> former point, I'm not sure it's always doable since even palloc()
>> could raise an error and it seems hard to require all FDW developers
>> to understand all possible paths of raising an error. And for the
>> latter point, that's true but I think those cases are
>> should-not-happen cases (i.g., rare cases) whereas the likelihood of
>> an error during committing prepared transactions is not low (e.g., by
>> network connectivity problem). I think we need to assume that that is
>> not a rare case.
>> 
>> 
>> Hmm… Sorry, I don’t have any good ideas now.
>> 
>> If anything, I’m on second side which users accept the confusion though
>> let users know a error happens before local commit is done or not is 
>> necessary
>> because if the former case, users will execute the same query again.
> 
> Yeah, users will need to remember the XID of the last executed
> transaction and check if it has been committed by pg_xact_status().
> 
>> 
>> 
>> The second problem is whether we can cancel committing foreign
>> transactions by pg_cancel_backend() (or pressing Ctl-c). If the
>> backend process commits prepared foreign transactions, it's FDW
>> developers' responsibility to write code that is interruptible. I’m
>> not sure it’s feasible for drivers for other databases.
>> 
>> 
>> Sorry, my understanding is not clear.
>> 
>> After all prepares are done, the foreign transactions will be committed.
>> So, does this mean that FDW must leave the unresolved transaction to the 
>> transaction
>> resolver and show some messages like “Since the transaction is already 
>> committed,
>> the transaction will be resolved in background" ?
> 
> I think this would happen after the backend cancels COMMIT PREPARED.
> To be able to cancel an in-progress query the backend needs to accept
> the interruption and send the cancel request. postgres_fdw can do that
> since libpq supports sending a query and waiting for the result but
> I’m not sure about other drivers.

Thanks, I understood that handling this issue is not scope of the 2PC feature 
as Tsunakawa-san and you said, 

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION





Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-06 Thread ikeda...@oss.nttdata.com



> 2021/06/04 21:38、Masahiko Sawada のメール:
> 
> On Fri, Jun 4, 2021 at 5:16 PM Masahiko Sawada  wrote:
>> 
>> On Fri, Jun 4, 2021 at 3:58 PM ikeda...@oss.nttdata.com
>>  wrote:
>>> 
>>> 
>>> 
>>> 2021/06/04 12:28、Masahiko Sawada のメール:
>>> 
>>> 
>>> Thank you for pointing it out. This idea has been proposed several
>>> times and there were discussions. I'd like to summarize the proposed
>>> ideas and those pros and cons before replying to your other comments.
>>> 
>>> There are 3 ideas. After backend both prepares all foreign transaction
>>> and commit the local transaction,
>>> 
>>> 1. the backend continues attempting to commit all prepared foreign
>>> transactions until all of them are committed.
>>> 2. the backend attempts to commit all prepared foreign transactions
>>> once. If an error happens, leave them for the resolver.
>>> 3. the backend asks the resolver that launched per foreign server to
>>> commit the prepared foreign transactions (and backend waits or doesn't
>>> wait for the commit completion depending on the setting).
>>> 
>>> With ideas 1 and 2, since the backend itself commits all foreign
>>> transactions the resolver process cannot be a bottleneck, and probably
>>> the code can get more simple as backends don't need to communicate
>>> with resolver processes.
>>> 
>>> However, those have two problems we need to deal with:
>>> 
>>> 
>>> Thanks for sharing the summarize. I understood there are problems related to
>>> FDW implementation.
>>> 
>>> First, users could get an error if an error happens during the backend
>>> committing prepared foreign transaction but the local transaction is
>>> already committed and some foreign transactions could also be
>>> committed, confusing users. There were two opinions to this problem:
>>> FDW developers should be responsible for writing FDW code such that
>>> any error doesn't happen during committing foreign transactions, and
>>> users can accept that confusion since an error could happen after
>>> writing the commit WAL even today without this 2PC feature. For the
>>> former point, I'm not sure it's always doable since even palloc()
>>> could raise an error and it seems hard to require all FDW developers
>>> to understand all possible paths of raising an error. And for the
>>> latter point, that's true but I think those cases are
>>> should-not-happen cases (i.g., rare cases) whereas the likelihood of
>>> an error during committing prepared transactions is not low (e.g., by
>>> network connectivity problem). I think we need to assume that that is
>>> not a rare case.
>>> 
>>> 
>>> Hmm… Sorry, I don’t have any good ideas now.
>>> 
>>> If anything, I’m on second side which users accept the confusion though
>>> let users know a error happens before local commit is done or not is 
>>> necessary
>>> because if the former case, users will execute the same query again.
>> 
>> Yeah, users will need to remember the XID of the last executed
>> transaction and check if it has been committed by pg_xact_status().
> 
> As the second idea, can we send something like a hint along with the
> error (or send a new type of error) that indicates the error happened
> after the transaction commit so that the client can decide whether or
> not to ignore the error? That way, we can deal with the confusion led
> by an error raised after the local commit by the existing post-commit
> cleanup routines (and post-commit xact callbacks) as well as by FDW’s
> commit prepared routine.


I think your second idea is better because it’s easier for users to know what 
error happens and there is nothing users should do. Since the focus of "hint” 
is how to fix the problem, is it appropriate to use "context”?  

FWIF, I took a fast look to elog.c and I found there is “error_context_stack”. 
So, why don’t you add the context which shows like "the transaction fate is 
decided to COMMIT (or ROLLBACK). So, even if error happens, the transaction 
will be resolved in background” after the local commit?


Regards,

-- 
Masahiro Ikeda
NTT DATA CORPORATION