On Thu, Feb 22, 2024 at 6:59 PM Давыдов Виталий
<v.davy...@postgrespro.ru> wrote:
>
> I'd like to present and talk about a problem when 2PC transactions are 
> applied quite slowly on a replica during logical replication. There is a 
> master and a replica with established logical replication from the master to 
> the replica with twophase = true. With some load level on the master, the 
> replica starts to lag behind the master, and the lag will be increasing. We 
> have to significantly decrease the load on the master to allow replica to 
> complete the catchup. Such problem may create significant difficulties in the 
> production. The problem appears at least on REL_16_STABLE branch.
>
> To reproduce the problem:
>
> Setup logical replication from master to replica with subscription parameter 
> twophase =  true.
> Create some intermediate load on the master (use pgbench with custom sql with 
> prepare+commit)
> Optionally switch off the replica for some time (keep load on master).
> Switch on the replica and wait until it reaches the master.
>
> The replica will never reach the master with even some low load on the 
> master. If to remove the load, the replica will reach the master for much 
> greater time, than expected. I tried the same for regular transactions, but 
> such problem doesn't appear even with a decent load.
>
> I think, the main proplem of 2PC catchup bad performance - the lack of 
> asynchronous commit support for 2PC. For regular transactions asynchronous 
> commit is used on the replica by default (subscrition sycnronous_commit = 
> off). It allows the replication worker process on the replica to avoid fsync 
> (XLogFLush) and to utilize 100% CPU (the background wal writer or 
> checkpointer will do fsync). I agree, 2PC are mostly used in multimaster 
> configurations with two or more nodes which are performed synchronously, but 
> when the node in catchup (node is not online in a multimaster cluster), 
> asynchronous commit have to be used to speedup the catchup.
>

I don't see we do anything specific for 2PC transactions to make them
behave differently than regular transactions with respect to
synchronous_commit setting. What makes you think so? Can you pin point
the code you are referring to?

> There is another thing that affects on the disbalance of the master and 
> replica performance. When the master executes requestes from multiple 
> clients, there is a fsync optimization takes place in XLogFlush. It allows to 
> decrease the number of fsync in case when a number of parallel backends write 
> to the WAL simultaneously. The replica applies received transactions in one 
> thread sequentially, such optimization is not applied.
>

Right, I think for this we need to implement parallel apply.

> I see some possible solutions:
>
> Implement asyncronous commit for 2PC transactions.
> Do some hacking with enableFsync when it is possible.
>

Can you be a bit more specific about what exactly you have in mind to
achieve the above solutions?

-- 
With Regards,
Amit Kapila.


Reply via email to