Re: logical decoding and replication of sequences, take 2

Tomas Vondra Wed, 06 Dec 2023 06:18:46 -0800

On 12/6/23 09:56, Amit Kapila wrote:
> On Tue, Dec 5, 2023 at 10:23 PM Tomas Vondra
> <[email protected]> wrote:
>>
>> On 12/5/23 13:17, Amit Kapila wrote:
>>
>>> (b) for transactional
>>> cases, we see overhead due to traversing all the top-level txns and
>>> check the hash table for each one to find whether change is
>>> transactional.
>>>
>>
>> Not really, no. As I explained in my preceding e-mail, this check makes
>> almost no difference - I did expect it to matter, but it doesn't. And I
>> was a bit disappointed the global hash table didn't move the needle.
>>
>> Most of the time is spent in
>>
>>     78.81%     0.00%  postgres  postgres  [.] DecodeCommit (inlined)
>>       |
>>       ---DecodeCommit (inlined)
>>          |
>>          |--72.65%--SnapBuildCommitTxn
>>          |     |
>>          |      --72.61%--SnapBuildBuildSnapshot
>>          |            |
>>          |             --72.09%--pg_qsort
>>          |                    |
>>          |                    |--66.24%--pg_qsort
>>          |                    |          |
>>
>> And there's almost no difference between master and build with sequence
>> decoding - see the attached diff-alter-sequence.perf, comparing the two
>> branches (perf diff -c delta-abs).
>>
> 
> I think in this the commit time predominates which hides the overhead.
> We didn't investigate in detail if that can be improved but if we see
> a similar case of abort [1], it shows the overhead of
> ReorderBufferSequenceIsTransactional(). I understand that aborts won't
> be frequent and it is sort of unrealistic test but still helps to show
> that there is overhead in ReorderBufferSequenceIsTransactional(). Now,
> I am not sure if we can ignore that case because theoretically, the
> overhead can increase based on the number of top-level transactions.
> 
> [1]: 
> https://www.postgresql.org/message-id/TY3PR01MB9889D457278B254CA87D1325F581A%40TY3PR01MB9889.jpnprd01.prod.outlook.com
>


But those profiles were with the "old" patch, with one hash table per
top-level transaction. I see nothing like that with the patch [1] that
replaces that with a single global hash table. With that patch, the
ReorderBufferSequenceIsTransactional() took ~0.5% in any tests I did.

What did have bigger impact is this:

    46.12%   1.47%  postgres [.] pg_logical_slot_get_changes_guts
      |
      |--45.12%--pg_logical_slot_get_changes_guts
      |    |
      |    |--42.34%--LogicalDecodingProcessRecord
      |    |    |
      |    |    |--12.82%--xact_decode
      |    |    |    |
      |    |    |    |--9.46%--DecodeAbort (inlined)
      |    |    |    |   |
      |    |    |    |   |--8.44%--ReorderBufferCleanupTXN
      |    |    |    |   |   |
      |    |    |    |   |   |--3.25%--ReorderBufferSequenceCleanup (in)
      |    |    |    |   |   |   |
      |    |    |    |   |   |   |--1.59%--hash_seq_search
      |    |    |    |   |   |   |
      |    |    |    |   |   |   |--0.80%--hash_search_with_hash_value
      |    |    |    |   |   |   |
      |    |    |    |   |   |    --0.59%--hash_search
      |    |    |    |   |   |              hash_bytes

I guess that could be optimized, but it's also a direct consequence of
the huge number of aborts for transactions that create relfilenode. For
any other workload this will be negligible.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: logical decoding and replication of sequences, take 2

Reply via email to