Hi, > > This is all true but note that in successful cases (where the table is > published) all the work done by FilterByTable(accessing caches, > transaction-related stuff) can add noticeable overhead as anyway we do > that later in pgoutput_change(). I think I gave the same comment > earlier as well but didn't see any satisfactory answer or performance > data for successful cases to back this proposal.
I did some benchmark yesterday at [1] and found it adds 20% cpu time. then come out a basic idea, I think it deserves a share. "transaction related stuff" comes from the syscache/systable access except the HistorySansphot. and the syscache is required in the following sistuations: 1. relfilenode (from wal) -> relid. 2. relid -> namespaceid (to check if the relid is a toast relation). 3. if toast, get its origianl relid. 4. access the data from pg_publication_tables. 5. see if the relid is a partition, if yes, we may get its root relation. Acutally we already has a RelationSyncCache for #4, and it *only* need to access syscache when replicate_valid is false, I think this case should be rare, but the caller doesn't know it, so the caller must prepare the transaction stuff in advance even in the most case they are not used. So I think we can get a optimization here. then the attached patch is made. Author: yizhi.fzh <yizhi....@alibaba-inc.com> Date: Wed Feb 21 18:40:03 2024 +0800 Make get_rel_sync_entry less depending on transaction state. get_rel_sync_entry needs transaction only a replicate_valid = false entry is found, this should be some rare case. However the caller can't know if a entry is valid, so they have to prepare the transaction state before calling this function. Such preparation is expensive. This patch makes the get_rel_sync_entry can manage a transaction stage only if necessary. so the callers don't need to prepare it blindly. Then comes to #1, acutally we have RelfilenumberMapHash as a cache, when the cache is hit (suppose this is a usual case), no transaction stuff related. I have two ideas then: 1. Optimize the cache hit sistuation like what we just did for get_rel_sync_entry for the all the 5 kinds of data and only pay the effort for cache miss case. for the data for #2, #3, #5, all the keys are relid, so I think a same HTAB should be OK. 2. add the content for #1, #2, #3, #5 to wal when wal_level is set to logical. In either case, the changes for get_rel_sync_entry should be needed. > Note, users can > configure to stream_in_progress transactions in which case they > shouldn't see such a big problem. People would see the changes is spilled to disk, but the CPU cost for Reorder should be still paid I think. [1] https://www.postgresql.org/message-id/87o7cadqj3.fsf%40163.com -- Best Regards Andy Fan
>From 90b8df330df049bd1d7e881dc6e9b108c17b0924 Mon Sep 17 00:00:00 2001 From: "yizhi.fzh" <yizhi....@alibaba-inc.com> Date: Wed, 21 Feb 2024 18:40:03 +0800 Subject: [PATCH v1 1/1] Make get_rel_sync_entry less depending on transaction state. get_rel_sync_entry needs transaction only a replicate_valid = false entry is found, this should be some rare case. However the caller can't know if a entry is valid, so they have to prepare the transaction state before calling this function. Such preparation is expensive. This patch makes the get_rel_sync_entry can manage a transaction stage only if necessary. so the callers don't need to prepare it blindly. --- src/backend/replication/pgoutput/pgoutput.c | 60 ++++++++++++++++++++- 1 file changed, 59 insertions(+), 1 deletion(-) diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c index 998f92d671..25e55590a2 100644 --- a/src/backend/replication/pgoutput/pgoutput.c +++ b/src/backend/replication/pgoutput/pgoutput.c @@ -13,6 +13,7 @@ #include "postgres.h" #include "access/tupconvert.h" +#include "access/relation.h" #include "catalog/partition.h" #include "catalog/pg_publication.h" #include "catalog/pg_publication_rel.h" @@ -214,6 +215,11 @@ static void init_rel_sync_cache(MemoryContext cachectx); static void cleanup_rel_sync_cache(TransactionId xid, bool is_commit); static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data, Relation relation); +static RelationSyncEntry *get_rel_sync_entry_by_relid(PGOutputData *data, + Oid relid); +static RelationSyncEntry *get_rel_sync_entry_internal(PGOutputData *data, + Relation relation, + Oid relid); static void rel_sync_cache_relation_cb(Datum arg, Oid relid); static void rel_sync_cache_publication_cb(Datum arg, int cacheid, uint32 hashvalue); @@ -1962,11 +1968,29 @@ set_schema_sent_in_streamed_txn(RelationSyncEntry *entry, TransactionId xid) */ static RelationSyncEntry * get_rel_sync_entry(PGOutputData *data, Relation relation) +{ + return get_rel_sync_entry_internal(data, relation, InvalidOid); +} + +static RelationSyncEntry * +__attribute__ ((unused)) +get_rel_sync_entry_by_relid(PGOutputData *data, Oid relid) +{ + return get_rel_sync_entry_internal(data, NULL, relid); +} + +static RelationSyncEntry * +get_rel_sync_entry_internal(PGOutputData *data, Relation relation, Oid oid) { RelationSyncEntry *entry; bool found; MemoryContext oldctx; - Oid relid = RelationGetRelid(relation); + Oid relid = OidIsValid(oid) ? oid: RelationGetRelid(relation); + bool started_xact = false, using_subtxn; + + /* either oid or relation is provided. */ + Assert(OidIsValid(oid) || RelationIsValid(relation)); + Assert(!(OidIsValid(oid) && RelationIsValid(relation))); Assert(RelationSyncCache != NULL); @@ -1993,6 +2017,23 @@ get_rel_sync_entry(PGOutputData *data, Relation relation) entry->attrmap = NULL; } + if (!entry->replicate_valid && !IsTransactionOrTransactionBlock()) + { + /* + * Validating the entry needs to access syscache, which must + * be in a transaction state, if that's not ready, start one. + */ + + using_subtxn = IsTransactionOrTransactionBlock(); + + if (using_subtxn) + BeginInternalSubTransaction(__func__); + else + StartTransactionCommand(); + + started_xact = true; + } + /* Validate the entry */ if (!entry->replicate_valid) { @@ -2198,9 +2239,19 @@ get_rel_sync_entry(PGOutputData *data, Relation relation) if (entry->pubactions.pubinsert || entry->pubactions.pubupdate || entry->pubactions.pubdelete) { + bool rel_opened = false; + + if (!RelationIsValid(relation)) + { + relation = relation_open(oid, AccessShareLock); + rel_opened = true; + } /* Initialize the tuple slot and map */ init_tuple_slot(data, relation, entry); + if (rel_opened) + relation_close(relation, AccessShareLock); + /* Initialize the row filter */ pgoutput_row_filter_init(data, rel_publications, entry); @@ -2215,6 +2266,13 @@ get_rel_sync_entry(PGOutputData *data, Relation relation) entry->replicate_valid = true; } + if (started_xact) + { + AbortCurrentTransaction(); + if (using_subtxn) + RollbackAndReleaseCurrentSubTransaction(); + } + return entry; } -- 2.34.1