On Mon, Apr 19, 2021 at 6:22 PM Peter Smith <smithpb2...@gmail.com> wrote:
> > Here are a some review comments: > > ------ > > 1. The patch v3 applied OK but with whitespace warnings > > [postgres@CentOS7-x64 oss_postgres_2PC]$ git apply > > ../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch > > ../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch:98: > indent with spaces. > /* output BEGIN if we haven't yet, avoid for streaming and > non-transactional messages */ > > ../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch:99: > indent with spaces. > if (!data->xact_wrote_changes && !in_streaming && transactional) > warning: 2 lines add whitespace errors. > > ------ > Fixed. > > 2. Please create a CF entry in [1] for this patch. > > ------ > > 3. Patch comment > > The comment describes the problem and then suddenly just says > "Postpone the BEGIN message until the first change." > > I suggest changing it to say more like... "(blank line) This patch > addresses the above problem by postponing the BEGIN message until the > first change." > > ------ > > Updated. > 4. pgoutput.h > > Maybe for consistency with the context member, the comment for the new > member should be to the right instead of above it? > > @@ -20,6 +20,9 @@ typedef struct PGOutputData > MemoryContext context; /* private memory context for transient > * allocations */ > > + /* flag indicating whether messages have previously been sent */ > + bool xact_wrote_changes; > + > > ------ > > 5. pgoutput.h > > + /* flag indicating whether messages have previously been sent */ > > "previously been sent" --> "already been sent" ?? > > ------ > > 6. pgoutput.h - misleading member name > > Actually, now that I have read all the rest of the code and how this > member is used I feel that this name is very misleading. e.g. For > "streaming" case then you still are writing changes but are not > setting this member at all - therefore it does not always mean what it > says. > > I feel a better name for this would be something like > "sent_begin_txn". Then if you have sent BEGIN it is true. If you > haven't sent BEGIN it is false. It eliminates all ambiguity naming it > this way instead. > > (This makes my feedback #5 redundant because the comment will be a bit > different if you do this). > > ------ > Fixed above comments. > > 7. pgoutput.c - function pgoutput_begin_txn > > @@ -345,6 +345,23 @@ pgoutput_startup(LogicalDecodingContext *ctx, > OutputPluginOptions *opt, > static void > pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) > { > > I guess that you still needed to pass the txn because that is how the > API is documented, right? > > But I am wondering if you ought to flag it as unused so you wont get > some BF machine giving warnings about it. > > e.g. Syntax like this? > > pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN * txn) { > (void)txn; > ... > Updated. > ------ > > 8. pgoutput.c - function pgoutput_begin_txn > > @@ -345,6 +345,23 @@ pgoutput_startup(LogicalDecodingContext *ctx, > OutputPluginOptions *opt, > static void > pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) > { > + PGOutputData *data = ctx->output_plugin_private; > + > + /* > + * Don't send BEGIN message here. Instead, postpone it until the first > + * change. In logical replication, a common scenario is to replicate a set > + * of tables (instead of all tables) and transactions whose changes were > on > + * table(s) that are not published will produce empty transactions. These > + * empty transactions will send BEGIN and COMMIT messages to subscribers, > + * using bandwidth on something with little/no use for logical > replication. > + */ > + data->xact_wrote_changes = false; > + elog(LOG,"Holding of begin"); > +} > > Why is this loglevel LOG? Looks like leftover debugging. > Removed. > > ------ > > 9. pgoutput.c - function pgoutput_commit_txn > > @@ -384,8 +401,14 @@ static void > pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, > XLogRecPtr commit_lsn) > { > + PGOutputData *data = ctx->output_plugin_private; > + > OutputPluginUpdateProgress(ctx); > > + /* skip COMMIT message if nothing was sent */ > + if (!data->xact_wrote_changes) > + return; > + > > In the case where you decided to do nothing does it make sense that > you still called the function OutputPluginUpdateProgress(ctx); ? > I thought perhaps that your new check should come first so this call > would never happen. > Even though the empty transaction is not sent, the LSN is tracked as decoded, hence the progress needs to be updated. > ------ > > 10. pgoutput.c - variable declarations without casts > > + PGOutputData *data = ctx->output_plugin_private; > > I noticed the new stack variable you declare have no casts. > > This differs from the existing code which always looks like: > PGOutputData *data = (PGOutputData *) ctx->output_plugin_private; > > There are a couple of examples of this so please search new code to > find them all. > > ----- > Fixed. > 11. pgoutput.c - function pgoutput_change > > @@ -551,6 +574,13 @@ pgoutput_change(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > Assert(false); > } > > + /* output BEGIN if we haven't yet */ > + if (!data->xact_wrote_changes && !in_streaming) > + { > + pgoutput_begin(ctx, txn); > + data->xact_wrote_changes = true; > + } > > If the variable is renamed as previously suggested then the assignment > data->sent_BEGIN_txn = true; can be assigned in just 1 common place > INSIDE the pgoutput_begin function. > > ------ > Updated. > > 12. pgoutput.c - pgoutput_truncate function > > @@ -693,6 +723,13 @@ pgoutput_truncate(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > > if (nrelids > 0) > { > + /* output BEGIN if we haven't yet */ > + if (!data->xact_wrote_changes && !in_streaming) > + { > + pgoutput_begin(ctx, txn); > + data->xact_wrote_changes = true; > + } > > (same comment as above) > > If the variable is renamed as previously suggested then the assignment > data->sent_BEGIN_txn = true; can be assigned in just 1 common place > INSIDE the pgoutput_begin function. > > 13. pgoutput.c - pgoutput_message > > @@ -725,6 +762,13 @@ pgoutput_message(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > if (in_streaming) > xid = txn->xid; > > + /* output BEGIN if we haven't yet, avoid for streaming and > non-transactional messages */ > + if (!data->xact_wrote_changes && !in_streaming && transactional) > + { > + pgoutput_begin(ctx, txn); > + data->xact_wrote_changes = true; > + } > > (same comment as above) > > If the variable is renamed as previously suggested then the assignment > data->sent_BEGIN_txn = true; can be assigned in just 1 common place > INSIDE the pgoutput_begin function. > > ------ > Fixed. > > 14. Test Code. > > I noticed that there is no test code specifically for seeing if empty > transactions get sent or not. Is it possible to write such a test or > is this traffic improvement only observable using the debugger? > > The 020_messages.pl actually has a test case for tracking empty messages even though it is part of the messages test. regards, Ajin Cherian Fujitsu Australia
v4-0001-Skip-empty-transactions-for-logical-replication.patch
Description: Binary data