On Tue, Nov 2, 2021 at 12:18 AM osumi.takami...@fujitsu.com <osumi.takami...@fujitsu.com> wrote: > > On Thursday, October 28, 2021 11:19 PM I wrote: > > I've created a new patch that extends pg_stat_subscription_workers to > > include > > other transaction statistics. > > > > Note that this patch depends on v18 patch-set in [1]... > Rebased based on the v19 in [1]. > Also, fixed documentation a little and made tests tidy. > FYI, the newly included TAP test(027_worker_xact_stats.pl) is stable > because I checked that 100 times of its execution in a tight loop all passed. >
I have done some basic testing of the patch and have some initial review comments: (1) I think this patch needs to be in "git format-patch" format, with a proper commit message that describes the purpose of the patch and the functionality it adds, and any high-level design points (something like the overview given in the initial post, and accounting for the subsequent discussion points and updated functionality). (2) doc/src/sgml/monitoring.sgml There are some grammatical issues in the current description. I suggest changing it to something like: BEFORE: + <entry>At least one row per subscription, showing about transaction statistics and error summary that AFTER: + <entry>At least one row per subscription, showing transaction statistics and information about errors that (2) doc/src/sgml/monitoring.sgml The current description seems a little confusing. Per subscription, it shows the transaction statistics and any last error info from tablesync/apply workers? If this is the case, I'd suggest the following change: BEFORE: + one row per subscription for transaction statistics and summary of the last + error reported by workers applying logical replication changes and workers + handling the initial data copy of the subscribed tables. AFTER: + one row per subscription, showing corresponding transaction statistics and + information about the last error reported by workers applying logical replication + changes or by workers handling the initial data copy of the subscribed tables. (3) xact_commit I think that the "xact_commit" column should be named "xact_commit_count" or "xact_commits". Similarly, I think "xact_error" should be named "xact_error_count" or "xact_errors", and "xact_aborts" should be named "xact_abort_count" or "xact_aborts". (4) xact_commit_bytes + Amount of transactions data successfully applied in this subscription. + Consumed memory for xact_commit is displayed. I find this description a bit confusing. "Consumed memory for xact_commit" seems different to "transactions data". Could the description be something like: Amount of data (in bytes) successfully applied in this subscription, across "xact_commit_count" transactions. (5) I'd suggest some minor rewording for the following: BEFORE: + Number of transactions failed to be applied and caught by table + sync worker or main apply worker in this subscription. AFTER: + Number of transactions that failed to be applied by the table + sync worker or main apply worker in this subscription. (6) xact_error_bytes Again, it's a little confusing referring to "consumed memory" here. How about rewording this, something like: BEFORE: + Amount of transactions data unsuccessfully applied in this subscription. + Consumed memory that past failed transaction used is displayed. AFTER: + Amount of data (in bytes) unsuccessfully applied in this subscription by the last failed transaction. (7) The additional information provided for "xact_abort_bytes" needs some rewording, something like: BEFORE: + Increase <literal>logical_decoding_work_mem</literal> on the publisher + so that it exceeds the size of whole streamed transaction + to suppress unnecessary consumed network bandwidth in addition to change + in memory of the subscriber, if unexpected amount of streamed transactions + are aborted. AFTER: + In order to suppress unnecessary consumed network bandwidth, increase + <literal>logical_decoding_work_mem</literal> on the publisher so that it + exceeds the size of the whole streamed transaction, and additionally increase + the available subscriber memory, if an unexpected amount of streamed transactions + are aborted. (8) Suggested update: BEFORE: + * Tell the collector that worker transaction has finished without problem. AFTER: + * Tell the collector that the worker transaction has successfully completed. (9) src/backend/postmaster/pgstat.c I think that the GID copying is unnecessarily copying the whole GID buffer or using an additional strlen(). It should be changed to use strlcpy() to match other code: BEFORE: + /* get the gid for this two phase operation */ + if (command == LOGICAL_REP_MSG_PREPARE || + command == LOGICAL_REP_MSG_STREAM_PREPARE) + memcpy(msg.m_gid, prepare_data->gid, GIDSIZE); + else if (command == LOGICAL_REP_MSG_COMMIT_PREPARED) + memcpy(msg.m_gid, commit_data->gid, GIDSIZE); + else /* rollback prepared */ + memcpy(msg.m_gid, rollback_data->gid, GIDSIZE); AFTER: + /* get the gid for this two phase operation */ + if (command == LOGICAL_REP_MSG_PREPARE || + command == LOGICAL_REP_MSG_STREAM_PREPARE) + strlcpy(msg.m_gid, prepare_data->gid, sizeof(msg.m_gid)); + else if (command == LOGICAL_REP_MSG_COMMIT_PREPARED) + strlcpy(msg.m_gid, commit_data->gid, sizeof(msg.m_gid)); + else /* rollback prepared */ + strlcpy(msg.m_gid, rollback_data->gid, sizeof(msg.m_gid)); BEFORE: + strlcpy(prepared_txn->gid, msg->m_gid, strlen(msg->m_gid) + 1); AFTER: + strlcpy(prepared_txn->gid, msg->m_gid, sizeof(prepared_txn->gid)); BEFORE: + memcpy(key.gid, msg->m_gid, strlen(msg->m_gid)); AFTER: + strlcpy(key.gid, msg->m_gid, sizeof(key.gid)); BEFORE: + memcpy(key.gid, gid, strlen(gid)); AFTER: + strlcpy(key.gid, gid, sizeof(key.gid)); (10) src/backend/replication/logical/worker.c Some suggested rewording: BEFORE: + * size of streaming transaction resources because it have used the AFTER: + * size of streaming transaction resources because it has used the BEFORE: + * tradeoff should not be good. Also, add multiple values + * at once in order to reduce the number of this function call. AFTER: + * tradeoff would not be good. Also, add multiple values + * at once in order to reduce the number of calls to this function. (11) update_apply_change_size() Shouldn't this function be declared static? (12) stream_write_change() + streamed_entry->xact_size = streamed_entry->xact_size + total_len; /* update */ could be simply written as: + streamed_entry->xact_size += total_len; /* update */ Regards, Greg Nancarrow Fujitsu Australia