I update the patch since there were my misunderstanding points. On 2021/03/26 16:20, Masahiro Ikeda wrote: > Thanks for many your suggestions! > I made the patch to handle the issues. > >> 1) What is the motivation to have both prevWalUsage and pgWalUsage, >> instead of just accumulating the stats since the last submission? >> There doesn't seem to be any comment explaining it? Computing >> differences to previous values, and copying to prev*, isn't free. I >> assume this is out of a fear that the stats could get reset before >> they're used for instrument.c purposes - but who knows? > > I removed the unnecessary code copying pgWalUsage and just reset the > pgWalUsage after reporting the stats in pgstat_report_wal().
I didn't change this. >> 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the >> former being used by wal writer, the latter by most other processes? >> There again don't seem to be comments explaining this. > > I added the comments why two functions are separated. > (But is it better to merge them?) I updated the comments for following reasons. >> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) >> just to figure out if there's been any changes isn't all that >> cheap. This is regularly exercised in read-only workloads too. Seems >> adding a boolean WalStats.have_pending = true or such would be >> better. >> 4) For plain backends pgstat_report_wal() is called by >> pgstat_report_stat() - but it is not checked as part of the "Don't >> expend a clock check if nothing to do" check at the top. It's >> probably rare to have pending wal stats without also passing one of >> the other conditions, but ... > > I added the logic to check if the stats counters are updated or not in > pgstat_report_stat() using not only generated wal record but also write/sync > counters, and it can skip to call reporting function. I removed the checking code whether the wal stats counters were updated or not in pgstat_report_stat() since I couldn't understand the valid case yet. pgstat_report_stat() is called by backends when the transaction is finished. This means that the condition seems always pass. I thought to implement if the WAL stats counters were not updated, skip to send all statistics including the counters for databases and so on. But I think it's not good because it may take more time to be reflected the generated stats by read-only transaction. > Although I added the condition which the write/sync counters are updated or > not, I haven't understood the following case yet...Sorry. I checked related > code and tested to insert large object, but I couldn't. I'll investigate more > deeply, but if you already know the function which leads the following case, > please let me know. I understood the above case (Fujii-san, thanks for your advice in person). When to flush buffers, for example, to select buffer replacement victim, it requires a WAL flush if the buffer is dirty. So, to check the WAL stats counters are updated or not, I check the number of generated wal record and the counter of syncing in pgstat_report_wal(). The reason why not to check the counter of writing is that if to write is happened, to sync is happened too in the above case. I added the comments in the patch. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c index 237e13361b..095d8d0970 100644 --- a/src/backend/executor/instrument.c +++ b/src/backend/executor/instrument.c @@ -19,6 +19,17 @@ BufferUsage pgBufferUsage; static BufferUsage save_pgBufferUsage; + +/* + * generated WAL usage counters. + * + * Be careful that the counters are cleared after reporting them to + * the stats collector although you can use WalUsageAccumDiff() + * to computing differences to previous values. For backends, + * the counters may be reset after a transaction is finished and + * pgstat_report_wal() is invoked, so you can compute the difference + * in the same transaction only. + */ WalUsage pgWalUsage; static WalUsage save_pgWalUsage; diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 60f45ccc4e..1ffedb2a24 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -144,14 +144,6 @@ char *pgstat_stat_tmpname = NULL; PgStat_MsgBgWriter BgWriterStats; PgStat_MsgWal WalStats; -/* - * WAL usage counters saved from pgWALUsage at the previous call to - * pgstat_report_wal(). This is used to calculate how much WAL usage - * happens between pgstat_report_wal() calls, by substracting - * the previous counters from the current ones. - */ -static WalUsage prevWalUsage; - /* * List of SLRU names that we keep stats for. There is no central registry of * SLRUs, so we use this fixed list instead. The "other" entry is used for @@ -3117,13 +3109,6 @@ pgstat_initialize(void) MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType]; } - /* - * Initialize prevWalUsage with pgWalUsage so that pgstat_report_wal() can - * calculate how much pgWalUsage counters are increased by substracting - * prevWalUsage from pgWalUsage. - */ - prevWalUsage = pgWalUsage; - /* Set up a process-exit hook to clean up */ on_shmem_exit(pgstat_beshutdown_hook, 0); } @@ -4674,8 +4659,7 @@ pgstat_send_bgwriter(void) /* ---------- * pgstat_report_wal() - * - * Calculate how much WAL usage counters are increased and send - * WAL statistics to the collector. + * Send WAL statistics to the collector and clear the counters. * * Must be called by processes that generate WAL. * ---------- @@ -4683,19 +4667,36 @@ pgstat_send_bgwriter(void) void pgstat_report_wal(void) { - WalUsage walusage; - /* - * Calculate how much WAL usage counters are increased by substracting the - * previous counters from the current ones. Fill the results in WAL stats - * message. + * Skip if the WAL statistics counters were not updated. + * + * We can know whether the counters were updated to check only two + * records. So, for performance, we don't allocate another memory spaces + * and check the all stats like pgstat_send_slru(). + * + * It's not enough to check only the generated wal records, because there + * is a case that a backend generates no wal records, but just writes and + * syncs the wal data. When to flush buffers, for example, to select + * buffer replacement victim, it requires a WAL flush if the buffer is + * dirty. This happens even if executing read-only transaction. + * + * In this case, if to write is happend, to sync is happend too. although + * there is a case to write is not happend but to sync is happened. For + * example, when to flush buffers just after another process only write + * and don't sync because the wal buffer is no space. + * + * So, it's ok to check the generated wal records and the counter of + * syncing only. */ - MemSet(&walusage, 0, sizeof(WalUsage)); - WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage); + if (pgWalUsage.wal_records == 0 && WalStats.m_wal_sync == 0) + return; - WalStats.m_wal_records = walusage.wal_records; - WalStats.m_wal_fpi = walusage.wal_fpi; - WalStats.m_wal_bytes = walusage.wal_bytes; + /* + * set the counters related to generated WAL data. + */ + WalStats.m_wal_records = pgWalUsage.wal_records; + WalStats.m_wal_fpi = pgWalUsage.wal_fpi; + WalStats.m_wal_bytes = pgWalUsage.wal_bytes; /* * Send WAL stats message to the collector. @@ -4704,18 +4705,26 @@ pgstat_report_wal(void) return; /* - * Save the current counters for the subsequent calculation of WAL usage. + * Clear out the statistics buffer for generated WAL data, so it can be + * re-used. + * + * It's ok to clear out because no one takes difference crossing + * pgstat_report_wal() calls although these counters are used in another + * places, for example in pg_stat_statements.c. */ - prevWalUsage = pgWalUsage; + MemSet(&pgWalUsage, 0, sizeof(WalUsage)); } /* ---------- * pgstat_send_wal() - * - * Send WAL statistics to the collector. + * Send WAL statistics to the collector. * - * If 'force' is not set, WAL stats message is only sent if enough time has - * passed since last one was sent to reach PGSTAT_STAT_INTERVAL. + * If the processes that generate WAL data must call pgstat_report_wal() instead. + * + * If 'force' is not set, the WAL stats message is only sent if the statistics + * counters except for the counters related to generated WAL data were updated + * and enough time has passed since last one was sent to reach PGSTAT_STAT_INTERVAL. * * Return true if the message is sent, and false otherwise. * ---------- @@ -4723,26 +4732,32 @@ pgstat_report_wal(void) bool pgstat_send_wal(bool force) { - /* We assume this initializes to zeroes */ - static const PgStat_MsgWal all_zeroes; static TimestampTz sendTime = 0; - /* - * This function can be called even if nothing at all has happened. In - * this case, avoid sending a completely empty message to the stats - * collector. - */ - if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) - return false; - if (!force) { - TimestampTz now = GetCurrentTimestamp(); + TimestampTz now; + + /* + * Don't expend a clock check if the WAL statistics counters were not + * updated. + * + * We don't consider the WAL stats related to the generated WAL data + * because pgstat_report_wal() is called instead in the case. + * + * We can know whether the counters were updated to check only two + * records. So, for performance, we don't allocate another memory + * spaces and check the all stats like pgstat_send_slru(). + */ + if (walStats.wal_write == 0 && walStats.wal_sync == 0) + return false; /* * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL - * msec since we last sent one. + * msec since we last sent one to avoid overloading the stats + * collector. */ + now = GetCurrentTimestamp(); if (!TimestampDifferenceExceeds(sendTime, now, PGSTAT_STAT_INTERVAL)) return false; sendTime = now;