Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021-03-19 16:30, Fujii Masao wrote: > On 2021/03/15 10:39, Masahiro Ikeda wrote: >> Thanks, I understood get_sync_bit() checks the sync flags and >> the write unit of generated wal data and replicated wal data is >> different. >> (It's interesting optimization whether to use kernel cache or not.) >> >> OK. Although I agree to separate the stats for the walrecever, >> I want to hear opinions from other people too. I didn't change the >> patch. >> >> Please feel to your comments. > > What about applying the patch for common WAL write function like > XLogWriteFile(), separately from the patch for walreceiver's stats? > Seems the former reaches the consensus, so we can commit it firstly. > Also even only the former change is useful because which allows > walreceiver to report WALWrite wait event. Agreed. I separated the patches. If only the former is committed, my trivial concern is that there may be a disadvantage, but no advantage for the standby server. It may lead to performance degradation to the wal receiver by calling INSTR_TIME_SET_CURRENT(), but the stats can't visible for users until the latter patch is committed. I think it's ok because this not happening in the case to disable the "track_wal_io_timing" in the standby server. Although some users may start the standby server using the backup which "track_wal_io_timing" is enabled in the primary server, they will say it's ok since the users already accept the performance degradation in the primary server. >> OK. I agree. >> >> I wonder to change the error check ways depending on who calls this >> function? >> Now, only the walreceiver checks (1)errno==0 and doesn't check >> (2)errno==ENITR. >> Other processes are the opposite. >> >> IIUC, it's appropriate that every process checks (1)(2). >> Please let me know my understanding is wrong. > > I'm thinking the same. Regarding (2), commit 79ce29c734 introduced > that code. According to the following commit log, it seems harmless > to retry on EINTR even walreceiver. > > Also retry on EINTR. All signals used in the backend are flagged > SA_RESTART > nowadays, so it shouldn't happen, but better to be defensive. Thanks, I understood. >>> BTW, currently XLogWrite() increments IO timing even when pg_pwrite() >>> reports an error. But this is useless. Probably IO timing should be >>> incremented after the return code of pg_pwrite() is checked, instead? >> >> Yes, I agree. I fixed it. >> (v18-0003-Makes-the-wal-receiver-report-WAL-statistics.patch) > > Thanks for the patch! > > nleft = nbytes; > do > { > - errno = 0; > + written = XLogWriteFile(openLogFile, from, > nleft, (off_t) > startoffset, > + > ThisTimeLineID, openLogSegNo, wal_segment_size); > > Can we merge this do-while loop in XLogWrite() into the loop > in XLogWriteFile()? > If we do that, ISTM that the following codes are not necessary in > XLogWrite(). > > nleft -= written; > from += written; OK, I fixed it. > + * 'segsize' is a segment size of WAL segment file. > > Since segsize is always wal_segment_size, segsize argument seems > not necessary in XLogWriteFile(). Right. I fixed it. > +XLogWriteFile(int fd, const void *buf, size_t nbyte, off_t offset, > + TimeLineID timelineid, XLogSegNo segno, int segsize) > > Why did you use "const void *" instead of "char *" for *buf? I followed the argument of pg_pwrite(). But, I think "char *" is better, so fixed it. > Regarding 0005 patch, I will review it later. Thanks. Regards, -- Masahiro Ikeda NTT DATA CORPORATION diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index a7a94d2a83..df028c5039 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -771,6 +771,9 @@ WalRcvDie(int code, Datum arg) /* Ensure that all WAL records received are flushed to disk */ XLogWalRcvFlush(true); + /* Send WAL statistics to the stats collector before terminating */ + pgstat_send_wal(true); + /* Mark ourselves inactive in shared memory */ SpinLockAcquire(&walrcv->mutex); Assert(walrcv->walRcvState == WALRCV_STREAMING || @@ -910,6 +913,12 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr) XLogArchiveForceDone(xlogfname); else XLogArchiveNotify(xlogfname); + +/* + * Send WAL statistics to the stats collector when finishing + * the current WAL segment file to avoid overloading it. + */ +pgstat_send_wal(false); } recvFile = -1; diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index bd5e787e55..4c7d90f1b9 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2536,61 +2536,14
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On 2021/03/22 16:50, Fujii Masao wrote: > > > On 2021/03/22 9:50, ikedamsh wrote: >> Agreed. I separated the patches. >> >> If only the former is committed, my trivial concern is that there may be >> a disadvantage, but no advantage for the standby server. It may lead to >> performance degradation to the wal receiver by calling >> INSTR_TIME_SET_CURRENT(), but the stats can't visible for users until the >> latter patch is committed. > > Your concern is valid, so let's polish and commit also the 0003 patch to v14. > I'm still thinking that it's better to separate wal_xxx columns into > walreceiver's and the others. But if we count even walreceiver activity on > the existing columns, regarding 0003 patch, we need to update the document? > For example, "Number of times WAL buffers were written out to disk via > XLogWrite request." should be "Number of times WAL buffers were written > out to disk via XLogWrite request and by WAL receiver process."? Maybe > we need to append some descriptions about this into "WAL configuration" > section? Agreed. Users can know whether the stats is for walreceiver or not. The pg_stat_wal view in standby server shows for the walreceiver, and in primary server it shows for the others. So, I updated the document. (v20-0003-Makes-the-wal-receiver-report-WAL-statistics.patch) >> I followed the argument of pg_pwrite(). >> But, I think "char *" is better, so fixed it. > > Thanks for updating the patch! > > +extern int XLogWriteFile(int fd, char *buf, > + size_t nbyte, off_t offset, > + TimeLineID timelineid, XLogSegNo segno, > + bool write_all); > > write_all seems not to be necessary. You added this flag for walreceiver, > I guess. But even without the argument, walreceiver seems to work expectedly. > So, what about the attached patch? I applied some cosmetic changes to the > patch. Thanks a lot. Yes, "write_all" is unnecessary. Your patch is looks good to me. Regards, -- Masahiro Ikeda NTT DATA CORPORATION diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index db4b4e460c..281b13b9fa 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3493,7 +3493,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i Number of times WAL buffers were written out to disk via - XLogWrite request. + XLogWrite request and WAL data were written + out to disk by the WAL receiver process. See for more information about the internal WAL function XLogWrite. @@ -3521,7 +3522,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i Total amount of time spent writing WAL buffers to disk via - XLogWrite request, in milliseconds + XLogWrite request and WAL data to disk + by the WAL receiver process, in milliseconds (if is enabled, otherwise zero). This includes the sync time when wal_sync_method is either diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml index ae4a3c1293..39e7028c96 100644 --- a/doc/src/sgml/wal.sgml +++ b/doc/src/sgml/wal.sgml @@ -769,7 +769,7 @@ - There are two internal functions to write WAL data to disk: + There are two internal functions to write generated WAL data to disk: XLogWrite and issue_xlog_fsync. When is enabled, the total amounts of time XLogWrite writes and @@ -795,7 +795,19 @@ issue_xlog_fsync syncs WAL data to disk are also counted as wal_write and wal_sync in pg_stat_wal, respectively. - + To write replicated WAL data to disk by the WAL receiver is almost the same + as above except for some points. First, there is a dedicated code path for the + WAL receiver to write data although issue_xlog_fsync is + the same for syncing data. + Second, the WAL receiver writes replicated WAL data per bytes from the local + memory although the generated WAL data is written per WAL buffer pages. + The counters of wal_write, wal_sync, + wal_write_time, and wal_sync_time are + common statistics for writing/syncing both generated and replicated WAL data. + But, you can distinguish them because the generated WAL data is written/synced + in the primary server and the replicated WAL data is written/synced in + the standby server. + diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index a7a94d2a83..df028c5039 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -771,6 +771,9 @@ WalRcvDie(int code, Datum arg) /* Ensure that all WAL records received are flushed to disk */ XLogWalRcvFlush(t
Fix comments of heap_prune_chain()
Hi,While I’m reading source codes related to vacuum, I found comments whichdon’t seem to fit the reality. I think the commit[1] just forgot to fix them.What do you think?[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=dc7420c2c9274a283779ec19718d2d16323640c0 Regards,-- Masahiro IkedaNTT DATA CORPORATION v1-0001-fix_heap_prune_chain.patch Description: Binary data
It is not documented that pg_promote can exit standby mode
Hi, The document(high-availability.sgml) says that there are only two ways to exit standby mode. 26.2.2. Standby Server Operation Standby mode is exited and the server switches to normal operation when pg_ctl promote is run or a trigger file is found (promote_trigger_file). But there is another way, by calling pg_promote function. I think we need to document it, doesn't it? I attached a patch. Please review and let me know your thoughts. Regards, Masahiro IkedaFrom 719b754c36f4537aaab7c6de1951d7b7ec4759f6 Mon Sep 17 00:00:00 2001 From: Masahiro Ikeda Date: Tue, 7 Apr 2020 08:34:55 +0900 Subject: [PATCH] fix doc about the way to exit standby mode --- doc/src/sgml/high-availability.sgml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index 4659b9e..88bf960 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -644,8 +644,8 @@ protocol to make nodes agree on a serializable transactional order. Standby mode is exited and the server switches to normal operation -when pg_ctl promote is run or a trigger file is found -(promote_trigger_file). Before failover, +when pg_ctl promote is run, pg_promote is called, +or a trigger file is found(promote_trigger_file). Before failover, any WAL immediately available in the archive or in pg_wal will be restored, but no attempt is made to connect to the master. -- 1.8.3.1