On Mon, 25 Jan 2021 at 09:36, Masahiko Sawada <sawada.m...@gmail.com> wrote: > On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda > <ikeda...@oss.nttdata.com> wrote: >> >> On 2021-01-22 14:50, Masahiko Sawada wrote: >> > On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda >> > <ikeda...@oss.nttdata.com> wrote: >> >> >> >> Hi, >> >> >> >> I rebased the patch to the master branch. >> > >> > Thank you for working on this. I've read the latest patch. Here are >> > comments: >> > >> > --- >> > + if (track_wal_io_timing) >> > + { >> > + INSTR_TIME_SET_CURRENT(duration); >> > + INSTR_TIME_SUBTRACT(duration, start); >> > + WalStats.m_wal_write_time += >> > INSTR_TIME_GET_MILLISEC(duration); >> > + } >> > >> > * I think it should add the time in micro sec. >> > After running pgbench with track_wal_io_timing = on for 30 sec, >> > pg_stat_wal showed the following on my environment: >> > >> > postgres(1:61569)=# select * from pg_stat_wal; >> > -[ RECORD 1 ]----+----------------------------- >> > wal_records | 285947 >> > wal_fpi | 53285 >> > wal_bytes | 442008213 >> > wal_buffers_full | 0 >> > wal_write | 25516 >> > wal_write_time | 0 >> > wal_sync | 25437 >> > wal_sync_time | 14490 >> > stats_reset | 2021-01-22 10:56:13.29464+09 >> > >> > Since writes can complete less than a millisecond, wal_write_time >> > didn't increase. I think sync_time could also have the same problem. >> >> Thanks for your comments. I didn't notice that. >> I changed the unit from milliseconds to microseconds. >> >> > --- >> > + /* >> > + * Measure i/o timing to fsync WAL data. >> > + * >> > + * The wal receiver skip to collect it to avoid performance >> > degradation of standy servers. >> > + * If sync_method doesn't have its fsync method, to skip too. >> > + */ >> > + if (!AmWalReceiverProcess() && track_wal_io_timing && >> > fsyncMethodCalled()) >> > + INSTR_TIME_SET_CURRENT(start); >> > >> > * Why does only the wal receiver skip it even if track_wal_io_timinig >> > is true? I think the performance degradation is also true for backend >> > processes. If there is another reason for that, I think it's better to >> > mention in both the doc and comment. >> > * How about checking track_wal_io_timing first? >> > * s/standy/standby/ >> >> I fixed it. >> As kuroda-san mentioned too, the skip is no need to be considered. > > I think you also removed the code to have the wal receiver report the > stats. So with the latest patch, the wal receiver tracks those > statistics but doesn't report. > > And maybe XLogWalRcvWrite() also needs to track I/O? > >> >> > --- >> > + /* increment the i/o timing and the number of times to fsync WAL >> > data */ >> > + if (fsyncMethodCalled()) >> > + { >> > + if (!AmWalReceiverProcess() && track_wal_io_timing) >> > + { >> > + INSTR_TIME_SET_CURRENT(duration); >> > + INSTR_TIME_SUBTRACT(duration, start); >> > + WalStats.m_wal_sync_time += >> > INSTR_TIME_GET_MILLISEC(duration); >> > + } >> > + >> > + WalStats.m_wal_sync++; >> > + } >> > >> > * I'd avoid always calling fsyncMethodCalled() in this path. How about >> > incrementing m_wal_sync after each sync operation? >> >> I think if syncing the disk does not occur, m_wal_sync should not be >> incremented. >> It depends enableFsync and sync_method. >> >> enableFsync is checked in each fsync method like >> pg_fsync_no_writethrough(), >> so if incrementing m_wal_sync after each sync operation, it should be >> implemented >> in each fsync method. It leads to many duplicated codes. > > Right. I missed that each fsync function checks enableFsync. > >> So, why don't you change the function to a flag whether to >> sync data to the disk will be occurred or not in issue_xlog_fsync()? > > Looks better. Since we don't necessarily need to increment m_wal_sync > after doing fsync we can write the code without an additional variable > as follows: > > if (enableFsync) > { > switch (sync_method) > { > case SYNC_METHOD_FSYNC: > #ifdef HAVE_FSYNC_WRITETHROUGH > case SYNC_METHOD_FSYNC_WRITETHROUGH: > #endif > #ifdef HAVE_FDATASYNC > case SYNC_METHOD_FDATASYNC: > #endif > WalStats.m_wal_sync++; > if (track_wal_io_timing) > INSTR_TIME_SET_CURRENT(start); > break; > default: > break; > } > } > > (do fsync and error handling here) > > /* increment the i/o timing and the number of times to fsync WAL data */ > if (track_wal_io_timing) > { > INSTR_TIME_SET_CURRENT(duration); > INSTR_TIME_SUBTRACT(duration, start); > WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration); > } > > I think we can change the first switch-case to an if statement. >
+1. We can also narrow the scope of "duration" into "if (track_wal_io_timing)" branch. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.