On Mon, Jan 25, 2021 at 1:28 PM Masahiro Ikeda <ikeda...@oss.nttdata.com> wrote: > > On 2021-01-25 13:15, Masahiro Ikeda wrote: > > On 2021-01-25 10:36, Masahiko Sawada wrote: > >> On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda > >> <ikeda...@oss.nttdata.com> wrote: > >>> > >>> On 2021-01-22 14:50, Masahiko Sawada wrote: > >>> > On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda > >>> > <ikeda...@oss.nttdata.com> wrote: > >>> >> > >>> >> Hi, > >>> >> > >>> >> I rebased the patch to the master branch. > >>> > > >>> > Thank you for working on this. I've read the latest patch. Here are > >>> > comments: > >>> > > >>> > --- > >>> > + if (track_wal_io_timing) > >>> > + { > >>> > + INSTR_TIME_SET_CURRENT(duration); > >>> > + INSTR_TIME_SUBTRACT(duration, start); > >>> > + WalStats.m_wal_write_time += > >>> > INSTR_TIME_GET_MILLISEC(duration); > >>> > + } > >>> > > >>> > * I think it should add the time in micro sec. > >>> > After running pgbench with track_wal_io_timing = on for 30 sec, > >>> > pg_stat_wal showed the following on my environment: > >>> > > >>> > postgres(1:61569)=# select * from pg_stat_wal; > >>> > -[ RECORD 1 ]----+----------------------------- > >>> > wal_records | 285947 > >>> > wal_fpi | 53285 > >>> > wal_bytes | 442008213 > >>> > wal_buffers_full | 0 > >>> > wal_write | 25516 > >>> > wal_write_time | 0 > >>> > wal_sync | 25437 > >>> > wal_sync_time | 14490 > >>> > stats_reset | 2021-01-22 10:56:13.29464+09 > >>> > > >>> > Since writes can complete less than a millisecond, wal_write_time > >>> > didn't increase. I think sync_time could also have the same problem. > >>> > >>> Thanks for your comments. I didn't notice that. > >>> I changed the unit from milliseconds to microseconds. > >>> > >>> > --- > >>> > + /* > >>> > + * Measure i/o timing to fsync WAL data. > >>> > + * > >>> > + * The wal receiver skip to collect it to avoid performance > >>> > degradation of standy servers. > >>> > + * If sync_method doesn't have its fsync method, to skip too. > >>> > + */ > >>> > + if (!AmWalReceiverProcess() && track_wal_io_timing && > >>> > fsyncMethodCalled()) > >>> > + INSTR_TIME_SET_CURRENT(start); > >>> > > >>> > * Why does only the wal receiver skip it even if track_wal_io_timinig > >>> > is true? I think the performance degradation is also true for backend > >>> > processes. If there is another reason for that, I think it's better to > >>> > mention in both the doc and comment. > >>> > * How about checking track_wal_io_timing first? > >>> > * s/standy/standby/ > >>> > >>> I fixed it. > >>> As kuroda-san mentioned too, the skip is no need to be considered. > >> > >> I think you also removed the code to have the wal receiver report the > >> stats. So with the latest patch, the wal receiver tracks those > >> statistics but doesn't report. > >> And maybe XLogWalRcvWrite() also needs to track I/O? > > > > Thanks, I forgot to add them. > > I'll fix it. > > > > > >>> > >>> > --- > >>> > + /* increment the i/o timing and the number of times to fsync WAL > >>> > data */ > >>> > + if (fsyncMethodCalled()) > >>> > + { > >>> > + if (!AmWalReceiverProcess() && track_wal_io_timing) > >>> > + { > >>> > + INSTR_TIME_SET_CURRENT(duration); > >>> > + INSTR_TIME_SUBTRACT(duration, start); > >>> > + WalStats.m_wal_sync_time += > >>> > INSTR_TIME_GET_MILLISEC(duration); > >>> > + } > >>> > + > >>> > + WalStats.m_wal_sync++; > >>> > + } > >>> > > >>> > * I'd avoid always calling fsyncMethodCalled() in this path. How about > >>> > incrementing m_wal_sync after each sync operation? > >>> > >>> I think if syncing the disk does not occur, m_wal_sync should not be > >>> incremented. > >>> It depends enableFsync and sync_method. > >>> > >>> enableFsync is checked in each fsync method like > >>> pg_fsync_no_writethrough(), > >>> so if incrementing m_wal_sync after each sync operation, it should be > >>> implemented > >>> in each fsync method. It leads to many duplicated codes. > >> > >> Right. I missed that each fsync function checks enableFsync. > >> > >>> So, why don't you change the function to a flag whether to > >>> sync data to the disk will be occurred or not in issue_xlog_fsync()? > >> > >> Looks better. Since we don't necessarily need to increment m_wal_sync > >> after doing fsync we can write the code without an additional variable > >> as follows: > >> > >> if (enableFsync) > >> { > >> switch (sync_method) > >> { > >> case SYNC_METHOD_FSYNC: > >> #ifdef HAVE_FSYNC_WRITETHROUGH > >> case SYNC_METHOD_FSYNC_WRITETHROUGH: > >> #endif > >> #ifdef HAVE_FDATASYNC > >> case SYNC_METHOD_FDATASYNC: > >> #endif > >> WalStats.m_wal_sync++; > >> if (track_wal_io_timing) > >> INSTR_TIME_SET_CURRENT(start); > >> break; > >> default: > >> break; > >> } > >> } > >> > >> (do fsync and error handling here) > >> > >> /* increment the i/o timing and the number of times to fsync WAL > >> data */ > >> if (track_wal_io_timing) > >> { > >> INSTR_TIME_SET_CURRENT(duration); > >> INSTR_TIME_SUBTRACT(duration, start); > >> WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration); > >> } > > > > IIUC, I think we can't handle the following case. > > > > When "sync_method" is SYNC_METHOD_OPEN or SYNC_METHOD_OPEN_DSYNC and > > "track_wal_io_timing" is enabled, "start" doesn't be initialized. > > > > My understanding is something wrong, isn't it?
You're right. We might want to initialize 'start' with 0 in those two cases and check if INSTR_TIME_IS_ZERO() later when accumulating the I/O time. > > I thought the following is better. > > > ``` > /* Measure i/o timing to sync WAL data.*/ > if (track_wal_io_timing) > INSTR_TIME_SET_CURRENT(start); > > (do fsync and error handling here) > > /* check whether to sync WAL data to the disk right now. */ > if (enableFsync) > { > if ((sync_method == SYNC_METHOD_FSYNC) || > (sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH) || > (sync_method == SYNC_METHOD_FDATASYNC)) > { > /* increment the i/o timing and the number of times > to fsync WAL data > */ > if (track_wal_io_timing) > { > instr_time duration; > > INSTR_TIME_SET_CURRENT(duration); > INSTR_TIME_SUBTRACT(duration, start); > WalStats.m_wal_sync_time = > INSTR_TIME_GET_MICROSEC(duration); > } > WalStats.m_wal_sync++; > } > } > ``` > > Although INSTR_TIME_SET_CURRENT(start) is called everytime regardless > of the "sync_method" and "enableFsync", we don't make additional > variables. > But it's ok because "track_wal_io_timing" leads already performance > degradation. > > What do you think? That also fine with me. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/