Hi, Thank you for looking into this! And, sorry for the late answer.
On Mon, 13 May 2024 at 17:12, Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > On Fri, Apr 19, 2024 at 1:32 PM Nazir Bilal Yavuz <byavu...@gmail.com> wrote: > > > > > I wanted to inform you that the 73f0a13266 commit changed all WALRead > > > calls to read variable bytes, only the WAL receiver was reading > > > variable bytes before. > > > > I want to start working on this again if possible. I will try to > > summarize the current status: > > Thanks for working on this. > > > * With the 73f0a13266 commit, the WALRead() function started to read > > variable bytes in every case. Before, only the WAL receiver was > > reading variable bytes. > > > > * With the 91f2cae7a4 commit, WALReadFromBuffers() is merged. We were > > discussing what we have to do when this is merged. It is decided that > > WALReadFromBuffers() does not call pgstat_report_wait_start() because > > this function does not perform any IO [1]. We may follow the same > > logic by not including these to pg_stat_io? > > Right. WALReadFromBuffers doesn't do any I/O. > > Whoever reads WAL from disk (backends, walsenders, recovery process) > using pg_pread (XLogPageRead, WALRead) needs to be tracked in > pg_stat_io or some other view. If it were to be in pg_stat_io, > although we may not be able to distinguish WAL read stats at a backend > level (like how many times/bytes a walsender or recovery process or a > backend read WAL from disk), but it can help understand overall impact > of WAL read I/O at a cluster level. With this approach, the WAL I/O > stats are divided up - WAL read I/O and write I/O stats are in > pg_stat_io and pg_stat_wal respectively. > > This makes me think if we need to add WAL read I/O stats also to > pg_stat_wal. Then, we can also add WALReadFromBuffers stats > hits/misses there. With this approach, pg_stat_wal can be a one-stop > view for all the WAL related stats. If needed, we can join info from > pg_stat_wal to pg_stat_io in system_views.sql so that the I/O stats > are emitted to the end-user via pg_stat_io. I agree that the ultimate goal is seeing WAL I/O stats from one place. There is a reply to this from Amit: On Tue, 28 May 2024 at 03:48, Amit Kapila <amit.kapil...@gmail.com> wrote: > > If possible, let's have all the I/O stats (even for WAL) in > pg_stat_io. Can't we show the WAL data we get from buffers in the hits > column and then have read_bytes or something like that to know the > amount of data read? I think it is better to have all the I/O stats in pg_stat_io like Amit said. And, it makes sense to me to show 'WAL data we get from buffers' in the hits column. Since, basically instead of doing I/O from disk; we get data directly from WAL buffers. I think that fits the explanation of the hits column in pg_stat_io, which is 'The number of times a desired block was found in a shared buffer.' [1]. > > * With the b5a9b18cd0 commit, streaming I/O is merged but AFAIK this > > does not block anything related to putting WAL stats in pg_stat_io. > > > > If I am not missing any new changes, the only problem is reading > > variable bytes now. We have discussed a couple of solutions: > > > > 1- Change op_bytes to something like -1, 0, 1, NULL etc. and document > > that this means some variable byte I/O is happening. > > > > I kind of dislike this solution because if the *only* read I/O is > > happening in variable bytes, it will look like write and extend I/Os > > are happening in variable bytes as well. As a solution, it could be > > documented that only read I/Os could happen in variable bytes for now. > > Yes, read I/O for relation and WAL can happen in variable bytes. I > think this idea seems reasonable and simple yet useful to know the > cluster-wide read I/O. I agree. > However, another point here is how the total number of bytes read is > represented with existing pg_stat_io columns 'reads' and 'op_bytes'. > It is known now with 'reads' * 'op_bytes', but with variable bytes, > how is read bytes calculated? Maybe add new columns > read_bytes/write_bytes? > > > 2- Use op_bytes_[read | write | extend] columns instead of one > > op_bytes column, also use the first solution. > > > > This can solve the first solution's weakness but it introduces two > > more columns. This is more future proof compared to the first solution > > if there is a chance that some variable I/O could happen in write and > > extend calls as well in the future. > > -1 as more columns impact the readability and usability. I did not understand the overall difference between what you suggested (adding read_bytes/write_bytes columns) and my suggestion (adding op_bytes_[read | write | extend] columns). They both introduce new columns. Could you please explain what you suggested in more detail? > > 3- Create a new pg_stat_io_wal view to put WAL I/Os here instead of > > pg_stat_io. > > > > pg_stat_io could remain untouchable and we will have flexibility to > > edit this new view as much as we want. But the original aim of the > > pg_stat_io is evaluating all I/O from a single view and adding a new > > view breaks this aim. > > -1 as it defeats the very purpose of one-stop view pg_stat_io for all > kinds of I/O. PS: see my response above about adding both WAL write > I/O and read I/O stats to pg_stat_wal. I agree. [1] https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-IO-VIEW -- Regards, Nazir Bilal Yavuz Microsoft