Thanks for working on this! Like Lukas, I'm excited to see more visibility into important parts of the system like this.
On Mon, Oct 10, 2022 at 11:49 AM Melanie Plageman <melanieplage...@gmail.com> wrote: > > I've gone ahead and implemented option 1 (commented below). No strong opinion on 1 versus 2, but I guess at least partly because I don't understand the implications (I do understand the difference, just not when it might be important in terms of stats). Can we think of a situation where combining stats about initial additions with pinned additions hides some behavior that might be good to understand and hard to pinpoint otherwise? I took a look at the latest docs (as someone mostly familiar with internals at only a pretty high level, so probably somewhat close to the target audience) and have some feedback. + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>backend_type</structfield> <type>text</type> + </para> + <para> + Type of backend (e.g. background worker, autovacuum worker). + </para></entry> + </row> Not critical, but is there a list of backend types we could cross-reference elsewhere in the docs? >From the io_context column description: + The autovacuum daemon, explicit <command>VACUUM</command>, explicit + <command>ANALYZE</command>, many bulk reads, and many bulk writes use a + fixed amount of memory, acquiring the equivalent number of shared + buffers and reusing them circularly to avoid occupying an undue portion + of the main shared buffer pool. + </para></entry> I don't understand how this is relevant to the io_context column. Could you expand on that, or am I just missing something obvious? + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>extended</structfield> <type>bigint</type> + </para> + <para> + Extends of relations done by this <varname>backend_type</varname> in + order to write data in this <varname>io_context</varname>. + </para></entry> + </row> I understand what this is, but not why this is something I might want to know about. And from your earlier e-mail: On Thu, Oct 6, 2022 at 10:42 AM Melanie Plageman <melanieplage...@gmail.com> wrote: > > Because we want to add non-block-oriented IO in the future (like > temporary file IO) to this view and want to use the same "read", > "written", "extended" columns, I would prefer not to prefix the columns > with "blks_". I have added a column "unit" which would contain the unit > in which read, written, and extended are in. Unfortunately, fsyncs are > not per block, so "unit" doesn't really work for this. I documented > this. > > The most correct thing to do to accommodate block-oriented and > non-block-oriented IO would be to specify all the values in bytes. > However, I would like this view to be usable visually (as opposed to > just in scripts and by tools). The only current value of unit is > "block_size" which could potentially be combined with the value of the > GUC to get bytes. > > I've hard-coded the string "block_size" into the view generation > function pg_stat_get_io(), so, if this idea makes sense, perhaps I > should do something better there. That seems broadly reasonable, but pg_settings also has a 'unit' field, and in that view, unit is '8kB' on my system--i.e., it (presumably) reflects the block size. Is that something we should try to be consistent with (not sure if that's a good idea, but thought it was worth asking)? > On Fri, Sep 30, 2022 at 7:18 PM Lukas Fittl <lu...@fittl.com> wrote: > > - Overall it would be helpful if we had a dedicated documentation page on > > I/O statistics that's linked from the pg_stat_io view description, and > > explains how the I/O statistics tie into the various concepts of shared > > buffers / buffer access strategies / etc (and what is not tracked today) > > I haven't done this yet. How specific were you thinking -- like > interpretations of all the combinations and what to do with what you > see? Like you should run pg_prewarm if you see X? Specific checkpointer > or bgwriter GUCs to change? Or just links to other docs pages on > recommended tunings? > > Were you imagining the other IO statistics views (like > pg_statio_all_tables and pg_stat_database) also being included in this > page? Like would it be a comprehensive guide to IO statistics and what > their significance/purposes are? I can't speak for Lukas here, but I encouraged him to suggest more thorough documentation in general, so I can speak to my concerns: in general, these stats should be usable for someone who does not know much about Postgres internals. It's pretty low-level information, sure, so I think you need some understanding of how the system broadly works to make sense of it. But ideally you should be able to find what you need to understand the concepts involved within the docs. I think your updated docs are much clearer (with the caveats of my specific comments above). It would still probably be helpful to have a dedicated page on I/O stats (and yeah, something with a broad scope, along the lines of a comprehensive guide), but I think that can wait until a future patch. Thanks, Maciek