On Mon, Oct 10, 2022 at 7:43 PM Maciek Sakrejda <m.sakre...@gmail.com> wrote: > > Thanks for working on this! Like Lukas, I'm excited to see more > visibility into important parts of the system like this.
Thanks for taking another look! > > On Mon, Oct 10, 2022 at 11:49 AM Melanie Plageman > <melanieplage...@gmail.com> wrote: > > > > I've gone ahead and implemented option 1 (commented below). > > No strong opinion on 1 versus 2, but I guess at least partly because I > don't understand the implications (I do understand the difference, > just not when it might be important in terms of stats). Can we think > of a situation where combining stats about initial additions with > pinned additions hides some behavior that might be good to understand > and hard to pinpoint otherwise? I think that it makes sense to count both the initial buffers added to the ring and subsequent shared buffers added to the ring (either when the current strategy buffer is pinned or in use or when a bulkread rejects dirty strategy buffers in favor of new shared buffers) as strategy clocksweeps because of how the statistic would be used. Clocksweeps give you an idea of how much of your working set is cached (setting aside initially reading data into shared buffers when you are warming up the db). You may use clocksweeps to determine if you need to make shared buffers larger. Distinguishing strategy buffer clocksweeps from shared buffer clocksweeps allows us to avoid enlarging shared buffers if most of the clocksweeps are to bring in blocks for the strategy operation. However, I could see an argument that discounting strategy clocksweeps done because the current strategy buffer is pinned makes the number of shared buffer clocksweeps artificially low since those other queries using the buffer would have suffered a cache miss were it not for the strategy. And, in this case, you would take strategy clocksweeps together with shared clocksweeps to make your decision. And if we include buffers initially added to the strategy ring in the strategy clocksweep statistic, this number may be off because those blocks may not be needed in the main shared working set. But you won't know that until you try to reuse the buffer and it is pinned. So, I think we don't have a better option than counting initial buffers added to the ring as strategy clocksweeps (as opposed to as reuses). So, in answer to your question, no, I cannot think of a scenario like that. Sitting down and thinking about that for a long time did, however, help me realize that some of my code comments were misleading (and some incorrect). I will update these in the next version once we agree on updated docs. It also made me remember that I am incorrectly counting rejected buffers as reused. I'm not sure if it is a good idea to subtract from reuses when a buffer is rejected. Waiting until after it is rejected to count the reuse will take some other code changes. Perhaps we could also count rejections in the stats? > > I took a look at the latest docs (as someone mostly familiar with > internals at only a pretty high level, so probably somewhat close to > the target audience) and have some feedback. > > + <row> > + <entry role="catalog_table_entry"><para > role="column_definition"> > + <structfield>backend_type</structfield> <type>text</type> > + </para> > + <para> > + Type of backend (e.g. background worker, autovacuum worker). > + </para></entry> > + </row> > > Not critical, but is there a list of backend types we could > cross-reference elsewhere in the docs? The most I could find was this longer explanation (with exhaustive list of types) in pg_stat_activity docs [1]. I could duplicate what it says or I could link to the view and say "see pg_stat_activity" for a description of backend_type" or something like that (to keep them from getting out of sync as new backend_types are added. I suppose I could also add docs on backend_types, but I'm not sure where something like that would go. > > From the io_context column description: > > + The autovacuum daemon, explicit <command>VACUUM</command>, > explicit > + <command>ANALYZE</command>, many bulk reads, and many bulk > writes use a > + fixed amount of memory, acquiring the equivalent number of > shared > + buffers and reusing them circularly to avoid occupying an > undue portion > + of the main shared buffer pool. > + </para></entry> > > I don't understand how this is relevant to the io_context column. > Could you expand on that, or am I just missing something obvious? > I'm trying to explain why those other IO Contexts exist (bulkread, bulkwrite, vacuum) and why they are separate from shared buffers. Should I cut it altogether or preface it with something like: these are counted separate from shared buffers because...? > + <row> > + <entry role="catalog_table_entry"><para > role="column_definition"> > + <structfield>extended</structfield> <type>bigint</type> > + </para> > + <para> > + Extends of relations done by this > <varname>backend_type</varname> in > + order to write data in this <varname>io_context</varname>. > + </para></entry> > + </row> > > I understand what this is, but not why this is something I might want > to know about. Unlike writes, backends largely have to do their own extends, so separating this from writes lets us determine whether or not we need to change checkpointer/bgwriter to be more aggressive using the writes without the distraction of the extends. Should I mention this in the docs? The other stats views don't seems to editorialize at all, and I wasn't sure if this was an objective enough point to include in docs. > > And from your earlier e-mail: > > On Thu, Oct 6, 2022 at 10:42 AM Melanie Plageman > <melanieplage...@gmail.com> wrote: > > > > Because we want to add non-block-oriented IO in the future (like > > temporary file IO) to this view and want to use the same "read", > > "written", "extended" columns, I would prefer not to prefix the columns > > with "blks_". I have added a column "unit" which would contain the unit > > in which read, written, and extended are in. Unfortunately, fsyncs are > > not per block, so "unit" doesn't really work for this. I documented > > this. > > > > The most correct thing to do to accommodate block-oriented and > > non-block-oriented IO would be to specify all the values in bytes. > > However, I would like this view to be usable visually (as opposed to > > just in scripts and by tools). The only current value of unit is > > "block_size" which could potentially be combined with the value of the > > GUC to get bytes. > > > > I've hard-coded the string "block_size" into the view generation > > function pg_stat_get_io(), so, if this idea makes sense, perhaps I > > should do something better there. > > That seems broadly reasonable, but pg_settings also has a 'unit' > field, and in that view, unit is '8kB' on my system--i.e., it > (presumably) reflects the block size. Is that something we should try > to be consistent with (not sure if that's a good idea, but thought it > was worth asking)? > I think this idea is a good option. I am wondering if it would be clear when mixed with non-block-oriented IO. Block-oriented IO would say 8kB (or whatever the build-time value of a block was) and non-block-oriented IO would say B or kB. The math would work out. Looking at pg_settings now though, I am confused about how the units for wal_buffers is 8kB but then the value of wal_buffers when I show it in psql is "16MB"... Though the units for the pg_stat_io view for block-oriented IO would be the build-time values for block size, so it wouldn't line up exactly with pg_settings. However, I do like the idea of having a unit column that reflects the value and not the name of the GUC/setting which determined the unit. I can update this in the next version. - Melanie [1] https://www.postgresql.org/docs/15/monitoring-stats.html#MONITORING-PG-STAT-ACTIVITY-VIEW