On Fri, Jan 24, 2020 at 8:52 PM Andres Freund <and...@anarazel.de> wrote: > > Hi, > > Currently pg_stat_bgwriter.buffers_backend is pretty useless to gauge > whether backends are doing writes they shouldn't do. That's because it > counts things that are either unavoidably or unlikely doable by other > parts of the system (checkpointer, bgwriter). > In particular extending the file can not currently be done by any > another type of process, yet is counted. When using a buffer access > strategy it is also very likely that writes have to be done by the > 'dirtying' backend itself, as the buffer will be reused soon after (when > not previously in s_b that is).
Yeah. That's quite annoying. > Additionally pg_stat_bgwriter.buffers_backend also counts writes done by > autovacuum et al. > > > I think it'd make sense to at least split buffers_backend into > buffers_backend_extend, > buffers_backend_write, > buffers_backend_write_strat > > but it could also be worthwhile to expand it into > buffers_backend_extend, > buffers_{backend,checkpoint,bgwriter,autovacuum}_write > buffers_{backend,autovacuum}_write_stat Given that these are individual global counters, I don't really see any reason not to expand it to the bigger set of counters. It's easy enough to add them up together later if needed. > Possibly by internally, in contrast to SQL level, having just counter > arrays indexed by backend types. > > > It's also noteworthy that buffers_backend is accounted in an absurd > manner. One might think that writes are accounted from backend -> shared > memory or such. But instead it works like this: > > 1) backend flushes buffer in bufmgr.c, accounts for backend *write time* > 2) mdwrite writes and registers a sync request, which forwards the sync > request to checkpointer > 3) ForwardSyncRequest(), when not called by bgwriter, increments > CheckpointerShmem->num_backend_writes > 4) checkpointer, whenever doing AbsorbSyncRequests(), moves > CheckpointerShmem->num_backend_writes to > BgWriterStats.m_buf_written_backend (local memory!) > 5) Occasionally it calls pgstat_send_bgwriter(), which sends the data to > pgstat (which bgwriter also does) > 6) Which then updates the shared memory used by the display functions > > Worthwhile to note that backend buffer read/write *time* is accounted > differently. That's done via pgstat_send_tabstat(). > > > I think there's very little excuse for the indirection via checkpointer, > besides architectually being weird, it actually requires that we > continue to wake up checkpointer over and over instead of optimizing how > and when we submit fsync requests. > > As far as I can tell we're also simply not accounting at all for writes > done outside of shared buffers. All writes done directly through > smgrwrite()/extend() aren't accounted anywhere as far as I can tell. > > > I think we also count things as writes that aren't writes: mdtruncate() > is AFAICT counted as one backend write for each segment. Which seems > weird to me. It's at least slightly weird :) Might it be worth counting truncate events separately? > Lastly, I don't understand what the point of sending fixed size stats, > like the stuff underlying pg_stat_bgwriter, through pgstats IPC. While > I don't like it's architecture, we obviously need something like pgstat > to handle variable amounts of stats (database, table level etc > stats). But that doesn't at all apply to these types of global stats. That part has annoyed me as well a few times. +1 for just moving that into a global shared memory. Given that we don't really care about things being in sync between those different counters *or* if we loose a bit of data (which the stats collector is designed to do), we could even do that without a lock? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/