Re: per backend I/O statistics

Michael Paquier Thu, 12 Dec 2024 18:03:39 -0800

On Thu, Dec 12, 2024 at 02:02:38PM +0000, Bertrand Drouvot wrote:
> On Thu, Dec 12, 2024 at 01:52:03PM +0900, Michael Paquier wrote:
> Yeah, but it's needed in pg_stat_get_backend_io() for the stats filtering (to
> display only those linked to this backend type), later in the function here:
> 
> +    /*
> +     * Some combinations of BackendType, IOObject, and IOContext are
> +     * not valid for any type of IOOp. In such cases, omit the entire
> +     * row from the view.
> +     */
> +    if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
> +          continue;
> 
> OTOH, now that we get rid of the AuxiliaryPidGetProc() call in
> pg_stat_reset_single_backend_io_counters() we can also remove the need to
> look for the backend type in this function.


Ah, you need that for the pgstat_tracks_io_object() filtering.  I'm
wondering if we should think about making that cheaper at some point.
As a O(N^2) when coupling a call of pg_stat_get_backend_io() with a
scan of pg_stat_activity, perhaps it does not matter much anyway..

Anyway, isn't it possible that this lookup loop finishes by finding
nothing depending on concurrent updates of other beentries?  It sounds
to me that this warrants an early exit in the function.

> BTW, not related to this particular patch but I realized that 
> pgstat_flush_io()
> is called for the walwriter. Indeed, it's coming from the pgstat_report_wal()
> call in WalWriterMain(). That can not report any I/O stats activity (as the
> walwriter is not part of the I/O stats tracking, see 
> pgstat_tracks_io_bktype()).
>
> So it looks like, we could move pgstat_flush_io() outside of 
> pgstat_report_wal()
> and add the pgstat_flush_io() calls only where they need to be made (and so, 
> not
> in WalWriterMain()).

Perhaps, yes.  pgstat_tracks_io_bktype() has always been discarded
walwriters since pgstat_io.c exists.

> But I'm not sure that's worth it until we don't have a need to add more
> pending stats per-backend, thoughts?

Okay with your argument here.

>> +{ oid => '8806', descr => 'statistics: per backend IO statistics',
>> +  proname => 'pg_stat_get_backend_io', prorows => '5', proretset => 't',
>> 
>> Similarly, s/pg_stat_get_backend_io/pg_stat_get_backend_stats/?
> 
> I think that's fine to keep pg_stat_get_backend_io(). If we add more 
> per-backend
> stats in the future then we could add "dedicated" get functions too and a 
> generic
> one retrieving all of them.

Okay about this layer, discarding my remark.  You have a point about
that: other stats associated to a single backend may be OK if not
returned as a SRF, and they'll most likely return a different set of
attributes.

>> +  descr => 'statistics: reset collected IO statistics for a single backend',
>> +  proname => 'pg_stat_reset_single_backend_io_counters', provolatile => 
>> 'v', 
>> 
>> And here, pg_stat_reset_backend_stats?
> 
> Same as above, we could imagine that in the future the backend would get 
> mutiple
> stats and that one would want to reset only the I/O ones for example.

Disagreed about this part.  It is slightly simpler to do a full reset
of the stats in a single entry.  If another subset of stats is added
to the backend-level entries, we could always introduce a new function
that has more control over what subset of a single backend entry is
reset.  And I'm pretty sure that we are going to need the function
that does the full reset anyway.

>> I
>> would just add a note in the docs with a query showing how to use it
>> with pg_stat_activity.  An example with LATERAL, doing the same work:
>> select a.pid, s.* from pg_stat_activity as a,
>>    lateral pg_stat_get_backend_io(a.pid) as s
>>   where pid = pg_backend_pid();
> 
> I'm not sure it's worth it. I think that's clear that to get our own stats
> then we need to provide our own backend pid. For example 
> pg_stat_get_activity()
> does not provide such an example using pg_stat_activity or using something 
> like 
> pg_stat_get_activity(pg_backend_pid()).

Okay.

As far as I can see, the patch relies entirely on write_to_file to
prevent any entries to be flushed out.  It means that we leave in the
dshash entries that may sit idle for as long as the server is up once
a pgproc slot is used at least once.  This scales depending on
max_connections.  It also means that we skip the sanity check about
dropped entries at shutdown, which may be a good thing to do because
we don't need to loop through them when writing the stats file.  Hmm.
Could it be better to be more aggressive with the handling of these
stats, marking them as dropped when their backend exists and cleanup
the dshash, without relying on the write flag to make sure that all
the entries are discarded at shutdown?  The point is that we do
shutdown in a controlled manner, with all backends exiting before the
checkpointer writes the stats file after the shutdown checkpoint is
completed.  The patch handles things so as entries are reset when a
procnum is reused, leaving past stats around until that happens.  We
should perhaps aim for more consistency with the way beentry is
refreshed and be more proactive with the backend entry drop or reset
at backend shutdown (pgstat_beshutdown_hook?), so as what is in the
dshash reflects exactly what's in shared memory for each PGPROC and
beentry.

Not sure that the "_per_" added in the various references of the patch
are good to keep, like pgstat_tracks_per_backend_bktype.  These could
be removed, I guess, doing also a PGSTAT_KIND_PER_BACKEND =>
PGSTAT_KIND_BACKEND?
--
Michael

signature.asc
Description: PGP signature

Re: per backend I/O statistics

Reply via email to