Re: per backend I/O statistics

Nazir Bilal Yavuz Wed, 15 Jan 2025 06:30:20 -0800

Hi,

On Wed, 15 Jan 2025 at 12:27, Michael Paquier <mich...@paquier.xyz> wrote:
>
> On Wed, Jan 15, 2025 at 08:30:20AM +0000, Bertrand Drouvot wrote:
> > On Wed, Jan 15, 2025 at 11:03:54AM +0300, Nazir Bilal Yavuz wrote:
> >> With this commit it may not be possible to count IOs in the critical
> >> sections. I think the problem happens only if the local
> >> PgStat_BackendPending entry is being created for the first time for
> >> this backend in the critical section.
> >
> > Yeah, I encountered the exact same thing and mentioned it in [1] (see R1.).
> >
> > In [1] I did propose to use a new PendingBackendWalStats variable to 
> > "bypass" the
> > pgstat_prep_backend_pending() usage.
> >
> > Michael mentioned in [2] that is not really consistent with the rest (what I
> > agree with) and that "we should rethink a bit the way pending entries are
> > retrieved". I did not think about it yet but that might be the way to
> > go, thoughts?
> >
> > [1]: 
> > https://www.postgresql.org/message-id/Z3zqc4o09dM/Ezyz%40ip-10-97-1-34.eu-west-3.compute.internal
> > [2]: https://www.postgresql.org/message-id/Z4dRlNuhSQ3hPPv2%40paquier.xyz


Thanks! I must have missed these emails.

>
> My problem is that this is not only related to backend stats, but to
> all variable-numbered stats kinds that require this behavior.  One
> other case where I've seen this as being an issue is injection point
> stats, for example, while enabling these stats for all callbacks and
> some of them are run in critical sections.
>
> A generic solution to the problem would be to allow
> pgStatPendingContext to have MemoryContextAllowInCriticalSection() set
> temporarily for the allocation of the pending data.
> Perhaps not for all stats kinds, just for these where we're OK with
> the potential memory footprint so we could have a flag in
> PgStat_KindInfo.  Giving to all stats kinds the responsibility to
> allocate a pending entry beforehand outside a critical section is
> another option, but that means going through different tweaks for all
> stats kinds that require them.

I think allowing only pgStatPendingContext to have
MemoryContextAllowInCriticalSection() is not enough. We need to allow
at least pgStatSharedRefContext as well to have
MemoryContextAllowInCriticalSection() as it can be allocated too.

'''
pgstat_prep_pending_entry() ->
pgstat_get_entry_ref() ->
pgstat_get_entry_ref_cached() ->
MemoryContextAlloc(pgStatSharedRefContext, sizeof(PgStat_EntryRef))
'''

Also, I encountered another problem. I did not write it in the first
email because I thought I fixed it but it seems I did not.

While doing the initdb, we are restoring stats with the
pgstat_restore_stats() and we do not expect any pending stats. The
problem goes like that:

'''
pgstat_restore_stats() ->
pgstat_read_statsfile() ->
pgstat_reset_after_failure() ->
pgstat_drop_all_entries() ->
pgstat_drop_entry_internal() ->
We have an assertion there which checks if there is a pending stat entry:

    /* should already have released local reference */
    if (pgStatEntryRefHash)
        Assert(!pgstat_entry_ref_hash_lookup(pgStatEntryRefHash, shent->key));
'''

And we have this at the same time:

'''
BootstrapModeMain() ->
InitPostgres() ->
StartupXLOG() ->
ReadCheckpointRecord() ->
InitWalRecovery() ->
... ->
XLogReadAhead() ->
XLogDecodeNextRecord() ->
ReadPageInternal() ->
state->routine.page_read = XLogPageRead() then WAL read happens
'''

So, this assert fails because we have pending stats for the
PGSTAT_KIND_BACKEND. My fix was simply not restoring stats when the
bootstrap is happening; but it did not work. Because, we reset all
fixed-numbered stats 'pgstat_reset_after_failure() ->
kind_info->reset_all_cb()' there, so if we skip it; we have some NULL
stats, for example reset timestamps. Some of the tests do not expect
them to be NULL at the start like below:

SELECT stats_reset AS archiver_reset_ts FROM pg_stat_archiver \gset
SELECT pg_stat_reset_shared('archiver');
SELECT stats_reset > :'archiver_reset_ts'::timestamptz FROM pg_stat_archiver;

This test fails because archiver_reset_ts is NULL when the server is started.

I was a bit surprised that Bertrand did not encounter the same problem
while working on the 'per backend WAL statistics' patch. Then I found
the reason, it is because this problem happens only for WAL read and
WAL init IOs which are starting to be tracked in my patch. By saying
that, I could not decide which thread to write about this problem,
migrating WAL stats thread or this thread. Since this thread is active
and other stats may cause the same problem, I decided to write here.
Please warn me if you think I need to write this to the migrating WAL
stats thread.

[1]
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index bf3dbda901d..0538859cc7f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5681,7 +5681,7 @@ StartupXLOG(void)
      */
     if (didCrash)
         pgstat_discard_stats();
-    else
+    else if (!IsBootstrapProcessingMode())
         pgstat_restore_stats(checkPoint.redo);

--
Regards,
Nazir Bilal Yavuz
Microsoft

Re: per backend I/O statistics

Reply via email to