Hi, On Wed, 15 Jan 2025 at 12:27, Michael Paquier <mich...@paquier.xyz> wrote: > > On Wed, Jan 15, 2025 at 08:30:20AM +0000, Bertrand Drouvot wrote: > > On Wed, Jan 15, 2025 at 11:03:54AM +0300, Nazir Bilal Yavuz wrote: > >> With this commit it may not be possible to count IOs in the critical > >> sections. I think the problem happens only if the local > >> PgStat_BackendPending entry is being created for the first time for > >> this backend in the critical section. > > > > Yeah, I encountered the exact same thing and mentioned it in [1] (see R1.). > > > > In [1] I did propose to use a new PendingBackendWalStats variable to > > "bypass" the > > pgstat_prep_backend_pending() usage. > > > > Michael mentioned in [2] that is not really consistent with the rest (what I > > agree with) and that "we should rethink a bit the way pending entries are > > retrieved". I did not think about it yet but that might be the way to > > go, thoughts? > > > > [1]: > > https://www.postgresql.org/message-id/Z3zqc4o09dM/Ezyz%40ip-10-97-1-34.eu-west-3.compute.internal > > [2]: https://www.postgresql.org/message-id/Z4dRlNuhSQ3hPPv2%40paquier.xyz
Thanks! I must have missed these emails. > > My problem is that this is not only related to backend stats, but to > all variable-numbered stats kinds that require this behavior. One > other case where I've seen this as being an issue is injection point > stats, for example, while enabling these stats for all callbacks and > some of them are run in critical sections. > > A generic solution to the problem would be to allow > pgStatPendingContext to have MemoryContextAllowInCriticalSection() set > temporarily for the allocation of the pending data. > Perhaps not for all stats kinds, just for these where we're OK with > the potential memory footprint so we could have a flag in > PgStat_KindInfo. Giving to all stats kinds the responsibility to > allocate a pending entry beforehand outside a critical section is > another option, but that means going through different tweaks for all > stats kinds that require them. I think allowing only pgStatPendingContext to have MemoryContextAllowInCriticalSection() is not enough. We need to allow at least pgStatSharedRefContext as well to have MemoryContextAllowInCriticalSection() as it can be allocated too. ''' pgstat_prep_pending_entry() -> pgstat_get_entry_ref() -> pgstat_get_entry_ref_cached() -> MemoryContextAlloc(pgStatSharedRefContext, sizeof(PgStat_EntryRef)) ''' Also, I encountered another problem. I did not write it in the first email because I thought I fixed it but it seems I did not. While doing the initdb, we are restoring stats with the pgstat_restore_stats() and we do not expect any pending stats. The problem goes like that: ''' pgstat_restore_stats() -> pgstat_read_statsfile() -> pgstat_reset_after_failure() -> pgstat_drop_all_entries() -> pgstat_drop_entry_internal() -> We have an assertion there which checks if there is a pending stat entry: /* should already have released local reference */ if (pgStatEntryRefHash) Assert(!pgstat_entry_ref_hash_lookup(pgStatEntryRefHash, shent->key)); ''' And we have this at the same time: ''' BootstrapModeMain() -> InitPostgres() -> StartupXLOG() -> ReadCheckpointRecord() -> InitWalRecovery() -> ... -> XLogReadAhead() -> XLogDecodeNextRecord() -> ReadPageInternal() -> state->routine.page_read = XLogPageRead() then WAL read happens ''' So, this assert fails because we have pending stats for the PGSTAT_KIND_BACKEND. My fix was simply not restoring stats when the bootstrap is happening; but it did not work. Because, we reset all fixed-numbered stats 'pgstat_reset_after_failure() -> kind_info->reset_all_cb()' there, so if we skip it; we have some NULL stats, for example reset timestamps. Some of the tests do not expect them to be NULL at the start like below: SELECT stats_reset AS archiver_reset_ts FROM pg_stat_archiver \gset SELECT pg_stat_reset_shared('archiver'); SELECT stats_reset > :'archiver_reset_ts'::timestamptz FROM pg_stat_archiver; This test fails because archiver_reset_ts is NULL when the server is started. I was a bit surprised that Bertrand did not encounter the same problem while working on the 'per backend WAL statistics' patch. Then I found the reason, it is because this problem happens only for WAL read and WAL init IOs which are starting to be tracked in my patch. By saying that, I could not decide which thread to write about this problem, migrating WAL stats thread or this thread. Since this thread is active and other stats may cause the same problem, I decided to write here. Please warn me if you think I need to write this to the migrating WAL stats thread. [1] diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index bf3dbda901d..0538859cc7f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5681,7 +5681,7 @@ StartupXLOG(void) */ if (didCrash) pgstat_discard_stats(); - else + else if (!IsBootstrapProcessingMode()) pgstat_restore_stats(checkPoint.redo); -- Regards, Nazir Bilal Yavuz Microsoft