Thanks for clarifying! At Mon, 8 May 2023 14:46:43 -0700, Andres Freund <and...@anarazel.de> wrote in > Hi, > > On 2023-04-26 18:47:14 +0900, Kyotaro Horiguchi wrote: > > I see four issues here. > > > > 1. The current database stats omits buffer fetches that don't > > originate from a relation. > > > > In this case pgstat_relation can't work since recovery isn't conscious > > of relids. We might be able to resolve relfilenode into a relid, but > > it may not be that simple. Fortunately we already count fetches and > > hits process-wide using pgBufferUsage, so we can use this for database > > stats. > > I don't think we need to do anything about that for 16 - they aren't updated > at process end either. > > I think the fix here is to do the architectural change of maintaining most > stats keyed by relfilenode as we've discussed in some other threads. Then we > also can have relation level write stats etc.
I think so. > > 2. Even if we wanted to report stats for the startup process, > > pgstat_report_stats wouldn't permit it since transaction-end > > timestamp doesn't advance. > > > > I'm not certain if it's the correct approach, but perhaps we could use > > GetCurrentTimestamp() instead of GetCurrentTransactionStopTimestamp() > > specifically for the startup process. > > What about using GetCurrentTimestamp() when force == true? That'd make sense > for other users as well, I think? I'm not sure if I got you right, but when force==true, it allows pgstat_report_stats to flush without considering whether the interval has elapsed. In that case, there's no need to keep track of the last flush time and the caller should handle the interval instead. > > 3. When should we call pgstat_report_stats on the startup process? > > > > During recovery, I think we can call pgstat_report_stats() (or a > > subset of it) right before invoking WaitLatch and at segment > > boundaries. > > I've pondered that as well. But I don't think it's great - it's not exactly > intuitive that stats reporting gets far less common if you use a 1GB > wal_segment_size. If the segment size gets larger, the archive intervals become longer. So, I have a vague feeling that users wouldn't go for such a large segment size. I don't have a clear idea about the ideal length for stats reporting intervals in this case, but I think every few minutes or so would not be so bad for the startup process to report stats when recovery gets busy. Also, I think recovery will often wait for new data once it catches up to the primary. regards. -- Kyotaro Horiguchi NTT Open Source Software Center