On Thu, Jul 23, 2020 at 11:46 AM Masahiko Sawada <masahiko.saw...@2ndquadrant.com> wrote: > > I've updated the patch so that the stats collector ignores the > 'update' message if the slot stats array is already full. >
This patch needs a rebase. I don't see this patch in the CF app. I hope you are still interested in working on this. Review comments: =============== 1. +CREATE VIEW pg_stat_replication_slots AS + SELECT + s.name, + s.spill_txns, + s.spill_count, + s.spill_bytes + FROM pg_stat_get_replication_slots() AS s; The view pg_stat_replication_slots should have a column 'stats_reset' (datatype: timestamp with time zone) as we provide a facitily to reset the slots. A similar column exists in pg_stat_slru as well, so is there a reason of not providing it here? 2. + </para> + </sect2> + + <sect2 id="monitoring-pg-stat-wal-receiver-view"> <title><structname>pg_stat_wal_receiver</structname></title> It is better to keep one empty line between </para> and </sect2> to keep it consistent with the documentation of other views. 3. <primary>pg_stat_reset_replication_slot</primary> + </indexterm> + <function>pg_stat_reset_replication_slot</function> ( <type>text</type> ) + <returnvalue>void</returnvalue> + </para> + <para> + Resets statistics to zero for a single replication slot, or for all + replication slots in the cluster. If the argument is NULL, all counters + shown in the <structname>pg_stat_replication_slots</structname> view for + all replication slots are reset. + </para> I think the information about the parameter for this function is not completely clear. It seems to me that it should be the name of the slot for which we want to reset the stats, if so, let's try to be clear. 4. +pgstat_reset_replslot_counter(const char *name) +{ + PgStat_MsgResetreplslotcounter msg; + + if (pgStatSock == PGINVALID_SOCKET) + return; + + pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETREPLSLOTCOUNTER); + if (name) + { + memcpy(&msg.m_slotname, name, NAMEDATALEN); + msg.clearall = false; + } Don't we want to verify here or in the caller of this function whether the passed slot_name is a valid slot or not? For ex. see pgstat_reset_shared_counters where we return an error if the target is not valid. 5. +static void +pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg, + int len) +{ + int i; + int idx = -1; + TimestampTz ts; + + if (!msg->clearall) + { + /* Get the index of replication slot statistics to reset */ + idx = pgstat_replslot_index(msg->m_slotname, false); + + if (idx < 0) + return; /* not found */ Can we add a comment to describe when we don't expect to find the slot here unless there is no way that can happen? 6. +pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg, + int len) { .. + for (i = 0; i < SLRU_NUM_ELEMENTS; i++) .. } I think here we need to traverse till nReplSlotStats, not SLRU_NUM_ELEMENTS. 7. Don't we need to change PGSTAT_FILE_FORMAT_ID for this patch? We can probably do at the end but better to change it now so that it doesn't slip from our mind. 8. @@ -5350,6 +5474,23 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep) break; + /* + * 'R' A PgStat_ReplSlotStats struct describing a replication slot + * follows. + */ + case 'R': + if (fread(&replSlotStats[nReplSlotStats], 1, sizeof(PgStat_ReplSlotStats), fpin) + != sizeof(PgStat_ReplSlotStats)) + { + ereport(pgStatRunningInCollector ? LOG : WARNING, + (errmsg("corrupted statistics file \"%s\"", + statfile))); + memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats)); + goto done; + } + nReplSlotStats++; + break; Both here and in pgstat_read_db_statsfile_timestamp(), the patch handles 'R' message after 'D' whereas while writing the 'R' is written before 'D'. So, I think it is better if we keep the order during read the same as during write. 9. While reviewing this patch, I noticed that in pgstat_read_db_statsfile_timestamp(), if we fail to read ArchiverStats or SLRUStats, we return 'false' from this function but OTOH, if we fail to read 'D' or 'R' message, we will return 'true'. I feel the handling of 'D' and 'R' message is fine because once we read GlobalStats, we can return the stats_timestamp. So the other two stands corrected. I understand that this is not directly related to this patch but if you agree we can do this as a separate patch. -- With Regards, Amit Kapila.