Hi Naga,

Thank you for the thoughtful feedback and for driving attention to
this issue. I appreciate you taking the time to review my patch.

You raise some good points about the trade-offs between a lightweight
function and the pgstat infrastructure. I actually think both
approaches have merit for different use cases, and they could
potentially coexist to serve the community better.

> I shared a patch [0] that adds a SQL-callable function exposing the same 
> counters via ReadMultiXactCounts() without complexity...introducing new 
> statistics infrastructure may be more than what's needed unless there's an 
> additional use case I'm overlooking...A lightweight function seems better 
> aligned with the nature of these metrics and the operational use cases they 
> serve, particularly for historical/ongoing diagnostics and periodic 
> monitoring.

I reviewed your patch in depth and I believe the pgstat approach I
took offers some advantages for continuous monitoring scenarios:

1. Performance under monitoring load: Many production environments,
including Metronome's, will poll these statistics frequently for
alerting. Using a direct call to pg_get_multixact_count() ->
ReadMultiXactCounts() acquires LWLocks, which could create significant
contention when multiple monitoring systems are polling frequently. In
high-throughput environments, this could become a bottleneck. The
pgstat view reads from shared memory snapshots without additional lock
acquisition, making it essentially free since we only update the
pgstat structure while we have the lock in the first place.

2. Consistency with existing patterns: PostgreSQL currently uses the
pgstat infrastructure for similar global, clusterwide metrics like
pg_stat_wal, pg_stat_wal_receiver, pg_stat_archiver, pg_stat_bgwriter,
and pg_stat_checkpointer. The multixact member count fits this same
pattern of cluster-wide resource monitoring.

3. Automatic updates: The stats update during natural multixact
operations (allocation, freeze threshold checks), providing current
data without requiring explicit polling of the underlying counters.

Your function approach has clear benefits for ad-hoc diagnostics and
simpler operational queries where call frequency is low. I also note
that your patch tracks both multixacts and members, which provides
valuable additional context.

I've also included isolation tests that verify the view accurately
reflects multixact member allocation, which helps ensure correctness
of the monitoring data.

Given our production experience with multixact membership exhaustion
at Metronome, both approaches would solve the core observability
problem.

I'm happy to keep discussing what the best approach for the community
is. It's great that more light is being shed on this particular issue.

[0] 
https://www.postgresql.org/message-id/CA%2BQeY%2BDTggHskCXOa39nag2sFds9BD-7k__zPbvL-_VVyJw7Sg%40mail.gmail.com

--
Respectfully,

Andrew Johnson
Software Engineer
Metronome, Inc.


Reply via email to