On Tue, Jun 10, 2025 at 11:40 AM Andrew Johnson <andr...@metronome.com> wrote:
>
> Hi Naga,
>
> Thank you for the thoughtful feedback and for driving attention to
> this issue. I appreciate you taking the time to review my patch.
>
> You raise some good points about the trade-offs between a lightweight
> function and the pgstat infrastructure. I actually think both
> approaches have merit for different use cases, and they could
> potentially coexist to serve the community better.
>
> > I shared a patch [0] that adds a SQL-callable function exposing the same 
> > counters via ReadMultiXactCounts() without complexity...introducing new 
> > statistics infrastructure may be more than what's needed unless there's an 
> > additional use case I'm overlooking...A lightweight function seems better 
> > aligned with the nature of these metrics and the operational use cases they 
> > serve, particularly for historical/ongoing diagnostics and periodic 
> > monitoring.
>
> I reviewed your patch in depth and I believe the pgstat approach I
> took offers some advantages for continuous monitoring scenarios:
>
> 1. Performance under monitoring load: Many production environments,
> including Metronome's, will poll these statistics frequently for
> alerting. Using a direct call to pg_get_multixact_count() ->
> ReadMultiXactCounts() acquires LWLocks, which could create significant
> contention when multiple monitoring systems are polling frequently. In
> high-throughput environments, this could become a bottleneck. The
> pgstat view reads from shared memory snapshots without additional lock
> acquisition, making it essentially free since we only update the
> pgstat structure while we have the lock in the first place.
>
> 2. Consistency with existing patterns: PostgreSQL currently uses the
> pgstat infrastructure for similar global, clusterwide metrics like
> pg_stat_wal, pg_stat_wal_receiver, pg_stat_archiver, pg_stat_bgwriter,
> and pg_stat_checkpointer. The multixact member count fits this same
> pattern of cluster-wide resource monitoring.
>
> 3. Automatic updates: The stats update during natural multixact
> operations (allocation, freeze threshold checks), providing current
> data without requiring explicit polling of the underlying counters.
>
> Your function approach has clear benefits for ad-hoc diagnostics and
> simpler operational queries where call frequency is low. I also note
> that your patch tracks both multixacts and members, which provides
> valuable additional context.
>
> I've also included isolation tests that verify the view accurately
> reflects multixact member allocation, which helps ensure correctness
> of the monitoring data.
>
> Given our production experience with multixact membership exhaustion
> at Metronome, both approaches would solve the core observability
> problem.
>
> I'm happy to keep discussing what the best approach for the community
> is. It's great that more light is being shed on this particular issue.
>
> [0] 
> https://www.postgresql.org/message-id/CA%2BQeY%2BDTggHskCXOa39nag2sFds9BD-7k__zPbvL-_VVyJw7Sg%40mail.gmail.com
>
> --
> Respectfully,
>
> Andrew Johnson
> Software Engineer
> Metronome, Inc.

Hi Andrew,

Thanks again for the thoughtful discussion and reviewing my patch.

I would like to directly address the concern around LWLock contention
and potential performance overhead from pg_get_multixact_count().

To evaluate this, I ran repeated invocations of both
pg_get_multixact_count() and pg_stat_multixact under sustained system
load. At the time of testing, the system showed a significantly high
load average with notable LWLock contention:

----------------------------------------------------------------------------------
$ uptime
 00:23:30 up  2:08,  4 users,  load average: 103.18, 117.93, 116.77
----------------------------------------------------------------------------------

----------------------------------------------------------------------------------
postgres=# SELECT
    a.wait_event_type,
    COUNT(*) AS count
FROM pg_stat_activity a
WHERE a.wait_event_type IS NOT NULL
GROUP BY a.wait_event_type
ORDER BY count DESC;
 wait_event_type | count
-----------------+-------
 LWLock          |  1978
 Client          |  1014
 Timeout         |    21
 Activity        |     6
(4 rows)
----------------------------------------------------------------------------------

To simulate realistic monitoring intervals — typically in the 100ms to
1s range — I used pgbench with -R 10 (10 queries per second) over a
60-second duration for both the function and the view. Here are the
full results:

----------------------------------------------------------------------------------
$ pgbench -n -T 60 -R 10 -f <(echo "SELECT * FROM
pg_get_multixact_count();") -h 127.0.0.1 -p 5593 -U postgres postgres
pgbench (18beta1)
transaction type: /dev/fd/63
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 579
number of failed transactions: 0 (0.000%)
latency average = 0.892 ms
latency stddev = 1.667 ms
rate limit schedule lag: avg 0.160 (max 8.789) ms
initial connection time = 7.106 ms
tps = 9.655158 (without initial connection time)
----------------------------------------------------------------------------------

----------------------------------------------------------------------------------
$ pgbench -n -T 60 -R 10 -f <(echo "SELECT * FROM pg_stat_multixact;")
-h 127.0.0.1 -p 5593 -U postgres postgres
pgbench (18beta1)
transaction type: /dev/fd/63
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 589
number of failed transactions: 0 (0.000%)
latency average = 0.378 ms
latency stddev = 0.273 ms
rate limit schedule lag: avg 0.137 (max 4.347) ms
initial connection time = 4.516 ms
tps = 9.846161 (without initial connection time)
----------------------------------------------------------------------------------

Even under heavy LWLock contention, both approaches maintained stable
performance. pg_get_multixact_count() executed comfortably under 1 ms
on average, and schedule lag remained low, demonstrating that it can
be safely used in periodic monitoring setups.

I also compared the live counter from the function to the stats
snapshot from the view:

----------------------------------------------------------------------------------
postgres=# SELECT
    'from_pg_get_multixact_count' AS source,
    members
FROM
    pg_get_multixact_count ()
UNION ALL
SELECT
    'from_pg_stat_multixact' AS source,
    members
FROM
    pg_stat_multixact;
           source            |  members
-----------------------------+-----------
 from_pg_get_multixact_count | 839438187
 from_pg_stat_multixact      | 839438011
(2 rows)
----------------------------------------------------------------------------------

While the values are quite close, I think it’s worth highlighting that
pg_get_multixact_count() returns real-time state by calling
ReadMultiXactCounts() at query time, whereas pg_stat_multixact reports
values from the statistics collector’s last sampling cycle. Although
the collector also calls ReadMultiXactCounts() internally, its updates
are asynchronous and may lag — either due to the
stats_fetch_consistency setting [1], or in environments with high load
or memory pressure where stats updates may be delayed.

This distinction was part of the motivation behind my earlier proposal
[0], which introduced a lightweight SQL-callable function using
ReadMultiXactCounts(). Since these counters are global, not
aggregatable per backend, and don’t reset meaningfully, it seemed
reasonable to expose them without adding new statistics
infrastructure.

I understand the appeal of following the pg_stat_* view pattern for
consistency and passive observability, and I think both approaches
could certainly coexist. The view based design offers a familiar
interface for long-term monitoring, while the function can complement
it in scenarios where timely visibility is helpful — such as
monitoring multixact activity more closely during periods of elevated
usage or investigating potential pressure before it escalates.

While the function and view can coexist, I believe the function
already addresses the majority of practical needs efficiently, without
introducing noticeable contention — even on heavily loaded systems —
and without adding the complexity of statistics infrastructure.

I would be happy to continue the discussion or help refine the direction.

[0] 
https://www.postgresql.org/message-id/CA%2BQeY%2BDTggHskCXOa39nag2sFds9BD-7k__zPbvL-_VVyJw7Sg%40mail.gmail.com
[1] 
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS

Best regards,
Naga


Reply via email to