Dear Pulsar Community,

We recently found a bug in `pulsar-admin topics partitioned-stats api` that
could incur a memory burst, high GC time, or OOM.

For this issue, I proposed a fix
<https://github.com/apache/pulsar/pull/18254> by deprecating the
aggregatePublisherStatsByProducerName
config and always aggregating the publishers' stats by publisherName,
instead of the list index(aggregatePublisherStatsByProducerName=false,
default).


   -  The index-based aggregation is inherently wrong in a highly
   concurrent producer environment(where the order and size of the publisher
   stat list are not guaranteed to be the same). The publisher stats need to
   be aggregated by a unique key, preferably the producer
   name(aggregatePublisherStatsByProducerName=true).


However, this fix will break some of the old client's compatibility since
the way Pulsar generates the producer name has changed over time, as
described here
<https://github.com/apache/pulsar/pull/18254#issuecomment-1311168182>.

As I replied here
<https://github.com/apache/pulsar/pull/18254#issuecomment-1312084363>, although
it is not desirable, I think we could be lenient on this change in the stat
API response(assuming thispublishers'stat struct is used for human admins
only for ad-hoc checks).

Are we OK with this non-backward-compatible fix for some of the old
clients? Or, do you have any other suggestions?

One idea for a long-term fix could be:
When there are thousands of producers(consumers) for a (partitioned-)topic,
it is expensive to aggregate each publisher(subscriptions)'s stats
on-the-fly across the brokers. Alternatively, for the next major version, I
think we could further define producers(subscriptions)' API like the below
and drop the publishers and subscriptions structs from topics
(partitioned-)stats returns.

pulsar-admin publishers list my-topic --last-pagination-key xyz
pulsar-admin publishers stats my-producer

# similarly for subscriptions

Regards,
Heesung

Reply via email to