Hi Nozomi,

I didn't look into the proposal at the moment. But I noticed the
producer name generation logic you mentioned here. At least in C++
clients, the producer name can only be set from the CommandProducer
response. i.e. PIP-79 is not catched up in C++ clients.

Thanks,
Yunze

On Fri, Nov 25, 2022 at 2:34 PM Nozomi Kurihara <nkuri...@apache.org> wrote:
>
> Hi Heesung,
>
> Thank you for opening this discussion.
>
> IMO backward-compatibility should be kept at least across minor releases.
>
> Although the performance issue you mentioned (e.g. memory burst, high GC
> and OOM) looks problematic, backward-compatibility is also important.
> I think which has higher priority depends on the case.
>
> Your current change seems to remove the index-based aggregation completely.
> However I think we should keep room for choice.
>
> In order to allow users (here Pulsar server-side admins in particular) to
> choose the performance or backward-compatibility, how about introducing a
> "force" setting, e.g. "forceAggregatePublisherStatsByProducerName"?
>
> Those who place more importance on the performance than
> backward-compatibility can set this flag to true.
> Others, those who want to keep backward-compatibility, set this flag to
> false.
>
> By the way, I'm not sure, is the producer name generation logic already
> implemented in C++, Go and other clients?
> If not so, first we should implement it before switching the
> producer-name-based aggregation.
>
> Best Regards,
> Nozomi
>
> 2022年11月17日(木) 8:51 Heesung Sohn <heesung.s...@streamnative.io.invalid>:
>
> > Hi,
> >
> > To add more about the backward incompatibility issue
> > <https://github.com/apache/pulsar/pull/18254#issuecomment-1311168182>,
> >
> > Before fix:
> > % ./bin/pulsar-admin topics partitioned-stats
> > persistent://public/default/pt
> > ...
> >   "publishers" : [ {
> >     "msgRateIn" : 0.0,
> >     "msgThroughputIn" : 0.0,
> >     "averageMsgSize" : 0.0,
> >     "chunkedMessageRate" : 0.0,
> >     "producerId" : 0,
> >     "supportsPartialProducer" : false
> >   } ],
> >
> > After fix:
> > % ./bin/pulsar-admin topics partitioned-stats
> > persistent://public/default/pt
> > ...
> >   "publishers" : [ {
> >     "msgRateIn" : 0.0,
> >     "msgThroughputIn" : 0.0,
> >     "averageMsgSize" : 0.0,
> >     "chunkedMessageRate" : 0.0,
> >     "producerId" : 0,
> >     "supportsPartialProducer" : true,
> >     "producerName" : "standalone-0-1"
> >   }, {
> >     "msgRateIn" : 0.0,
> >     "msgThroughputIn" : 0.0,
> >     "averageMsgSize" : 0.0,
> >     "chunkedMessageRate" : 0.0,
> >     "producerId" : 0,
> >     "supportsPartialProducer" : true,
> >     "producerName" : "standalone-0-0"
> >   } ],
> > ...
> >
> >
> > The broker side's producer name generation has been there since this PR
> > <https://github.com/apache/pulsar/pull/1178>(1.22-incubating).
> > ProducerName was automatically generated in this format
> > {clusterName}-{brokerInstanceId}-{producerNameGenerationCounter} until
> > 2.10.
> > So, a producer to a partitioned topic(2) results in the two producer names,
> > like the following.
> > standalone-0-0
> > standalone-0-1
> >
> > And since 2.10, by default, partitioned producers have the same producer
> > name between partitions by this PR
> > <https://github.com/apache/pulsar/pull/10279>(generated on the client
> > side).
> > standalone-0-0
> >
> > Hence, the impacted versions(backward incompatibility issue
> > <https://github.com/apache/pulsar/pull/18254#issuecomment-1311168182>) by
> > the proposed fix(Option 1 below) are < 2.10.
> >
> > When we aggregate stats between partitions, the default(
> > aggregatePublisherStatsByProducerName=false) aggregates the producer stats
> > by the index of the producer stat list from each partition. So, when lucky,
> > it could output a single producer stat. However, this method can be buggy,
> > as each partition could return a different size and index of the
> > producer stat list.
> >
> >
> > To fix the original issue(described here
> > <https://github.com/apache/pulsar/pull/18254>), I think we have the
> > following options.
> >
> > Option 1(proposed): Deprecate aggregatePublisherStatsByProducerName (the
> > current PR here <https://github.com/apache/pulsar/pull/18254>) in the next
> > release and live with behavior change where we get `topics
> > partitioned-stats` per-producer-and-partition from old clients(Ver. <2.10),
> > instead of stats per-producer.
> >
> > Option 2: Defer the Option 1 fix and push it to the next major
> > version(3.0.0), as this is a breaking change.
> >
> > Option 3: Keep aggregatePublisherStatsByProducerName config but change the
> > default to aggregatePublisherStatsByProducerName=true.
> >
> > Option 4: As a long-term fix, create separate Admin-APIs for publisher and
> > subscription stats and drop their stats from `topics partitioned-stats` as
> > it is expensive to aggregate them on the fly. (for thousands of publishers
> > and subscriptions). Push this change to the next major version.
> >
> > Or other suggestions?
> >
> > Regards,
> > Heesung
> >
> >
> >
> >
> >
> >
> > On Mon, Nov 14, 2022 at 7:56 PM Heesung Sohn <heesung.s...@streamnative.io
> > >
> > wrote:
> >
> > > Dear Pulsar Community,
> > >
> > > We recently found a bug in `pulsar-admin topics partitioned-stats api`
> > that
> > > could incur a memory burst, high GC time, or OOM.
> > >
> > > For this issue, I proposed a fix
> > > <https://github.com/apache/pulsar/pull/18254> by deprecating the
> > aggregatePublisherStatsByProducerName
> > > config and always aggregating the publishers' stats by publisherName,
> > > instead of the list index(aggregatePublisherStatsByProducerName=false,
> > > default).
> > >
> > >
> > >    -  The index-based aggregation is inherently wrong in a highly
> > >    concurrent producer environment(where the order and size of the
> > publisher
> > >    stat list are not guaranteed to be the same). The publisher stats
> > need to
> > >    be aggregated by a unique key, preferably the producer
> > >    name(aggregatePublisherStatsByProducerName=true).
> > >
> > >
> > > However, this fix will break some of the old client's compatibility since
> > > the way Pulsar generates the producer name has changed over time, as
> > > described here
> > > <https://github.com/apache/pulsar/pull/18254#issuecomment-1311168182>.
> > >
> > > As I replied here
> > > <https://github.com/apache/pulsar/pull/18254#issuecomment-1312084363>,
> > although
> > > it is not desirable, I think we could be lenient on this change in the
> > stat
> > > API response(assuming thispublishers'stat struct is used for human admins
> > > only for ad-hoc checks).
> > >
> > > Are we OK with this non-backward-compatible fix for some of the old
> > > clients? Or, do you have any other suggestions?
> > >
> > > One idea for a long-term fix could be:
> > > When there are thousands of producers(consumers) for a
> > > (partitioned-)topic, it is expensive to aggregate each
> > > publisher(subscriptions)'s stats on-the-fly across the brokers.
> > Alternatively,
> > > for the next major version, I think we could further define
> > > producers(subscriptions)' API like the below and drop the publishers and
> > > subscriptions structs from topics (partitioned-)stats returns.
> > >
> > > pulsar-admin publishers list my-topic --last-pagination-key xyz
> > > pulsar-admin publishers stats my-producer
> > >
> > > # similarly for subscriptions
> > >
> > > Regards,
> > > Heesung
> > >
> >

Reply via email to