Tao, I support the idea of tracking the same metric for all the possible commands.
One additional bonus would be to at least add a label with the "tenant" or, even better, the "namespace". This way in a multi-tenant system you can track down to the user/application who is causing the load Otherwise you will see that you have many requests of some kind but you won't know what is causing it What do you think ? Enrico Il giorno gio 10 nov 2022 alle ore 13:37 Jiuming Tao <jm...@streamnative.io.invalid> ha scritto: > > Hi Asaf, > > > After reaching the end of the PIP I discovered a link to an issue that > > gives an example of a misconfiguration. > > I’m sorry about this, currently, I’ve updated the PIP and added a link for > the `misconfigured` client. > > > 1. Maybe we should put this code as a general wrapper for any command, to > > avoid copy-pasting it for any other command we may need in the future. I > > realize it's not an easy thing to do given the way the code is structured > > today. The metric name you gave is generic enough so this can be refactored > > as I said but in the future. > > > Yes, the metrics name is very general, we can reuse it in the future. Say, if > we want to add some others command execution latency or failed counter > metrics, we don’t need another PIP. > > > 2. We provide some means for the operator to discover the issue, but > > definitely not easy means. Also, we're not really solving any user > > experience. In theory, we can say: > > * There should be a reasonable delay between requests for this metadata, > > why not enforce it? Perhaps apply a rate limit to this command which seems > > reasonable by default > > * The client can utilize the bi-directional nature of Pulsar Protocol and > > get notified when partitions change (subscribe) therefore limiting the need > > to pull based on frequency. > > I’m sorry that I can’t give you answer about this question, this will change > our current client implementation. We need another PIP for this > > > Thanks, > Tao Jiuming > > > 2022年11月8日 下午10:56,Asaf Mesika <asaf.mes...@gmail.com> 写道: > > > > Hi Jiuming, > > > > I have a few questions: > > > > Motivation > >> > >> Currently, there's no way to track CommandPartitionedTopicMetadata > >> requests. There's no metrics that indicate that a broker is handling > >> CommandPartitionedTopicMetadata requests. > >> > >> Misconfigured clients might flood brokers with > >> CommandPartitionedTopicMetadata requests and cause high CPU consumption. > >> > > I managed to understand the first line: You can't know how many such > > requests each broker received. In the second line which is the problem > > statement I couldn't figure out: > > 1. What do you mean by misconfigured clients? What configuration can you > > change in a wrong way that will cause the brokers with such requests? > > 2. Clients ask a random broker for that metadata? > > > > So for now, you say that by having the metrics in place, I'll be able to > > set up an alert on a spike of such requests, correlate that with the high > > CPU, and then do what - reconfigure the client somehow to avoid sending > > those requests? > > > > After reaching the end of the PIP I discovered a link to an issue that > > gives an example of a misconfiguration. > > Perhaps then it would be good to copy the example or at least link to it in > > the motivation section to say - here is a link that contains an example (I > > prefer to inline it myself). > > > > Other than that, I think it's good. > > > > I do have a thought I want to share, to hear what people think: > > > > When I read it a couple of things occurred to me: > > 1. Maybe we should put this code as a general wrapper for any command, to > > avoid copy-pasting it for any other command we may need in the future. I > > realize it's not an easy thing to do given the way the code is structured > > today. The metric name you gave is generic enough so this can be refactored > > as I said but in the future. > > > > 2. We provide some means for the operator to discover the issue, but > > definitely not easy means. Also, we're not really solving any user > > experience. In theory, we can say: > > * There should be a reasonable delay between requests for this metadata, > > why not enforce it? Perhaps apply a rate limit to this command which seems > > reasonable by default > > * The client can utilize the bi-directional nature of Pulsar Protocol and > > get notified when partitions change (subscribe) therefore limiting the need > > to pull based on frequency. > > > > Thanks, > > > > Asaf > > > > > > > > On Thu, Nov 3, 2022 at 12:19 PM Jiuming Tao <jm...@streamnative.io.invalid> > > wrote: > > > >> Hi pulsar community, > >> > >> I’ve opened a PIP to discuss: PIP-222: Add CommandPartitionedTopicMetadata > >> metrics > >> > >> The PIP link: https://github.com/apache/pulsar/issues/18319 < > >> https://github.com/apache/pulsar/issues/18319> > >> > >> Thanks, > >> Tao Jiuming >