Hi Enrico,
> One additional bonus would be to at least add a label with the > "tenant" or, even better, the "namespace”. I think this is a good idea, but there is a challenge: we reuse the Channel in the client. Which means that some Consumers/Producers reuses a single Channel. For ServerCnx, the same instance is reused by multiple producers/consumers, it's difficult to distinguish which producer/consumer sent a command to broker. And, some commands(CommandConnect/CommandConnected/CommandSuccess/CommandProducerSuccess/CommandTopicMigrated/CommandAuthChallenge etc.) didn’t have a field that let us know it’s Topic/Namespace/Tenant. Thanks, Tao Jiuming > 2022年11月10日 下午9:08,Enrico Olivelli <eolive...@gmail.com> 写道: > > Tao, > I support the idea of tracking the same metric for all the possible commands. > > One additional bonus would be to at least add a label with the > "tenant" or, even better, the "namespace". > This way in a multi-tenant system you can track down to the > user/application who is causing the load > Otherwise you will see that you have many requests of some kind but > you won't know what is causing it > > What do you think ? > > Enrico > > Il giorno gio 10 nov 2022 alle ore 13:37 Jiuming Tao > <jm...@streamnative.io.invalid> ha scritto: >> >> Hi Asaf, >> >>> After reaching the end of the PIP I discovered a link to an issue that >>> gives an example of a misconfiguration. >> >> I’m sorry about this, currently, I’ve updated the PIP and added a link for >> the `misconfigured` client. >> >>> 1. Maybe we should put this code as a general wrapper for any command, to >>> avoid copy-pasting it for any other command we may need in the future. I >>> realize it's not an easy thing to do given the way the code is structured >>> today. The metric name you gave is generic enough so this can be refactored >>> as I said but in the future. >> >> >> Yes, the metrics name is very general, we can reuse it in the future. Say, >> if we want to add some others command execution latency or failed counter >> metrics, we don’t need another PIP. >> >>> 2. We provide some means for the operator to discover the issue, but >>> definitely not easy means. Also, we're not really solving any user >>> experience. In theory, we can say: >>> * There should be a reasonable delay between requests for this metadata, >>> why not enforce it? Perhaps apply a rate limit to this command which seems >>> reasonable by default >>> * The client can utilize the bi-directional nature of Pulsar Protocol and >>> get notified when partitions change (subscribe) therefore limiting the need >>> to pull based on frequency. >> >> I’m sorry that I can’t give you answer about this question, this will change >> our current client implementation. We need another PIP for this >> >> >> Thanks, >> Tao Jiuming >> >>> 2022年11月8日 下午10:56,Asaf Mesika <asaf.mes...@gmail.com> 写道: >>> >>> Hi Jiuming, >>> >>> I have a few questions: >>> >>> Motivation >>>> >>>> Currently, there's no way to track CommandPartitionedTopicMetadata >>>> requests. There's no metrics that indicate that a broker is handling >>>> CommandPartitionedTopicMetadata requests. >>>> >>>> Misconfigured clients might flood brokers with >>>> CommandPartitionedTopicMetadata requests and cause high CPU consumption. >>>> >>> I managed to understand the first line: You can't know how many such >>> requests each broker received. In the second line which is the problem >>> statement I couldn't figure out: >>> 1. What do you mean by misconfigured clients? What configuration can you >>> change in a wrong way that will cause the brokers with such requests? >>> 2. Clients ask a random broker for that metadata? >>> >>> So for now, you say that by having the metrics in place, I'll be able to >>> set up an alert on a spike of such requests, correlate that with the high >>> CPU, and then do what - reconfigure the client somehow to avoid sending >>> those requests? >>> >>> After reaching the end of the PIP I discovered a link to an issue that >>> gives an example of a misconfiguration. >>> Perhaps then it would be good to copy the example or at least link to it in >>> the motivation section to say - here is a link that contains an example (I >>> prefer to inline it myself). >>> >>> Other than that, I think it's good. >>> >>> I do have a thought I want to share, to hear what people think: >>> >>> When I read it a couple of things occurred to me: >>> 1. Maybe we should put this code as a general wrapper for any command, to >>> avoid copy-pasting it for any other command we may need in the future. I >>> realize it's not an easy thing to do given the way the code is structured >>> today. The metric name you gave is generic enough so this can be refactored >>> as I said but in the future. >>> >>> 2. We provide some means for the operator to discover the issue, but >>> definitely not easy means. Also, we're not really solving any user >>> experience. In theory, we can say: >>> * There should be a reasonable delay between requests for this metadata, >>> why not enforce it? Perhaps apply a rate limit to this command which seems >>> reasonable by default >>> * The client can utilize the bi-directional nature of Pulsar Protocol and >>> get notified when partitions change (subscribe) therefore limiting the need >>> to pull based on frequency. >>> >>> Thanks, >>> >>> Asaf >>> >>> >>> >>> On Thu, Nov 3, 2022 at 12:19 PM Jiuming Tao <jm...@streamnative.io.invalid> >>> wrote: >>> >>>> Hi pulsar community, >>>> >>>> I’ve opened a PIP to discuss: PIP-222: Add CommandPartitionedTopicMetadata >>>> metrics >>>> >>>> The PIP link: https://github.com/apache/pulsar/issues/18319 < >>>> https://github.com/apache/pulsar/issues/18319> >>>> >>>> Thanks, >>>> Tao Jiuming >>