Hi Enrico,

> One additional bonus would be to at least add a label with the
> "tenant" or, even better, the "namespace”.

I think this is a good idea, but there is a challenge: we reuse the Channel in 
the client. Which means that some Consumers/Producers reuses a single Channel.  

For ServerCnx, the same instance is reused by multiple producers/consumers, 
it's difficult to distinguish which producer/consumer sent a command to broker.

And, some 
commands(CommandConnect/CommandConnected/CommandSuccess/CommandProducerSuccess/CommandTopicMigrated/CommandAuthChallenge
 etc.) didn’t have a field that let us know it’s Topic/Namespace/Tenant.

Thanks,
Tao Jiuming


> 2022年11月10日 下午9:08,Enrico Olivelli <eolive...@gmail.com> 写道:
> 
> Tao,
> I support the idea of tracking the same metric for all the possible commands.
> 
> One additional bonus would be to at least add a label with the
> "tenant" or, even better, the "namespace".
> This way in a multi-tenant system you can track down to the
> user/application who is causing the load
> Otherwise you will see that you have many requests of some kind but
> you won't know what is causing it
> 
> What do you think ?
> 
> Enrico
> 
> Il giorno gio 10 nov 2022 alle ore 13:37 Jiuming Tao
> <jm...@streamnative.io.invalid> ha scritto:
>> 
>> Hi Asaf,
>> 
>>> After reaching the end of the PIP I discovered a link to an issue that
>>> gives an example of a misconfiguration.
>> 
>> I’m sorry about this, currently, I’ve updated the PIP and added a link for 
>> the `misconfigured` client.
>> 
>>> 1. Maybe we should put this code as a general wrapper for any command, to
>>> avoid copy-pasting it for any other command we may need in the future. I
>>> realize it's not an easy thing to do given the way the code is structured
>>> today. The metric name you gave is generic enough so this can be refactored
>>> as I said but in the future.
>> 
>> 
>> Yes, the metrics name is very general, we can reuse it in the future. Say, 
>> if we want to add some others command execution latency or failed counter 
>> metrics, we don’t need another PIP.
>> 
>>> 2. We provide some means for the operator to discover the issue, but
>>> definitely not easy means. Also, we're not really solving any user
>>> experience. In theory, we can say:
>>> * There should be a reasonable delay between requests for this metadata,
>>> why not enforce it? Perhaps apply a rate limit to this command which seems
>>> reasonable by default
>>> * The client can utilize the bi-directional nature of Pulsar Protocol and
>>> get notified when partitions change (subscribe) therefore limiting the need
>>> to pull based on frequency.
>> 
>> I’m sorry that I can’t give you answer about this question, this will change 
>> our current client implementation. We need another PIP for this
>> 
>> 
>> Thanks,
>> Tao Jiuming
>> 
>>> 2022年11月8日 下午10:56,Asaf Mesika <asaf.mes...@gmail.com> 写道:
>>> 
>>> Hi Jiuming,
>>> 
>>> I have a few questions:
>>> 
>>> Motivation
>>>> 
>>>> Currently, there's no way to track CommandPartitionedTopicMetadata
>>>> requests. There's no metrics that indicate that a broker is handling
>>>> CommandPartitionedTopicMetadata requests.
>>>> 
>>>> Misconfigured clients might flood brokers with
>>>> CommandPartitionedTopicMetadata requests and cause high CPU consumption.
>>>> 
>>> I managed to understand the first line: You can't know how many such
>>> requests each broker received. In the second line which is the problem
>>> statement I couldn't figure out:
>>> 1. What do you mean by misconfigured clients? What configuration can you
>>> change in a wrong way that will cause the brokers with such requests?
>>> 2. Clients ask a random broker for that metadata?
>>> 
>>> So for now, you say that by having the metrics in place, I'll be able to
>>> set up an alert on a spike of such requests, correlate that with the high
>>> CPU, and then do what - reconfigure the client somehow to avoid sending
>>> those requests?
>>> 
>>> After reaching the end of the PIP I discovered a link to an issue that
>>> gives an example of a misconfiguration.
>>> Perhaps then it would be good to copy the example or at least link to it in
>>> the motivation section to say - here is a link that contains an example (I
>>> prefer to inline it myself).
>>> 
>>> Other than that, I think it's good.
>>> 
>>> I do have a thought I want to share, to hear what people think:
>>> 
>>> When I read it a couple of things occurred to me:
>>> 1. Maybe we should put this code as a general wrapper for any command, to
>>> avoid copy-pasting it for any other command we may need in the future. I
>>> realize it's not an easy thing to do given the way the code is structured
>>> today. The metric name you gave is generic enough so this can be refactored
>>> as I said but in the future.
>>> 
>>> 2. We provide some means for the operator to discover the issue, but
>>> definitely not easy means. Also, we're not really solving any user
>>> experience. In theory, we can say:
>>> * There should be a reasonable delay between requests for this metadata,
>>> why not enforce it? Perhaps apply a rate limit to this command which seems
>>> reasonable by default
>>> * The client can utilize the bi-directional nature of Pulsar Protocol and
>>> get notified when partitions change (subscribe) therefore limiting the need
>>> to pull based on frequency.
>>> 
>>> Thanks,
>>> 
>>> Asaf
>>> 
>>> 
>>> 
>>> On Thu, Nov 3, 2022 at 12:19 PM Jiuming Tao <jm...@streamnative.io.invalid>
>>> wrote:
>>> 
>>>> Hi pulsar community,
>>>> 
>>>> I’ve opened a PIP to discuss: PIP-222: Add CommandPartitionedTopicMetadata
>>>> metrics
>>>> 
>>>> The PIP link: https://github.com/apache/pulsar/issues/18319 <
>>>> https://github.com/apache/pulsar/issues/18319>
>>>> 
>>>> Thanks,
>>>> Tao Jiuming
>> 

Reply via email to