Tao,
I support the idea of tracking the same metric for all the possible commands.

One additional bonus would be to at least add a label with the
"tenant" or, even better, the "namespace".
This way in a multi-tenant system you can track down to the
user/application who is causing the load
Otherwise you will see that you have many requests of some kind but
you won't know what is causing it

What do you think ?

Enrico

Il giorno gio 10 nov 2022 alle ore 13:37 Jiuming Tao
<jm...@streamnative.io.invalid> ha scritto:
>
> Hi Asaf,
>
> > After reaching the end of the PIP I discovered a link to an issue that
> > gives an example of a misconfiguration.
>
> I’m sorry about this, currently, I’ve updated the PIP and added a link for 
> the `misconfigured` client.
>
> > 1. Maybe we should put this code as a general wrapper for any command, to
> > avoid copy-pasting it for any other command we may need in the future. I
> > realize it's not an easy thing to do given the way the code is structured
> > today. The metric name you gave is generic enough so this can be refactored
> > as I said but in the future.
>
>
> Yes, the metrics name is very general, we can reuse it in the future. Say, if 
> we want to add some others command execution latency or failed counter 
> metrics, we don’t need another PIP.
>
> > 2. We provide some means for the operator to discover the issue, but
> > definitely not easy means. Also, we're not really solving any user
> > experience. In theory, we can say:
> > * There should be a reasonable delay between requests for this metadata,
> > why not enforce it? Perhaps apply a rate limit to this command which seems
> > reasonable by default
> > * The client can utilize the bi-directional nature of Pulsar Protocol and
> > get notified when partitions change (subscribe) therefore limiting the need
> > to pull based on frequency.
>
> I’m sorry that I can’t give you answer about this question, this will change 
> our current client implementation. We need another PIP for this
>
>
> Thanks,
>  Tao Jiuming
>
> > 2022年11月8日 下午10:56,Asaf Mesika <asaf.mes...@gmail.com> 写道:
> >
> > Hi Jiuming,
> >
> > I have a few questions:
> >
> > Motivation
> >>
> >> Currently, there's no way to track CommandPartitionedTopicMetadata
> >> requests. There's no metrics that indicate that a broker is handling
> >> CommandPartitionedTopicMetadata requests.
> >>
> >> Misconfigured clients might flood brokers with
> >> CommandPartitionedTopicMetadata requests and cause high CPU consumption.
> >>
> > I managed to understand the first line: You can't know how many such
> > requests each broker received. In the second line which is the problem
> > statement I couldn't figure out:
> > 1. What do you mean by misconfigured clients? What configuration can you
> > change in a wrong way that will cause the brokers with such requests?
> > 2. Clients ask a random broker for that metadata?
> >
> > So for now, you say that by having the metrics in place, I'll be able to
> > set up an alert on a spike of such requests, correlate that with the high
> > CPU, and then do what - reconfigure the client somehow to avoid sending
> > those requests?
> >
> > After reaching the end of the PIP I discovered a link to an issue that
> > gives an example of a misconfiguration.
> > Perhaps then it would be good to copy the example or at least link to it in
> > the motivation section to say - here is a link that contains an example (I
> > prefer to inline it myself).
> >
> > Other than that, I think it's good.
> >
> > I do have a thought I want to share, to hear what people think:
> >
> > When I read it a couple of things occurred to me:
> > 1. Maybe we should put this code as a general wrapper for any command, to
> > avoid copy-pasting it for any other command we may need in the future. I
> > realize it's not an easy thing to do given the way the code is structured
> > today. The metric name you gave is generic enough so this can be refactored
> > as I said but in the future.
> >
> > 2. We provide some means for the operator to discover the issue, but
> > definitely not easy means. Also, we're not really solving any user
> > experience. In theory, we can say:
> > * There should be a reasonable delay between requests for this metadata,
> > why not enforce it? Perhaps apply a rate limit to this command which seems
> > reasonable by default
> > * The client can utilize the bi-directional nature of Pulsar Protocol and
> > get notified when partitions change (subscribe) therefore limiting the need
> > to pull based on frequency.
> >
> > Thanks,
> >
> > Asaf
> >
> >
> >
> > On Thu, Nov 3, 2022 at 12:19 PM Jiuming Tao <jm...@streamnative.io.invalid>
> > wrote:
> >
> >> Hi pulsar community,
> >>
> >> I’ve opened a PIP to discuss: PIP-222: Add CommandPartitionedTopicMetadata
> >> metrics
> >>
> >> The PIP link: https://github.com/apache/pulsar/issues/18319 <
> >> https://github.com/apache/pulsar/issues/18319>
> >>
> >> Thanks,
> >> Tao Jiuming
>

Reply via email to