[ https://issues.apache.org/jira/browse/KAFKA-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872943#comment-17872943 ]
Sanskar Jhajharia commented on KAFKA-15863: ------------------------------------------- Hey [~junrao] , I performed some preliminary tests on the throttling for the Client Metrics and wanted to share the observations for discussions on the further steps that we might need here. On the broker side I added logs in the ClientQuotaManager to suggest when we hit the limits. I started the broker locally and set a request rate based quota on it with the following command: {code:java} ./bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type clients --entity-name sj-producer --alter --add-config 'request_percentage=1'{code} On the client side, I edited my client code such that it would continuously send a GetTelemetrySubscription Request to the broker with no pause. With this, I started a producer client with client id as {{{}sj-producer{}}}. The producer was not producing any data during this time. As expected, the client bombarded the broker with GetTelemetrySubscription requests. As soon as I applied the above quota, I was able to see throttling logs in my server and the corresponding Get Telemetry Logs also reduced. This proves that the channel was indeed muted and the additional requests in queue must have timed out. {code:java} [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [2024-08-12 16:11:23,632] INFO Changing REQUEST quota for client-id sj-producer to 0.1 (kafka.server.ClientRequestQuotaManager) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [2024-08-12 16:11:23,633] INFO Changing REQUEST quota for client-id sj-producer to 0.1 (kafka.server.ClientRequestQuotaManager) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) ... // This was repeated multiple times. Truncating to add only relevant data [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJha]: Quota violated for sensor (Request-:sj-producer). Delay time: (19) [SJha]: Channel throttled for sensor (Request-:sj-producer). Delay time: (19)) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJha]: Quota violated for sensor (Request-:sj-producer). Delay time: (50) [SJha]: Channel throttled for sensor (Request-:sj-producer). Delay time: (50)) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJha]: Quota violated for sensor (Request-:sj-producer). Delay time: (197) [SJha]: Channel throttled for sensor (Request-:sj-producer). Delay time: (197)) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJha]: Quota violated for sensor (Request-:sj-producer). Delay time: (152) [SJha]: Channel throttled for sensor (Request-:sj-producer). Delay time: (152)) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJha]: Quota violated for sensor (Request-:sj-producer). Delay time: (146) [SJha]: Channel throttled for sensor (Request-:sj-producer). Delay time: (146)) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJha]: Quota violated for sensor (Request-:sj-producer). Delay time: (151) [SJha]: Channel throttled for sensor (Request-:sj-producer). Delay time: (151)) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) [SJha]: Quota violated for sensor (Request-:sj-producer). Delay time: (123) [SJha]: Channel throttled for sensor (Request-:sj-producer). Delay time: (123)) [SJ-1]: Get Telemetry Subs Request: (Z2yB04-jRg2OnTPda5VBkw) {code} So going forward, I would like to know what are your suggestions on the effective handling of the same. Based on my understanding, the Client Telemetry RPCs also utilise the same quota as the Producer/Consumer client would have (as we are setting the quota based on the client-id). a) Do you suggest that similar to KIP-599 where we added a new quota {{controller_mutations_rate}} , we should also define a new quota (maybe like {{client_telemetry_rate) }}which would ensure that a client's Telemetry Requests will not interfere with the Produce / Consume bandwidth? b) If yes, do we need a KIP similar to KIP-599 for this change as well, given that this was not specifically covered in the original KIP-714? Thanks! cc: [~apoorvmittal10] > Handle push telemetry throttling with quota manager > --------------------------------------------------- > > Key: KAFKA-15863 > URL: https://issues.apache.org/jira/browse/KAFKA-15863 > Project: Kafka > Issue Type: Sub-task > Reporter: Apoorv Mittal > Assignee: Sanskar Jhajharia > Priority: Major > > Details: https://github.com/apache/kafka/pull/14699#discussion_r1399714279 -- This message was sent by Atlassian Jira (v8.20.10#820010)