[ https://issues.apache.org/jira/browse/KAFKA-13576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490617#comment-17490617 ]
RivenSun commented on KAFKA-13576: ---------------------------------- Hi [~rsivaram] [~ijuma] , [~guozhang] can you give any suggestions? Thanks. > Processor.ConnectionQueueSize provides configuration & metrics, > SelectorMetrics adds connection-register related metrics > ------------------------------------------------------------------------------------------------------------------------ > > Key: KAFKA-13576 > URL: https://issues.apache.org/jira/browse/KAFKA-13576 > Project: Kafka > Issue Type: Improvement > Components: metrics, network > Affects Versions: 3.0.0 > Reporter: RivenSun > Assignee: Luke Chen > Priority: Major > > h1. Problem: > After all client machines are switched to the company's private BYOIP, > producers who send messages frequently have a significant increase in time > consumption. Producers who send messages infrequently often throw out > exceptions that send messages to obtain metadata timeout. Everything was > normal before switching > h1. RC: > 1. The client's BYOIP lacks DNS-PTR configuration > 2. When the port uses SASL_SSL protocol, the underlying method > SaslChannelBuilder#buildTransportLayer of Processor#configureNewConnections > will call socketChannel.socket().getInetAddress().getHostName() to trigger > DNS reverse lookup. If clientIp lacks PTR configuration, this will cause > getHostName() will be time consuming. > 3. Several steps in the processor's run method are executed serially. If > configureNewConnections takes time, it will inevitably cause the completed > response to not be sent to the client in time, resulting in an increase in > the ack time for the producer to send messages > 4. ConfigureNewConnections is time-consuming, which will cause the elements > in Processor.newConnections to not be removed in time, which will increase > the time-consuming of the Acceptor#assignNewConnection method. > AssignNewConnection will even block in newConnections.put(socketChannel). At > this time, the Acceptor thread may reject any new creation TCP connection > request. > h1. Solution: > 1. Add DNS-PTR configuration to the BYOIP of the client > 2. Kafka high version has fixed this problem, > https://issues.apache.org/jira/browse/KAFKA-8562 > [https://github.com/apache/kafka/pull/10059] > 3. Selector Metrics of each processor’s selector, add *connection-register* > related metrics. > Selector#register(String id, SocketChannel socketChannel) In this method, > update the connection-register related indicators, the metrics indicator type > is expected to use newHistogram, which is similar to the attribute field of > *responseQueueTimeMs* > 4. > 1) The queue size of Processor.newConnections is recommended to be > configurable > Source code: > {code:java} > private[kafka] object Processor { > val IdlePercentMetricName = "IdlePercent" > val NetworkProcessorMetricTag = "networkProcessor" > val ListenerMetricTag = "listener" > val ConnectionQueueSize = 20 > }{code} > The current value is 20, and the code is hard-coded here, perhaps for design > considerations, but it is still recommended to provide configuration, > *queued.max.connections* acts on processors of all ports, > Or the processor of each listener port provides independent configuration > *listener.name.\{listenerName}.queued.max.connections* > 2) Provide metrics statistics for each processor’s newConnections queue size: > {*}ConnectionQueueSize{*}, ConnectionQueueSize metrics can refer to the > *ResponseQueueSize* maintained in RequestChannel -- This message was sent by Atlassian Jira (v8.20.1#820001)