RivenSun created KAFKA-13576:
--------------------------------

             Summary: Processor.ConnectionQueueSize provides configuration & 
metrics, SelectorMetrics adds connection-register related metrics
                 Key: KAFKA-13576
                 URL: https://issues.apache.org/jira/browse/KAFKA-13576
             Project: Kafka
          Issue Type: Improvement
          Components: metrics, network
    Affects Versions: 3.0.0
            Reporter: RivenSun
            Assignee: Luke Chen


h1. Problem:


After all client machines are switched to the company's private BYOIP, 
producers who send messages frequently have a significant increase in time 
consumption. Producers who send messages infrequently often throw out 
exceptions that send messages to obtain metadata timeout. Everything was normal 
before switching


h1. RC:


1. The client's BYOIP lacks PTR configuration

2. When the port uses SASL_SSL protocol, the underlying method 
SaslChannelBuilder#buildTransportLayer of Processor#configureNewConnections 
will call socketChannel.socket().getInetAddress().getHostName() to trigger DNS 
reverse lookup. If clientIp lacks PTR configuration, this will cause 
getHostName() will be time consuming.

3. Several steps in the processor's run method are executed serially. If 
configureNewConnections takes time, it will inevitably cause the completed 
response to not be sent to the client in time, resulting in an increase in the 
ack time for the producer to send messages

4. ConfigureNewConnections is time-consuming, which will cause the elements in 
Processor.newConnections to not be removed in time, which will increase the 
time-consuming of the Acceptor#assignNewConnection method. AssignNewConnection 
will even block in newConnections.put(socketChannel). At this time, the 
Acceptor thread may reject any new creation TCP connection request.


h1. Solution:


1. Add PTR configuration to the BYOIP of the client


2. Kafka high version has fixed this problem,

https://issues.apache.org/jira/browse/KAFKA-8562

https://github.com/apache/kafka/pull/10059

3. Selector Metrics of each processor’s selector, add *connection-register* 
related metrics.
Selector#register(String id, SocketChannel socketChannel) In this method, 
update the connection-register related indicators, the metrics indicator type 
is expected to use newHistogram, which is similar to the attribute field of 
*responseQueueTimeMs*

4.

1) The queue size of Processor.newConnections is recommended to be configurable

Source code:
{code:java}
private[kafka] object Processor {
  val IdlePercentMetricName = "IdlePercent"
  val NetworkProcessorMetricTag = "networkProcessor"
  val ListenerMetricTag = "listener"
  val ConnectionQueueSize = 20
}{code}

The current value is 20, and the code is hard-coded here, perhaps for design 
considerations, but it is still recommended to provide configuration, 
*queued.max.connections* acts on processors of all ports,

Or the processor of each listener port provides independent configuration
*listener.name.\{listenerName}.queued.max.connections*

2) Provide metrics statistics for each processor’s newConnections queue size: 
{*}ConnectionQueueSize{*}, ConnectionQueueSize metrics can refer to the 
*ResponseQueueSize* maintained in RequestChannel



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to