RivenSun created KAFKA-13576:
--------------------------------
Summary: Processor.ConnectionQueueSize provides configuration &
metrics, SelectorMetrics adds connection-register related metrics
Key: KAFKA-13576
URL: https://issues.apache.org/jira/browse/KAFKA-13576
Project: Kafka
Issue Type: Improvement
Components: metrics, network
Affects Versions: 3.0.0
Reporter: RivenSun
Assignee: Luke Chen
h1. Problem:
After all client machines are switched to the company's private BYOIP,
producers who send messages frequently have a significant increase in time
consumption. Producers who send messages infrequently often throw out
exceptions that send messages to obtain metadata timeout. Everything was normal
before switching
h1. RC:
1. The client's BYOIP lacks PTR configuration
2. When the port uses SASL_SSL protocol, the underlying method
SaslChannelBuilder#buildTransportLayer of Processor#configureNewConnections
will call socketChannel.socket().getInetAddress().getHostName() to trigger DNS
reverse lookup. If clientIp lacks PTR configuration, this will cause
getHostName() will be time consuming.
3. Several steps in the processor's run method are executed serially. If
configureNewConnections takes time, it will inevitably cause the completed
response to not be sent to the client in time, resulting in an increase in the
ack time for the producer to send messages
4. ConfigureNewConnections is time-consuming, which will cause the elements in
Processor.newConnections to not be removed in time, which will increase the
time-consuming of the Acceptor#assignNewConnection method. AssignNewConnection
will even block in newConnections.put(socketChannel). At this time, the
Acceptor thread may reject any new creation TCP connection request.
h1. Solution:
1. Add PTR configuration to the BYOIP of the client
2. Kafka high version has fixed this problem,
https://issues.apache.org/jira/browse/KAFKA-8562
https://github.com/apache/kafka/pull/10059
3. Selector Metrics of each processor’s selector, add *connection-register*
related metrics.
Selector#register(String id, SocketChannel socketChannel) In this method,
update the connection-register related indicators, the metrics indicator type
is expected to use newHistogram, which is similar to the attribute field of
*responseQueueTimeMs*
4.
1) The queue size of Processor.newConnections is recommended to be configurable
Source code:
{code:java}
private[kafka] object Processor {
val IdlePercentMetricName = "IdlePercent"
val NetworkProcessorMetricTag = "networkProcessor"
val ListenerMetricTag = "listener"
val ConnectionQueueSize = 20
}{code}
The current value is 20, and the code is hard-coded here, perhaps for design
considerations, but it is still recommended to provide configuration,
*queued.max.connections* acts on processors of all ports,
Or the processor of each listener port provides independent configuration
*listener.name.\{listenerName}.queued.max.connections*
2) Provide metrics statistics for each processor’s newConnections queue size:
{*}ConnectionQueueSize{*}, ConnectionQueueSize metrics can refer to the
*ResponseQueueSize* maintained in RequestChannel
--
This message was sent by Atlassian Jira
(v8.20.1#820001)