[ https://issues.apache.org/jira/browse/KAFKA-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Jacot resolved KAFKA-9648. -------------------------------- Fix Version/s: 3.2.0 Reviewer: David Jacot Resolution: Fixed > Add configuration to adjust listen backlog size for Acceptor > ------------------------------------------------------------ > > Key: KAFKA-9648 > URL: https://issues.apache.org/jira/browse/KAFKA-9648 > Project: Kafka > Issue Type: Improvement > Components: core > Affects Versions: 0.10.0.1 > Reporter: li xiangyuan > Assignee: Haruki Okada > Priority: Minor > Fix For: 3.2.0 > > > I have describe a mystery problem > (https://issues.apache.org/jira/browse/KAFKA-9211). This issue I found kafka > server will trigger tcp Congestion Control in some condition. finally we > found the root cause. > when kafka server restart for any reason and then execute preferred replica > leader, lots of replica leader will give back to it & trigger cluster > metadata update. then all clients will establish connection to this server. > at the monment many tcp estable request are waiting in the tcp sync queue , > and then to accept queue. > kafka create serversocket in SocketServer.scala > > {code:java} > serverChannel.socket.bind(socketAddress);{code} > this method has second parameter "backlog", min(backlog,tcp_max_syn_backlog) > will decide the queue length.beacues kafka haven't set ,it is default value > 50. > if this queue is full, and tcp_syncookies = 0, then new connection request > will be rejected. If tcp_syncookies=1, it will trigger the tcp synccookie > mechanism. this mechanism could allow linux handle more tcp sync request, but > it would lose many tcp external parameter, include "wscale", the one that > allow tcp connection to send much more bytes per tcp package. because > syncookie triggerd, wscale has lost, and this tcp connection will handle > network very slow, forever,until this connection is closed and establish > another tcp connection. > so after a preferred repilca executed, lots of new tcp connection will > establish without set wscale,and many network traffic to this server will > have a very slow speed. > i'm not sure whether new linux version have resolved this problem, but kafka > also should set backlog a larger value. we now have modify this to 512, seems > everything is ok. > -- This message was sent by Atlassian Jira (v8.20.1#820001)