[ https://issues.apache.org/jira/browse/KAFKA-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748671#comment-13748671 ]
Neha Narkhede commented on KAFKA-955: ------------------------------------- Thanks for the patches, Guozhang. I reviewed patch v4 and here are some comments - KafkaApis and SocketServer 1.1 One way to allow the socket server to close the channel is to just mark the request's key cancelled in the Response object. This way when the socket server is handling the response, it will throw a CancelledKeyException and we close the key in this case. One advantage of this approach is we can avoid introducing the close socket flag, just to handle this case. To make sure the request metrics are always updated, we can move curr.request.updateRequestMetrics to the first statement in the (curr.responseSend == null) block. 1.2 I think the below warn message can be improved - Sending the close socket signal due to error handling produce request [%s] with Ack=0 Let's include the client id, correlation id and list of topics and partitions that this request had. This is probably more useful than printing the entire produce request as is, since that attempts to print things like ByteBufferMessageSet and is unreadable. > After a leader change, messages sent with ack=0 are lost > -------------------------------------------------------- > > Key: KAFKA-955 > URL: https://issues.apache.org/jira/browse/KAFKA-955 > Project: Kafka > Issue Type: Bug > Reporter: Jason Rosenberg > Assignee: Guozhang Wang > Attachments: KAFKA-955.v1.patch, KAFKA-955.v1.patch, > KAFKA-955.v2.patch, KAFKA-955.v3.patch, KAFKA-955.v4.patch > > > If the leader changes for a partition, and a producer is sending messages > with ack=0, then messages will be lost, since the producer has no active way > of knowing that the leader has changed, until it's next metadata refresh > update. > The broker receiving the message, which is no longer the leader, logs a > message like this: > Produce request with correlation id 7136261 from client on partition > [mytopic,0] failed due to Leader not local for partition [mytopic,0] on > broker 508818741 > This is exacerbated by the controlled shutdown mechanism, which forces an > immediate leader change. > A possible solution to this would be for a broker which receives a message, > for a topic that it is no longer the leader for (and if the ack level is 0), > then the broker could just silently forward the message over to the current > leader. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira