[ https://issues.apache.org/jira/browse/KAFKA-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750375#comment-13750375 ]
Jay Kreps commented on KAFKA-955: --------------------------------- Great fix. A few minor comments, mostly stylistic. RequestChannel.scala: 1. This usage exposes a bit much: requestChannel.sendResponse(new RequestChannel.Response(request.processor, request, null, RequestChannel.CloseSocket)) I think it might be nicer to have this instead: requestChannel.close(request.processor, request) and requestChannel.noResponse(req.processor, request) Implementation would be the same, it just would just be a little more clear for the user and the response codes can be private. Likewise in the response object I should be able to 2. These are a little confusing: val SendResponse: Short = 0 val NoResponse: Short = 1 val CloseSocket: Short = 9 Why is it 0, 1, and 9? What is the relationship between these and ErrorMapping? It should be clear from reading. Is there a reason we can't use a case class case class ResponseAction case object SendAction extends ResponseAction case object NoOpAction extends ResponseAction case object CloseConnectionAction extends ResponseAction Then to use it response.action match { case SendAction => do send case NoOpAction => read more case CloseConnectionAction => something } This seems clearer to me and I don't think it is significantly more expensive. Can we also standardize the usage so that we no longer have the user EITHER give null or NoResponse? It should be one or the other. 3. This logging "Cancelling the request key to notify socket server close the connection due to error handling produce request " is not informative to the user. What does it mean to cancel a key? What broke? What should they do? I also think this should be info unless we want the server admin to take some action (I don't think so, right? This is a normal occurance). SocketServer.scala 4. The comment "a null response send object" is retained but we are no longer using null to indicate this we are using RequestChannel.NoResponse. I think this comment is actually a little verbose given that we now have a nicely named response action. ProducerTest.scala: 5. org.scalatest.TestFailedException: Is there a reason you are giving the full path here instead of importing it Question on testing, what is the message loss rate with acks=0 under moderate load if we do something like a controlled shutdown with other replicas available? > After a leader change, messages sent with ack=0 are lost > -------------------------------------------------------- > > Key: KAFKA-955 > URL: https://issues.apache.org/jira/browse/KAFKA-955 > Project: Kafka > Issue Type: Bug > Reporter: Jason Rosenberg > Assignee: Guozhang Wang > Attachments: KAFKA-955.v1.patch, KAFKA-955.v1.patch, > KAFKA-955.v2.patch, KAFKA-955.v3.patch, KAFKA-955.v4.patch, > KAFKA-955.v5.patch, KAFKA-955.v6.patch > > > If the leader changes for a partition, and a producer is sending messages > with ack=0, then messages will be lost, since the producer has no active way > of knowing that the leader has changed, until it's next metadata refresh > update. > The broker receiving the message, which is no longer the leader, logs a > message like this: > Produce request with correlation id 7136261 from client on partition > [mytopic,0] failed due to Leader not local for partition [mytopic,0] on > broker 508818741 > This is exacerbated by the controlled shutdown mechanism, which forces an > immediate leader change. > A possible solution to this would be for a broker which receives a message, > for a topic that it is no longer the leader for (and if the ack level is 0), > then the broker could just silently forward the message over to the current > leader. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira