[ 
https://issues.apache.org/jira/browse/KAFKA-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748671#comment-13748671
 ] 

Neha Narkhede commented on KAFKA-955:
-------------------------------------

Thanks for the patches, Guozhang. I reviewed patch v4 and here are some 
comments -

KafkaApis and SocketServer
1.1 One way to allow the socket server to close the channel is to just mark the 
request's key cancelled in the Response object. This way when the socket server 
is handling the response, it will throw a CancelledKeyException and we close 
the key in this case. One advantage of this approach is we can avoid 
introducing the close socket flag, just to handle this case. To make sure the 
request metrics are always updated, we can move 
curr.request.updateRequestMetrics to the first statement in the 
(curr.responseSend == null) block.
 
1.2 I think the below warn message can be improved -
Sending the close socket signal due to error handling produce request [%s] with 
Ack=0

Let's include the client id, correlation id and list of topics and partitions 
that this request had. This is probably more useful than printing the entire 
produce request as is, since that attempts to print things like 
ByteBufferMessageSet and is unreadable.
                
> After a leader change, messages sent with ack=0 are lost
> --------------------------------------------------------
>
>                 Key: KAFKA-955
>                 URL: https://issues.apache.org/jira/browse/KAFKA-955
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Rosenberg
>            Assignee: Guozhang Wang
>         Attachments: KAFKA-955.v1.patch, KAFKA-955.v1.patch, 
> KAFKA-955.v2.patch, KAFKA-955.v3.patch, KAFKA-955.v4.patch
>
>
> If the leader changes for a partition, and a producer is sending messages 
> with ack=0, then messages will be lost, since the producer has no active way 
> of knowing that the leader has changed, until it's next metadata refresh 
> update.
> The broker receiving the message, which is no longer the leader, logs a 
> message like this:
> Produce request with correlation id 7136261 from client  on partition 
> [mytopic,0] failed due to Leader not local for partition [mytopic,0] on 
> broker 508818741
> This is exacerbated by the controlled shutdown mechanism, which forces an 
> immediate leader change.
> A possible solution to this would be for a broker which receives a message, 
> for a topic that it is no longer the leader for (and if the ack level is 0), 
> then the broker could just silently forward the message over to the current 
> leader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to