[ 
https://issues.apache.org/jira/browse/KAFKA-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13746810#comment-13746810
 ] 

Magnus Edenhill commented on KAFKA-955:
---------------------------------------

Hi Guozhang,

I understand that you might not want to introduce a new message semantic at 
this point of the 0.8 beta, but it wont get easier after the release.

My proposal is a change of the protocol definition to allow unsolicited 
metadata response messages to be sent from the broker, this would of course 
require changes in most clients, but a very small one for those that are not 
interested in keeping their leader cache up to date.

Consider a producer forwarding >100kmsgs/s for a number of topics to a broker 
that suddenly drops the connection because one of those topics changed leader, 
the producer message queue will quickly build up and might start dropping 
messages (for topics that didnt loose their leader) due to local queue 
thresholds or very slowly recover if the current rate of messages is close to 
the maximum thruput.


In my mind closing the socket because one top+par changed leader is a very 
intrusive way to signal an event for sub-set of the communication, and it 
should instead be fixed properly with an unsoliticed metadata response message.

The unsolicited metadata response message is useful for other scenarios aswell, 
new brokers and topics being added, for instance.

My two cents on the topic, thank you.
                
> After a leader change, messages sent with ack=0 are lost
> --------------------------------------------------------
>
>                 Key: KAFKA-955
>                 URL: https://issues.apache.org/jira/browse/KAFKA-955
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Rosenberg
>            Assignee: Guozhang Wang
>         Attachments: KAFKA-955.v1.patch, KAFKA-955.v1.patch, 
> KAFKA-955.v2.patch, KAFKA-955.v3.patch
>
>
> If the leader changes for a partition, and a producer is sending messages 
> with ack=0, then messages will be lost, since the producer has no active way 
> of knowing that the leader has changed, until it's next metadata refresh 
> update.
> The broker receiving the message, which is no longer the leader, logs a 
> message like this:
> Produce request with correlation id 7136261 from client  on partition 
> [mytopic,0] failed due to Leader not local for partition [mytopic,0] on 
> broker 508818741
> This is exacerbated by the controlled shutdown mechanism, which forces an 
> immediate leader change.
> A possible solution to this would be for a broker which receives a message, 
> for a topic that it is no longer the leader for (and if the ack level is 0), 
> then the broker could just silently forward the message over to the current 
> leader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to