[ https://issues.apache.org/jira/browse/KAFKA-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747692#comment-13747692 ]
Jun Rao commented on KAFKA-955: ------------------------------- Magnus, thanks for your comment. What you suggested is interesting and could be a more effective way of communicating between the producer and the broker. It does require that the producer be able to receive requests initiated at the broker. We do plan to make the producer side processing selector based for efficiency reason. However, this will be a post 0.8 item. We could consider your suggestion then. Regarding your concern about dropped messages, my take is the following. If a client chooses not to receive an ack, it probably means that losing a few batch of messages is not that important. If a client does care about data loss, it can choose ack with 1 or -1. The throughout will be less. However, there are other ways to improve the throughput (e.g., using a larger batch size and/or more instances of producers). Guozhang, patch v3 looks good to me overall. A few more comments: 30. SyncProducerTest.testMessagesizeTooLargeWithAckZero(): You hardcoded the sleep to 500ms. Could you change it to the waitUntil style wait such that the test can finish early if the conditions have been met? 31. KafkaApi.handleProducerRequest(): The logging should probably be at debug level since this doesn't indicate an error at the broker. It's really an error for the client. > After a leader change, messages sent with ack=0 are lost > -------------------------------------------------------- > > Key: KAFKA-955 > URL: https://issues.apache.org/jira/browse/KAFKA-955 > Project: Kafka > Issue Type: Bug > Reporter: Jason Rosenberg > Assignee: Guozhang Wang > Attachments: KAFKA-955.v1.patch, KAFKA-955.v1.patch, > KAFKA-955.v2.patch, KAFKA-955.v3.patch > > > If the leader changes for a partition, and a producer is sending messages > with ack=0, then messages will be lost, since the producer has no active way > of knowing that the leader has changed, until it's next metadata refresh > update. > The broker receiving the message, which is no longer the leader, logs a > message like this: > Produce request with correlation id 7136261 from client on partition > [mytopic,0] failed due to Leader not local for partition [mytopic,0] on > broker 508818741 > This is exacerbated by the controlled shutdown mechanism, which forces an > immediate leader change. > A possible solution to this would be for a broker which receives a message, > for a topic that it is no longer the leader for (and if the ack level is 0), > then the broker could just silently forward the message over to the current > leader. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira