GitHub user ZoneMayor opened a pull request: https://github.com/apache/kafka/pull/689
KAFKA-2058: ProducerTest.testSendWithDeadBroker transient failure I reproduced this transient failure; It turns that waitUntilMetadataIsPropagated is not enough; in "onBrokerStartup", methods below will send send both LeaderAndIsrRequest and UpdateMetadataRequest to KafkaApis: replicaStateMachine.handleStateChanges(allReplicasOnNewBrokers, OnlineReplica) partitionStateMachine.triggerOnlinePartitionStateChange() The two kinds of request are handled seperately and we are not sure about the order; If UpdateMetadataRequest is handled first, metadataCache of kafkaApis will be updated, thus TestUtils.waitUntilMetadataIsPropagated will be satisfied, and consumer can(will) start fetching data; But if the LeaderAndIsrRequest is not handled at this moment, "becomeLeaderOrFollower" cannot be called , thus structures like "leaderReplicaOpt" cannot be updated, which leads to failure of consumer's fetching data; To fix above, consumer should start fetching data after partition's leaderReplica is refreshed, not just the leader is elected; You can merge this pull request into a Git repository by running: $ git pull https://github.com/ZoneMayor/kafka trunk-KAFKA-2058 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/689.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #689 ---- commit 95374147a28208d4850f6e73f714bf418935fc2d Author: ZoneMayor <jinxing6...@126.com> Date: 2015-11-27T03:49:34Z Merge pull request #1 from apache/trunk merge commit cec5b48b651a7efd3900cfa3c1fd0ab1eeeaa3ec Author: ZoneMayor <jinxing6...@126.com> Date: 2015-12-01T10:44:02Z Merge pull request #2 from apache/trunk 2015-12-1 commit a119d547bf1741625ce0627073c7909992a20f15 Author: ZoneMayor <jinxing6...@126.com> Date: 2015-12-04T13:42:27Z Merge pull request #3 from apache/trunk 2015-12-04#KAFKA-2893 commit b767a8dff85fc71c75d4cf5178c3f6f03ff81bfc Author: ZoneMayor <jinxing6...@126.com> Date: 2015-12-09T10:42:30Z Merge pull request #5 from apache/trunk 2015-12-9 commit 0070c2d71d06ee8baa1cddb3451cd5af6c6b1d4a Author: ZoneMayor <jinxing6...@126.com> Date: 2015-12-11T14:50:30Z Merge pull request #8 from apache/trunk 2015-12-11 commit 09908ac646d4c84f854dad63b8c99213b74a7063 Author: ZoneMayor <jinxing6...@126.com> Date: 2015-12-13T14:17:19Z Merge pull request #9 from apache/trunk 2015-12-13 commit 30b26b2d3c714bff11f4c58f00f5d1b075a592e9 Author: ZoneMayor <jinxing6...@126.com> Date: 2015-12-17T12:27:27Z Merge pull request #10 from apache/trunk 2015-12-17 commit 6b1790b2742fa1244d3ba44aef459d8d5a6d3b55 Author: jinxing <jinx...@fenbi.com> Date: 2015-12-17T12:30:38Z KAFKA-2058: 30b26b2d3c714bff11f4c58f00f5d1b075a592e9 ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---