GitHub user ZoneMayor reopened a pull request: https://github.com/apache/kafka/pull/648
KAFKA-2837: fix transient failure of kafka.api.ProducerBounceTest > testBrokerFailure I can reproduced this transient failure, it seldom happen; code is like below: // rolling bounce brokers for (i <- 0 until numServers) { for (server <- servers) { server.shutdown() server.awaitShutdown() server.startup() Thread.sleep(2000) } // Make sure the producer do not see any exception // in returned metadata due to broker failures assertTrue(scheduler.failed == false) // Make sure the leader still exists after bouncing brokers (0 until numPartitions).foreach(partition => TestUtils.waitUntilLeaderIsElectedOrChanged(zkUtils, topic1, partition)) Brokers keep rolling restart, and producer keep sending messages; In every loop, it will wait for election of partition leader; But if the election is slow, more messages will be buffered in RecordAccumulator's BufferPool; The limit for buffer is set to be 30000; TimeoutException("Failed to allocate memory within the configured max blocking time") will show up when out of memory; Since for every restart of the broker, it will sleep for 2000 ms, so this transient failure seldom happen; But if I reduce the sleeping period, the bigger chance failure happens; for example if the broker with role of controller suffered a restart, it will take time to select controller first, then select leader, which will lead to more messges blocked in KafkaProducer:RecordAccumulator:BufferPool; In this fix, I just enlarge the producer's buffer size to be 1MB; @guozhangwang , Could you give some comments? You can merge this pull request into a Git repository by running: $ git pull https://github.com/ZoneMayor/kafka trunk-KAFKA-2837 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/648.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #648 ---- commit 95374147a28208d4850f6e73f714bf418935fc2d Author: ZoneMayor <jinxing6...@126.com> Date: 2015-11-27T03:49:34Z Merge pull request #1 from apache/trunk merge commit cec5b48b651a7efd3900cfa3c1fd0ab1eeeaa3ec Author: ZoneMayor <jinxing6...@126.com> Date: 2015-12-01T10:44:02Z Merge pull request #2 from apache/trunk 2015-12-1 commit a119d547bf1741625ce0627073c7909992a20f15 Author: ZoneMayor <jinxing6...@126.com> Date: 2015-12-04T13:42:27Z Merge pull request #3 from apache/trunk 2015-12-04#KAFKA-2893 commit b767a8dff85fc71c75d4cf5178c3f6f03ff81bfc Author: ZoneMayor <jinxing6...@126.com> Date: 2015-12-09T10:42:30Z Merge pull request #5 from apache/trunk 2015-12-9 commit cd5e6f4700a4387f9383b84aca0ee9c4639b1033 Author: jinxing <jinx...@fenbi.com> Date: 2015-12-09T13:49:07Z KAFKA-2837: fix transient failure kafka.api.ProducerBounceTest > testBrokerFailure commit 8ded9104a04861f789a7a990c2ddd4fc38a899cd Author: ZoneMayor <jinxing6...@126.com> Date: 2015-12-10T04:47:06Z Merge pull request #6 from apache/trunk 2015-12-10 commit 2bcf010c73923bb24bbd9cece7e39983b2bdce0c Author: jinxing <jinx...@fenbi.com> Date: 2015-12-10T04:47:39Z KAFKA-2837: WIP commit dae4a3cc0b564bb25121d54e65b5ad363c3e866d Author: jinxing <jinx...@fenbi.com> Date: 2015-12-10T04:48:21Z Merge branch 'trunk-KAFKA-2837' of https://github.com/ZoneMayor/kafka into trunk-KAFKA-2837 commit 7118e11813e445bca3eab65a23028e76138b136a Author: jinxing <jinx...@fenbi.com> Date: 2015-12-10T04:51:43Z KAFKA-2837: WIP ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---