GitHub user ZoneMayor reopened a pull request:
https://github.com/apache/kafka/pull/648
KAFKA-2837: fix transient failure of kafka.api.ProducerBounceTest >
testBrokerFailure
I can reproduced this transient failure, it seldom happen;
code is like below:
// rolling bounce brokers
for (i <- 0 until numServers) {
for (server <- servers) {
server.shutdown()
server.awaitShutdown()
server.startup()
Thread.sleep(2000)
}
// Make sure the producer do not see any exception
// in returned metadata due to broker failures
assertTrue(scheduler.failed == false)
// Make sure the leader still exists after bouncing brokers
(0 until numPartitions).foreach(partition =>
TestUtils.waitUntilLeaderIsElectedOrChanged(zkUtils, topic1, partition))
Brokers keep rolling restart, and producer keep sending messages;
In every loop, it will wait for election of partition leader;
But if the election is slow, more messages will be buffered in
RecordAccumulator's BufferPool;
The limit for buffer is set to be 30000;
TimeoutException("Failed to allocate memory within the configured max
blocking time") will show up when out of memory;
Since for every restart of the broker, it will sleep for 2000 ms, so this
transient failure seldom happen;
But if I reduce the sleeping period, the bigger chance failure happens;
for example if the broker with role of controller suffered a restart, it
will take time to select controller first, then select leader, which will lead
to more messges blocked in KafkaProducer:RecordAccumulator:BufferPool;
In this fix, I just enlarge the producer's buffer size to be 1MB;
@guozhangwang , Could you give some comments?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ZoneMayor/kafka trunk-KAFKA-2837
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/kafka/pull/648.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #648
----
commit 95374147a28208d4850f6e73f714bf418935fc2d
Author: ZoneMayor <[email protected]>
Date: 2015-11-27T03:49:34Z
Merge pull request #1 from apache/trunk
merge
commit cec5b48b651a7efd3900cfa3c1fd0ab1eeeaa3ec
Author: ZoneMayor <[email protected]>
Date: 2015-12-01T10:44:02Z
Merge pull request #2 from apache/trunk
2015-12-1
commit a119d547bf1741625ce0627073c7909992a20f15
Author: ZoneMayor <[email protected]>
Date: 2015-12-04T13:42:27Z
Merge pull request #3 from apache/trunk
2015-12-04#KAFKA-2893
commit b767a8dff85fc71c75d4cf5178c3f6f03ff81bfc
Author: ZoneMayor <[email protected]>
Date: 2015-12-09T10:42:30Z
Merge pull request #5 from apache/trunk
2015-12-9
commit cd5e6f4700a4387f9383b84aca0ee9c4639b1033
Author: jinxing <[email protected]>
Date: 2015-12-09T13:49:07Z
KAFKA-2837: fix transient failure kafka.api.ProducerBounceTest >
testBrokerFailure
commit 8ded9104a04861f789a7a990c2ddd4fc38a899cd
Author: ZoneMayor <[email protected]>
Date: 2015-12-10T04:47:06Z
Merge pull request #6 from apache/trunk
2015-12-10
commit 2bcf010c73923bb24bbd9cece7e39983b2bdce0c
Author: jinxing <[email protected]>
Date: 2015-12-10T04:47:39Z
KAFKA-2837: WIP
commit dae4a3cc0b564bb25121d54e65b5ad363c3e866d
Author: jinxing <[email protected]>
Date: 2015-12-10T04:48:21Z
Merge branch 'trunk-KAFKA-2837' of https://github.com/ZoneMayor/kafka into
trunk-KAFKA-2837
commit 7118e11813e445bca3eab65a23028e76138b136a
Author: jinxing <[email protected]>
Date: 2015-12-10T04:51:43Z
KAFKA-2837: WIP
commit 310dd6b34547b52aad21a35dcf631bda3e15ab64
Author: jinxing <[email protected]>
Date: 2015-12-11T03:43:32Z
KAFKA-2837: WIP
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---