[ https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350725#comment-14350725 ]
Ewen Cheslack-Postava commented on KAFKA-1501: ---------------------------------------------- Uploaded a wip patch. It gets rid of choosePorts entirely and makes the tests work using random ports instead for both ZK and Kafka. A couple of notes: 1. One change this necessitated is that a bunch of things that used to just be initialized during test class construction now have to be dynamic since you can't generate the Kafka configs until you know the ZK port. This has two impacts. First, KafkaServerTestHarness subclasses now have to override a generateConfigs() class rather than just overriding the configs field. Second, the minimal patch to make this work maintains the ability to access some data (info about zk, the list of configs) like fields (no ()), but I think this might just be misleading or confusing to people writing tests -- something like getConfigs() might make it clearer that it will only be valid while a test is running. 2. A few tests were specifying ports directly instead of using choosePorts. I think I found them all, but it'd be good to have a couple more eyes looking for them. 3. Tests that bounce brokers became more difficult because the port changes when you restart. In most cases you this isn't a problem, you just need to make sure you instantiate producers/consumers at the right time. However, one test (ProducerFailureHandlingTest.testBrokerFailure) revealed an underlying issue. There are conditions where you can bounce the brokers too quickly and because of the way the new producer gets metadata, it can get stuck with old metadata and none of the brokers are listening on the ports it has. I included a patch which in theory should address the problem, but the producer is also having an issue where sometimes connection requests take a long time to finish, and during that time the brokers all bounce, leaving the producer with no useful addresses in its copy of the metadata. In practice you would never bounce your servers to new addresses that quickly, so this is purely an artifact of having to use random ports during tests. If anyone has suggestions for how to handle this, I'm all ears. In order to allow testing the rest of the patch, I commented out that test for the time being. I wanted to get this up so we can discuss these issues, but also so [~guozhang] can test this to verify the approach will work before I spend much more time on it. I tested a few times with 5 copies of the tests running concurrently. > transient unit tests failures due to port already in use > -------------------------------------------------------- > > Key: KAFKA-1501 > URL: https://issues.apache.org/jira/browse/KAFKA-1501 > Project: Kafka > Issue Type: Improvement > Components: core > Reporter: Jun Rao > Assignee: Guozhang Wang > Labels: newbie > Attachments: KAFKA-1501-choosePorts.patch, KAFKA-1501.patch, > KAFKA-1501.patch, KAFKA-1501.patch, KAFKA-1501.patch, test-100.out, > test-100.out, test-27.out, test-29.out, test-32.out, test-35.out, > test-38.out, test-4.out, test-42.out, test-45.out, test-46.out, test-51.out, > test-55.out, test-58.out, test-59.out, test-60.out, test-69.out, test-72.out, > test-74.out, test-76.out, test-84.out, test-87.out, test-91.out, test-92.out > > > Saw the following transient failures. > kafka.api.ProducerFailureHandlingTest > testTooLargeRecordWithAckOne FAILED > kafka.common.KafkaException: Socket server failed to bind to > localhost:59909: Address already in use. > at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195) > at kafka.network.Acceptor.<init>(SocketServer.scala:141) > at kafka.network.SocketServer.startup(SocketServer.scala:68) > at kafka.server.KafkaServer.startup(KafkaServer.scala:95) > at kafka.utils.TestUtils$.createServer(TestUtils.scala:123) > at > kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68) -- This message was sent by Atlassian JIRA (v6.3.4#6332)