Tony Reix created KAFKA-1970: -------------------------------- Summary: Several tests are not stable Key: KAFKA-1970 URL: https://issues.apache.org/jira/browse/KAFKA-1970 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1 Environment: Several: - RHEL7.1/x86_64 <=== My reference - RHEL7.0/x86_64 - Ubuntu /x86_64 - RHEL7.1/PPC64LE - RHEL7.0/PPC64BE - OpenJDK 1.7 - IBM JVM 1.7 Reporter: Tony Reix
I'm porting Kafka 0.8.1 on RHEL 7.1/PPC64LE. Since it looked that tests were unstable, I've launched the tests on several environments, in order to have a wide view. I'm using: - ./gradlew build -x signArchives or: - ./gradlew test -x signArchives Results seem to show: - Tests are unstable everywhere (very few on your Ubuntu/x86_64 test env) - IBM JVM shows some more issues than OpenJDK - Sometimes, tests are not lauched, with no reason. But not on my reference environment (RHEL7.1/x86_64/OPenJDK) - Open JDK : Tests runs results: - dorado-vm2 - RHEL7.0/x86_64 : - 238 tests completed, 82 failed - 238 tests completed, 94 failed - BUILD SUCCESSFUL - dorado-vm3 - Ubuntu /x86_64 : - BUILD SUCCESSFUL - soe01x - RHEL7.1/x86_64 : - 238 tests completed, 4 failed x 2 times - soe07-vm1 - RHEL7.1/PPC64LE: - BUILD SUCCESSFUL - 238 tests completed, 2 failed - 238 tests completed, 3 failed - IBM JVM : Tests runs results: - dorado-vm2 - RHEL7.0/x86_64 : - 1 failed + Tests Blocked - BUILD SUCCESSFUL - soe01x - RHEL7.1/x86_64 : - 238 tests completed, 6 failed - 238 tests completed, 4 failed x 3 times - 238 tests completed, 5 failed - soe07-vm1 - RHEL7.1/PPC64LE: - 238 tests completed, 1 failed - BUILD SUCCESSFUL - laurel6 - RHEL7.0/PPC64BE: - 238 tests completed, 1 failed - BUILD SUCCESSFUL ========================================================= I think that these tests are unstable: kafka.server.LogRecoveryTest > testHWCheckpointNoFailuresMultipleLogSegments kafka.server.LogRecoveryTest > testHWCheckpointWithFailuresMultipleLogSegments kafka.admin.DeleteTopicTest > testAutoCreateAfterDeleteTopic kafka.admin.DeleteTopicTest > testPreferredReplicaElectionDuringDeleteTopic kafka.server.RequestPurgatoryTest > testRequestExpiry ========================================================= These tests are failing often (always on my reference environment (RHEL7.1/x86_64/OpenJDK), but not on Ubuntu) : kafka.server.LogOffsetTest > testEmptyLogsGetOffsets kafka.server.LogOffsetTest > testGetOffsetsBeforeLatestTime kafka.server.LogOffsetTest > testGetOffsetsBeforeEarliestTime kafka.server.LogOffsetTest > testGetOffsetsBeforeNow ========================================================= As an example or random failures, on my reference environment (RHEL7.1/x86_64/OpenJDK) , the test: kafka.server.LogRecoveryTest > testHWCheckpointNoFailuresMultipleLogSegments failed 2 times out of 12. ========================================================= On Ubuntu/x86_64/OpenJDK , out of 3 runs of : gradlew test -x signArchive I've got: - 3 Full success - 1 launch that did NOT run the tests ========================================================= Still on x86_64/OpenJDK , I'm surprised to always have 4 failures with RHEL 7.1 and none on Ubuntu. Some issue within RHEL 7.1 and/or Java ? ========================================================= On RHEL 7.1 / PPC64LE / IBM JVM, I see a wide unstability. I've run 12 tests. Parsing them about "FAILED" tests with: for i in 1 2 10 11 12 13 14 15 16 17 18 19; do $i; N=`grep FAILED gradlew.build.IBMJVM.res$i | wc -l`; echo $i":"$N; done gave: 1:3 2:0 10:0 11:3 12:4 13:49 14:3 15:4 16:0 17:0 18:0 19:0 Doing the same about "PASSED" tests, I've got: 1:372 2:238 10:0 11:372 12:236 13:191 14:237 15:236 16:238 17:0 18:0 19:0 Showing: - 4 launches did NOT run the tests - 2 launches were SUCCESSFUL - for the others, there were 3, 4 or 49 FAILED tests ========================================================= Conclusions: I think that it would be useful for Kafka project: - to run tests with IBM JVM in addition to OpenJDK. - to run tests on a different Linux distrib than Ubuntu: RHEL . - to check (by running it many times) that the following test is stable in your standard test environment: kafka.server.LogRecoveryTest > testHWCheckpointNoFailuresMultipleLogSegments On my side, there are other causes of unstability in my specific environments (PPC64) that I have to study. -- This message was sent by Atlassian JIRA (v6.3.4#6332)