[ https://issues.apache.org/jira/browse/KAFKA-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093658#comment-14093658 ]
Joe Stein commented on KAFKA-1589: ---------------------------------- This would be a great contribution. I would also ask please for documentation for these scripts. There are a lot of nooks and crannies in the systems_test directory and a bunch of gems but once folks go in there they get errors and turn away (so I have seen) e.g. vagrant@precise64:~$ python kafka-0.8.2-SNAPSHOT-src/system_test/utils/metrics.py Traceback (most recent call last): File "kafka-0.8.2-SNAPSHOT-src/system_test/utils/metrics.py", line 34, in <module> import matplotlib as mpl ImportError: No module named matplotlib not that items like this can't be corrected with steps and effort but it isn't a good "out of the box" experience" we could provide do it differently. I think with more communicable "how to" and ease of use more folks in the community will latch on to /system_test/ and make them part of their cycles in their environments. This also goes to the heart / root cause of what you the pain is here to I think. > Strengthen System Tests > ----------------------- > > Key: KAFKA-1589 > URL: https://issues.apache.org/jira/browse/KAFKA-1589 > Project: Kafka > Issue Type: Bug > Reporter: Guozhang Wang > Fix For: 0.9.0 > > > Although the system test code is also part of the open source repository, not > too much attention is paid to this module today. The incurred results is that > we keep breaking the system tests with either changes on the admin tools, or > library upgrades that change the APIs like Zookeeper. And when the system > tests breaks / hangs / etc, it is also hard to debug the issue. We need to > treat the system test suite just as part of the open source code. > Based on my personal experience trouble shooting system tests, I would > propose doing at least the follow enhancement around system tests. > 1. Add unit tests for all system util test tools, for example: > kafka_system_test_utils.get_controller_attributes > kafka_system_test_utils.get_leader_for > 2. Add exception handling logic in the python test framework to clean-up the > testbed upon failures, so that the subsequent test cases will not be affected. > 3. Remove timing based mechanism such as "sleep(5000) to wait for metadata to > be propagated" as much as possible to avoid transient failures. > After those enhancements, we should probably also pick a very small subset > (say one from each suite) of the system test cases into the patch reviewing > process along with the unit tests. -- This message was sent by Atlassian JIRA (v6.2#6252)