[ https://issues.apache.org/jira/browse/KAFKA-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15797404#comment-15797404 ]
Ewen Cheslack-Postava commented on KAFKA-4558: ---------------------------------------------- [~apurva] I'm thinking it'd be a lot easier to handle this type of issue if we forced system tests to use a custom MetricsReporter that exposes metrics via HTTP. Right now it's a pain to grab metrics because the JMX reporter isn't easily accessible to the test driver app written in python. If we provided an alternative that gave access to metrics via HTTP, we'd have a really easy way to validate metrics from system tests. To fix this specific problem we might still need to add another metric to the consumer, but as I looked into how to get the metrics required in the system tests, it seemed really painful in its current form. This would probably also simplify some other tests currently relying on the {{JmxMixin}} class in the system tests. What do you think? We could add this in the test binaries for now to avoid any KIPs, dependency issues, etc, though I suspect we might eventually want to graduate it to its own module as many folks might find it useful to be able to just ping a URL periodically and collect all metrics data from a Kafka process. > throttling_test fails if the producer starts too fast. > ------------------------------------------------------ > > Key: KAFKA-4558 > URL: https://issues.apache.org/jira/browse/KAFKA-4558 > Project: Kafka > Issue Type: Bug > Reporter: Apurva Mehta > Assignee: Apurva Mehta > > As described in https://issues.apache.org/jira/browse/KAFKA-4526, the > throttling test will fail if the producer in the produce-consume-validate > loop starts up before the consumer is fully initialized. > We need to block the start of the producer until the consumer is ready to go. > The current plan is to poll the consumer for a particular metric (like, for > instance, partition assignment) which will act as a good proxy for successful > initialization. Currently, we just check for the existence of a process with > the PID, which is not a strong enough check, causing the test to fail > intermittently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)