[ https://issues.apache.org/jira/browse/KAFKA-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15800434#comment-15800434 ]
Ewen Cheslack-Postava commented on KAFKA-4558: ---------------------------------------------- Number of partitions probably isn't reliable since some consumers could end up with 0 assigned partitions (unless we always make sure tests have enough partitions to go around). Maybe just a metric indicating whether the consumer state, i.e. joining, member, etc? This is actually kind of tricky because we don't just need the partitions assigned, we also need to make sure the we've looked up and set the fetch offsets in the consumer. This might require per-assigned-partition offset information -- maybe have a metric for list of assigned partitions and then a metric for the offset (or lag) for each of them? Given some issues with other tests we've seen, I think there are others that have the same requirement. I'm fine with {{@ignore}}ing some tests if people think that's valuable, but I think it'd be better to just try to get the fix in asap since it'll be quite a bit of effort to evaluate all the tests using ProduceConsumeValidate -- I see at least 13 or so in kafkatest. > throttling_test fails if the producer starts too fast. > ------------------------------------------------------ > > Key: KAFKA-4558 > URL: https://issues.apache.org/jira/browse/KAFKA-4558 > Project: Kafka > Issue Type: Bug > Reporter: Apurva Mehta > Assignee: Apurva Mehta > > As described in https://issues.apache.org/jira/browse/KAFKA-4526, the > throttling test will fail if the producer in the produce-consume-validate > loop starts up before the consumer is fully initialized. > We need to block the start of the producer until the consumer is ready to go. > The current plan is to poll the consumer for a particular metric (like, for > instance, partition assignment) which will act as a good proxy for successful > initialization. Currently, we just check for the existence of a process with > the PID, which is not a strong enough check, causing the test to fail > intermittently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)