[ https://issues.apache.org/jira/browse/KAFKA-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias J. Sax reopened KAFKA-8677: ------------------------------------ Reopening this ticket. Test failed again: [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/4226/testReport/junit/kafka.api/GroupEndToEndAuthorizationTest/testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl/] {code:java} org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'responses': Error reading array of size 131085, only 28 bytes available at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:110) at org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:313) at org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:719) at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:833) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:556) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1306) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1246) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1214) at kafka.utils.TestUtils$.pollUntilAtLeastNumRecords(TestUtils.scala:795) at kafka.utils.TestUtils$.consumeRecords(TestUtils.scala:1351) at kafka.api.EndToEndAuthorizationTest.consumeRecords(EndToEndAuthorizationTest.scala:537) at kafka.api.EndToEndAuthorizationTest.consumeRecordsIgnoreOneAuthorizationException(EndToEndAuthorizationTest.scala:556) at kafka.api.EndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl(EndToEndAuthorizationTest.scala:376) {code} > Flakey test > GroupEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl > ------------------------------------------------------------------------------------------------ > > Key: KAFKA-8677 > URL: https://issues.apache.org/jira/browse/KAFKA-8677 > Project: Kafka > Issue Type: Bug > Components: core, security, unit tests > Affects Versions: 2.4.0 > Reporter: Boyang Chen > Assignee: Guozhang Wang > Priority: Blocker > Labels: flaky-test > Fix For: 2.4.0 > > > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6325/console] > > *18:43:39* kafka.api.GroupEndToEndAuthorizationTest > > testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl STARTED*18:44:00* > kafka.api.GroupEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl > failed, log available in > /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/core/build/reports/testOutput/kafka.api.GroupEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl.test.stdout*18:44:00* > *18:44:00* kafka.api.GroupEndToEndAuthorizationTest > > testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl FAILED*18:44:00* > org.scalatest.exceptions.TestFailedException: Consumed 0 records before > timeout instead of the expected 1 records > --------------------------- > I found this flaky test is actually exposing a real bug in consumer: within > {{KafkaConsumer.poll}}, we have an optimization to try to send the next fetch > request before returning the data in order to pipelining the fetch requests: > {code} > if (!records.isEmpty()) { > // before returning the fetched records, we can send off > the next round of fetches > // and avoid block waiting for their responses to enable > pipelining while the user > // is handling the fetched records. > // > // NOTE: since the consumed position has already been > updated, we must not allow > // wakeups or any other errors to be triggered prior to > returning the fetched records. > if (fetcher.sendFetches() > 0 || > client.hasPendingRequests()) { > client.pollNoWakeup(); > } > return this.interceptors.onConsume(new > ConsumerRecords<>(records)); > } > {code} > As the NOTE mentioned, this pollNoWakeup should NOT throw any exceptions, > since at this point the fetch position has been updated. If an exception is > thrown here, and the callers decides to capture and continue, those records > would never be returned again, causing data loss. -- This message was sent by Atlassian Jira (v8.3.4#803005)