[ https://issues.apache.org/jira/browse/KAFKA-16474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837342#comment-17837342 ]
Philip Nee commented on KAFKA-16474: ------------------------------------ The log was attached: See line 7481 and 7492 in : {code:java} AssignmentValidationTest/test_valid_assignment/metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer.group_remote_assignor=uniform/2/VerifiableConsumer-0-281473320420544/ducker11/verifiable_consumer.log {code} {code:java} 7480 [2024-04-15 16:03:35,964] TRACE [Consumer clientId=consumer-test_group_id-1, groupId=test_group_id] Found least loaded node ducker10:9092 (id: 2 rack: null) connected with no in-flight requests (org.apache.kafka.clients.NetworkClient) 7481 [2024-04-15 16:03:35,964] DEBUG [Consumer clientId=consumer-test_group_id-1, groupId=test_group_id] Sending CONSUMER_GROUP_HEARTBEAT request with header RequestHeader(apiKey=CONSUMER_GROUP_HEARTBEAT, apiVersion=0, clientId=consumer-test_gro up_id-1, correlationId=108, headerVersion=2) and timeout 30000 to node 2147483646: ConsumerGroupHeartbeatRequestData(groupId='test_group_id', memberId='', memberEpoch=0, instanceId=null, rackId=null, rebalanceTimeoutMs=300000, subscribedTop icNames=[test_topic], serverAssignor='uniform', topicPartitions=[]) (org.apache.kafka.clients.NetworkClient) 7482 [2024-04-15 16:03:35,964] TRACE For telemetry state SUBSCRIPTION_NEEDED, returning the value 299843 ms; (org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter) 7483 [2024-04-15 16:03:35,964] TRACE [Consumer clientId=consumer-test_group_id-1, groupId=test_group_id] Starting processing of 1 event (org.apache.kafka.clients.consumer.internals.events.EventProcessor) 7484 [2024-04-15 16:03:35,964] TRACE [Consumer clientId=consumer-test_group_id-1, groupId=test_group_id] Processing event: ValidatePositionsEvent{type=VALIDATE_POSITIONS, id=nMQf3YgtRU6vaJu-oVpiQg, future=java.util.concurrent.CompletableFuture@5 7bc27f5[Not completed, 1 dependents], deadlineMs=9223372036854775807} (org.apache.kafka.clients.consumer.internals.events.EventProcessor) 7485 [2024-04-15 16:03:35,964] TRACE [Consumer clientId=consumer-test_group_id-1, groupId=test_group_id] Completed processing (org.apache.kafka.clients.consumer.internals.events.EventProcessor) 7486 [2024-04-15 16:03:35,964] TRACE [Consumer clientId=consumer-test_group_id-1, groupId=test_group_id] 999 ms remain before another request should be sent for RequestState{owner='org.apache.kafka.clients.consumer.internals.HeartbeatRequestMana ger$HeartbeatRequestState', exponentialBackoff=ExponentialBackoff{multiplier=2, expMax=3.3219280948873626, initialInterval=100, jitter=300000.0}, lastSentMs=1713197015963, lastReceivedMs=1713197015963, numAttempts=1, backoffMs=1000} (org.ap ache.kafka.clients.consumer.internals.RequestState) 7487 [2024-04-15 16:03:35,964] TRACE [Consumer clientId=consumer-test_group_id-1, groupId=test_group_id] Polling for fetches with timeout 0 (org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer) 7488 [2024-04-15 16:03:35,964] TRACE [Consumer clientId=consumer-test_group_id-1, groupId=test_group_id] Found least loaded node ducker10:9092 (id: 2 rack: null) connected with no in-flight requests (org.apache.kafka.clients.NetworkClient) 7489 [2024-04-15 16:03:35,964] TRACE [Consumer clientId=consumer-test_group_id-1, groupId=test_group_id] Enqueued event: PollEvent{type=POLL, id=T9IwUDeHQoGrnqJJIyzb8Q, pollTimeMs=1713197015964} (org.apache.kafka.clients.consumer.internals.event s.ApplicationEventHandler) 7490 [2024-04-15 16:03:35,964] TRACE [Consumer clientId=consumer-test_group_id-1, groupId=test_group_id] No events to process (org.apache.kafka.clients.consumer.internals.events.EventProcessor) 7491 [2024-04-15 16:03:35,964] TRACE [Consumer clientId=consumer-test_group_id-1, groupId=test_group_id] Enqueued event: ValidatePositionsEvent{type=VALIDATE_POSITIONS, id=otKgqakkS9COd01SkJ3btQ, future=java.util.concurrent.CompletableFuture@5fb 759d6[Not completed], deadlineMs=9223372036854775807} (org.apache.kafka.clients.consumer.internals.events.ApplicationEventHandler) 7492 [2024-04-15 16:03:35,964] DEBUG [Consumer clientId=consumer-test_group_id-1, groupId=test_group_id] Sending CONSUMER_GROUP_HEARTBEAT request with header RequestHeader(apiKey=CONSUMER_GROUP_HEARTBEAT, apiVersion=0, clientId=consumer-test_gro up_id-1, correlationId=109, headerVersion=2) and timeout 30000 to node 2147483646: ConsumerGroupHeartbeatRequestData(groupId='test_group_id', memberId='', memberEpoch=0, instanceId=null, rackId=null, rebalanceTimeoutMs=300000, subscribedTop icNames=[test_topic], serverAssignor='uniform', topicPartitions=[]) (org.apache.kafka.clients.NetworkClient) {code} > AsyncKafkaConsumer might send out heartbeat request without waiting for its > response > ------------------------------------------------------------------------------------ > > Key: KAFKA-16474 > URL: https://issues.apache.org/jira/browse/KAFKA-16474 > Project: Kafka > Issue Type: Bug > Components: clients, consumer > Reporter: Philip Nee > Assignee: Lianet Magrans > Priority: Critical > Labels: kip-848-client-support > Fix For: 3.8.0 > > Attachments: failing_results.zip > > > KAFKA-16389 > We've discovered that in some uncommon cases, the consumer could send out > successive heartbeats without waiting for the response to come back. this > might result in causing the consumer to revoke its just assigned assignments > in some cases. For example: > > The consumer first sends out a heartbeat with epoch=0 and memberId='' > The consumer then rapidly sends out another heartbeat with epoch=0 and > memberId='' because it has not gotten any response and thus not updating its > local state > > The consumer receives assignments from the first heartbeat and reconciles its > assignment. > > Since the second heartbeat has epoch=0 and memberId='', the server will think > this is a new member joining and therefore send out an empty assignment. > > The consumer receives the response from the second heartbeat. Revoke all of > its partitions. > > There are 2 issues associate with this bug: > # inflight logic > # rapid poll: In the KAFKA-16389 we've observe consumer polling interval to > be a few ms. -- This message was sent by Atlassian Jira (v8.20.10#820010)