[ 
https://issues.apache.org/jira/browse/KAFKA-16474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837342#comment-17837342
 ] 
Philip Nee commented on KAFKA-16474:
------------------------------------

The log was attached:

See line 7481 and 7492 in :
{code:java}
AssignmentValidationTest/test_valid_assignment/metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer.group_remote_assignor=uniform/2/VerifiableConsumer-0-281473320420544/ducker11/verifiable_consumer.log
 {code}
{code:java}
7480 [2024-04-15 16:03:35,964] TRACE [Consumer 
clientId=consumer-test_group_id-1, groupId=test_group_id] Found least loaded 
node ducker10:9092 (id: 2 rack: null) connected with no in-flight requests 
(org.apache.kafka.clients.NetworkClient)
7481 [2024-04-15 16:03:35,964] DEBUG [Consumer 
clientId=consumer-test_group_id-1, groupId=test_group_id] Sending 
CONSUMER_GROUP_HEARTBEAT request with header 
RequestHeader(apiKey=CONSUMER_GROUP_HEARTBEAT, apiVersion=0, 
clientId=consumer-test_gro     up_id-1, correlationId=108, headerVersion=2) and 
timeout 30000 to node 2147483646: 
ConsumerGroupHeartbeatRequestData(groupId='test_group_id', memberId='', 
memberEpoch=0, instanceId=null, rackId=null, rebalanceTimeoutMs=300000, 
subscribedTop     icNames=[test_topic], serverAssignor='uniform', 
topicPartitions=[]) (org.apache.kafka.clients.NetworkClient)
7482 [2024-04-15 16:03:35,964] TRACE For telemetry state SUBSCRIPTION_NEEDED, 
returning the value 299843 ms;  
(org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter)
7483 [2024-04-15 16:03:35,964] TRACE [Consumer 
clientId=consumer-test_group_id-1, groupId=test_group_id] Starting processing 
of 1 event (org.apache.kafka.clients.consumer.internals.events.EventProcessor)
7484 [2024-04-15 16:03:35,964] TRACE [Consumer 
clientId=consumer-test_group_id-1, groupId=test_group_id] Processing event: 
ValidatePositionsEvent{type=VALIDATE_POSITIONS, id=nMQf3YgtRU6vaJu-oVpiQg, 
future=java.util.concurrent.CompletableFuture@5     7bc27f5[Not completed, 1 
dependents], deadlineMs=9223372036854775807} 
(org.apache.kafka.clients.consumer.internals.events.EventProcessor)
7485 [2024-04-15 16:03:35,964] TRACE [Consumer 
clientId=consumer-test_group_id-1, groupId=test_group_id] Completed processing 
(org.apache.kafka.clients.consumer.internals.events.EventProcessor)
7486 [2024-04-15 16:03:35,964] TRACE [Consumer 
clientId=consumer-test_group_id-1, groupId=test_group_id] 999 ms remain before 
another request should be sent for 
RequestState{owner='org.apache.kafka.clients.consumer.internals.HeartbeatRequestMana
     ger$HeartbeatRequestState', 
exponentialBackoff=ExponentialBackoff{multiplier=2, expMax=3.3219280948873626, 
initialInterval=100, jitter=300000.0}, lastSentMs=1713197015963, 
lastReceivedMs=1713197015963, numAttempts=1, backoffMs=1000} (org.ap     
ache.kafka.clients.consumer.internals.RequestState)
7487 [2024-04-15 16:03:35,964] TRACE [Consumer 
clientId=consumer-test_group_id-1, groupId=test_group_id] Polling for fetches 
with timeout 0 (org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer)
7488 [2024-04-15 16:03:35,964] TRACE [Consumer 
clientId=consumer-test_group_id-1, groupId=test_group_id] Found least loaded 
node ducker10:9092 (id: 2 rack: null) connected with no in-flight requests 
(org.apache.kafka.clients.NetworkClient)
7489 [2024-04-15 16:03:35,964] TRACE [Consumer 
clientId=consumer-test_group_id-1, groupId=test_group_id] Enqueued event: 
PollEvent{type=POLL, id=T9IwUDeHQoGrnqJJIyzb8Q, pollTimeMs=1713197015964} 
(org.apache.kafka.clients.consumer.internals.event     
s.ApplicationEventHandler)
7490 [2024-04-15 16:03:35,964] TRACE [Consumer 
clientId=consumer-test_group_id-1, groupId=test_group_id] No events to process 
(org.apache.kafka.clients.consumer.internals.events.EventProcessor)
7491 [2024-04-15 16:03:35,964] TRACE [Consumer 
clientId=consumer-test_group_id-1, groupId=test_group_id] Enqueued event: 
ValidatePositionsEvent{type=VALIDATE_POSITIONS, id=otKgqakkS9COd01SkJ3btQ, 
future=java.util.concurrent.CompletableFuture@5fb     759d6[Not completed], 
deadlineMs=9223372036854775807} 
(org.apache.kafka.clients.consumer.internals.events.ApplicationEventHandler)
7492 [2024-04-15 16:03:35,964] DEBUG [Consumer 
clientId=consumer-test_group_id-1, groupId=test_group_id] Sending 
CONSUMER_GROUP_HEARTBEAT request with header 
RequestHeader(apiKey=CONSUMER_GROUP_HEARTBEAT, apiVersion=0, 
clientId=consumer-test_gro     up_id-1, correlationId=109, headerVersion=2) and 
timeout 30000 to node 2147483646: 
ConsumerGroupHeartbeatRequestData(groupId='test_group_id', memberId='', 
memberEpoch=0, instanceId=null, rackId=null, rebalanceTimeoutMs=300000, 
subscribedTop     icNames=[test_topic], serverAssignor='uniform', 
topicPartitions=[]) (org.apache.kafka.clients.NetworkClient) {code}

> AsyncKafkaConsumer might send out heartbeat request without waiting for its 
> response
> ------------------------------------------------------------------------------------
>
>                 Key: KAFKA-16474
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16474
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>            Reporter: Philip Nee
>            Assignee: Lianet Magrans
>            Priority: Critical
>              Labels: kip-848-client-support
>             Fix For: 3.8.0
>
>         Attachments: failing_results.zip
>
>
> KAFKA-16389
> We've discovered that in some uncommon cases, the consumer could send out 
> successive heartbeats without waiting for the response to come back.  this 
> might result in causing the consumer to revoke its just assigned assignments 
> in some cases.  For example:
>  
> The consumer first sends out a heartbeat with epoch=0 and memberId='' 
> The consumer then rapidly sends out another heartbeat with epoch=0 and 
> memberId='' because it has not gotten any response and thus not updating its 
> local state
>  
> The consumer receives assignments from the first heartbeat and reconciles its 
> assignment.
>  
> Since the second heartbeat has epoch=0 and memberId='', the server will think 
> this is a new member joining and therefore send out an empty assignment.  
>  
> The consumer receives the response from the second heartbeat.  Revoke all of 
> its partitions.
>  
> There are 2 issues associate with this bug:
>  # inflight logic
>  # rapid poll: In the KAFKA-16389 we've observe consumer polling interval to 
> be a few ms.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to