[ https://issues.apache.org/jira/browse/KAFKA-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rajini Sivaram resolved KAFKA-8052. ----------------------------------- Resolution: Fixed Reviewer: Jason Gustafson Fix Version/s: 2.3.0 > Intermittent INVALID_FETCH_SESSION_EPOCH error on FETCH request > ---------------------------------------------------------------- > > Key: KAFKA-8052 > URL: https://issues.apache.org/jira/browse/KAFKA-8052 > Project: Kafka > Issue Type: Bug > Components: clients > Affects Versions: 2.1.0 > Reporter: Bartek Jakub > Assignee: Rajini Sivaram > Priority: Major > Fix For: 2.3.0 > > > I noticed in my logs some weird behavior. I see in logs intermittent log: > {noformat} > 2019-03-06 14:02:13.024 INFO 1 --- [container-1-C-1] > o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-4, > groupId=service-main] Node 2 was unable to process the fetch request with > (sessionId=1321134604, epoch=125730): INVALID_FETCH_SESSION_EPOCH.{noformat} > which happens every ~1 hour. > > I was wondering if it's my Kafka provider fault so I decided to investigate > the problem and I tried to reproduce the issue on my local - with success. My > configuration is: > * Kafka Clients version - 2.0.1 > * Kafka - 2.12_2.1.0 > > I enabled trace logs for 'org.apache.kafka.clients' and that's what I get: > {noformat} > 2019-03-05 21:04:16.161 DEBUG 3052 --- [container-0-C-1] > o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, > groupId=service-main] Built incremental fetch (sessionId=197970881, > epoch=525) for node 1001. Added (), altered (), removed () out of > (itunes-command-19, itunes-command-18, itunes-command-11, itunes-command-10, > itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, > itunes-command-17, itunes-command-16) > 2019-03-05 21:04:16.161 DEBUG 3052 --- [container-0-C-1] > o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-3, > groupId=service-main] Sending READ_UNCOMMITTED > IncrementalFetchRequest(toSend=(), toForget=(), implied=(itunes-command-19, > itunes-command-18, itunes-command-11, itunes-command-10, itunes-command-13, > itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, > itunes-command-16)) to broker localhost:9092 (id: 1001 rack: null) > 2019-03-05 21:04:16.161 TRACE 3052 --- [container-0-C-1] > org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, > groupId=service-main] Sending FETCH > {replica_id=-1,max_wait_time=500,min_bytes=1,max_bytes=52428800,isolation_level=0,session_id=197970881,epoch=525,topics=[],forgotten_topics_data=[]} > with correlation id 629 to node 1001 > 2019-03-05 21:04:16.664 TRACE 3052 --- [container-0-C-1] > org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, > groupId=service-main] Completed receive from node 1001 for FETCH with > correlation id 629, received > {throttle_time_ms=0,error_code=0,session_id=197970881,responses=[]} > 2019-03-05 21:04:16.664 DEBUG 3052 --- [container-0-C-1] > o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, > groupId=service-main] Node 1001 sent an incremental fetch response for > session 197970881 with response=(), implied=(itunes-command-19, > itunes-command-18, itunes-command-11, itunes-command-10, itunes-command-13, > itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, > itunes-command-16) > 2019-03-05 21:04:16.665 DEBUG 3052 --- [container-0-C-1] > o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, > groupId=service-main] Built incremental fetch (sessionId=197970881, > epoch=526) for node 1001. Added (), altered (), removed () out of > (itunes-command-19, itunes-command-18, itunes-command-11, itunes-command-10, > itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, > itunes-command-17, itunes-command-16) > 2019-03-05 21:04:16.665 DEBUG 3052 --- [container-0-C-1] > o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-3, > groupId=service-main] Sending READ_UNCOMMITTED > IncrementalFetchRequest(toSend=(), toForget=(), implied=(itunes-command-19, > itunes-command-18, itunes-command-11, itunes-command-10, itunes-command-13, > itunes-command-12, itunes-command-15, itunes-command-14, itunes-command-17, > itunes-command-16)) to broker localhost:9092 (id: 1001 rack: null) > 2019-03-05 21:04:16.665 TRACE 3052 --- [container-0-C-1] > org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, > groupId=service-main - F630] Sending FETCH > {replica_id=-1,max_wait_time=500,min_bytes=1,max_bytes=52428800,isolation_level=0,session_id=197970881,epoch=526,topics=[],forgotten_topics_data=[]} > with correlation id 630 to node 1001 > 2019-03-05 21:04:17.152 DEBUG 3052 --- [ service-main] > o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-3, > groupId=service-main - ???] Sending Heartbeat request to coordinator > localhost:9092 (id: 2147482646 rack: null) > 2019-03-05 21:04:17.153 TRACE 3052 --- [ service-main] > org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, > groupId=service-main - H631] Sending HEARTBEAT > {group_id=service-main,generation_id=108,member_id=consumer-3-7e3c718e-f472-4bd8-9404-458ae5ed9a26} > with correlation id 631 to node 2147482646 > 2019-03-05 21:04:17.254 TRACE 3052 --- [ service-main] > org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, > groupId=service-main - ???] Completed receive from node 2147482646 for > HEARTBEAT with correlation id 631, received {throttle_time_ms=0,error_code=0} > // After two seconds FETCH response has been received... > 2019-03-05 21:04:19.141 TRACE 3052 --- [ service-main] > org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, > groupId=service-main - F630] Completed receive from node 1001 for FETCH with > correlation id 630, received > {throttle_time_ms=0,error_code=0,session_id=197970881,responses=[]} > 2019-03-05 21:04:19.142 DEBUG 3052 --- [ service-main] > o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-3, > groupId=service-main] Received successful Heartbeat response > // But in the same moment, Fethcer is sending FETCH with same epoch once again > 2019-03-05 21:04:19.142 DEBUG 3052 --- [container-0-C-1] > o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, > groupId=service-main] Built incremental fetch (sessionId=197970881, > epoch=526) for node 1001. Added (), altered (), removed (itunes-command-19, > itunes-command-18, itunes-command-11) out of (itunes-command-10, > itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, > itunes-command-17, itunes-command-16) > 2019-03-05 21:04:19.142 DEBUG 3052 --- [container-0-C-1] > o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-3, > groupId=service-main] Sending READ_UNCOMMITTED > IncrementalFetchRequest(toSend=(), toForget=(itunes-command-19, > itunes-command-18, itunes-command-11), implied=(itunes-command-10, > itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, > itunes-command-17, itunes-command-16)) to broker localhost:9092 (id: 1001 > rack: null) > 2019-03-05 21:04:19.142 DEBUG 3052 --- [ service-main] > o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, > groupId=service-main - F630 ] Node 1001 sent an incremental fetch response > for session 197970881 with response=(), implied=(itunes-command-10, > itunes-command-13, itunes-command-12, itunes-command-15, itunes-command-14, > itunes-command-17, itunes-command-16) > 2019-03-05 21:04:19.142 TRACE 3052 --- [container-0-C-1] > org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, > groupId=service-main - F632] Sending FETCH > {replica_id=-1,max_wait_time=500,min_bytes=1,max_bytes=52428800,isolation_level=0,session_id=197970881,epoch=526,topics=[],forgotten_topics_data=[{topic=itunes-command,partitions=[19,18,11]}]} > with correlation id 632 to node 1001 > 2019-03-05 21:04:19.152 TRACE 3052 --- [container-0-C-1] > org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, > groupId=service-main] Completed receive from node 1001 for FETCH with > correlation id 632, received > {throttle_time_ms=0,error_code=71,session_id=0,responses=[]} > 2019-03-05 21:04:52.546 TRACE 3052 --- [ service-main] > org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, > groupId=service-main] Found least loaded node localhost:9092 (id: 1001 rack: > null) > 2019-03-05 21:04:52.546 DEBUG 3052 --- [ service-main] > org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, > groupId=service-main] Sending metadata request (type=MetadataRequest, > topics=itunes-command) to node localhost:9092 (id: 1001 rack: null) > 2019-03-05 21:04:52.546 TRACE 3052 --- [ service-main] > org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-3, > groupId=service-main] Sending METADATA > {topics=[itunes-command],allow_auto_topic_creation=true} with correlation id > 633 to node 1001 > 2019-03-05 21:04:52.547 INFO 3052 --- [ service-main] > o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-3, > groupId=service-main] Group coordinator localhost:9092 (id: 2147482646 rack: > null) is unavailable or invalid, will attempt rediscovery > 2019-03-05 21:12:04.206 INFO 3052 --- [container-0-C-1] > o.a.kafka.clients.FetchSessionHandler : [Consumer clientId=consumer-3, > groupId=service-main] Node 1001 was unable to process the fetch request with > (sessionId=197970881, epoch=527): INVALID_FETCH_SESSION_EPOCH.{noformat} > > The first request has been sent with epoch 525 and that's ok. > The second request has been sent with epoch 526, and... in the meantime, > AbstractCoordinator starts sending Heartbeat request. Everything stops for ~2 > seconds. After these 2 seconds, the response for FETCH request has been > received. *But* *in the same moment*, Fetcher decides to send FETCH request > *with the same epoch* once again! That results with > INVALID_FETCH_SESSION_EPOCH response from Kafka node (which is > understandable). > I'm wondering if is there some workaround for this problem? > And the second question is - does it impact my service performance? -- This message was sent by Atlassian JIRA (v7.6.3#76005)