[ https://issues.apache.org/jira/browse/KAFKA-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lucas Bradstreet resolved KAFKA-9137. ------------------------------------- Resolution: Fixed Closed by [https://github.com/apache/kafka/pull/7640] > Maintenance of FetchSession cache causing FETCH_SESSION_ID_NOT_FOUND in live > sessions > ------------------------------------------------------------------------------------- > > Key: KAFKA-9137 > URL: https://issues.apache.org/jira/browse/KAFKA-9137 > Project: Kafka > Issue Type: Bug > Components: core > Reporter: Lucas Bradstreet > Priority: Major > > We have recently seen cases where brokers end up in a bad state where fetch > session evictions occur at a high rate (> 16 per second) after a roll. This > increase in eviction rate included the following pattern in our logs: > > {noformat} > broker 6: October 31st 2019, 17:52:45.496 Created a new incremental > FetchContext for session id 2046264334, epoch 9790: added (), updated (), > removed () > broker 6: October 31st 2019, 17:52:45.496 Created a new incremental > FetchContext for session id 2046264334, epoch 9791: added (), updated (), > removed () broker 6: October 31st 2019, 17:52:45.500 Created a new > incremental FetchContext for session id 2046264334, epoch 9792: added (), > updated (lkc-7nv6o_tenant_soak_topic_144p-67), removed () > broker 6: October 31st 2019, 17:52:45.501 Created a new incremental > FetchContext for session id 2046264334, epoch 9793: added (), updated > (lkc-7nv6o_tenant_soak_topic_144p-59, lkc-7nv6o_tenant_soak_topic_144p-123, > lkc-7nv6o_tenant_soak_topic_144p-11, lkc-7nv6o_tenant_soak_topic_144p-3, > lkc-7nv6o_tenant_soak_topic_144p-67, lkc-7nv6o_tenant_soak_topic_144p-115), > removed () > broker 6: October 31st 2019, 17:52:45.501 Evicting stale FetchSession > 2046264334. > broker 6: October 31st 2019, 17:52:45.502 Session error for 2046264334: no > such session ID found. > broker 4: October 31st 2019, 17:52:45.813 [ReplicaFetcher replicaId=4, > leaderId=6, fetcherId=0] Node 6 was unable to process the fetch request with > (sessionId=2046264334, epoch=9793): FETCH_SESSION_ID_NOT_FOUND. > {noformat} > This pattern appears to be problematic for two reasons. Firstly, the replica > fetcher for broker 4 was clearly able to send multiple incremental fetch > requests to broker 6, and receive replies, and did so right up to the point > where broker 6 evicted its fetch session within milliseconds of multiple > fetch requests. The second problem is that replica fetchers are considered > privileged for the fetch session cache, and should not be evicted by consumer > fetch sessions. This cluster only has 12 brokers and 1000 fetch session cache > slots (the default for max.incremental.fetch.session.cache.slots), and it > thus very unlikely that this session should have been evicted by another > replica fetcher session. > This cluster also appears to be causing cycles of fetch session evictions > where the cluster never stabilizes into a state where fetch sessions are not > evicted. The above logs are the best example I could find of a case where a > session clearly should not have been evicted. -- This message was sent by Atlassian Jira (v8.3.4#803005)