[
https://issues.apache.org/jira/browse/KAFKA-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lucas Bradstreet resolved KAFKA-9137.
-------------------------------------
Resolution: Fixed
Closed by [https://github.com/apache/kafka/pull/7640]
> Maintenance of FetchSession cache causing FETCH_SESSION_ID_NOT_FOUND in live
> sessions
> -------------------------------------------------------------------------------------
>
> Key: KAFKA-9137
> URL: https://issues.apache.org/jira/browse/KAFKA-9137
> Project: Kafka
> Issue Type: Bug
> Components: core
> Reporter: Lucas Bradstreet
> Priority: Major
>
> We have recently seen cases where brokers end up in a bad state where fetch
> session evictions occur at a high rate (> 16 per second) after a roll. This
> increase in eviction rate included the following pattern in our logs:
>
> {noformat}
> broker 6: October 31st 2019, 17:52:45.496 Created a new incremental
> FetchContext for session id 2046264334, epoch 9790: added (), updated (),
> removed ()
> broker 6: October 31st 2019, 17:52:45.496 Created a new incremental
> FetchContext for session id 2046264334, epoch 9791: added (), updated (),
> removed () broker 6: October 31st 2019, 17:52:45.500 Created a new
> incremental FetchContext for session id 2046264334, epoch 9792: added (),
> updated (lkc-7nv6o_tenant_soak_topic_144p-67), removed ()
> broker 6: October 31st 2019, 17:52:45.501 Created a new incremental
> FetchContext for session id 2046264334, epoch 9793: added (), updated
> (lkc-7nv6o_tenant_soak_topic_144p-59, lkc-7nv6o_tenant_soak_topic_144p-123,
> lkc-7nv6o_tenant_soak_topic_144p-11, lkc-7nv6o_tenant_soak_topic_144p-3,
> lkc-7nv6o_tenant_soak_topic_144p-67, lkc-7nv6o_tenant_soak_topic_144p-115),
> removed ()
> broker 6: October 31st 2019, 17:52:45.501 Evicting stale FetchSession
> 2046264334.
> broker 6: October 31st 2019, 17:52:45.502 Session error for 2046264334: no
> such session ID found.
> broker 4: October 31st 2019, 17:52:45.813 [ReplicaFetcher replicaId=4,
> leaderId=6, fetcherId=0] Node 6 was unable to process the fetch request with
> (sessionId=2046264334, epoch=9793): FETCH_SESSION_ID_NOT_FOUND.
> {noformat}
> This pattern appears to be problematic for two reasons. Firstly, the replica
> fetcher for broker 4 was clearly able to send multiple incremental fetch
> requests to broker 6, and receive replies, and did so right up to the point
> where broker 6 evicted its fetch session within milliseconds of multiple
> fetch requests. The second problem is that replica fetchers are considered
> privileged for the fetch session cache, and should not be evicted by consumer
> fetch sessions. This cluster only has 12 brokers and 1000 fetch session cache
> slots (the default for max.incremental.fetch.session.cache.slots), and it
> thus very unlikely that this session should have been evicted by another
> replica fetcher session.
> This cluster also appears to be causing cycles of fetch session evictions
> where the cluster never stabilizes into a state where fetch sessions are not
> evicted. The above logs are the best example I could find of a case where a
> session clearly should not have been evicted.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)