weeco commented on pull request #10807: URL: https://github.com/apache/kafka/pull/10807#issuecomment-853223533
Hey @tombentley , thanks for tackling this! I understand your performance concerns in regards to the quota implementation; I can't judge if a quota for that is actually too expensive. However here are my thoughts regarding your PR: Unless I missunderstand your proposed changes here, I believe that logging the fetch session slot evictor is currently not super helpful because: 1. I believe fetch sessions are expected to be evicted. For me it seems like fetch sessions are not proactively relived from the session slots cache. In my clusters I can see the fetch session slots slowly growing over time (hours until all slots are taken) and then fetch sessions are regularly evicted at a rate of 1-3 evictions / second. I noticed this by tracking this via the JMX metrics `IncrementalFetchSessionEvictionsPerSec` and `NumIncrementalFetchSessions` (introduced in KIP-227) 2. A potential attacker (or misbehaving client such as sarama v1.26.0) can quickly create new fetch sessions (thousands of fetch sessions per seconds) which would then cause one log line for each eviction. This again can cause high CPU usage (for printing the log line) or at least cause problems for many logging pipelines. **I could envision two other solutions to mitigate the issue around leaking fetch sessions:** 1. Create one fetch session slot cache per client with a fixed, relatively low number of available fetch sessions. This way a single client can not negatively impact other clients. Possibly this is even a performance improvement because the lock contention as required in one large cache that is concurrently accessed from many fetch different fetch requests may be reduced? 2. Introduce a new counter metric that logs the number of fetch sessions created per client id. Something like `IncrementalFetchSessionsCreatedPerSec<ClientId=([-.\w]+)>` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org