lucasbru opened a new pull request, #12875: URL: https://github.com/apache/kafka/pull/12875
In this change, we enable backing off when the state directory is still locked during initialization of a task. For this, we introduce a new queue inside the state updater, that keeps all tasks that still need to be initialized. When a new task is added to the state updater, it is inserted into the queue for initialization. When the state directory is locked, the task is reinserted into the initialization queue. We will reattempt to acquire the lock after the next round of restoration. In the rare case where all tasks are still locked from being initialized, we back-off for 1 second before retrying, and avoid a busy wait on the lock this way. During system testing, `ThreadCache` threw a concurrent modification exception - when the state updater would create a new cache, while the main thread would compute the size of the caches for eviction inside `sizeBytes`. Since the data structure is designed to be thread-safe, in this change we also synchronize the `sizeBytes` function. ### Committer Checklist (excluded from commit message) - [x] Verify design and implementation - [x] Verify test coverage and CI build status - [x] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org