[ https://issues.apache.org/jira/browse/KAFKA-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Schlansker resolved KAFKA-13593. --------------------------------------- Resolution: Fixed > ThrottledChannelReaper slows broker shutdown by multiple seconds > ---------------------------------------------------------------- > > Key: KAFKA-13593 > URL: https://issues.apache.org/jira/browse/KAFKA-13593 > Project: Kafka > Issue Type: Improvement > Components: core > Affects Versions: 3.0.0 > Reporter: Steven Schlansker > Priority: Minor > > We run an embedded KRaft broker in integration tests, to test that our > Producer / Consumers are all hooked up and verify our overall pipeline. > While trying to deliver CI speed improvements, we noticed that the majority > of time for a small test is actually spent in Kafka broker shutdown. From the > logs, it looks like the ClientQuotaManager has multiple > ThrottledChannelReaper threads and each of them takes up to a second to > shutdown. > {code:java} > 2022-01-12T15:26:31.932Z [main] INFO kafka.log.LogManager - Shutdown > complete. > 2022-01-12T15:26:31.934Z [main] INFO > k.s.ClientQuotaManager$ThrottledChannelReaper - > [ThrottledChannelReaper-Fetch]: Shutting down > 2022-01-12T15:26:32.311Z [main] INFO > k.s.ClientQuotaManager$ThrottledChannelReaper - > [ThrottledChannelReaper-Fetch]: Shutdown completed > 2022-01-12T15:26:32.311Z [ThrottledChannelReaper-Fetch] INFO > k.s.ClientQuotaManager$ThrottledChannelReaper - > [ThrottledChannelReaper-Fetch]: Stopped > 2022-01-12T15:26:32.311Z [main] INFO > k.s.ClientQuotaManager$ThrottledChannelReaper - > [ThrottledChannelReaper-Produce]: Shutting down > 2022-01-12T15:26:33.312Z [ThrottledChannelReaper-Produce] INFO > k.s.ClientQuotaManager$ThrottledChannelReaper - > [ThrottledChannelReaper-Produce]: Stopped > 2022-01-12T15:26:33.312Z [main] INFO > k.s.ClientQuotaManager$ThrottledChannelReaper - > [ThrottledChannelReaper-Produce]: Shutdown completed > 2022-01-12T15:26:33.312Z [main] INFO > k.s.ClientQuotaManager$ThrottledChannelReaper - > [ThrottledChannelReaper-Request]: Shutting down > 2022-01-12T15:26:34.315Z [ThrottledChannelReaper-Request] INFO > k.s.ClientQuotaManager$ThrottledChannelReaper - > [ThrottledChannelReaper-Request]: Stopped > 2022-01-12T15:26:34.315Z [main] INFO > k.s.ClientQuotaManager$ThrottledChannelReaper - > [ThrottledChannelReaper-Request]: Shutdown completed > 2022-01-12T15:26:34.315Z [main] INFO > k.s.ClientQuotaManager$ThrottledChannelReaper - > [ThrottledChannelReaper-ControllerMutation]: Shutting down > 2022-01-12T15:26:35.317Z [main] INFO > k.s.ClientQuotaManager$ThrottledChannelReaper - > [ThrottledChannelReaper-ControllerMutation]: Shutdown completed > 2022-01-12T15:26:35.317Z [ThrottledChannelReaper-ControllerMutation] INFO > k.s.ClientQuotaManager$ThrottledChannelReaper - > [ThrottledChannelReaper-ControllerMutation]: Stopped{code} > Inspecting the code, the ThrottledChannelReaper threads are marked as not > interruptible, so ShutdownableThread does not interrupt the worker on > shutdown. Unfortunately, the doWork method polls with a 1 second timeout. So > you wait up to 1s for every type of quota, in this case for a total of almost > 4s. > > While this is not a problem for production Kafka brokers, where instances are > expected to stay up for months, in tests that expect to spin up and down it > is easily noticed and adds real overhead to CI. > > Suggested possible remediations: > * Allow ThrottledChannelReaper to be interrupted by ShutdownableThread > * ThrottledChannelReaper overrides initiateShutdown and after calling > {{super.initiateShutdown}} then enqueues a {{null}} element on the delayQueue > to force worker thread to notice shutdown state > * Reduce 1s poll timeout to something small (carries overhead penalty for > all users though, so this is less desirable), or make it configurable so we > can set it to e.g. 10ms in unit tests -- This message was sent by Atlassian Jira (v8.20.1#820001)