Hey folks,

Discovered recently that when taking historicals down for maintenance that
we get pretty significant query latency spikes across that node's tier.

These spikes seem be related to contention from ZKCoordinator threads
unzipping segments from deep storage to replace those from the stopped
historical.

The default value for druid.segmentCache.numLoadingThreads is the number of
cores on the host. I haven't done any detailed profiling to be sure but
intuitively it seems like lower default might be safer to avoid contending
with query workloads, at least setting it to a much lower values looks to
have fixed our probllem.

Maybe I've only noticed it because there's something unique in our setup,
so I'm curious if anyone else has experienced something similar? Or even
it'd be interesting if anyone can confirm that they don't see an impact on
latency when nodes are taken down with the value for the config set to its
default.

(This is all on 0.16.1 btw, I haven't tried to replicate it on a newer
version yet)

Best regards,
Dylan

Reply via email to