[
https://issues.apache.org/jira/browse/KAFKA-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155857#comment-16155857
]
Ewen Cheslack-Postava commented on KAFKA-5741:
----------------------------------------------
It would be good to have clear indications this is actually a problem in
practice and that other threads starving the herder thread caused it to
rebalance. First, heartbeating actually happens in a background thread, so
you'd have to starve that thread as well for the session timeout. And the
actual processing done in the thread is very minimal, so you'd have to
completely starve that thread for a long time -- it's much more likely that
things like waiting for other threads to flush data during a rebalance is what
causes it to fall out of the group.
I'm also skeptical of the prioritization because to me, if this really occurred
for this reason, it would suggest that the hardware is just underprovisioned
for the workload. Prioritizing the DistributedHerder thread would probably just
end up starving other threads if there really is that much resource contention,
and so the connectors won't even really be functioning correctly anyway...
> Prioritize threads in Connect distributed worker process
> --------------------------------------------------------
>
> Key: KAFKA-5741
> URL: https://issues.apache.org/jira/browse/KAFKA-5741
> Project: Kafka
> Issue Type: Improvement
> Components: KafkaConnect
> Affects Versions: 0.11.0.0
> Reporter: Randall Hauch
> Priority: Critical
>
> Connect's distributed worker process uses the {{DistributedHerder}} to
> perform all administrative operations, including: starting, stopping,
> pausing, resuming, reconfiguring connectors; rebalancing; etc. The
> {{DistributedHerder}} uses a single threaded executor service to do all this
> work and to do it sequentially. If this thread gets preempted for any reason
> (e.g., connector tasks are bogging down the process, DoS, etc.), then the
> herder's membership in the group may be dropped, causing a rebalance.
> This herder thread should be run at a much higher priority than all of the
> other threads in the system.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)