Sure thing: https://issues.apache.org/jira/browse/KAFKA-5571
On Fri, Jul 7, 2017 at 2:59 AM, Damian Guy <damian....@gmail.com> wrote: > Hi Greg, > > Would you mind creating a JIRA for this with the thread dump ( i don't see > it attached to your message). > > Thanks, > Damian > > On Fri, 7 Jul 2017 at 10:36 Greg Fodor <gfo...@gmail.com> wrote: > > > I'm running a 10.2 job across 5 nodes with 32 stream threads on each node > > and find that when gracefully shutdown all of them at once via an ansible > > scripts, some of the nodes end up freezing -- at a glance the attached > > thread dump implies a deadlock between stream threads trying to update > > their state via setState. We haven't had this problem before but it may > or > > may not be related to changes in 10.2 (we are upgrading from 10.0 to > 10.2) > > > > when we gracefully shutdown all nodes simultaneously, what typically > > happens is some subset of the nodes end up not shutting down completely > but > > end up going through a rebalance first. it seems this deadlock requires > > this rebalancing to occur simultaneously with the graceful shutdown. if > we > > happen to shut them down and no rebalance happens, i don't believe this > > deadlock is triggered. > > > > the deadlock appears related to the state change handlers being > subscribed > > across threads and the fact that both StreamThread#setState and > > StreamStateListener#onChange are both synchronized methods. > > >