Thanks On Fri, 7 Jul 2017 at 18:20, Greg Fodor <gfo...@gmail.com> wrote:
> Sure thing: https://issues.apache.org/jira/browse/KAFKA-5571 > > On Fri, Jul 7, 2017 at 2:59 AM, Damian Guy <damian....@gmail.com> wrote: > > > Hi Greg, > > > > Would you mind creating a JIRA for this with the thread dump ( i don't > see > > it attached to your message). > > > > Thanks, > > Damian > > > > On Fri, 7 Jul 2017 at 10:36 Greg Fodor <gfo...@gmail.com> wrote: > > > > > I'm running a 10.2 job across 5 nodes with 32 stream threads on each > node > > > and find that when gracefully shutdown all of them at once via an > ansible > > > scripts, some of the nodes end up freezing -- at a glance the attached > > > thread dump implies a deadlock between stream threads trying to update > > > their state via setState. We haven't had this problem before but it may > > or > > > may not be related to changes in 10.2 (we are upgrading from 10.0 to > > 10.2) > > > > > > when we gracefully shutdown all nodes simultaneously, what typically > > > happens is some subset of the nodes end up not shutting down completely > > but > > > end up going through a rebalance first. it seems this deadlock requires > > > this rebalancing to occur simultaneously with the graceful shutdown. if > > we > > > happen to shut them down and no rebalance happens, i don't believe this > > > deadlock is triggered. > > > > > > the deadlock appears related to the state change handlers being > > subscribed > > > across threads and the fact that both StreamThread#setState and > > > StreamStateListener#onChange are both synchronized methods. > > > > > >