Re: Critical worker threads liveness checking drawbacks

Yakov Zhdanov Fri, 07 Sep 2018 09:10:46 -0700

Yes, and you should suggest solution, e.g. throttle rebalancing threads
more to produce less load.


What you suggesting kills the idea of this enhancement.

--Yakov

2018-09-07 19:03 GMT+03:00 Andrey Kuznetsov <stku...@gmail.com>:

> Yakov,
>
> Thanks for reply. Indeed, initial design assumed node termination when
> hanging critical thread has been detected. But sometimes it looks
> inappropriate. Let, for example fsync in WAL writer thread takes too long,
> and we terminate the node. Upon rebalancing, this may lead to long fsyncs
> on other nodes due to increased per node load, hence we can terminate the
> next node as well. Eventually we can collapse the entire cluster. Is it a
> possible scenario?
>
> пт, 7 сент. 2018 г. в 18:44, Yakov Zhdanov <yzhda...@apache.org>:
>
> > Andrey,
> >
> > I don't understand your point. My opinion, the idea of these changes is
> to
> > make cluster more stable and responsive by eliminating hanged nodes. I
> > would not make too much difference between threads trapped in deadlock
> and
> > threads hanging on fsync calls for too long. Both situations lead to
> > increasing latency in cluster till its full unavailability.
> >
> > So, killing node hanging on fsync may be reasonable. Agree?
> >
> > You may implement the approach when you have warning messages in logs by
> > default, but termination option should also be available.
> >
> > Thanks!
> >
> > --Yakov
> >
> >
>

Re: Critical worker threads liveness checking drawbacks

Reply via email to