P.s. Andrey Kuznetsov, corrected me that we have no warranty that failed node able to notify cluster.
But, try{ sendDiscoveryMessageWithFail(...); } catch(){ // No-op; } is better than nothing, I think. 2018-04-20 14:22 GMT+03:00 Anton Vinogradov <a...@apache.org>: > Sounds helpful and easy to implement. > > 2018-04-20 5:39 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>: > >> On Thu, Apr 19, 2018 at 8:19 AM, Yakov Zhdanov <yzhda...@apache.org> >> wrote: >> >> > Guys, >> > >> > We have activity to implement a set of mechanisms to handle critical >> issues >> > on nodes (IEP-14 - [1]). >> > >> > I have an idea to spread message about critical issues to nodes through >> > entire topology and put it to logs of all nodes. In my view this will >> add >> > much more clarity. Imagine all nodes output message to log - "Critical >> > system thread failed on node XXX [details=...]". This should help a lot >> > with investigations. >> > >> > Andrey Gura, Alex Goncharuk what do you think? >> > >> >> Yakov, even though you did not ask me what I think, but I really like the >> idea :) >> > >