Re: IEP-14: Ignite failures handling (Discussion)

Ivan Rakov Tue, 13 Mar 2018 16:59:41 -0700

One more note: "kill if standalone, stop if embedded" differs from whatyou are suggesting "try graceful, then kill process regardless" only incase when graceful shutdown hangs.Do we have understanding, how often does graceful shutdown hang?Obviously, *grid hang* is often case, but it shouldn't be messed with*graceful shutdown hang*. From my experience, if something went wrong,users just prefer to do kill -9 because it's much more reliable andeasy. Probably, in most of cases when kill -9 worked, graceful stopwould have worked as well - we just don't have such statistics.It may be bad example, but: in our CI tests we intentionally break gridin many harsh ways and perform a graceful stop after the test execution,and it doesn't hang - otherwise we'd see many "Execution timeout" testsuite hangs.


Best Regards,
Ivan Rakov


On 14.03.2018 2:24, Dmitriy Setrakyan wrote:

On Tue, Mar 13, 2018 at 7:13 PM, Ivan Rakov <ivan.glu...@gmail.com> wrote:

I just would like to add my +1 for "kill if standalone, stop if embedded"
default option. My arguments:

1) Regarding "If Ignite hangs - it will likely be impossible to stop":
Unfortunately, it's true that Ignite can hang during stop procedure.
However, most of failures described under IEP-14 (storage IO exceptions,
death of critical system worker thread, etc) normally shouldn't turn node
into "impossible to stop" state. Turning into that state is a bug itself. I
guess that we shouldn't choose system behavior on the basis of known bugs.


The whole discussion is about protecting against force-major issues,
including Ignite bugs. You are assuming that a user application will
somehow continue to function if an Ignite node is stopped. In most cases it
will just freeze itself and cause the rest of the application to hang.

Again, "kill+stop" is the most deterministic and the safest default
behavior. Try a graceful shutdown (which will make restart easier), and
then kill the process regardless.

Note that we are arguing about the default behavior. If a user does not
like this default, then this user can change it to another behavior.

2) User might want to handle Ignite node crash before shutting down the
whole JVM - raise alert, close external resources, etc

Very unlikely, but if a user is this advanced, then this user can change
the default behavior. Most users will not even know how to configure such
custom shutdown behavior and would prefer an automatic kill.

3) IEP-14 document has important notes: "More than one Ignite node could be

started in one JVM process" and "Different nodes in one JVM process could
belong to different clusters". This is possible only in embedded mode. I
think, we shouldn't shock user by sudden JVM halt (possibly, along with
another healthy nodes) if there's a chance of successful node stop.

Has anyone actually seen a real example of that? I have not. This scenario
is extremely unlikely and should not define the default behavior. Again, if
a user is so advanced to come up with such a sophisticated deployment, then
the same user should be able to set different default behaviors for
different clusters.

Re: IEP-14: Ignite failures handling (Discussion)

Reply via email to