I tend to agree that providing proper exception to the client is enough in
this case, no need to stop server nodes. However, I believe that's how it
used to work before we added failure handlers. So probably there was a
reason for the current implementation? Does anyone know?

-Val

On Tue, Jan 11, 2022 at 10:10 PM Alexey Kukushkin <kukushkinale...@gmail.com>
wrote:

> Hi Igniters,
>
> Currently Ignite treats the "not enough data region capacity" case as a
> critical failure and does not allow configuring any of the default critical
> failure handlers to ignore that error.
>
> In our company we have different teams using Apache Ignite and none of them
> wants to apply a default "stop server" or "restart server" handler when
> encountering the problem. We rather want to report this problem to DevOps
> and the end users.
>
> We developed a custom failure handler to deal with the problem but the
> solution is really clumsy. And the most important thing is we think
> treating this problem as a critical failure is not what most users would
> want.
>
> What do you think about enhancing Ignite not to treat the "not enough data
> region capacity" case as a critical failure?
>
> We opened IGNITE-16272 <https://issues.apache.org/jira/browse/IGNITE-16272>
> for
> this discussion with the description below:
>
> The Problem
> Ignite raises the IgniteOutOfMemoryException
> <
> https://github.com/apache/ignite/blob/2.11.1/modules/core/src/main/java/org/apache/ignite/internal/mem/IgniteOutOfMemoryException.java
> >if
> a data region size is exceeded when trying to add more data to a cache.
> Ignite considers the IgniteOutOfMemoryException as a critical failure. This
> causes shutting down the Ignite server with the default failure handler.
>
> However, reaching the data region capacity does not seem to be such a
> critical problem requiring the server shutdown or restart. For example, in
> our application we just want to report this problem back to the users and
> notify the DevOps without applying the critical failure handler. To achieve
> that, we had to define a custom FailureHandler that detects and ignores the
> IgniteOutOfMemoryException and all the caused by the
> IgniteOutOfMemoryException, allowing the final exception to reach the
> application. This solution is clumsy and unreliable since it uses the
> internal IgniteOutOfMemoryException definition and relies on a complex
> secondary exception structure trying to find the IgniteOutOfMemoryException
> among the suppressed exception and causes.
>
> Ignite out-of-the-box failure handlers have the ignoredFailure property
> that allows filtering out some kinds of failures. However, the
> IgniteOutOfMemoryException is not among the FailureType
> <
> https://github.com/apache/ignite/blob/2.11.1/modules/core/src/main/java/org/apache/ignite/failure/FailureType.java
> >that
> can be ignored.
>
> The Proposal
>
>    1. Does anyone really want to treat the "data region capacity exceeded"
>    problem as a critical failure and stop or restart the server?
>       - Consider never treating this condition as a critical failure. This
>       change is not backward compatible.
>       - Or add another item to the FailureType enumeration to optionally
>       allow the users not to have that treated as a critical failure. This
> is
>       backward-compatible.
>    2. Make the IgniteOutOfMemoryException a public API (now it is in the
>    internal package)
>    3. Consider renaming IgniteOutOfMemoryException (for example, to
>    something like NotEnoughStorageException) since the current name is
> similar
>    to a really critical and usually unrecoverable Java's OutOfMemoryError
>    although the IgniteOutOfMemoryException is not that critical.
>
> --
> Best regards,
> Alexey
>

Reply via email to