I tend to agree that providing proper exception to the client is enough in this case, no need to stop server nodes. However, I believe that's how it used to work before we added failure handlers. So probably there was a reason for the current implementation? Does anyone know?
-Val On Tue, Jan 11, 2022 at 10:10 PM Alexey Kukushkin <kukushkinale...@gmail.com> wrote: > Hi Igniters, > > Currently Ignite treats the "not enough data region capacity" case as a > critical failure and does not allow configuring any of the default critical > failure handlers to ignore that error. > > In our company we have different teams using Apache Ignite and none of them > wants to apply a default "stop server" or "restart server" handler when > encountering the problem. We rather want to report this problem to DevOps > and the end users. > > We developed a custom failure handler to deal with the problem but the > solution is really clumsy. And the most important thing is we think > treating this problem as a critical failure is not what most users would > want. > > What do you think about enhancing Ignite not to treat the "not enough data > region capacity" case as a critical failure? > > We opened IGNITE-16272 <https://issues.apache.org/jira/browse/IGNITE-16272> > for > this discussion with the description below: > > The Problem > Ignite raises the IgniteOutOfMemoryException > < > https://github.com/apache/ignite/blob/2.11.1/modules/core/src/main/java/org/apache/ignite/internal/mem/IgniteOutOfMemoryException.java > >if > a data region size is exceeded when trying to add more data to a cache. > Ignite considers the IgniteOutOfMemoryException as a critical failure. This > causes shutting down the Ignite server with the default failure handler. > > However, reaching the data region capacity does not seem to be such a > critical problem requiring the server shutdown or restart. For example, in > our application we just want to report this problem back to the users and > notify the DevOps without applying the critical failure handler. To achieve > that, we had to define a custom FailureHandler that detects and ignores the > IgniteOutOfMemoryException and all the caused by the > IgniteOutOfMemoryException, allowing the final exception to reach the > application. This solution is clumsy and unreliable since it uses the > internal IgniteOutOfMemoryException definition and relies on a complex > secondary exception structure trying to find the IgniteOutOfMemoryException > among the suppressed exception and causes. > > Ignite out-of-the-box failure handlers have the ignoredFailure property > that allows filtering out some kinds of failures. However, the > IgniteOutOfMemoryException is not among the FailureType > < > https://github.com/apache/ignite/blob/2.11.1/modules/core/src/main/java/org/apache/ignite/failure/FailureType.java > >that > can be ignored. > > The Proposal > > 1. Does anyone really want to treat the "data region capacity exceeded" > problem as a critical failure and stop or restart the server? > - Consider never treating this condition as a critical failure. This > change is not backward compatible. > - Or add another item to the FailureType enumeration to optionally > allow the users not to have that treated as a critical failure. This > is > backward-compatible. > 2. Make the IgniteOutOfMemoryException a public API (now it is in the > internal package) > 3. Consider renaming IgniteOutOfMemoryException (for example, to > something like NotEnoughStorageException) since the current name is > similar > to a really critical and usually unrecoverable Java's OutOfMemoryError > although the IgniteOutOfMemoryException is not that critical. > > -- > Best regards, > Alexey >