Alexey, As I understand, you suggest ErrorScope enum for easier analysis and it will be a part of a error code. But what about String representation?
I think we should use a common prefix for error codes in error messages. Such codes will be more searchable and as a bonus, vendor-specific. String representation might look like IGN-001042 or IGN-TBL042. On Wed, Apr 21, 2021 at 11:43 AM Alexei Scherbakov < alexey.scherbak...@gmail.com> wrote: > I've create the ticket for implementing this [1] > > [1] https://issues.apache.org/jira/browse/IGNITE-14611 > > пт, 16 апр. 2021 г. в 16:30, Alexei Scherbakov < > alexey.scherbak...@gmail.com > >: > > > > > > > чт, 15 апр. 2021 г. в 18:21, Andrey Mashenkov < > andrey.mashen...@gmail.com > > >: > > > >> Hi Alexey, > >> I like the idea. > >> > >> 1. > >> > >> > TBL-0001 is a *string representation* of the error. It is built > from 2 > >> > byte scope id (mapped to name TBL) and 2 byte number (0001). Both > >> > internally packed in int. No any kind of parsing will be necessary to > >> read > >> > scope/category. > >> > >> I think Alexey mean if it will be possible to make smth like that > >> > >> catch (IgniteException e) { > >> if (e.getScope() == "TBL" && e.getCode() == 1234) > >> continue; // E.g. retry my TX > >> } > >> > >> It looks useful to me. > >> > > > > I have in mind something like this: > > > > public class IgniteException extends RuntimeException { > > private int errorCode; > > > > public IgniteException(ErrorScope scope, int code, String message, > > Throwable cause) { > > super(message, cause); > > this.errorCode = make(scope, code); > > } > > > > public boolean matches(ErrorScope scope, int code) { > > return errorCode == make(scope, code); > > } > > > > private int make(ErrorScope scope, int code) { > > return ((scope.ordinal() << 16) | code); > > } > > > > public ErrorScope scope() { > > return ErrorScope.values()[errorCode >> 16]; > > } > > > > public int code() { > > return 0xFFFF & errorCode; > > } > > > > public static void main(String[] args) { > > IgniteException e = new IgniteException(ErrorScope.RAFT, 1, > > "test", null); > > > > System.out.println(e.matches(ErrorScope.RAFT, 2)); > > System.out.println(e.scope()); > > System.out.println(e.code()); > > > > try { > > throw e; > > } > > catch (IgniteException ee) { > > System.out.println(ee.matches(ErrorScope.RAFT, 1)); > > } > > } > > } > > > > > >> > >> 2. How you see a cross-module exception throwing? > >> Assume, user call -> module A, which recursively call -> module B, which > >> fails. > >> So, module A component calls a module B component and got an Exception > >> with > >> "B-1234" exception. > >> Module A do not expect any exception here and doesn't take care of > "B-xxx" > >> error codes, but only "A-yyyy. > >> Should it rethrow exception with "A-unknown" (maybe "UNK-0001") code > >> or reuse "B-xxxx" code with the own message, pointing original exception > >> as > >> a cause for both cases? > >> > >> The first approach may looks confusing, while the second one produces > too > >> many "UNK-" codes. > >> What code should get the user in that case? > >> > >> > >> > > The method should always report root cause, in your example it will be > > B-xxxx, no matter which module API is called. > > > > > >> > >> > >> > >> On Thu, Apr 15, 2021 at 5:36 PM Alexei Scherbakov < > >> alexey.scherbak...@gmail.com> wrote: > >> > >> > чт, 15 апр. 2021 г. в 14:26, Ilya Kasnacheev < > ilya.kasnach...@gmail.com > >> >: > >> > > >> > > Hello! > >> > > > >> > > > All public methods throw only unchecked > >> > > org.apache.ignite.lang.IgniteException containing aforementioned > >> fields. > >> > > > Each public method must have a section in the javadoc with a list > of > >> > all > >> > > possible error codes for this method. > >> > > > >> > > I don't think this is feasible at all. > >> > > Imagine javadoc for cache.get() method featuring three pages of > >> possible > >> > > error codes thrown by this method. > >> > > > >> > > >> > Of course there is no need to write 3 pages of error codes, this makes > >> no > >> > sense. > >> > I think we can use error ranges here, or, probably, document most > >> important > >> > error scenarios. > >> > The point here is to try to document errors as much as possible. > >> > > >> > > >> > > Also, updated every two weeks to account for changes in > >> > > underlying mechanisms. > >> > > > >> > > There is still a confusion between "error code for any error in > logs" > >> and > >> > > "error code for any pair of method & exception". Which one are we > >> > > discussing really? > >> > > > >> > > This is impossible to track or test, but is susceptible for common > >> > > error-hiding antipattern where all exceptions are caught in > >> cache.get() > >> > and > >> > > discarded, and instead a brand new CH-0001 "error in cache.get()" is > >> > thrown > >> > > to the user. > >> > > > >> > > Which is certainly not something that anybody wants. > >> > > > >> > > >> > Certainly not. We are talking here about root cause - what is exactly > >> the > >> > reason for method call failure. > >> > > >> > > >> > > > >> > > Regards, > >> > > -- > >> > > Ilya Kasnacheev > >> > > > >> > > > >> > > чт, 15 апр. 2021 г. в 13:03, Vladislav Pyatkov < > vldpyat...@gmail.com > >> >: > >> > > > >> > > > Hi Alexei, > >> > > > > >> > > > > Each public method *must *have a section in the javadoc with a > >> list > >> > of > >> > > > all possible error codes for this method. > >> > > > > >> > > > I consider it is redundant, because any public exception can be > >> thrown > >> > > from > >> > > > public API. > >> > > > If it not happens today, it may change tomorrow: one will be > >> removed, > >> > > > another one will be added. > >> > > > > >> > > > >Nested exceptions are not forbidden to use. They can provide > >> > additional > >> > > > details on the error for debug purposes > >> > > > > >> > > > I see another issue, which is in the Ignite 2.x, but not attend > >> here. > >> > We > >> > > > can have a deep nested exception and use it for handling. > >> > > > In the result, all time when we are handling an exception we use > >> > > > pattern like this: > >> > > > try{ > >> > > > ... > >> > > > } > >> > > > catch (Exception e) { > >> > > > if (X.hasCause(e, TimeoutException.class)) > >> > > > throw e; > >> > > > > >> > > > if (X.hasCause(e, ConnectException.class, EOFException.class)) > >> > > > continue; > >> > > > > >> > > > if (X.hasCause(e, InterruptedException.class)) > >> > > > return false; > >> > > > } > >> > > > > >> > > > If we have a pant to make only one exception to the client side, > we > >> can > >> > > > also do it for an internal exception. > >> > > > > >> > > > On Wed, Apr 14, 2021 at 11:42 AM Alexei Scherbakov < > >> > > > alexey.scherbak...@gmail.com> wrote: > >> > > > > >> > > > > Alexey, > >> > > > > > >> > > > > ср, 14 апр. 2021 г. в 01:52, Alexey Kukushkin < > >> > > kukushkinale...@gmail.com > >> > > > >: > >> > > > > > >> > > > > > Just some points looking questionable to me, although I > realize > >> the > >> > > > error > >> > > > > > handling style may be very opinionated: > >> > > > > > > >> > > > > > - Would it make sense splitting the proposed composite > error > >> > code > >> > > > > > (TBL-0001) into separate numeric code (0001) and > >> scope/category > >> > > > > ("TBL") > >> > > > > > to > >> > > > > > avoid parsing the code when an error handler needs to > analyze > >> > only > >> > > > the > >> > > > > > category or the code? > >> > > > > > > >> > > > > > >> > > > > TBL-0001 is a *string representation* of the error. It is > built > >> > from > >> > > 2 > >> > > > > byte scope id (mapped to name TBL) and 2 byte number (0001). > Both > >> > > > > internally packed in int. No any kind of parsing will be > >> necessary to > >> > > > read > >> > > > > scope/category. > >> > > > > > >> > > > > > >> > > > > > - "*The cause - short string description of an issue, > >> readable > >> > by > >> > > > > > user.*". > >> > > > > > This terminology sounds confusing to me since that "cause" > >> > sounds > >> > > > like > >> > > > > > Java > >> > > > > > Throwable's Message to me and the "Cause" is a lower level > >> > > > exception. > >> > > > > > > >> > > > > > >> > > > > The string describes the cause of error, so the name. I'm ok to > >> > rename > >> > > it > >> > > > > to a message. It will be stored in IgniteException.message field > >> > > anyway. > >> > > > > > >> > > > > > >> > > > > > - "*The action - steps for a user to resolve error...*". > The > >> > > action > >> > > > is > >> > > > > > very important but do we want to make it part of the > >> > > > IgniteException? > >> > > > > I > >> > > > > > do > >> > > > > > not think the recovery action text should be part of the > >> > > exception. > >> > > > > > IgniteException may include a URL pointing to the > >> corresponding > >> > > > > > documentation - this is discussable. > >> > > > > > > >> > > > > > >> > > > > This will not be the part of the exception. A user should visit > >> the > >> > > > > documentation page and read the action section by corresponding > >> error > >> > > > code. > >> > > > > > >> > > > > > >> > > > > > - "*All public methods throw only unchecked > >> IgniteException*" - > >> > I > >> > > > > assume > >> > > > > > we still respect JCache specification and prefer using > >> standard > >> > > Java > >> > > > > > exception to signal about invalid parameters. > >> > > > > > > >> > > > > > >> > > > > Using standard java exceptions whenever possible makes sense. > >> > > > > > >> > > > > > >> > > > > > - Why we do not follow the "classic" structured exception > >> > handling > >> > > > > > practices in Ignite: > >> > > > > > > >> > > > > > >> > > > > Ignite 3 will be multi language, and other languages use other > >> error > >> > > > > processing models. SQL for example uses error codes. > >> > > > > The single exception approach simplifies and unifies error > >> handling > >> > > > across > >> > > > > platforms for me. > >> > > > > > >> > > > > > >> > > > > > - Why do we not allow using checked exceptions? It seems > >> to > >> > me > >> > > > > > sometimes forcing the user to handle an error serves as > a > >> > hint > >> > > > and > >> > > > > > thus > >> > > > > > improves usability. For example, handling an > >> > > > optimistic/pessimistic > >> > > > > > transaction conflict/deadlock. Or handling a timeout for > >> > > > operations > >> > > > > > with > >> > > > > > timeouts. > >> > > > > > > >> > > > > > >> > > > > A valid point. Checked exceptions must be used for whose > methods, > >> > where > >> > > > > error handling is enforced, for example tx optimistic failure. > >> > > > > Such errors will also have corresponding error codes. > >> > > > > > >> > > > > > >> > > > > > - Why single public IgniteException and no exception > >> > hierarchy? > >> > > > > Java > >> > > > > > is optimized for structured exception handling instead > of > >> > using > >> > > > > > IF-ELSE to > >> > > > > > analyze the codes. > >> > > > > > > >> > > > > > >> > > > > Exception hierarchy is not required when using error codes and > >> > > applicable > >> > > > > only to java API, so I would avoid spending efforts on it. > >> > > > > > >> > > > > > >> > > > > > - Why no nested exceptions? Sometimes an error handler > is > >> > > > > interested > >> > > > > > only in high level exceptions (like Invalid > Configuration) > >> > and > >> > > > > > sometimes > >> > > > > > details are needed (like specific configuration parser > >> > > > exceptions). > >> > > > > > > >> > > > > > >> > > > > Nested exceptions are not forbidden to use. They can provide > >> > additional > >> > > > > details on the error for debug purposes, but not strictly > >> required, > >> > > > because > >> > > > > error code + message should provide enough information to the > >> user. > >> > > > > > >> > > > > > >> > > > > > - For async methods returning a Future we may have a > >> universal > >> > > rule > >> > > > on > >> > > > > > how to handle exceptions. For example, we may specify that > >> any > >> > > async > >> > > > > > method > >> > > > > > can throw only invalid argument exceptions. All other > errors > >> are > >> > > > > > reported > >> > > > > > via the exceptionally(IgniteException -> {}) callback even > if > >> > the > >> > > > > async > >> > > > > > method was executed synchronously. > >> > > > > > > >> > > > > > >> > > > > This is ok to me. > >> > > > > > >> > > > > > >> > > > > > > >> > > > > > > >> > > > > > вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov < > >> > > > > > alexey.scherbak...@gmail.com > >> > > > > > >: > >> > > > > > > >> > > > > > > Igniters, > >> > > > > > > > >> > > > > > > I would like to start the discussion about error handling in > >> > > Ignite 3 > >> > > > > and > >> > > > > > > how we can improve it compared to Ignite 2. > >> > > > > > > > >> > > > > > > The error handling in Ignite 2 was not very good because of > >> > generic > >> > > > > > > CacheException thrown on almost any occasion, having deeply > >> > nested > >> > > > root > >> > > > > > > cause and often containing no useful information on further > >> steps > >> > > to > >> > > > > fix > >> > > > > > > the issue. > >> > > > > > > > >> > > > > > > I aim to fix it by introducing some rules on error handling. > >> > > > > > > > >> > > > > > > *Public exception structure.* > >> > > > > > > > >> > > > > > > A public exception must have an error code, a cause, and an > >> > action. > >> > > > > > > > >> > > > > > > * The code - the combination of 2 byte scope id and 2 byte > >> error > >> > > > number > >> > > > > > > within the module. This allows up to 2^16 errors for each > >> scope, > >> > > > which > >> > > > > > > should be enough. The error code string representation can > >> look > >> > > like > >> > > > > > > RFT-0001 or TBL-0001 > >> > > > > > > * The cause - short string description of an issue, readable > >> by > >> > > user. > >> > > > > > This > >> > > > > > > can have dynamic parameters depending on the error type for > >> > better > >> > > > user > >> > > > > > > experience, like "Can't write a snapshot, no space left on > >> device > >> > > > {0}" > >> > > > > > > * The action - steps for a user to resolve error situation > >> > > described > >> > > > in > >> > > > > > the > >> > > > > > > documentation in the corresponding error section, for > example > >> > > "Clean > >> > > > up > >> > > > > > > disk space and retry the operation". > >> > > > > > > > >> > > > > > > Common errors should have their own scope, for example > >> IGN-0001 > >> > > > > > > > >> > > > > > > All public methods throw only unchecked > >> > > > > > > org.apache.ignite.lang.IgniteException containing > >> aforementioned > >> > > > > fields. > >> > > > > > > Each public method must have a section in the javadoc with a > >> list > >> > > of > >> > > > > all > >> > > > > > > possible error codes for this method. > >> > > > > > > > >> > > > > > > A good example with similar structure can be found here [1] > >> > > > > > > > >> > > > > > > *Async timeouts.* > >> > > > > > > > >> > > > > > > Because almost all API methods in Ignite 3 are async, they > all > >> > will > >> > > > > have > >> > > > > > a > >> > > > > > > configurable default timeout and can complete with timeout > >> error > >> > > if a > >> > > > > > > computation is not finished in time, for example if a > response > >> > has > >> > > > not > >> > > > > > been > >> > > > > > > yet received. > >> > > > > > > I suggest to complete the async op future with > >> TimeoutException > >> > in > >> > > > this > >> > > > > > > case to make it on par with synchronous execution using > >> > future.get, > >> > > > > which > >> > > > > > > will throw java.util.concurrent.TimeoutException on timeout. > >> > > > > > > For reference, see > >> > java.util.concurrent.CompletableFuture#orTimeout > >> > > > > > > No special error code should be used for this scenario. > >> > > > > > > > >> > > > > > > *Internal exceptions hierarchy.* > >> > > > > > > > >> > > > > > > All internal exceptions should extend > >> > > > > > > org.apache.ignite.internal.lang.IgniteInternalException for > >> > checked > >> > > > > > > exceptions and > >> > > > > > > > org.apache.ignite.internal.lang.IgniteInternalCheckedException > >> > for > >> > > > > > > unchecked exceptions. > >> > > > > > > > >> > > > > > > Thoughts ? > >> > > > > > > > >> > > > > > > [1] > >> > > > > https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm > >> > > > > > > > >> > > > > > > -- > >> > > > > > > > >> > > > > > > Best regards, > >> > > > > > > Alexei Scherbakov > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > -- > >> > > > > > Best regards, > >> > > > > > Alexey > >> > > > > > > >> > > > > > >> > > > > > >> > > > > -- > >> > > > > > >> > > > > Best regards, > >> > > > > Alexei Scherbakov > >> > > > > > >> > > > > >> > > > > >> > > > -- > >> > > > Vladislav Pyatkov > >> > > > > >> > > > >> > > >> > > >> > -- > >> > > >> > Best regards, > >> > Alexei Scherbakov > >> > > >> > >> > >> -- > >> Best regards, > >> Andrey V. Mashenkov > >> > > > > > > -- > > > > Best regards, > > Alexei Scherbakov > > > > > -- > > Best regards, > Alexei Scherbakov > -- Best regards, Andrey V. Mashenkov