Re: [DISCUSS] log guidelines

João Jandre Paraquetti Tue, 05 Dec 2023 11:21:43 -0800

Hello all,

As someone who has worked a lot on logging in this project, I feel likea set of log guidelines are a good addition. I generally agree withDaan's ideas; however, I think that the error level was oversimplified.As Daan already said, there are two main reasons for error logs: "thesystem can not execute a user request or part of the system is degraded".

When the system cannot execute a user request, a simple messageexplaining why it's not possible to execute the request is often enough.However, when the error happens because of system degradation, there aretimes when having a stacktrace can save hours of debugging, as the errormight be hard to reproduce. Also, if the error is occurring in aproduction environment, changing the log level to debug means restartingACS's components, possibly leading to downtime and/or more errors. Thesecond issue will be solved with the log4j2 PR, but the first one canstill be a pain.

In summary, a hard rule would not address all situations, but aguideline or good practices guide would be a great addition to ACS.


Best regards,
João Jandre


On 12/4/23 10:51, Daan Hoogland wrote:

There have been some discussions on github PRs about the subject of
proper logging and I want to use this to start a discussion and come
to some guidelines that hopefully everybody can agree to.

My idea is the following

Fatal:
- include a reason why the system can not function anymore,
- be followed by a System.exit() call,
- include a stack trace if it is reasonable to expect this would give
properly helpful information.
I did a quick search and we have 2 fatal calls in the system. Neither
is followed by a System.exit() and I think they can be replaced by
error()/debug() combinations

Error:
- include a reason why the system can not execute a user request or
why part of the system is degraded.
- should *not* include a stack trace

Warn:
- include a reason why the system might not behave as the user is
expecting. (configuration or integration missing for example,)
- should *not* include a stack trace

Info:
- is for informationals to the operator like resource lifecycle
messages or deployment planning considderations
- should *not* include a stack trace

Debug:
-  information to be turned on when trouble shooting or developing.
This information should not be needed for normal operation
- if a thrown exception is caught and not rethrown, the stack trace of
such exception

Trace:
- whatever you feel helps you during development of a
feature/enhancement/fix/improvement that is not needed in any kind of
operational way.

As I think the two fatal()s we have are not genuine, I think stack
traces are only for Debug and Trace level, where Debug can be
temporarily turned on in production and Trace is fro developers
exclusively,

I realise the current state of the system is far from this behaviour,
but i like to think we can grow in the direction as I describe and I
hope we can refine this to a generic guideline to be used in reviews.

Re: [DISCUSS] log guidelines

Reply via email to