Hello Team, I am one of many people responsible for supporting the Hadoop products out in the field. Error handling and logging are crucial to my success. I've been reading over the code and I see many of the same mistakes again and again. I just wanted to bring some of these things to your attention so that moving forward, we can make these products better.
The general best-practice is: public class TestExceptionLogging { private static final Logger LOG = LoggerFactory.getLogger(TestExceptionLogging.class); public static void main(String[] args) { try { processData(); } catch (Exception e) { LOG.error("Application failed", e); } } public static void processData() throws Exception { try { readData(); } catch (Exception e) { throw new Exception("Failed to process data", e); } } public static byte[] readData() throws Exception { throw new IOException("Failed to read device"); } } Produces: [main] ERROR TestExceptionLogging - Application failed java.lang.Exception: Failed to process data at TestExceptionLogging.processData(TestExceptionLogging.java:22) at TestExceptionLogging.main(TestExceptionLogging.java:12) Caused by: java.io.IOException: Failed to read device at TestExceptionLogging.readData(TestExceptionLogging.java:27) at TestExceptionLogging.processData(TestExceptionLogging.java:20) ... 1 more Please notice that when an exception is thrown, and caught, it is wrapped at each level and each level adds some more context to describe what was happening when the error occurred. It also produces a complete stack trace that pinpoints the issue. For Hive folks, it is rarely the case that a method consuming a HMS API call should itself throw a MetaException. The MetaException has no way of wrapping an underlying Exception and helpful data is often loss. A method may chooses to wrap a MetaException, but it should not be throwing them around as the default behavior. Also important to note is that there is exactly one place that is doing the logging. There does not need to be any logging at the lower levels. A catch block should throw or log, not both. This is an anti-pattern and annoying as the end user: having to deal with multiple stack traces at different log levels for the same error condition. The log message should be at the highest level only. https://community.oracle.com/docs/DOC-983543#logAndThrow Both projects use SLF4J as the logging framework (facade anyway). Please familiarize yourself with how to correctly log an Exception. There is no need to log a thread name, a time stamp, a class name, or a stack trace. The logging framework will do that all for you. http://www.slf4j.org/faq.html#paramException Again, there is no need to 'stringify' an exception. For example, do not use this: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/StringUtils.java#L86 If you do however want to dump a stack trace for debugging (or trace) purposes, consider performing the following: if (LOG.isDebugEnabled()) { LOG.debug("Dump Thread Stack", new Exception("Thread Stack Trace (Not an Error)")); } Finally, I've seen it a couple of times in Apache project that enabling debug-level logging causes the application to emit logs at other levels, for example: LOG.warn("Some error occurred: {}", e.getMessage()); if (LOG.isDebugEnabled()) { LOG. warn("Dump Warning Thread Stack", e); } Please refrain from doing this. The inner log statement should be at DEBUG level to match the check. Otherwise, when I enable DEBUG logging in the application, the expectation that I have is that I will have the exact logging as the INFO level, but I will also have additional DEBUG details as well. I am going to be using 'grep' to find DEBUG and I will miss this additional logging. I will then be peeved when I say "ah ha! It's a WARN log message, I don't need DEBUG enabled to get this logging when the issue happens again!"... but then I restart the application and discover I do in fact need DEBUG logging enabled. Finally, finally, do not pass server-generated stack traces to a client application. When a client dumps a stack trace, it is not always trivial to decide if the stack trace is in regards to the client or the server. This can also be a security issue as it may hint to the client which libraries and (based on line numbers) which versions of the libraries are being used. If the server is going to generate an error in response to a client API call, the server should log the exception (right before returning the response to the client) and it should only pass a helpful error message to the client. The operator/administrator can look in the server logs for the details. Thanks!