Machine readable logs are always good, especially if you can read the entire logs into an SQL query.
It might be good to use some specific differentiation between hint/warn/fatal error in the numbering so that any automated analysis of the logs can identify the class of an error even if its an error not actually recognised. See VMS docs for an example of this; that in Windows is apparently based on their work https://www.stsci.edu/ftp/documents/system-docs/vms-guide/html/VUG_19.html . Even if things are only errors for now, leaving room in the format for other levels is wise. The trend in cloud infras is always to have some string "NoSuchBucket" which is (a) guaranteed to be maintained over time and (b) searchable in google. (That said. AWS has every service not just making up their own values but not even consistent responses for the same problem. S3 throttling: 503. DynamoDB: 500 + one of two different messages. see com.amazonaws.retry.RetryUtils for the details ) On Wed, 14 Apr 2021 at 20:04, Karen <karenfeng...@gmail.com> wrote: > Hi all, > > We would like to kick off a discussion on adding error IDs to Spark. > > Proposal: > > Add error IDs to provide a language-agnostic, locale-agnostic, specific, > and succinct answer for which class the problem falls under. When partnered > with a text-based error class (eg. 12345 TABLE_OR_VIEW_NOT_FOUND), error > IDs can provide meaningful categorization. They are useful for all Spark > personas: from users, to support engineers, to developers. > > Add SQLSTATEs. As discussed in #32013 > <https://github.com/apache/spark/pull/32013>, SQLSTATEs > <https://docs.teradata.com/r/EClCkxtGMW6hxXXtL8sBfA/ZDOZe5cOpMSSNnWOg8iLyw> > are portable error codes that are part of the ANSI/ISO SQL-99 standard > <https://github.com/apache/spark/files/6236838/ANSI.pdf>, and especially > useful for JDBC/ODBC users. They are not mutually exclusive with adding > product-specific error IDs, which can be more specific; for example, MySQL > uses an N-1 mapping from error IDs to SQLSTATEs: > https://dev.mysql.com/doc/refman/8.0/en/error-message-elements.html. > > Uniquely link error IDs to error messages (1-1). This simplifies the > auditing process and ensures that we uphold quality standards, as outlined > in SPIP: Standardize Error Message in Spark ( > https://docs.google.com/document/d/1XGj1o3xAFh8BA7RCn3DtwIPC6--hIFOaNUNSlpaOIZs/edit > ). > > Requirements: > > Changes are backwards compatible; developers should still be able to throw > exceptions in the existing style (eg. throw new > AnalysisException(“Arbitrary error message.”)). Adding error IDs will be a > gradual process, as there are thousands of exceptions thrown across the > code base. > > Optional: > > Label errors as user-facing or internal. Internal errors should be logged, > and end-users should be aware that they likely cannot fix the error > themselves. > > End result: > > Before: > > AnalysisException: Cannot find column ‘fakeColumn’; line 1 pos 14; > > After: > > AnalysisException: SPK-12345 COLUMN_NOT_FOUND: Cannot find column > ‘fakeColumn’; line 1 pos 14; (SQLSTATE 42704) > > Please let us know what you think about this proposal! We’d love to hear > what you think. > > Best, > > Karen Feng >