Re: [DISCUSS] Add error IDs

Wenchen Fan Wed, 21 Apr 2021 07:33:08 -0700

I think severity makes sense for logs, but not sure about errors.

+1 to the proposal to improve the error message further.


On Fri, Apr 16, 2021 at 6:01 PM Yuming Wang <wgy...@gmail.com> wrote:

> +1 for this proposal.
>
> On Fri, Apr 16, 2021 at 5:15 AM Karen <karenfeng...@gmail.com> wrote:
>
>> We could leave space in the numbering system, but a more flexible method
>> may be to have the severity as a field associated with the error class -
>> the same way we would associate error ID with SQLSTATE, or with whether an
>> error is user-facing or internal. As you noted, I don't believe there is a
>> standard framework for hints/warnings in Spark today. I propose that we
>> leave out severity as a field until there is sufficient demand. We will
>> leave room in the format for other fields.
>>
>> On Thu, Apr 15, 2021 at 3:18 AM Steve Loughran
>> <ste...@cloudera.com.invalid> wrote:
>>
>>>
>>> Machine readable logs are always good, especially if you can read the
>>> entire logs into an SQL query.
>>>
>>> It might be good to use some specific differentiation between
>>> hint/warn/fatal error in the numbering so that any automated analysis of
>>> the logs can identify the class of an error even if its an error not
>>> actually recognised. See VMS docs for an example of this; that in Windows
>>> is apparently based on their work
>>>
>>> https://www.stsci.edu/ftp/documents/system-docs/vms-guide/html/VUG_19.html
>>> . Even if things are only errors for now, leaving room in the format for
>>> other levels is wise.
>>>
>>> The trend in cloud infras is always to have some string "NoSuchBucket"
>>> which is (a) guaranteed to be maintained over time and (b) searchable in
>>> google.
>>>
>>> (That said. AWS has every service not just making up their own values
>>> but not even consistent responses for the same problem. S3 throttling: 503.
>>> DynamoDB: 500 + one of two different messages. see
>>> com.amazonaws.retry.RetryUtils for the details )
>>>
>>> On Wed, 14 Apr 2021 at 20:04, Karen <karenfeng...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> We would like to kick off a discussion on adding error IDs to Spark.
>>>>
>>>> Proposal:
>>>>
>>>> Add error IDs to provide a language-agnostic, locale-agnostic,
>>>> specific, and succinct answer for which class the problem falls under. When
>>>> partnered with a text-based error class (eg. 12345
>>>> TABLE_OR_VIEW_NOT_FOUND), error IDs can provide meaningful categorization.
>>>> They are useful for all Spark personas: from users, to support engineers,
>>>> to developers.
>>>>
>>>> Add SQLSTATEs. As discussed in #32013
>>>> <https://github.com/apache/spark/pull/32013>, SQLSTATEs
>>>> <https://docs.teradata.com/r/EClCkxtGMW6hxXXtL8sBfA/ZDOZe5cOpMSSNnWOg8iLyw>
>>>> are portable error codes that are part of the ANSI/ISO SQL-99 standard
>>>> <https://github.com/apache/spark/files/6236838/ANSI.pdf>, and
>>>> especially useful for JDBC/ODBC users. They are not mutually exclusive with
>>>> adding product-specific error IDs, which can be more specific; for example,
>>>> MySQL uses an N-1 mapping from error IDs to SQLSTATEs:
>>>> https://dev.mysql.com/doc/refman/8.0/en/error-message-elements.html.
>>>>
>>>> Uniquely link error IDs to error messages (1-1). This simplifies the
>>>> auditing process and ensures that we uphold quality standards, as outlined
>>>> in SPIP: Standardize Error Message in Spark (
>>>> https://docs.google.com/document/d/1XGj1o3xAFh8BA7RCn3DtwIPC6--hIFOaNUNSlpaOIZs/edit
>>>> ).
>>>>
>>>> Requirements:
>>>>
>>>> Changes are backwards compatible; developers should still be able to
>>>> throw exceptions in the existing style (eg. throw new
>>>> AnalysisException(“Arbitrary error message.”)). Adding error IDs will be a
>>>> gradual process, as there are thousands of exceptions thrown across the
>>>> code base.
>>>>
>>>> Optional:
>>>>
>>>> Label errors as user-facing or internal. Internal errors should be
>>>> logged, and end-users should be aware that they likely cannot fix the error
>>>> themselves.
>>>>
>>>> End result:
>>>>
>>>> Before:
>>>>
>>>> AnalysisException: Cannot find column ‘fakeColumn’; line 1 pos 14;
>>>>
>>>> After:
>>>>
>>>> AnalysisException: SPK-12345 COLUMN_NOT_FOUND: Cannot find column
>>>> ‘fakeColumn’; line 1 pos 14; (SQLSTATE 42704)
>>>>
>>>> Please let us know what you think about this proposal! We’d love to
>>>> hear what you think.
>>>>
>>>> Best,
>>>>
>>>> Karen Feng
>>>>
>>>

Re: [DISCUSS] Add error IDs

Reply via email to