Max Gekk created SPARK-57339:
--------------------------------

             Summary: Format nanosecond-precision timestamp literals in 
Literal.toString and Literal.sql
                 Key: SPARK-57339
                 URL: https://issues.apache.org/jira/browse/SPARK-57339
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk


h2. Background

As part of the nanosecond timestamp preview (SPARK-56822), the types
{{TIMESTAMP_NTZ(p)}} and {{TIMESTAMP_LTZ(p)}} (with {{p}} in {{[7, 9]}}) are
represented by literal values of type {{TimestampNanosVal}}.

In {{Literal}}, both {{toString}} and {{sql}} have explicit, nicely-formatted
cases for every other temporal literal type ({{DateType}}, {{TimeType}},
{{TimestampType}}, {{TimestampNTZType}}), but the two nanosecond timestamp types
have no dedicated case and fall through to a generic default:

* {{Literal.toString}} -> {{case _ => other.toString}}, i.e. it prints the raw
{{TimestampNanosVal.toString}}.
* {{Literal.sql}} -> no {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} case at all.

h2. Problem

The raw {{TimestampNanosVal}} representation leaks into user-facing output such
as analyzed plans, schemas and generated SQL. For example, the analyzer result
of {{SELECT hour(TIMESTAMP_LTZ '2020-01-01 13:24:35.123456789')}} contains:

{code:sql}
Project [hour(cast(TimestampNanosVal(1577913875123456, 789) as timestamp), ...) 
AS hour(TimestampNanosVal(1577913875123456, 789))#x]
{code}

instead of a readable, round-trippable literal.

This was raised during the review of [PR 
#56368|https://github.com/apache/spark/pull/56368].

h2. Expected

Add explicit cases for the nanosecond timestamp types so that the formatting is
consistent with the microsecond timestamp types:

* {{Literal.toString}} renders the value as a formatted timestamp string with up
to 9 fractional digits.
* {{Literal.sql}} emits typed literals, e.g.
{{TIMESTAMP_NTZ '2018-02-14 12:58:59.123456789'}} /
{{TIMESTAMP_LTZ '2020-01-01 13:24:35.123456789'}}.

Also review the other {{value}}/{{dataType}} match sites in {{Literal}} (e.g.
{{jsonFields}}, {{default}}, codegen) for the same missing nanos cases.

h2. Scope

* {{sql/catalyst/.../expressions/literals.scala}}: {{Literal.toString}} and
{{Literal.sql}} (and any related match sites).
* A formatter producing nanosecond precision for the new types.

h2. Tests

* Unit tests for {{Literal.toString}} / {{Literal.sql}} over
{{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} for {{p}} in {{[7, 9]}}.
* Update affected golden files (e.g. {{timestamp-ltz-nanos.sql.out}}) once the
formatting changes.

h2. Notes

This is a follow-up/cleanup item under the nanosecond timestamp preview
umbrella (SPARK-56822) and is independent of the HOUR/MINUTE/SECOND support
added in SPARK-57315.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to