Max Gekk created SPARK-57340:
--------------------------------

             Summary: Support EXTRACT/date_part HOUR, MINUTE and SECOND over 
nanosecond-precision timestamps
                 Key: SPARK-57340
                 URL: https://issues.apache.org/jira/browse/SPARK-57340
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk


h2. Background

SPARK-57315 added support for the {{hour()}}, {{minute()}} and {{second()}}
functions over the nanosecond-precision timestamp types {{TIMESTAMP_NTZ(p)}} /
{{TIMESTAMP_LTZ(p)}} ({{p}} in {{[7, 9]}}), part of the nanosecond timestamp
preview (SPARK-56822). It did so by casting the nanosecond input down to the
matching microsecond timestamp type inside {{HourExpressionBuilder}},
{{MinuteExpressionBuilder}} and {{SecondExpressionBuilder}}.

h2. Problem

{{EXTRACT(field FROM source)}} and {{date_part(field, source)}} are supposed to
be equivalent to {{hour()}} / {{minute()}} / {{second()}}, but they do not go
through those function builders. They resolve via {{Extract}} ->
{{DatePart.parseExtractField}}, which constructs the core expressions directly 
on
the raw source:

{code:scala}
case "HOUR" | "H" | "HOURS" | "HR" | "HRS" => Hour(source)
case "MINUTE" | "M" | "MIN" | "MINS" | "MINUTES" => Minute(source)
case "SECOND" | "S" | "SEC" | "SECONDS" | "SECS" => SecondWithFraction(source)
{code}

There is no nanosecond -> microsecond cast on this path, and {{Hour}} / 
{{Minute}} /
{{Second}} (via {{GetTimeField}}) only accept {{AnyTimestampType}}, which is the
microsecond {{TimestampType}} / {{TimestampNTZType}} only. As a result, with the
{{spark.sql.timestampNanosTypes.enabled}} flag on:

{code:sql}
SELECT EXTRACT(HOUR FROM TIMESTAMP_NTZ '2018-02-14 12:58:59.123456789');
-- fails analysis with a type-check error, while hour(...) returns 12
{code}

h2. Expected

* {{EXTRACT(HOUR|MINUTE FROM <nanos ts>)}} and the {{date_part}} equivalents
return the same integer results as {{hour()}} / {{minute()}}, by reusing the
existing nanosecond -> microsecond cast (e.g. 
{{NanosTimestampCast.castToMicros}})
in the {{EXTRACT}} resolution path ({{DatePart.parseExtractField}}).
* Decide and document the semantics of {{EXTRACT(SECOND FROM <nanos ts>)}}:
unlike {{second()}}, it maps to {{SecondWithFraction}} returning {{DECIMAL(8, 
6)}},
which is the sub-microsecond-dependent path intentionally excluded from
SPARK-57315. Either truncate to microseconds or widen the result to carry the
nanosecond fraction.

h2. Scope

* {{sql/catalyst/.../expressions/datetimeExpressions.scala}}:
{{DatePart.parseExtractField}} (and the {{Extract}} / {{date_part}} paths).

h2. Tests

* Golden tests for {{EXTRACT(HOUR|MINUTE|SECOND FROM ...)}} and {{date_part}}
over {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}}, mirroring the {{hour}} /
{{minute}} / {{second}} cases added in SPARK-57315.

h2. Notes

Follow-up to SPARK-57315, under the nanosecond timestamp preview umbrella
SPARK-56822. Raised during review of PR #56368.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to