[
https://issues.apache.org/jira/browse/SPARK-57340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-57340:
-----------------------------------
Labels: pull-request-available (was: )
> Support EXTRACT/date_part HOUR, MINUTE and SECOND over nanosecond-precision
> timestamps
> --------------------------------------------------------------------------------------
>
> Key: SPARK-57340
> URL: https://issues.apache.org/jira/browse/SPARK-57340
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Priority: Major
> Labels: pull-request-available
>
> h2. Background
> SPARK-57315 added support for the {{hour()}}, {{minute()}} and {{second()}}
> functions over the nanosecond-precision timestamp types {{TIMESTAMP_NTZ(p)}} /
> {{TIMESTAMP_LTZ(p)}} ({{p}} in {{[7, 9]}}), part of the nanosecond timestamp
> preview (SPARK-56822). It did so by casting the nanosecond input down to the
> matching microsecond timestamp type inside {{HourExpressionBuilder}},
> {{MinuteExpressionBuilder}} and {{SecondExpressionBuilder}}.
> h2. Problem
> {{EXTRACT(field FROM source)}} and {{date_part(field, source)}} are supposed
> to
> be equivalent to {{hour()}} / {{minute()}} / {{second()}}, but they do not go
> through those function builders. They resolve via {{Extract}} ->
> {{DatePart.parseExtractField}}, which constructs the core expressions
> directly on
> the raw source:
> {code:scala}
> case "HOUR" | "H" | "HOURS" | "HR" | "HRS" => Hour(source)
> case "MINUTE" | "M" | "MIN" | "MINS" | "MINUTES" => Minute(source)
> case "SECOND" | "S" | "SEC" | "SECONDS" | "SECS" => SecondWithFraction(source)
> {code}
> There is no nanosecond -> microsecond cast on this path, and {{Hour}} /
> {{Minute}} /
> {{Second}} (via {{GetTimeField}}) only accept {{AnyTimestampType}}, which is
> the
> microsecond {{TimestampType}} / {{TimestampNTZType}} only. As a result, with
> the
> {{spark.sql.timestampNanosTypes.enabled}} flag on:
> {code:sql}
> SELECT EXTRACT(HOUR FROM TIMESTAMP_NTZ '2018-02-14 12:58:59.123456789');
> -- fails analysis with a type-check error, while hour(...) returns 12
> {code}
> h2. Expected
> * {{EXTRACT(HOUR|MINUTE FROM <nanos ts>)}} and the {{date_part}} equivalents
> return the same integer results as {{hour()}} / {{minute()}}, by reusing the
> existing nanosecond -> microsecond cast (e.g.
> {{NanosTimestampCast.castToMicros}})
> in the {{EXTRACT}} resolution path ({{DatePart.parseExtractField}}).
> * Decide and document the semantics of {{EXTRACT(SECOND FROM <nanos ts>)}}:
> unlike {{second()}}, it maps to {{SecondWithFraction}} returning {{DECIMAL(8,
> 6)}},
> which is the sub-microsecond-dependent path intentionally excluded from
> SPARK-57315. Either truncate to microseconds or widen the result to carry the
> nanosecond fraction.
> h2. Scope
> * {{sql/catalyst/.../expressions/datetimeExpressions.scala}}:
> {{DatePart.parseExtractField}} (and the {{Extract}} / {{date_part}} paths).
> h2. Tests
> * Golden tests for {{EXTRACT(HOUR|MINUTE|SECOND FROM ...)}} and {{date_part}}
> over {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}}, mirroring the {{hour}} /
> {{minute}} / {{second}} cases added in SPARK-57315.
> h2. Notes
> Follow-up to SPARK-57315, under the nanosecond timestamp preview umbrella
> SPARK-56822. Raised during review of PR #56368.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]