[
https://issues.apache.org/jira/browse/SPARK-57748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Max Gekk resolved SPARK-57748.
------------------------------
Fix Version/s: 4.3.0
Resolution: Fixed
Issue resolved by pull request 56888
[https://github.com/apache/spark/pull/56888]
> Use a dedicated tree-pattern bit for the TIME -> TIMESTAMP_NTZ cast rewrite
> in ComputeCurrentTime
> -------------------------------------------------------------------------------------------------
>
> Key: SPARK-57748
> URL: https://issues.apache.org/jira/browse/SPARK-57748
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Assignee: Anupam Yadav
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.3.0
>
>
> h2. Summary
> Replace the broad {{containsPattern(CAST)}} pruning condition in the
> {{ComputeCurrentTime}}
> optimizer rule with a dedicated tree-pattern bit, so the rule only descends
> into plans that
> actually contain a cast to the {{TIMESTAMP_NTZ}} family instead of any plan
> that contains any cast.
> h2. Background
> SPARK-57618 added {{CAST(TIME(p) AS TIMESTAMP_NTZ(q))}}, whose date fields
> come from
> {{CURRENT_DATE}}. To keep the value query-stable, {{ComputeCurrentTime}}
> rewrites such casts into a
> date+time builder anchored on the same current-date literal as
> {{current_date()}}.
> For the rule to visit those casts, its pruning predicate was widened to:
> {code:scala}
> bits.containsPattern(CURRENT_LIKE) || bits.containsPattern(CAST)
> {code}
> Tagging the {{Cast}} with {{CURRENT_LIKE}} was rejected earlier because that
> pattern has shared
> semantics (e.g. inline-table validation in {{EvaluateUnresolvedInlineTable}}
> treats {{CURRENT_LIKE}}
> expressions as safe to defer, which would let unrelated non-foldable
> NTZ-target casts such as
> {{CAST(rand() AS TIMESTAMP_NTZ)}} bypass validation).
> The {{CAST}} fallback is correct but defeats pruning: casts are present in
> almost every query, so
> {{ComputeCurrentTime}} now traverses the full expression tree of essentially
> every plan even though
> the {{TIME -> TIMESTAMP_NTZ}} rewrite fires rarely.
> h2. Proposal
> Introduce a dedicated {{TreePattern}} (e.g. {{CAST_TO_TIMESTAMP_NTZ}}) and:
> * Tag it in {{Cast.nodePatternsInternal}} keyed on the *target* type only
> (the {{TIMESTAMP_NTZ}} /
> {{TIMESTAMP_NTZ(p)}} families), never on {{child.dataType}}.
> * Prune {{ComputeCurrentTime}} on {{CURRENT_LIKE || CAST_TO_TIMESTAMP_NTZ}}
> instead of {{CAST}}.
> The node-level {{Cast.isTimeToTimestampNTZ}} guard stays, so only {{TIME ->
> TIMESTAMP_NTZ}} casts are
> actually rewritten.
> h2. Constraints / notes
> * The tag must be keyed on the target type, not the source:
> {{nodePatternsInternal}} is computed
> eagerly at {{Cast}} construction, before the child is resolved, and reading
> {{child.dataType}} there
> can throw even when {{child.resolved}} is true (e.g. an {{OuterReference}}
> wrapping an unresolved
> attribute - the {{makeSQLTableFunctionPlan}} / {{sql-udf.sql}} crash seen
> during SPARK-57618). The
> target type is always safe to read.
> * This still slightly over-tags (all NTZ-target casts, not strictly {{TIME
> ->}}), but the bit is
> dedicated with no other consumers, so it cannot leak into inline-table
> validation, streaming
> {{CURRENT_LIKE}} handling, or {{ReplaceCurrentLike}}. Full source precision
> is not safely achievable
> at construction time.
> h2. Testing
> * Keep the existing inline-table regression test ({{CAST(rand() AS
> TIMESTAMP_NTZ)}} still rejected).
> * Add a {{ComputeCurrentTimeSuite}} assertion that a plan whose only casts
> are unrelated
> (e.g. {{string -> int}}) is left untouched, while {{TIME -> TIMESTAMP_NTZ}}
> is still rewritten to a
> date literal consistent with {{current_date()}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]