Max Gekk created SPARK-57386:
--------------------------------
Summary: Render nanosecond timestamp types in HiveResult through
the Types Framework
Key: SPARK-57386
URL: https://issues.apache.org/jira/browse/SPARK-57386
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
h2. Background
The nanosecond timestamp types {{TIMESTAMP_NTZ(p)}} and {{TIMESTAMP_LTZ(p)}}
(preview
feature under SPARK-56822) are implemented solely through the Types Framework.
External-value
rendering for the framework is centralized in {{TypeApiOps.formatExternal}},
which already backs
Row JSON ({{Row.json}} / {{Row.prettyJson}}).
{{HiveResult.toHiveString}} dispatches through the framework first
({{TypeApiOps(dt).flatMap(_.formatExternal(value, nested))}}) and falls back to
the legacy
{{toHiveStringDefault}}. However, the nanos ops deliberately override the
two-arg
{{formatExternal(value, nested)}} to return {{None}}, so HiveResult instead
renders nanos through
inline pattern-matching in {{toHiveStringDefault}}. That duplicates the
formatter logic and was
documented in code as a temporary split "until nanos external rendering is
unified across the
zone-less (Row JSON) and zone-aware (Hive) paths".
h2. Goal
Unify nanosecond timestamp rendering in HiveResult onto the Types Framework,
and remove the inline
duplicate. The nanos types are a Types Framework feature and must NOT be
supported in HiveResult
when the framework is disabled. They are gated by
{{timestampNanosTypesEnabled = timestampNanosTypes.enabled &&
types.framework.enabled}}, so a
nanos column cannot exist while the framework is off; the inline cases are
therefore dead code in
that mode and redundant when the framework is on.
h2. Changes
* {{TimestampNanosTypeApiOps}}: remove the {{formatExternal(value, nested) =
None}} override so the
Hive path shares each subclass's single-arg {{formatExternal}} renderer (the
same one Row JSON
uses). {{nested}} does not affect timestamp formatting.
* {{HiveResult.toHiveStringDefault}}: remove the inline
{{TimestampLTZNanosType}} /
{{TimestampNTZNanosType}} cases. The legacy path keeps no nanos handling, so
a nanos value that
somehow reaches it (only possible with the framework off, which the gating
forbids) is
unsupported rather than silently rendered.
* {{TypeApiOps}}: update the two-arg {{formatExternal}} scaladoc to reflect
that Hive now shares the
single-arg renderer.
h2. Non-goals / notes
* {{TIME}} already renders through the framework when it is enabled (its
single-arg
{{formatExternal}} returns a value and the two-arg overload delegates to it);
no change. The
inline {{LocalTime}} case remains as the framework-disabled fallback, since
{{TimeType}} is GA and
exists independently of the framework flag.
* No user-facing output change: nanos Hive output is identical (zone-aware LTZ,
zone-independent
NTZ, precision flooring, trailing-zero trimming). Existing
{{HiveResultSuite}} "SPARK-57257" tests
cover precision 7/8/9, pre-1970 epochs, nested arrays/maps/structs, NULLs,
and session-zone vs
zone-independent rendering, and now exercise the framework path.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]