Max Gekk created SPARK-57386:
--------------------------------

             Summary: Render nanosecond timestamp types in HiveResult through 
the Types Framework
                 Key: SPARK-57386
                 URL: https://issues.apache.org/jira/browse/SPARK-57386
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk


h2. Background

The nanosecond timestamp types {{TIMESTAMP_NTZ(p)}} and {{TIMESTAMP_LTZ(p)}} 
(preview
feature under SPARK-56822) are implemented solely through the Types Framework. 
External-value
rendering for the framework is centralized in {{TypeApiOps.formatExternal}}, 
which already backs
Row JSON ({{Row.json}} / {{Row.prettyJson}}).

{{HiveResult.toHiveString}} dispatches through the framework first
({{TypeApiOps(dt).flatMap(_.formatExternal(value, nested))}}) and falls back to 
the legacy
{{toHiveStringDefault}}. However, the nanos ops deliberately override the 
two-arg
{{formatExternal(value, nested)}} to return {{None}}, so HiveResult instead 
renders nanos through
inline pattern-matching in {{toHiveStringDefault}}. That duplicates the 
formatter logic and was
documented in code as a temporary split "until nanos external rendering is 
unified across the
zone-less (Row JSON) and zone-aware (Hive) paths".

h2. Goal

Unify nanosecond timestamp rendering in HiveResult onto the Types Framework, 
and remove the inline
duplicate. The nanos types are a Types Framework feature and must NOT be 
supported in HiveResult
when the framework is disabled. They are gated by
{{timestampNanosTypesEnabled = timestampNanosTypes.enabled && 
types.framework.enabled}}, so a
nanos column cannot exist while the framework is off; the inline cases are 
therefore dead code in
that mode and redundant when the framework is on.

h2. Changes

* {{TimestampNanosTypeApiOps}}: remove the {{formatExternal(value, nested) = 
None}} override so the
  Hive path shares each subclass's single-arg {{formatExternal}} renderer (the 
same one Row JSON
  uses). {{nested}} does not affect timestamp formatting.
* {{HiveResult.toHiveStringDefault}}: remove the inline 
{{TimestampLTZNanosType}} /
  {{TimestampNTZNanosType}} cases. The legacy path keeps no nanos handling, so 
a nanos value that
  somehow reaches it (only possible with the framework off, which the gating 
forbids) is
  unsupported rather than silently rendered.
* {{TypeApiOps}}: update the two-arg {{formatExternal}} scaladoc to reflect 
that Hive now shares the
  single-arg renderer.

h2. Non-goals / notes

* {{TIME}} already renders through the framework when it is enabled (its 
single-arg
  {{formatExternal}} returns a value and the two-arg overload delegates to it); 
no change. The
  inline {{LocalTime}} case remains as the framework-disabled fallback, since 
{{TimeType}} is GA and
  exists independently of the framework flag.
* No user-facing output change: nanos Hive output is identical (zone-aware LTZ, 
zone-independent
  NTZ, precision flooring, trailing-zero trimming). Existing 
{{HiveResultSuite}} "SPARK-57257" tests
  cover precision 7/8/9, pre-1970 epochs, nested arrays/maps/structs, NULLs, 
and session-zone vs
  zone-independent rendering, and now exercise the framework path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to