Max Gekk created SPARK-57831:
--------------------------------
Summary: Align Hive-metastore compatibility classification for
nanosecond timestamp types
Key: SPARK-57831
URL: https://issues.apache.org/jira/browse/SPARK-57831
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond
precision).
h2. Problem
{{HiveExternalCatalog.isHiveCompatibleDataType}} (~L1543-1558) returns false
for microsecond {{TimestampNTZType}}, but the nanosecond timestamp types fall
through to {{case _ => true}}, so a nanosecond-column table may take the
Hive-compatible metastore path and store {{timestamp_ntz(9)}} /
{{timestamp_ltz(9)}} in the HMS {{FieldSchema}} (via {{toHiveColumn}} /
{{catalogString}}) - unlike microsecond NTZ, which stores its schema in table
properties. The USING-datasource schema itself round-trips via the
{{spark.sql.sources.schema}} JSON, but the HMS {{FieldSchema}} type strings may
be non-standard for Hive.
h2. Goal
Treat nanosecond types like {{TimestampNTZType}} ({{isHiveCompatibleDataType =
false}}) so the schema is stored in table properties, or explicitly justify and
test the {{FieldSchema}} round-trip.
h2. Scope
Add nanosecond arms to {{isHiveCompatibleDataType}}; add HMS {{toHiveColumn}}
<-> {{fromHiveColumn}} round-trip tests; add a {{CREATE TABLE ... USING
parquet}} metastore reload e2e test.
h2. Acceptance criteria
* Nanosecond-column tables persist/reload from HMS correctly; classification
matches microsecond NTZ.
h2. Testing
{{HiveExternalCatalogSuite}}, {{MetastoreDataSourcesSuite}}.
h2. Dependencies
Relates to the preview-flag metastore-read-policy sub-task (same
catalog-persistence area); can land independently.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]