Shekhar Prasad Rajak created SPARK-51753: --------------------------------------------
Summary: Spark , Avro and Iceberg Timestamp , Timestamp NTZ definition need more clarifications Key: SPARK-51753 URL: https://issues.apache.org/jira/browse/SPARK-51753 Project: Spark Issue Type: Question Components: Documentation, Java API, Spark Core Affects Versions: 3.5.5 Reporter: Shekhar Prasad Rajak * There is no definition added in spark for tiemstamp datatype variants : [https://spark.apache.org/docs/3.5.0/sql-data-sources-avro.html#supported-types-for-spark-sql---avro-conversion] * Avro spec : [https://avro.apache.org/docs/1.12.0/specification/] {quote} {quote} {quote}{{Timestamps The timestamp-\{millis,micros,nanos} logical type represents an instant on the global timeline, independent of a particular time zone or calendar. Upon reading a value back, we can only reconstruct the instant, but not the original representation. In practice, such timestamps are typically displayed to users in their local time zones, therefore they may be displayed differently depending on the execution environment. timestamp-millis: logical type annotates an Avro long, where the long stores the number of milliseconds from the unix epoch, 1 January 1970 00:00:00.000. timestamp-micros: logical type annotates an Avro long, where the long stores the number of microseconds from the unix epoch, 1 January 1970 00:00:00.000000. timestamp-nanos: logical type annotates an Avro long, where the long stores the number of nanoseconds from the unix epoch, 1 January 1970 00:00:00.000000000. Example: Given an event at noon local time (12:00) on January 1, 2000, in Helsinki where the local time was two hours east of UTC (UTC+2). The timestamp is first shifted to UTC 2000-01-01T10:00:00 and that is then converted to Avro long 946720800000 (milliseconds) and written. Local Timestamps The local-timestamp-\{millis,micros,nanos} logical type represents a timestamp in a local timezone, regardless of what specific time zone is considered local. local-timestamp-millis: logical type annotates an Avro long, where the long stores the number of milliseconds, from 1 January 1970 00:00:00.000. local-timestamp-micros: logical type annotates an Avro long, where the long stores the number of microseconds, from 1 January 1970 00:00:00.000000. local-timestamp-nanos: logical type annotates an Avro long, where the long stores the number of nanoseconds, from 1 January 1970 00:00:00.000000000. Example: Given an event at noon local time (12:00) on January 1, 2000, in Helsinki where the local time was two hours east of UTC (UTC+2). The timestamp is converted to Avro long 946728000000 (milliseconds) and then written.}} {quote} * Iceberg Spec for all the timestamp types : [https://iceberg.apache.org/spec/#avro] {quote} {quote} {quote}{{timestamp | Timestamp, microsecond precision, without timezone timestamptz | Timestamp, microsecond precision, with timezone}} {quote} But spark 3 treat non timezone as TimestampNTZType but the avro serialiser will treat TimestampTZType as logicaltype *timestamp-millis* and TimestampNTZType as *local-timestamp-millis* -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org