Shekhar Prasad Rajak created SPARK-51753:
--------------------------------------------

             Summary: Spark , Avro and Iceberg Timestamp , Timestamp NTZ 
definition need more clarifications
                 Key: SPARK-51753
                 URL: https://issues.apache.org/jira/browse/SPARK-51753
             Project: Spark
          Issue Type: Question
          Components: Documentation, Java API, Spark Core
    Affects Versions: 3.5.5
            Reporter: Shekhar Prasad Rajak


* There is no definition added in spark for tiemstamp datatype variants : 
[https://spark.apache.org/docs/3.5.0/sql-data-sources-avro.html#supported-types-for-spark-sql---avro-conversion]
 * Avro spec : [https://avro.apache.org/docs/1.12.0/specification/]

 
{quote} 
{quote}
{quote}{{Timestamps
The timestamp-\{millis,micros,nanos} logical type represents an instant on the 
global timeline, independent of a particular time zone or calendar. Upon 
reading a value back, we can only reconstruct the instant, but not the original 
representation. In practice, such timestamps are typically displayed to users 
in their local time zones, therefore they may be displayed differently 
depending on the execution environment.

timestamp-millis: logical type annotates an Avro long, where the long stores 
the number of milliseconds from the unix epoch, 1 January 1970 00:00:00.000.
timestamp-micros: logical type annotates an Avro long, where the long stores 
the number of microseconds from the unix epoch, 1 January 1970 00:00:00.000000.
timestamp-nanos: logical type annotates an Avro long, where the long stores the 
number of nanoseconds from the unix epoch, 1 January 1970 00:00:00.000000000.
Example: Given an event at noon local time (12:00) on January 1, 2000, in 
Helsinki where the local time was two hours east of UTC (UTC+2). The timestamp 
is first shifted to UTC 2000-01-01T10:00:00 and that is then converted to Avro 
long 946720800000 (milliseconds) and written.

Local Timestamps
The local-timestamp-\{millis,micros,nanos} logical type represents a timestamp 
in a local timezone, regardless of what specific time zone is considered local.

local-timestamp-millis: logical type annotates an Avro long, where the long 
stores the number of milliseconds, from 1 January 1970 00:00:00.000.
local-timestamp-micros: logical type annotates an Avro long, where the long 
stores the number of microseconds, from 1 January 1970 00:00:00.000000.
local-timestamp-nanos: logical type annotates an Avro long, where the long 
stores the number of nanoseconds, from 1 January 1970 00:00:00.000000000.
Example: Given an event at noon local time (12:00) on January 1, 2000, in 
Helsinki where the local time was two hours east of UTC (UTC+2). The timestamp 
is converted to Avro long 946728000000 (milliseconds) and then written.}}
 {quote} * Iceberg Spec for all the timestamp types : 
[https://iceberg.apache.org/spec/#avro]

 
{quote} 
{quote}
{quote}{{timestamp | Timestamp, microsecond precision, without timezone

timestamptz | Timestamp, microsecond precision, with timezone}}
 {quote}
But spark 3 treat non timezone as TimestampNTZType but the avro serialiser will 
treat TimestampTZType as logicaltype *timestamp-millis* and TimestampNTZType as 
*local-timestamp-millis*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to