[ https://issues.apache.org/jira/browse/SPARK-51734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Abhinav Koul updated SPARK-51734: --------------------------------- Priority: Major (was: Minor) > Wrong results when reading ORC Timestamp type with different Reader/Writer > Timezones > ------------------------------------------------------------------------------------ > > Key: SPARK-51734 > URL: https://issues.apache.org/jira/browse/SPARK-51734 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.5.1 > Reporter: Abhinav Koul > Priority: Major > > When reading ORC TimestampLTZ (Timestamp with local timezone) spark returns > incorrect values if Reader and Writer timezones are different. > How to Replicate: > {code:java} > TimeZone.setDefault(TimeZone.getTimeZone("Europe/Berlin")) > sql("SET spark.sql.session.timeZone = Europe/Berlin") > sql("DROP TABLE IF EXISTS t") > sql("CREATE TABLE t (tz TIMESTAMP) USING hive OPTIONS(fileFormat 'orc')") > sql("INSERT INTO t VALUES (TIMESTAMP('1996-08-02 09:00:00.723100809'))") > TimeZone.setDefault(TimeZone.getTimeZone("Asia/Kolkata")) > sql("SET spark.sql.session.timeZone = Asia/Kolkata") > spark.table("t").collect() {code} > On analysing the above query results with parquet I found the following: > || ||Parquet(ms)||Orc(ms)||Parquet (Timestamp)||Orc (Timestamp)|| > |Spark to Fileformat Writer|838969200723|838969200723|1996-08-02 > 09:00:00.723100809|1996-08-02 09:00:00.723100809| > |Fileformat Reader to Spark|838969200723|838956600723|1996-08-02 > 12:30:00.723100809|1996-08-02 09:00:00.723100809| > Inside ORC reader I found that ORC did read correct millisecond value of > 838969200723 but purposefully adds WriterTZ - ReaderTZ offset to it > (-12600000 ms about -3hrs 30mins). > What parquet does seems to be correct according to my understanding where > Timestamp should be adjusted to corresponding timezone and should not show > the same time like ORC's current behaviour. Please suggest what can be done > further here. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org