[jira] [Created] (SPARK-51734) Wrong results when reading ORC Timestamp type with different Reader/Writer Timezones

Abhinav Koul (Jira) Mon, 07 Apr 2025 03:28:32 -0700

Abhinav Koul created SPARK-51734:
------------------------------------

             Summary: Wrong results when reading ORC Timestamp type with 
different Reader/Writer Timezones
                 Key: SPARK-51734
                 URL: https://issues.apache.org/jira/browse/SPARK-51734
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.5.1
            Reporter: Abhinav Koul



When reading ORC TimestampLTZ (Timestamp with local timezone) spark returns 
incorrect values if Reader and Writer timezones are different. 

How to Replicate:
{code:java}
TimeZone.setDefault(TimeZone.getTimeZone("Europe/Berlin"))
sql("SET spark.sql.session.timeZone = Europe/Berlin")

sql("DROP TABLE IF EXISTS t")
sql("CREATE TABLE t (tz TIMESTAMP) USING hive OPTIONS(fileFormat 'orc')")
sql("INSERT INTO t VALUES (TIMESTAMP('1996-08-02 09:00:00.723100809'))")

TimeZone.setDefault(TimeZone.getTimeZone("Asia/Kolkata"))
sql("SET spark.sql.session.timeZone = Asia/Kolkata")

spark.table("t").collect() {code}
On analysing the above query results with parquet I found the following:
|| ||Parquet(ms)||Orc(ms)||Parquet (Timestamp)||Orc (Timestamp)||
|Spark to Fileformat Writer|838969200723|838969200723|1996-08-02 
09:00:00.723100809|1996-08-02 09:00:00.723100809|
|Fileformat Reader to Spark|838969200723|838956600723|1996-08-02 
12:30:00.723100809|1996-08-02 09:00:00.723100809|

Inside ORC reader I found that ORC did read correct millisecond value of 
838969200723 but purposefully adds WriterTZ - ReaderTZ offset to it (-12600000 
ms about -3hrs 30mins). 

What parquet does seems to be correct according to my understanding where 
Timestamp should be adjusted to corresponding timezone and should not show the 
same time like ORC's current behaviour. Please suggest what can be done further 
here.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51734) Wrong results when reading ORC Timestamp type with different Reader/Writer Timezones

Reply via email to