[jira] [Updated] (SPARK-51734) Wrong results when reading ORC Timestamp type with different Reader/Writer Timezones

Abhinav Koul (Jira) Mon, 07 Apr 2025 22:32:54 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-51734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Abhinav Koul updated SPARK-51734:
---------------------------------
    Priority: Major  (was: Minor)

> Wrong results when reading ORC Timestamp type with different Reader/Writer 
> Timezones
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-51734
>                 URL: https://issues.apache.org/jira/browse/SPARK-51734
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.5.1
>            Reporter: Abhinav Koul
>            Priority: Major
>
> When reading ORC TimestampLTZ (Timestamp with local timezone) spark returns 
> incorrect values if Reader and Writer timezones are different. 
> How to Replicate:
> {code:java}
> TimeZone.setDefault(TimeZone.getTimeZone("Europe/Berlin"))
> sql("SET spark.sql.session.timeZone = Europe/Berlin")
> sql("DROP TABLE IF EXISTS t")
> sql("CREATE TABLE t (tz TIMESTAMP) USING hive OPTIONS(fileFormat 'orc')")
> sql("INSERT INTO t VALUES (TIMESTAMP('1996-08-02 09:00:00.723100809'))")
> TimeZone.setDefault(TimeZone.getTimeZone("Asia/Kolkata"))
> sql("SET spark.sql.session.timeZone = Asia/Kolkata")
> spark.table("t").collect() {code}
> On analysing the above query results with parquet I found the following:
> || ||Parquet(ms)||Orc(ms)||Parquet (Timestamp)||Orc (Timestamp)||
> |Spark to Fileformat Writer|838969200723|838969200723|1996-08-02 
> 09:00:00.723100809|1996-08-02 09:00:00.723100809|
> |Fileformat Reader to Spark|838969200723|838956600723|1996-08-02 
> 12:30:00.723100809|1996-08-02 09:00:00.723100809|
> Inside ORC reader I found that ORC did read correct millisecond value of 
> 838969200723 but purposefully adds WriterTZ - ReaderTZ offset to it 
> (-12600000 ms about -3hrs 30mins). 
> What parquet does seems to be correct according to my understanding where 
> Timestamp should be adjusted to corresponding timezone and should not show 
> the same time like ORC's current behaviour. Please suggest what can be done 
> further here.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51734) Wrong results when reading ORC Timestamp type with different Reader/Writer Timezones

Reply via email to