Zoltan Ivanfi created HIVE-21291:
------------------------------------

             Summary: Restore historical way of handling timestamps in Avro 
while keeping the new semantics at the same time
                 Key: HIVE-21291
                 URL: https://issues.apache.org/jira/browse/HIVE-21291
             Project: Hive
          Issue Type: Sub-task
            Reporter: Zoltan Ivanfi


This sub-task is for implementing the Avro-specific parts of the following plan:

h1. Problem

Historically, the semantics of the TIMESTAMP type in Hive depended on the file 
format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
_Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a text 
SerDe had _LocalDateTime_ semantics.

The Hive community wanted to get rid of this inconsistency and have 
_LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
leads to the desired new semantics, it also leads to incorrect results when new 
Hive versions read timestamps written by old Hive versions or when old Hive 
versions or any other component not aware of this change (including legacy 
Impala and Spark versions) read timestamps written by new Hive versions.

h1. Solution

To work around this issue, Hive *should restore the practice of normalizing to 
UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary SerDe. 
In itself, this would restore the historical _Instant_ semantics, which is 
undesirable. In order to achieve the desired _LocalDateTime_ semantics in spite 
of normalizing to UTC, newer Hive versions should record the session-local 
local time zone in the file metadata fields serving arbitrary key-value storage 
purposes.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to