[ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825016#comment-16825016
 ] 

Karen Coppage commented on HIVE-21291:
--------------------------------------

[Review request|https://reviews.apache.org/r/70523/] is attached.

Implementation summary:

Writing timestamps:
-restore the practice of normalizing to UTC
-record writer HiveServer's time zone in file metadata (property: 
writer.time.zone)

Reading timestamps:
-if property writer.time.zone is present in file metadata, convert to the 
writer time zone. This results in time zone agnostic behavior
-if property writer.time.zone is NOT present in file metadata AND session-level 
property hive.avro.timestamp.skip.conversion is FALSE (default): the timestamp 
is converted to session local time zone. This is the historical behavior until 
Hive 3.1.
-if if property writer.time.zone is NOT present in file metadata AND 
session-level property hive.avro.timestamp.skip.conversion is TRUE: the 
timestamp is not converted any time zone, and is read as if the recorded 
timestamp was intended to be time zone agnostic. This is also historical 
behavior, since Hive 3.1.

Note:
-The session-level property hive.avro.timestamp.skip.conversion will influence 
how HBase files using the AvroSerDe are deserialized (timestamps will always be 
UTC-normalized during serialization).
-Same goes for Kafka.

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21291
>                 URL: https://issues.apache.org/jira/browse/HIVE-21291
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Zoltan Ivanfi
>            Assignee: Karen Coppage
>            Priority: Major
>         Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch, 
> HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch, 
> HIVE-21291.5.patch, HIVE-21291.6.patch, HIVE-21291.7.patch, HIVE-21291.7.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to