[ https://issues.apache.org/jira/browse/HIVE-21002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919389#comment-16919389 ]
Piotr Findeisen commented on HIVE-21002: ---------------------------------------- [~klcopp] [~zi] this issue explicitly talks about Avro and Parquet, whereas the same problem applies also to "RCBinary" ({{ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' STORED AS RCFILE;}}). Has this been addressed too, or should I create a new issue? > TIMESTAMP - Backwards incompatible change: Hive 3.1 reads back Avro and > Parquet timestamps written by Hive 2.x incorrectly > -------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-21002 > URL: https://issues.apache.org/jira/browse/HIVE-21002 > Project: Hive > Issue Type: Bug > Affects Versions: 3.1.0, 3.1.1 > Reporter: Zoltan Ivanfi > Priority: Major > > Hive 3.1 reads back Avro and Parquet timestamps written by Hive 2.x > incorrectly. As an example session to demonstrate this problem, create a > dataset using Hive version 2.x in America/Los_Angeles: > {code:sql} > hive> create table ts_‹format› (ts timestamp) stored as ‹format›; > hive> insert into ts_‹format› values (*‘2018-01-01 00:00:00.000’*); > {code} > Querying this table by issuing > {code:sql} > hive> select * from ts_‹format›; > {code} > from different time zones using different versions of Hive and different > storage formats gives the following results: > |‹format›|Writer time zone (in Hive 2.x)|Reader time zone|Result in Hive 2.x > reader|Result in Hive 3.1 reader| > |Avro and Parquet|America/Los_Angeles|America/Los_Angeles|2018-01-01 > *00*:00:00.0|2018-01-01 *08*:00:00.0| > |Avro and Parquet|America/Los_Angeles|Europe/Paris|2018-01-01 > *09*:00:00.0|2018-01-01 *08*:00:00.0| > |Textfile and ORC|America/Los_Angeles|America/Los_Angeles|2018-01-01 > 00:00:00.0|2018-01-01 00:00:00.0| > |Textfile and ORC|America/Los_Angeles|Europe/Paris|2018-01-01 > 00:00:00.0|2018-01-01 00:00:00.0| > *Hive 3.1 clearly gives different results than Hive 2.x for timestamps stored > in Avro and Parquet formats.* Apache ORC behaviour has not changed because it > was modified to adjust timestamps to retain backwards compatibility. Textfile > behaviour has not changed, because its processing involves parsing and > formatting instead of proper serializing and deserializing, so they > inherently had LocalDateTime semantics even in Hive 2.x. -- This message was sent by Atlassian Jira (v8.3.2#803003)