[ 
https://issues.apache.org/jira/browse/HIVE-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671191#comment-15671191
 ] 

Vitalii Diravka commented on HIVE-9482:
---------------------------------------

Why this hive.parquet.timestamp.skip.conversion option is enabled by default? 
Since according [parquet 
spec|https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#timestamp_millis],
 parquet files don't keep local timezone. And we cann't distinguish from file 
what was the value of that option while parquet file was generating.

> Hive parquet timestamp compatibility
> ------------------------------------
>
>                 Key: HIVE-9482
>                 URL: https://issues.apache.org/jira/browse/HIVE-9482
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>    Affects Versions: 0.15.0
>            Reporter: Szehon Ho
>            Assignee: Szehon Ho
>             Fix For: 1.2.0
>
>         Attachments: HIVE-9482.2.patch, HIVE-9482.patch, HIVE-9482.patch, 
> parquet_external_time.parq
>
>
> In current Hive implementation, timestamps are stored in UTC (converted from 
> current timezone), based on original parquet timestamp spec.
> However, we find this is not compatibility with other tools, and after some 
> investigation it is not the way of the other file formats, or even some 
> databases (Hive Timestamp is more equivalent of 'timestamp without timezone' 
> datatype).
> This is the first part of the fix, which will restore compatibility with 
> parquet-timestamp files generated by external tools by skipping conversion on 
> reading.
> Later fix will change the write path to not convert, and stop the 
> read-conversion even for files written by Hive itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to