[ https://issues.apache.org/jira/browse/HIVE-12192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426302#comment-16426302 ]
Haozhun Jin commented on HIVE-12192: ------------------------------------ [~jcamachorodriguez], thank you for your patient answer. Please bear with us for a little bit more. What do you think after reading the two experiments before? h1. Experiment 1 I conducted this experiment once before. And this is why I'm under the impression that Hive Timestamp type means Instant. I redid the experiment today in both 1.2.1 and 2.6.3. The result is the same. Zone is America/Los_Angeles. The table below summarizes the outcome: [raw|https://gist.github.com/haozhun/03cd09b3fa2456271f2e01759c9c1b8e] ||Query||Actual||expected LocalDateTime||expect Instant|| |current_timestamp()|2018-04-04 14:51|2018-04-04 14:51|2018-04-04 14:51| |current_timestamp() - interval '2880' hour|2017-12-05 13:51|2018-12-05 14:51|2017-12-05 13:51| h1. Experiment 2 After reading your comments about ORC, I conducted some experiments to see how other file formats behave. The result is not what I was expecting. I did the experiment in 2.6.3. The table below summarizes the outcome: [raw|[https://gist.github.com/haozhun/2e4b7c52bf3c56c03ed06e1ad895e198]|https://gist.github.com/haozhun/2e4b7c52bf3c56c03ed06e1ad895e198])] ||(server +05:45)||ORC||RCBinary||RCText||Text|| |Insert (client -07:00)|06:00:00|06:00:00|06:00:00|06:00:00| |Read 1 (client -07:00)|18:45:00|06:00:00|18:45:00|18:45:00| |Read 2 (client -04:00)|18:45:00|09:00:00|18:45:00|18:45:00| h1. Summary This is my take away * experiment 1: Hive timestamp type has Instant semantics. If it's internal representation is changed from java.sql.Timestamp to java.time.LocalDateTime, it will be a user-visible behavior change. * experiment 2: Do not use Hive in a zone different from the server's (given insert and read does not round trip in a single hive cli session). Hopefully, that's indeed how every one uses Hive. In that case, it does not matter whether experiment 2 indicates Instant or LocalDateTime. > Hive should carry out timestamp computations in UTC > --------------------------------------------------- > > Key: HIVE-12192 > URL: https://issues.apache.org/jira/browse/HIVE-12192 > Project: Hive > Issue Type: Sub-task > Components: Hive > Reporter: Ryan Blue > Assignee: Jesus Camacho Rodriguez > Priority: Major > Labels: timestamp > Attachments: HIVE-12192.patch > > > Hive currently uses the "local" time of a java.sql.Timestamp to represent the > SQL data type TIMESTAMP WITHOUT TIME ZONE. The purpose is to be able to use > {{Timestamp#getYear()}} and similar methods to implement SQL functions like > {{year}}. > When the SQL session's time zone is a DST zone, such as America/Los_Angeles > that alternates between PST and PDT, there are times that cannot be > represented because the effective zone skips them. > {code} > hive> select TIMESTAMP '2015-03-08 02:10:00.101'; > 2015-03-08 03:10:00.101 > {code} > Using UTC instead of the SQL session time zone as the underlying zone for a > java.sql.Timestamp avoids this bug, while still returning correct values for > {{getYear}} etc. Using UTC as the convenience representation (timestamp > without time zone has no real zone) would make timestamp calculations more > consistent and avoid similar problems in the future. > Notably, this would break the {{unix_timestamp}} UDF that specifies the > result is with respect to ["the default timezone and default > locale"|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions]. > That function would need to be updated to use the > {{System.getProperty("user.timezone")}} zone. -- This message was sent by Atlassian JIRA (v7.6.3#76005)