[ https://issues.apache.org/jira/browse/HIVE-14412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407038#comment-15407038 ]
Rui Li commented on HIVE-14412: ------------------------------- Thanks [~sershe] for your comments. For UDFs like "to/from UTC timestamp", the timezone is explicitly specified. But since we use Java timestamp internally, the TZ info is lost and like you said local TZ applies when converting the TS to string. I haven't read through all the timestamp related code yet, but my plan so far is: # We can create a sub-class of Timestamp,e.g. HiveTimestamp, with extra TZ info. This TZ info indicates the offset from UTC time for the timestamp. I think we can assume offsets are multiples of minutes. So if user gives {{'2005-04-03 10:01:00','Asia/Shanghai'}}, the TZ info should be 480, meaning it's 8 hours ahead of UTC. Any timestamp that requires a specific TZ should be stored as HiveTimestamp. When parsing back/forth, the specified TZ will apply. For example, the above timestamp will be converted to string {{2005-04-03 10:01:00 GMT+08:00}}. # The TZ info needs to be stored. For example, if we insert the results of the UDF to a table and later we query that table, we need to be able to restore the timestamp properly. # TimestampWritable needs to be modified to accommodate HiveTimestamp. Currently, the internal bytes structure is {{|4-byte int| |1st VInt| |2nd VInt|}}, where {{1st VInt}} stores the nano seconds, and the {{4-byte int}} and {{2nd VInt}} store the seconds. We need a 3rd VInt to store the TZ info. Since {{1st VInt}} ranges in [-1000000000, 999999999], we can use its second MSB to indicate if the 3rd VInt exists. This way we can maintain backward-compatibility. Would like to know your opinions. Thanks. > Add a timezone-aware timestamp > ------------------------------ > > Key: HIVE-14412 > URL: https://issues.apache.org/jira/browse/HIVE-14412 > Project: Hive > Issue Type: Sub-task > Components: Hive > Reporter: Rui Li > Assignee: Rui Li > > Java's Timestamp stores the time elapsed since the epoch. While it's by > itself unambiguous, ambiguity comes when we parse a string into timestamp, or > convert a timestamp to string, causing problems like HIVE-14305. > To solve the issue, I think we should make timestamp aware of timezone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)