[ 
https://issues.apache.org/jira/browse/HIVE-14412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407038#comment-15407038
 ] 

Rui Li commented on HIVE-14412:
-------------------------------

Thanks [~sershe] for your comments.
For UDFs like "to/from UTC timestamp", the timezone is explicitly specified. 
But since we use Java timestamp internally, the TZ info is lost and like you 
said local TZ applies when converting the TS to string.

I haven't read through all the timestamp related code yet, but my plan so far 
is:
# We can create a sub-class of Timestamp,e.g. HiveTimestamp, with extra TZ 
info. This TZ info indicates the offset from UTC time for the timestamp. I 
think we can assume offsets are multiples of minutes. So if user gives 
{{'2005-04-03 10:01:00','Asia/Shanghai'}}, the TZ info should be 480, meaning 
it's 8 hours ahead of UTC. Any timestamp that requires a specific TZ should be 
stored as HiveTimestamp. When parsing back/forth, the specified TZ will apply. 
For example, the above timestamp will be converted to string {{2005-04-03 
10:01:00 GMT+08:00}}.
# The TZ info needs to be stored. For example, if we insert the results of the 
UDF to a table and later we query that table, we need to be able to restore the 
timestamp properly.
# TimestampWritable needs to be modified to accommodate HiveTimestamp. 
Currently, the internal bytes structure is {{|4-byte int| |1st VInt| |2nd 
VInt|}}, where {{1st VInt}} stores the nano seconds, and the {{4-byte int}} and 
{{2nd VInt}} store the seconds. We need a 3rd VInt to store the TZ info. Since 
{{1st VInt}} ranges in [-1000000000, 999999999], we can use its second MSB to 
indicate if the 3rd VInt exists. This way we can maintain 
backward-compatibility.

Would like to know your opinions. Thanks.

> Add a timezone-aware timestamp
> ------------------------------
>
>                 Key: HIVE-14412
>                 URL: https://issues.apache.org/jira/browse/HIVE-14412
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Hive
>            Reporter: Rui Li
>            Assignee: Rui Li
>
> Java's Timestamp stores the time elapsed since the epoch. While it's by 
> itself unambiguous, ambiguity comes when we parse a string into timestamp, or 
> convert a timestamp to string, causing problems like HIVE-14305.
> To solve the issue, I think we should make timestamp aware of timezone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to