Vincent Tran created HIVE-20154:
-----------------------------------

             Summary: Improve unix_timestamp(args) to handle automatic-DST 
switching timezones
                 Key: HIVE-20154
                 URL: https://issues.apache.org/jira/browse/HIVE-20154
             Project: Hive
          Issue Type: Improvement
            Reporter: Vincent Tran


Currently unix_timestamp(args) UDF will only handle static timezone specifiers. 
It does not recognize SystemV specifiers such as EST5EDT or PST8PDT.

Based on this experiment, when z is used to parse a TZ string like UTC4PDT 
(obviously not a valid SystemV specifier) - it will parse the time as UTC.
When z<n>z is used to parse a TZ string like UTC4PDT, it will take parse the 
timestamp as the TZ of the final z position. This is demonstrated by my final 
query when the format string z4z1z is used to parse UTC4PDT1EDT.

{noformat}
0: jdbc:hive2://localhost:10000/default>; select 
from_unixtime(unix_timestamp("2018-02-01 00:00:00 UTC4PDT", "yyyy-MM-dd 
HH:mm:ss z"), "yyyy-MM-dd HH:mm:ss zzzz");
+--------------------------------------------+--+
|                    _c0                     |
+--------------------------------------------+--+
| 2018-01-31 16:00:00 Pacific Standard Time  |
+--------------------------------------------+--+
1 row selected (0.041 seconds)
0: jdbc:hive2://localhost:10000/default>; select 
from_unixtime(unix_timestamp("2018-02-01 00:00:00 UTC", "yyyy-MM-dd HH:mm:ss 
z"), "yyyy-MM-dd HH:mm:ss zzzz");
+--------------------------------------------+--+
|                    _c0                     |
+--------------------------------------------+--+
| 2018-01-31 16:00:00 Pacific Standard Time  |
+--------------------------------------------+--+
1 row selected (0.041 seconds)
0: jdbc:hive2://localhost:10000/default>; select 
from_unixtime(unix_timestamp("2018-02-01 00:00:00 UTC4PDT", "yyyy-MM-dd 
HH:mm:ss z4z"), "yyyy-MM-dd HH:mm:ss zzzz");
+--------------------------------------------+--+
|                    _c0                     |
+--------------------------------------------+--+
| 2018-01-31 23:00:00 Pacific Standard Time  |
+--------------------------------------------+--+
1 row selected (0.047 seconds)
0: jdbc:hive2://localhost:10000/default>; select 
from_unixtime(unix_timestamp("2018-02-01 00:00:00 UTC4PDT1EDT", "yyyy-MM-dd 
HH:mm:ss z4z1z"), "yyyy-MM-dd HH:mm:ss zzzz");
+--------------------------------------------+--+
|                    _c0                     |
+--------------------------------------------+--+
| 2018-01-31 20:00:00 Pacific Standard Time  |
+--------------------------------------------+--+
1 row selected (0.055 seconds)
0: jdbc:hive2://localhost:10000/default>;
{noformat}



So all in all, I don't think the SystemV specifier EST5EDT or PST8PDT are valid 
to unix_timestamp(args) at all. And that those when parsed with the z<n>z 
format string, will be read as whatever valid timezone at the final position 
(effectively EDT and PDT respectively in when those valid SystemV TZ specifiers 
above are used).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to