Vincent Tran created HIVE-20154: ----------------------------------- Summary: Improve unix_timestamp(args) to handle automatic-DST switching timezones Key: HIVE-20154 URL: https://issues.apache.org/jira/browse/HIVE-20154 Project: Hive Issue Type: Improvement Reporter: Vincent Tran
Currently unix_timestamp(args) UDF will only handle static timezone specifiers. It does not recognize SystemV specifiers such as EST5EDT or PST8PDT. Based on this experiment, when z is used to parse a TZ string like UTC4PDT (obviously not a valid SystemV specifier) - it will parse the time as UTC. When z<n>z is used to parse a TZ string like UTC4PDT, it will take parse the timestamp as the TZ of the final z position. This is demonstrated by my final query when the format string z4z1z is used to parse UTC4PDT1EDT. {noformat} 0: jdbc:hive2://localhost:10000/default>; select from_unixtime(unix_timestamp("2018-02-01 00:00:00 UTC4PDT", "yyyy-MM-dd HH:mm:ss z"), "yyyy-MM-dd HH:mm:ss zzzz"); +--------------------------------------------+--+ | _c0 | +--------------------------------------------+--+ | 2018-01-31 16:00:00 Pacific Standard Time | +--------------------------------------------+--+ 1 row selected (0.041 seconds) 0: jdbc:hive2://localhost:10000/default>; select from_unixtime(unix_timestamp("2018-02-01 00:00:00 UTC", "yyyy-MM-dd HH:mm:ss z"), "yyyy-MM-dd HH:mm:ss zzzz"); +--------------------------------------------+--+ | _c0 | +--------------------------------------------+--+ | 2018-01-31 16:00:00 Pacific Standard Time | +--------------------------------------------+--+ 1 row selected (0.041 seconds) 0: jdbc:hive2://localhost:10000/default>; select from_unixtime(unix_timestamp("2018-02-01 00:00:00 UTC4PDT", "yyyy-MM-dd HH:mm:ss z4z"), "yyyy-MM-dd HH:mm:ss zzzz"); +--------------------------------------------+--+ | _c0 | +--------------------------------------------+--+ | 2018-01-31 23:00:00 Pacific Standard Time | +--------------------------------------------+--+ 1 row selected (0.047 seconds) 0: jdbc:hive2://localhost:10000/default>; select from_unixtime(unix_timestamp("2018-02-01 00:00:00 UTC4PDT1EDT", "yyyy-MM-dd HH:mm:ss z4z1z"), "yyyy-MM-dd HH:mm:ss zzzz"); +--------------------------------------------+--+ | _c0 | +--------------------------------------------+--+ | 2018-01-31 20:00:00 Pacific Standard Time | +--------------------------------------------+--+ 1 row selected (0.055 seconds) 0: jdbc:hive2://localhost:10000/default>; {noformat} So all in all, I don't think the SystemV specifier EST5EDT or PST8PDT are valid to unix_timestamp(args) at all. And that those when parsed with the z<n>z format string, will be read as whatever valid timezone at the final position (effectively EDT and PDT respectively in when those valid SystemV TZ specifiers above are used). -- This message was sent by Atlassian JIRA (v7.6.3#76005)