[ https://issues.apache.org/jira/browse/HIVE-21733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840603#comment-16840603 ]
Jesus Camacho Rodriguez commented on HIVE-21733: ------------------------------------------------ Thanks for opening this issue [~oricken]. [~kgyrtkirk] described well the motivation behind this change. There is a Hive wiki page that explains the change in semantics (as well as semantics for other systems too): https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types Like you mentioned, apparently we missed updating some documentation in the wiki, e.g., related to the UDFs. I will take care of that shortly. Regarding the timezone itself, we support the Java supported time zone IDs; this is something that should be added to the documentation too. To move forward with your problem, you can also leverage 'timestamp with local time zone' type which by default will use the system time zone. > 'from_unixtime' does not recognise timezone > ------------------------------------------- > > Key: HIVE-21733 > URL: https://issues.apache.org/jira/browse/HIVE-21733 > Project: Hive > Issue Type: Bug > Components: Hive, hpl/sql > Affects Versions: 3.1.0 > Reporter: Oliver Ricken > Priority: Major > > Dear experts, > I try to convert a unix timestamp to a date using 'from_unixtime' in Hive 3.1 > do not get the correct result in my local timezone but in UTC. As I have a > Hive 1.2 system avaialble also, I can state that in that version, the issue > does not exist. > I performed extensive tests on systems running HDP 3.1 as well as HDP 2.6. > For the detailled versions of the HDP stack and its components, see the > beeline results below. > I start from the European timezone "Europe/Berlin" or "CEST" (c.f. the date > of the post) set on all my system: > {code:java} > [user@host ~]$ date > Wed May 15 13:48:05 CEST 2019 > [user@host ~]${code} > In a next step, I launch beeline and use the following timestamp as a > test-value: > 1557921076 > (corresponding to Wed May 15 13:51:16 2019 CEST and Wed May 15 11:51:16 2019 > UTC) > On the HDP 2.6 system: > {code:java} > Connecting to > jdbc:hive2://host.domain.com:2181,host.domain.com:2181,host.domain.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2; > Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8) > Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Beeline version 1.2.1000.2.6.0.3-8 by Apache Hive > 0: jdbc:hive2://host.domain.com:2181,host> select from_unixtime(1557921076); > _c0 > 2019-05-15 13:51:16 > 1 row selected (0.373 seconds){code} > As you can see, using HDP 2.6 with Hive 1.2 the conversion works as expected. > The timestamp is cast into a time in the local timezone. Very nice! > On the HDP 3.1 system: > {code:java} > [user@host ~]$ beeline > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://host.domain.com:2181,host.domain.com:2181,host.domain.com:2181/default;principal=hive/_h...@domain.com;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 > 19/05/15 13:55:36 [main]: INFO jdbc.HiveConnection: Connected to > host.domain.com:10000 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > 0: jdbc:hive2://host.domain.com:2181,host.n> select from_unixtime(1557921076); > DEBUG : Acquired the compile lock. > INFO : Compiling > command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514): > select from_unixtime(1557921076) > DEBUG : Encoding valid txns info > 13519:13517:13517:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485 > txnid:13519 > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514); > Time taken: 0.492 seconds > INFO : Executing > command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514): > select from_unixtime(1557921076) > INFO : Completed executing > command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514); > Time taken: 0.011 seconds > INFO : OK > DEBUG : Shutting down query select from_unixtime(1557921076) > +----------------------+ > | _c0 | > +----------------------+ > | 2019-05-15 11:51:16 | > +----------------------+ > 1 row selected (0.672 seconds){code} > Here, things go the wrong way, as described in the above post. The local > timezone is not recognised by Hive and the default of 'UTC' is chosen, i.e. > an offset of 2h is returned. > I was able to track the problem down a bit further. Doing the detour of > converting via 'from_unix_timestamp' where providing a timezone parameter is > possible, gives the correct result: > {code:java} > 0: jdbc:hive2://host.domain.com:2181,host.n> select > from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin'); > DEBUG : Acquired the compile lock. > INFO : Compiling > command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8): > select from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin') > DEBUG : Encoding valid txns info > 13520:13517:13517:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485 > txnid:13520 > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, > type:timestamp, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8); > Time taken: 1.024 seconds > INFO : Executing > command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8): > select from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin') > INFO : Completed executing > command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8); > Time taken: 0.01 seconds > INFO : OK > DEBUG : Shutting down query select > from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin') > +------------------------+ > | _c0 | > +------------------------+ > | 2019-05-15 13:51:16.0 | > +------------------------+ > 1 row selected (1.187 seconds){code} > Giving 'CEST', however, leads to the observed problem from above: > {code:java} > 0: jdbc:hive2://host.domain.com:2181,host.n> select > from_utc_timestamp(from_unixtime(1557921076), 'CEST'); > DEBUG : Acquired the compile lock. > INFO : Compiling > command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa): > select from_utc_timestamp(from_unixtime(1557921076), 'CEST') > DEBUG : Encoding valid txns info > 13522:13517:13517,13521:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485 > txnid:13522 > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, > type:timestamp, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa); > Time taken: 0.396 seconds > INFO : Executing > command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa): > select from_utc_timestamp(from_unixtime(1557921076), 'CEST') > INFO : Completed executing > command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa); > Time taken: 0.031 seconds > INFO : OK > DEBUG : Shutting down query select > from_utc_timestamp(from_unixtime(1557921076), 'CEST') > +------------------------+ > | _c0 | > +------------------------+ > | 2019-05-15 11:51:16.0 | > +------------------------+ > 1 row selected (0.547 seconds){code} > Although 'CEST' and 'Europe/Berlin' should be equivalent this time of the > year, the result differs. I assume that 'CEST' is simply not recognised by > Hive and the fall-back to UTC is used. > A last test with the European winter time 'CET' gives the following result: > {code:java} > 0: jdbc:hive2://host.domain.com:2181,host.n> select > from_utc_timestamp(from_unixtime(1557921076), 'CET'); > DEBUG : Acquired the compile lock. > INFO : Compiling > command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821): > select from_utc_timestamp(from_unixtime(1557921076), 'CET') > DEBUG : Encoding valid txns info > 13525:9223372036854775807::11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485,13517,13521,13523 > txnid:13525 > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, > type:timestamp, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821); > Time taken: 0.777 seconds > INFO : Executing > command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821): > select from_utc_timestamp(from_unixtime(1557921076), 'CET') > INFO : Completed executing > command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821); > Time taken: 0.022 seconds > INFO : OK > DEBUG : Shutting down query select > from_utc_timestamp(from_unixtime(1557921076), 'CET') > +------------------------+ > | _c0 | > +------------------------+ > | 2019-05-15 13:51:16.0 | > +------------------------+ > 1 row selected (1.02 seconds){code} > Although 'CET' differs by 1 hour from 'CEST' and 'Europe/Berlin', the result > is correct. I assume that Hive realises that 'CET'-member timezones are > actually in 'CEST' this time of the year and corrects accordingly. > As nice of a work-around using 'Europe/Berlin' may look at a first glance, I > suspect things to go wrong when the actual timezone is 'CET' in winter. > I suspect 'from_unixtime' to recognise 'CET' as a valid timezone once it is > the timezone of the local system, resulting in a wrongly cast 'from_unixtime' > in the inner parentheses. > A summer-/winter-time independent work-around is the following: > {code:java} > 0: jdbc:hive2://host.domain.com:2181,host.n> select > from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'Europe/Berlin'); > DEBUG : Acquired the compile lock. > INFO : Compiling > command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed): > select from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'CET') > DEBUG : Encoding valid txns info > 13531:9223372036854775807::11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485 > txnid:13531 > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, > type:timestamp, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed); > Time taken: 0.453 seconds > INFO : Executing > command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed): > select from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'CET') > INFO : Completed executing > command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed); > Time taken: 0.009 seconds > INFO : OK > DEBUG : Shutting down query select from_utc_timestamp(cast(1557921076 as > BIGINT)*1000, 'Europe/Berlin') > +------------------------+ > | _c0 | > +------------------------+ > | 2019-05-15 13:51:16.0 | > +------------------------+ > 1 row selected (0.586 seconds){code} > Here, I avoid the usage of the timezone-sensitive 'from_unixtime' entirly by > casting multiplying the timestamp by a factor of 1000 to get a genuine > unixtimestamp with milisecond-precision (in contrast to the truncated one > with second-precision used before). > For this trick to work, casting the truncated timestamp as BIGINT is > necessary as the '* 1000'-operation would exceed the size of a regular INT. > This neat work-around, however, does not work on the HDP 2.6/Hive 1.2 system > any more, here, the the 2h offset is applied in the wrong direction. It is > hence not applicable independently of the Hive-Version, unfortunately. > I would like to understand where the original issue arises from, the > ignorance of Hive towards the 'CEST' timezone (in my case). > As it is working nicely for HDP 2.6 and Hive 1.2, I suspect a fix of this bug > is possible also for HDP 3.1 and Hive 3.1. > I am looking forward to some input from the expert community. > Cheers > Oliver -- This message was sent by Atlassian JIRA (v7.6.3#76005)