[ https://issues.apache.org/jira/browse/HIVE-21733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840406#comment-16840406 ]
Zoltan Haindrich commented on HIVE-21733: ----------------------------------------- HIVE-20007 have changed from_unixtimestamp to explicitly use UTC https://github.com/apache/hive/blame/master/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFFromUnixTime.java#L123 I think the rationale behind that change was to bring consistency into a very badly situated cases: * suppose that the client (beeline) is in timezone X * suppose that the server (HS2) is in timezone Y * suppose that there is an executor which is using timezone Z * and another executor which is using timezone W Earlier the udf behaved better; because in most cases X=Y=Z=W; but when it is not...fromUnixtime might return a timestamp from Y, Z or W depending on the query/etc. > 'from_unixtime' does not recognise timezone > ------------------------------------------- > > Key: HIVE-21733 > URL: https://issues.apache.org/jira/browse/HIVE-21733 > Project: Hive > Issue Type: Bug > Components: Hive, hpl/sql > Affects Versions: 3.1.0 > Reporter: Oliver Ricken > Priority: Major > > Dear experts, > I try to convert a unix timestamp to a date using 'from_unixtime' in Hive 3.1 > do not get the correct result in my local timezone but in UTC. As I have a > Hive 1.2 system avaialble also, I can state that in that version, the issue > does not exist. > I performed extensive tests on systems running HDP 3.1 as well as HDP 2.6. > For the detailled versions of the HDP stack and its components, see the > beeline results below. > I start from the European timezone "Europe/Berlin" or "CEST" (c.f. the date > of the post) set on all my system: > {code:java} > [user@host ~]$ date > Wed May 15 13:48:05 CEST 2019 > [user@host ~]${code} > In a next step, I launch beeline and use the following timestamp as a > test-value: > 1557921076 > (corresponding to Wed May 15 13:51:16 2019 CEST and Wed May 15 11:51:16 2019 > UTC) > On the HDP 2.6 system: > {code:java} > Connecting to > jdbc:hive2://host.domain.com:2181,host.domain.com:2181,host.domain.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2; > Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8) > Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Beeline version 1.2.1000.2.6.0.3-8 by Apache Hive > 0: jdbc:hive2://host.domain.com:2181,host> select from_unixtime(1557921076); > _c0 > 2019-05-15 13:51:16 > 1 row selected (0.373 seconds){code} > As you can see, using HDP 2.6 with Hive 1.2 the conversion works as expected. > The timestamp is cast into a time in the local timezone. Very nice! > On the HDP 3.1 system: > {code:java} > [user@host ~]$ beeline > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] > Connecting to > jdbc:hive2://host.domain.com:2181,host.domain.com:2181,host.domain.com:2181/default;principal=hive/_h...@domain.com;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 > 19/05/15 13:55:36 [main]: INFO jdbc.HiveConnection: Connected to > host.domain.com:10000 > Connected to: Apache Hive (version 3.1.0.3.1.0.0-78) > Driver: Hive JDBC (version 3.1.0.3.1.0.0-78) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Beeline version 3.1.0.3.1.0.0-78 by Apache Hive > 0: jdbc:hive2://host.domain.com:2181,host.n> select from_unixtime(1557921076); > DEBUG : Acquired the compile lock. > INFO : Compiling > command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514): > select from_unixtime(1557921076) > DEBUG : Encoding valid txns info > 13519:13517:13517:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485 > txnid:13519 > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, > type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514); > Time taken: 0.492 seconds > INFO : Executing > command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514): > select from_unixtime(1557921076) > INFO : Completed executing > command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514); > Time taken: 0.011 seconds > INFO : OK > DEBUG : Shutting down query select from_unixtime(1557921076) > +----------------------+ > | _c0 | > +----------------------+ > | 2019-05-15 11:51:16 | > +----------------------+ > 1 row selected (0.672 seconds){code} > Here, things go the wrong way, as described in the above post. The local > timezone is not recognised by Hive and the default of 'UTC' is chosen, i.e. > an offset of 2h is returned. > I was able to track the problem down a bit further. Doing the detour of > converting via 'from_unix_timestamp' where providing a timezone parameter is > possible, gives the correct result: > {code:java} > 0: jdbc:hive2://host.domain.com:2181,host.n> select > from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin'); > DEBUG : Acquired the compile lock. > INFO : Compiling > command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8): > select from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin') > DEBUG : Encoding valid txns info > 13520:13517:13517:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485 > txnid:13520 > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, > type:timestamp, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8); > Time taken: 1.024 seconds > INFO : Executing > command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8): > select from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin') > INFO : Completed executing > command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8); > Time taken: 0.01 seconds > INFO : OK > DEBUG : Shutting down query select > from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin') > +------------------------+ > | _c0 | > +------------------------+ > | 2019-05-15 13:51:16.0 | > +------------------------+ > 1 row selected (1.187 seconds){code} > Giving 'CEST', however, leads to the observed problem from above: > {code:java} > 0: jdbc:hive2://host.domain.com:2181,host.n> select > from_utc_timestamp(from_unixtime(1557921076), 'CEST'); > DEBUG : Acquired the compile lock. > INFO : Compiling > command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa): > select from_utc_timestamp(from_unixtime(1557921076), 'CEST') > DEBUG : Encoding valid txns info > 13522:13517:13517,13521:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485 > txnid:13522 > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, > type:timestamp, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa); > Time taken: 0.396 seconds > INFO : Executing > command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa): > select from_utc_timestamp(from_unixtime(1557921076), 'CEST') > INFO : Completed executing > command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa); > Time taken: 0.031 seconds > INFO : OK > DEBUG : Shutting down query select > from_utc_timestamp(from_unixtime(1557921076), 'CEST') > +------------------------+ > | _c0 | > +------------------------+ > | 2019-05-15 11:51:16.0 | > +------------------------+ > 1 row selected (0.547 seconds){code} > Although 'CEST' and 'Europe/Berlin' should be equivalent this time of the > year, the result differs. I assume that 'CEST' is simply not recognised by > Hive and the fall-back to UTC is used. > A last test with the European winter time 'CET' gives the following result: > {code:java} > 0: jdbc:hive2://host.domain.com:2181,host.n> select > from_utc_timestamp(from_unixtime(1557921076), 'CET'); > DEBUG : Acquired the compile lock. > INFO : Compiling > command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821): > select from_utc_timestamp(from_unixtime(1557921076), 'CET') > DEBUG : Encoding valid txns info > 13525:9223372036854775807::11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485,13517,13521,13523 > txnid:13525 > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, > type:timestamp, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821); > Time taken: 0.777 seconds > INFO : Executing > command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821): > select from_utc_timestamp(from_unixtime(1557921076), 'CET') > INFO : Completed executing > command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821); > Time taken: 0.022 seconds > INFO : OK > DEBUG : Shutting down query select > from_utc_timestamp(from_unixtime(1557921076), 'CET') > +------------------------+ > | _c0 | > +------------------------+ > | 2019-05-15 13:51:16.0 | > +------------------------+ > 1 row selected (1.02 seconds){code} > Although 'CET' differs by 1 hour from 'CEST' and 'Europe/Berlin', the result > is correct. I assume that Hive realises that 'CET'-member timezones are > actually in 'CEST' this time of the year and corrects accordingly. > As nice of a work-around using 'Europe/Berlin' may look at a first glance, I > suspect things to go wrong when the actual timezone is 'CET' in winter. > I suspect 'from_unixtime' to recognise 'CET' as a valid timezone once it is > the timezone of the local system, resulting in a wrongly cast 'from_unixtime' > in the inner parentheses. > A summer-/winter-time independent work-around is the following: > {code:java} > 0: jdbc:hive2://host.domain.com:2181,host.n> select > from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'Europe/Berlin'); > DEBUG : Acquired the compile lock. > INFO : Compiling > command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed): > select from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'CET') > DEBUG : Encoding valid txns info > 13531:9223372036854775807::11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485 > txnid:13531 > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, > type:timestamp, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed); > Time taken: 0.453 seconds > INFO : Executing > command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed): > select from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'CET') > INFO : Completed executing > command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed); > Time taken: 0.009 seconds > INFO : OK > DEBUG : Shutting down query select from_utc_timestamp(cast(1557921076 as > BIGINT)*1000, 'Europe/Berlin') > +------------------------+ > | _c0 | > +------------------------+ > | 2019-05-15 13:51:16.0 | > +------------------------+ > 1 row selected (0.586 seconds){code} > Here, I avoid the usage of the timezone-sensitive 'from_unixtime' entirly by > casting multiplying the timestamp by a factor of 1000 to get a genuine > unixtimestamp with milisecond-precision (in contrast to the truncated one > with second-precision used before). > For this trick to work, casting the truncated timestamp as BIGINT is > necessary as the '* 1000'-operation would exceed the size of a regular INT. > This neat work-around, however, does not work on the HDP 2.6/Hive 1.2 system > any more, here, the the 2h offset is applied in the wrong direction. It is > hence not applicable independently of the Hive-Version, unfortunately. > I would like to understand where the original issue arises from, the > ignorance of Hive towards the 'CEST' timezone (in my case). > As it is working nicely for HDP 2.6 and Hive 1.2, I suspect a fix of this bug > is possible also for HDP 3.1 and Hive 3.1. > I am looking forward to some input from the expert community. > Cheers > Oliver -- This message was sent by Atlassian JIRA (v7.6.3#76005)