[ 
https://issues.apache.org/jira/browse/HIVE-21733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840406#comment-16840406
 ] 

Zoltan Haindrich commented on HIVE-21733:
-----------------------------------------

HIVE-20007 have changed from_unixtimestamp to explicitly use UTC
https://github.com/apache/hive/blame/master/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFFromUnixTime.java#L123

I think the rationale behind that change was to bring consistency into a very 
badly situated cases:

* suppose that the client (beeline) is in timezone X
* suppose that the server (HS2) is in timezone Y
* suppose that there is an executor which is using timezone Z
* and another executor which is using timezone W

Earlier the udf behaved better; because in most cases X=Y=Z=W; but when it is 
not...fromUnixtime might return a timestamp from Y, Z or W depending on the 
query/etc.

> 'from_unixtime' does not recognise timezone
> -------------------------------------------
>
>                 Key: HIVE-21733
>                 URL: https://issues.apache.org/jira/browse/HIVE-21733
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, hpl/sql
>    Affects Versions: 3.1.0
>            Reporter: Oliver Ricken
>            Priority: Major
>
> Dear experts,
> I try to convert a unix timestamp to a date using 'from_unixtime' in Hive 3.1 
> do not get the correct result in my local timezone but in UTC. As I have a 
> Hive 1.2 system avaialble also, I can state that in that version, the issue 
> does not exist.
> I performed extensive tests on systems running HDP 3.1 as well as HDP 2.6.
> For the detailled versions of the HDP stack and its components, see the 
> beeline results below.
> I start from the European timezone "Europe/Berlin" or "CEST" (c.f. the date 
> of the post) set on all my system:
> {code:java}
> [user@host ~]$ date
> Wed May 15 13:48:05 CEST 2019
> [user@host ~]${code}
> In a next step, I launch beeline and use the following timestamp as a 
> test-value:
> 1557921076
> (corresponding to Wed May 15 13:51:16 2019 CEST and  Wed May 15 11:51:16 2019 
> UTC)
> On the HDP 2.6 system:
> {code:java}
> Connecting to 
> jdbc:hive2://host.domain.com:2181,host.domain.com:2181,host.domain.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;
> Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8)
> Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 1.2.1000.2.6.0.3-8 by Apache Hive
> 0: jdbc:hive2://host.domain.com:2181,host> select from_unixtime(1557921076);
> _c0
> 2019-05-15 13:51:16
> 1 row selected (0.373 seconds){code}
> As you can see, using HDP 2.6 with Hive 1.2 the conversion works as expected. 
> The timestamp is cast into a time in the local timezone. Very nice!
> On the HDP 3.1 system:
> {code:java}
> [user@host ~]$ beeline
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to 
> jdbc:hive2://host.domain.com:2181,host.domain.com:2181,host.domain.com:2181/default;principal=hive/_h...@domain.com;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
> 19/05/15 13:55:36 [main]: INFO jdbc.HiveConnection: Connected to 
> host.domain.com:10000
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
> 0: jdbc:hive2://host.domain.com:2181,host.n> select from_unixtime(1557921076);
> DEBUG : Acquired the compile lock.
> INFO  : Compiling 
> command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514): 
> select from_unixtime(1557921076)
> DEBUG : Encoding valid txns info 
> 13519:13517:13517:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485
>  txnid:13519
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
> type:string, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514); 
> Time taken: 0.492 seconds
> INFO  : Executing 
> command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514): 
> select from_unixtime(1557921076)
> INFO  : Completed executing 
> command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514); 
> Time taken: 0.011 seconds
> INFO  : OK
> DEBUG : Shutting down query select from_unixtime(1557921076)
> +----------------------+
> |         _c0          |
> +----------------------+
> | 2019-05-15 11:51:16  |
> +----------------------+
> 1 row selected (0.672 seconds){code}
> Here, things go the wrong way, as described in the above post. The local 
> timezone is not recognised by Hive and the default of 'UTC' is chosen, i.e. 
> an offset of 2h is returned.
> I was able to track the problem down a bit further. Doing the detour of 
> converting via 'from_unix_timestamp' where providing a timezone parameter is 
> possible, gives the correct result:
> {code:java}
> 0: jdbc:hive2://host.domain.com:2181,host.n> select 
> from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin');
> DEBUG : Acquired the compile lock.
> INFO  : Compiling 
> command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8): 
> select from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin')
> DEBUG : Encoding valid txns info 
> 13520:13517:13517:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485
>  txnid:13520
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
> type:timestamp, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8); 
> Time taken: 1.024 seconds
> INFO  : Executing 
> command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8): 
> select from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin')
> INFO  : Completed executing 
> command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8); 
> Time taken: 0.01 seconds
> INFO  : OK
> DEBUG : Shutting down query select 
> from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin')
> +------------------------+
> |          _c0           |
> +------------------------+
> | 2019-05-15 13:51:16.0  |
> +------------------------+
> 1 row selected (1.187 seconds){code}
> Giving 'CEST', however, leads to the observed problem from above:
> {code:java}
> 0: jdbc:hive2://host.domain.com:2181,host.n> select 
> from_utc_timestamp(from_unixtime(1557921076), 'CEST');
> DEBUG : Acquired the compile lock.
> INFO  : Compiling 
> command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa): 
> select from_utc_timestamp(from_unixtime(1557921076), 'CEST')
> DEBUG : Encoding valid txns info 
> 13522:13517:13517,13521:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485
>  txnid:13522
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
> type:timestamp, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa); 
> Time taken: 0.396 seconds
> INFO  : Executing 
> command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa): 
> select from_utc_timestamp(from_unixtime(1557921076), 'CEST')
> INFO  : Completed executing 
> command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa); 
> Time taken: 0.031 seconds
> INFO  : OK
> DEBUG : Shutting down query select 
> from_utc_timestamp(from_unixtime(1557921076), 'CEST')
> +------------------------+
> |          _c0           |
> +------------------------+
> | 2019-05-15 11:51:16.0  |
> +------------------------+
> 1 row selected (0.547 seconds){code}
> Although 'CEST' and 'Europe/Berlin' should be equivalent this time of the 
> year, the result differs. I assume that 'CEST' is simply not recognised by 
> Hive and the fall-back to UTC is used.
> A last test with the European winter time 'CET' gives the following result:
> {code:java}
> 0: jdbc:hive2://host.domain.com:2181,host.n> select 
> from_utc_timestamp(from_unixtime(1557921076), 'CET');
> DEBUG : Acquired the compile lock.
> INFO  : Compiling 
> command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821): 
> select from_utc_timestamp(from_unixtime(1557921076), 'CET')
> DEBUG : Encoding valid txns info 
> 13525:9223372036854775807::11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485,13517,13521,13523
>  txnid:13525
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
> type:timestamp, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821); 
> Time taken: 0.777 seconds
> INFO  : Executing 
> command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821): 
> select from_utc_timestamp(from_unixtime(1557921076), 'CET')
> INFO  : Completed executing 
> command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821); 
> Time taken: 0.022 seconds
> INFO  : OK
> DEBUG : Shutting down query select 
> from_utc_timestamp(from_unixtime(1557921076), 'CET')
> +------------------------+
> |          _c0           |
> +------------------------+
> | 2019-05-15 13:51:16.0  |
> +------------------------+
> 1 row selected (1.02 seconds){code}
> Although 'CET' differs by 1 hour from 'CEST' and 'Europe/Berlin', the result 
> is correct. I assume that Hive realises that 'CET'-member timezones are 
> actually in 'CEST' this time of the year and corrects accordingly.
> As nice of a work-around using 'Europe/Berlin' may look at a first glance, I 
> suspect things to go wrong when the actual timezone is 'CET' in winter.
> I suspect 'from_unixtime' to recognise 'CET' as a valid timezone once it is 
> the timezone of the local system, resulting in a wrongly cast 'from_unixtime' 
> in the inner parentheses.
> A summer-/winter-time independent work-around is the following:
> {code:java}
> 0: jdbc:hive2://host.domain.com:2181,host.n> select 
> from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'Europe/Berlin');
> DEBUG : Acquired the compile lock.
> INFO  : Compiling 
> command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed): 
> select from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'CET')
> DEBUG : Encoding valid txns info 
> 13531:9223372036854775807::11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485
>  txnid:13531
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
> type:timestamp, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed); 
> Time taken: 0.453 seconds
> INFO  : Executing 
> command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed): 
> select from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'CET')
> INFO  : Completed executing 
> command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed); 
> Time taken: 0.009 seconds
> INFO  : OK
> DEBUG : Shutting down query select from_utc_timestamp(cast(1557921076 as 
> BIGINT)*1000, 'Europe/Berlin')
> +------------------------+
> |          _c0           |
> +------------------------+
> | 2019-05-15 13:51:16.0  |
> +------------------------+
> 1 row selected (0.586 seconds){code}
> Here, I avoid the usage of the timezone-sensitive 'from_unixtime' entirly by 
> casting multiplying the timestamp by a factor of 1000 to get a genuine 
> unixtimestamp with milisecond-precision (in contrast to the truncated one 
> with second-precision used before).
> For this trick to work, casting the truncated timestamp as BIGINT is 
> necessary as the '* 1000'-operation would exceed the size of a regular INT.
> This neat work-around, however, does not work on the HDP 2.6/Hive 1.2 system 
> any more, here, the the 2h offset is applied in the wrong direction. It is 
> hence not applicable independently of the Hive-Version, unfortunately.
> I would like to understand where the original issue arises from, the 
> ignorance of Hive towards the 'CEST' timezone (in my case).
> As it is working nicely for HDP 2.6 and Hive 1.2, I suspect a fix of this bug 
> is possible also for HDP 3.1 and Hive 3.1.
> I am looking forward to some input from the expert community.
> Cheers
> Oliver



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to