[ https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901291#comment-16901291 ]
Piotr Findeisen edited comment on HIVE-21376 at 8/6/19 7:52 PM: ---------------------------------------------------------------- bq. If bucketed data with those types has been written in 3.0 using v2, a user should recreate those bucketed tables using a more recent Hive version. To me that means there is a disincentive to deploying Hive 3 in production until this issue is fixed. It's fixed in 3.1.2, but the latest available from HDP is 3.1.0. [~jcamachorodriguez] do you have a timeline when 3.1.2 will be available in HDP? was (Author: findepi): bq. If bucketed data with those types has been written in 3.0 using v2, a user should recreate those bucketed tables using a more recent Hive version. To me that means Hive 3 should not be deployed on production until this issue is fixed. It's fixed in 3.1.2, but the latest available from HDP is 3.1.0. [~jcamachorodriguez] do you have a timeline when 3.1.2 will be available in HDP? > Incompatible change in Hive bucket computation > ---------------------------------------------- > > Key: HIVE-21376 > URL: https://issues.apache.org/jira/browse/HIVE-21376 > Project: Hive > Issue Type: Bug > Affects Versions: 3.1.0 > Reporter: David Phillips > Assignee: Jesus Camacho Rodriguez > Priority: Major > Fix For: 4.0.0, 3.2.0, 3.1.2 > > Attachments: HIVE-21376.01.patch, HIVE-21376.patch > > > HIVE-20007 seems to have inadvertently changed the bucket hash code > computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the > {{DATE}} and {{TIMESTAMP}} data type2. > {{DATE}} was previously computed using {{DateWritable}}, which uses > {{daysSinceEpoch}} as the hash code. It is now computed using > {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} > (which is not days since epoch). > {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses > {{TimestampWritableV2}}. They ostensibly use the same hash code computation, > but there are two important differences: > # {{TimestampWritable}} rounds the number of milliseconds into the seconds > portion of the computation, but {{TimestampWritableV2}} does not. > # {{TimestampWritable}} gets the epoch time from {{java.sql.Timestamp}}, > which returns it relative to the JVM time zone, not UTC. > {{TimestampWritableV2}} uses a {{LocalDateTime}} relative to UTC. > I was unable to get Hive 3.1 running in order to verify if this actually > causes data to be read or written incorrectly (there may be code above this > library method which makes things work correctly). However, if my > understanding is correct, this means Hive 3.1 is both forwards and backwards > incompatible with bucketed tables using either of these data types. It also > indicates that Hive needs tests to verify that the hash code does not change > between releases. -- This message was sent by Atlassian JIRA (v7.6.14#76016)