[ https://issues.apache.org/jira/browse/HIVE-25449?focusedWorklogId=638639&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638639 ]
ASF GitHub Bot logged work on HIVE-25449: ----------------------------------------- Author: ASF GitHub Bot Created on: 17/Aug/21 12:57 Start Date: 17/Aug/21 12:57 Worklog Time Spent: 10m Work Description: ShubhamChaurasia opened a new pull request #2591: URL: https://github.com/apache/hive/pull/2591 Please find the repro and detailed description here - https://issues.apache.org/jira/browse/HIVE-25449 ### What changes were proposed in this pull request? Removed usage of SimpleDateFormat which was using local timezone parsing scalar dates. ### Why are the changes needed? This fixes the bug described in the Jira and datediff() now gives correct values. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual testing and added two new unit tests - testDateDiffColScalarWithTz() and testDateDiffScalarColWithTz() -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 638639) Remaining Estimate: 0h Time Spent: 10m > datediff() gives wrong output when run in a tez task with some non-UTC > timezone > ------------------------------------------------------------------------------- > > Key: HIVE-25449 > URL: https://issues.apache.org/jira/browse/HIVE-25449 > Project: Hive > Issue Type: Bug > Components: UDF > Reporter: Shubham Chaurasia > Assignee: Shubham Chaurasia > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Repro (thanks Qiaosong Dong) - > Add -Duser.timezone=GMT+8 to {{tez.task.launch.cmd-opts}} > {code} > create external table test_dt(id string, dt date); > insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); > select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt > on dt1.id = dt.id; > +------+ > | _c0 | > +------+ > | 6 | > | 7 | > +------+ > {code} > Expected output - > {code} > +------+ > | _c0 | > +------+ > | 5 | > | 6 | > +------+ > {code} > *Cause* > This happens because in {{VectorUDFDateDiffColScalar}} class > 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to > parse the date strings which interprets it to be in local timezone. > 2. For first column we get a column vector which represents the date as epoch > day. This is always in UTC. > *Solution* > We need to check other variants of datediff UDFs as well and change the > parsing mechanism to always interpret date strings in UTC. > > I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. > {code} > - date.setTime(formatter.parse(new String(bytesValue, > "UTF-8")).getTime()); > - baseDate = DateWritableV2.dateToDays(date); > + org.apache.hadoop.hive.common.type.Date hiveDate > + = org.apache.hadoop.hive.common.type.Date.valueOf(new > String(bytesValue, "UTF-8")); > + date.setTime(hiveDate.toEpochMilli()); > + baseDate = hiveDate.toEpochDay(); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)