[ https://issues.apache.org/jira/browse/HIVE-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Bodor updated HIVE-25449: -------------------------------- Fix Version/s: 4.0.0 > datediff() gives wrong output when run in a tez task with some non-UTC > timezone > ------------------------------------------------------------------------------- > > Key: HIVE-25449 > URL: https://issues.apache.org/jira/browse/HIVE-25449 > Project: Hive > Issue Type: Bug > Components: UDF > Reporter: Shubham Chaurasia > Assignee: Shubham Chaurasia > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Repro (thanks Qiaosong Dong) - > Add -Duser.timezone=GMT+8 to {{tez.task.launch.cmd-opts}} > {code} > create external table test_dt(id string, dt date); > insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); > select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt > on dt1.id = dt.id; > +------+ > | _c0 | > +------+ > | 6 | > | 7 | > +------+ > {code} > Expected output - > {code} > +------+ > | _c0 | > +------+ > | 5 | > | 6 | > +------+ > {code} > *Cause* > This happens because in {{VectorUDFDateDiffColScalar}} class > 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to > parse the date strings which interprets it to be in local timezone. > 2. For first column we get a column vector which represents the date as epoch > day. This is always in UTC. > *Solution* > We need to check other variants of datediff UDFs as well and change the > parsing mechanism to always interpret date strings in UTC. > > I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. > {code} > - date.setTime(formatter.parse(new String(bytesValue, > "UTF-8")).getTime()); > - baseDate = DateWritableV2.dateToDays(date); > + org.apache.hadoop.hive.common.type.Date hiveDate > + = org.apache.hadoop.hive.common.type.Date.valueOf(new > String(bytesValue, "UTF-8")); > + date.setTime(hiveDate.toEpochMilli()); > + baseDate = hiveDate.toEpochDay(); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)