Benjamin Bowman created HIVE-7166: ------------------------------------- Summary: Vectorization with UDFs returns incorrect results Key: HIVE-7166 URL: https://issues.apache.org/jira/browse/HIVE-7166 Project: Hive Issue Type: Bug Components: HiveServer2, UDF Affects Versions: 0.13.0 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster Reporter: Benjamin Bowman Priority: Minor
Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect query results. Example Query: SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - X) and UDF_1 The following test scenario will reproduce the problem: TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 10000): package com.test; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import java.lang.String; import java.lang.*; public class tenThousand extends UDF { private final LongWritable result = new LongWritable(); public LongWritable evaluate() { result.set(10000); return result; } } TEST DATA (test.input): 1|CBCABC|12 2|DBCABC|13 3|EBCABC|14 40000|ABCABC|15 50000|BBCABC|16 60000|CBCABC|17 CREATING ORC TABLE: 0: jdbc:hive2://server:10002/db> create table testTabOrc (first bigint, second varchar(20), third int) partitioned by (range int) clustered by (first) sorted by (first) into 8 buckets stored as orc tblproperties ("orc.compress" = "SNAPPY", "orc.index" = "true"); CREATE LOADING TABLE: 0: jdbc:hive2://server:10002/db> create table loadingDir (first bigint, second varchar(20), third int) partitioned by (range int) row format delimited fields terminated by '|' stored as textfile; COPY IN DATA: [root@server]# hadoop fs -copyFromLocal /tmp/test.input /db/loading/. ORC DATA: [root@server]# beeline -u jdbc:hive2://server:10002/db -n root --hiveconf hive.exec.dynamic.partition.mode=nonstrict --hiveconf hive.enforce.sorting=true -e "insert into table testTabOrc partition(range) select * from loadingDir;" LOAD TEST FUNCTION: 0: jdbc:hive2://server:10002/db> add jar /opt/hadoop/lib/testFunction.jar 0: jdbc:hive2://server:10002/db> create temporary function ten_thousand as 'com.test.tenThousand'; TURN OFF VECTORIZATION: 0: jdbc:hive2://server:10002/db> set hive.vectorized.execution.enabled=false; QUERY (RESULTS AS EXPECTED): 0: jdbc:hive2://server:10002/db> select first from testTabOrc where first between ten_thousand()-10000 and ten_thousand()-9995; +--------+ | first | +--------+ | 1 | | 2 | | 3 | +--------+ 3 rows selected (15.286 seconds) TURN ON VECTORIZATION: 0: jdbc:hive2://server:10002/db> set hive.vectorized.execution.enabled=true; QUERY AGAIN (WRONG RESULTS): 0: jdbc:hive2://server:10002/db> select first from testTabOrc where first between ten_thousand()-10000 and ten_thousand()-9995; +--------+ | first | +--------+ +--------+ No rows selected (17.763 seconds) -- This message was sent by Atlassian JIRA (v6.2#6252)