[ 
https://issues.apache.org/jira/browse/HIVE-12228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenlei Xie updated HIVE-12228:
------------------------------
    Summary: Hive Error for nested query with UDF returns Struct type  (was: 
Hive Error When query nested query with UDF returns Struct type)

> Hive Error for nested query with UDF returns Struct type
> --------------------------------------------------------
>
>                 Key: HIVE-12228
>                 URL: https://issues.apache.org/jira/browse/HIVE-12228
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Query Planning, UDF
>    Affects Versions: 0.13.1
>            Reporter: Wenlei Xie
>         Attachments: SimpleStruct.java
>
>
> The following simple nested query with UDF returns Struct would fail on Hive 
> 0.13.1 . The UDF java code is attached as {{SimpleStruct.java}}
> {noformat}
> ADD JAR simplestruct.jar;
> CREATE TEMPORARY FUNCTION simplestruct AS 'test.SimpleStruct';
> SELECT *
>   FROM (
>     SELECT *
>     from mytest
>  ) subquery
> WHERE simplestruct(subquery.testStr).first
> {noformat}
> The error message is 
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"testint":1,"testname":"haha","teststr":"hehe"}
>         at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549)
>         at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
>         ... 8 more
> Caused by: java.lang.RuntimeException: cannot find field teststr from 
> [0:_col0, 1:_col1, 2:_col2]
>         at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
>         at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
> ..............................
> {noformat}
> The query works fine if we replace the UDF returns Boolean. By comparing the 
> query plan, we note when using the {{SimpleStruct}} UDF, the query plan is 
> {noformat}
>           TableScan
>             Select Operator
>               Filter Operator
>                 Select Operator
> {noformat}
> The first Select Operator would rename the columns to {{col_k}}, which cause 
> this trouble. If we use some UDF returns Boolean, the query plan becomes 
> {noformat}
>           TableScan
>             Filter Operator
>               Select Operator
> {noformat}
> It looks like the Query Planner failed to push down the Filter Operator when 
> the predicate is based on a UDF returns Struct. 
> This bug was fixed in Hive 1.2.1, but we cannot find the ticket to fix it.
> Appendix: 
> The table {{mytest}} is created in the following way
> {noformat}
> CREATE TABLE mytest(testInt INT, testName STRING, testStr STRING) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH 'test.txt' INTO TABLE mytest;
> {noformat}
> The file {{test.txt}} is a simple CSV file.
> {noformat}
> 1,haha,hehe
> 2,my,test
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to