[jira] [Commented] (HIVE-3495) elements in aggParameters passed to SemanticAnalyzer.getGenericUDAFEvaluator are generated in two different ways

Yin Huai (JIRA) Fri, 21 Sep 2012 06:53:09 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460490#comment-13460490
 ]


Yin Huai commented on HIVE-3495:
--------------------------------

Just looked at this problem in details. Here is the reason...
if we set map-side-aggregation to false (set hive.map.aggr=false;) (to make my 
point clear, let's also assume hive.groupby.skewindata is false), in 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan, genGroupByPlan1MR 
will be used to generate the ReduceSinkOperator (through 
genGroupByPlanReduceSinkOperator) and GroupByOperator 
(genGroupByPlanGroupByOperator). Since in this case, there is no 
map-side-aggregation, inside getReduceValuesForReduceSinkNoMapAgg, 
getReduceValuesForReduceSinkNoMapAgg will be called to generate reduce values 
by looking at the every children in every aggregation tree. You can see, inside 
getReduceValuesForReduceSinkNoMapAgg, genExprNodeDesc is called to generate the 
ExprNodeDesc and for the case I described in the description, 
ExprNodeConstantDesc will be used for those constant parameters. For every 
children in every aggregation tree, a entry from the parameter to a ColumnInfo 
will be added into the reduceSinkOutputRowResolver. However, based on the code, 
it seems that when an ASTNode has ColumnInfo, ExprNodeColumnDesc will be used 
for this Node (when a node has ColumnInfo, the class of ExprNodeDesc should be 
ExprNodeColumnDesc). Thus, in genGroupByPlanGroupByOperator, when all children 
of all aggregation trees are being converted to aggParameters, 
ExprNodeColumnDesc will be used because all parameters (no matter what it is) 
have their own ColumnInfos. Thus, we will get the error. 

To solve this problem, we can have two ways.
1) We extend the class of ColumnInfo to record ExprNodeDesc not just the 
typeInfo.
2) For ExprNodeDesc other than ExprNodeColumnDesc, do not create ColumnInfo and 
thus, RowResolver will not have a match. Then, we use genExprNodeDesc to 
generate ExprNodeDesc in genGroupByPlanGroupByOperator and 
genGroupByPlanGroupByOperator1.

Seems the second option is the right way since it has a clear meaning. Will do 
that first. 
                
> elements in aggParameters passed to SemanticAnalyzer.getGenericUDAFEvaluator 
> are generated in two different ways 
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3495
>                 URL: https://issues.apache.org/jira/browse/HIVE-3495
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.10.0
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>            Priority: Minor
>
> When I was working on HIVE-3493, I also found elements in aggParameters are 
> generated by two different ways. One is 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, 
> RowResolver). Another is to create an ExprNodeColumnDesc. Since a UDAF may 
> need to check the type of its parameters, e.g. percentile_approx 
> (GenericUDAFPercentileApprox), if the second way is used, we may get a 
> UDFArgumentTypeException. 
> An example used to reply the error is 
> {code:sql}
> set hive.map.aggr=false;
> SELECT percentile_approx(cast(substr(src.value,5) AS double), 0.5) FROM src;
> {code}. 
> Here is the log 
> {code}
> 2012-09-20 12:36:06,947 DEBUG exec.FunctionRegistry 
> (FunctionRegistry.java:getGenericUDAFResolver(849)) - Looking up GenericUDAF: 
> percentile_approx
> 2012-09-20 12:36:06,952 ERROR ql.Driver (SessionState.java:printError(400)) - 
> FAILED: UDFArgumentTypeException The second argument must be a constant, but 
> double was passed instead.
> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: The second argument 
> must be a constant, but double was passed instead.
>       at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox.getEvaluator(GenericUDAFPercentileApprox.java:149)
>       at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:774)
>       at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:2389)
>       at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator(SemanticAnalyzer.java:2561)
>       at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlan1MR(SemanticAnalyzer.java:3341)
>       at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6140)
>       at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6903)
>       at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7484)
>       at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:245)
>       at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
>       at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:903)
>       at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>       at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>       at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
>       at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
>       at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:713)
>       at 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_replay(TestCliDriver.java:125)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at junit.framework.TestCase.runTest(TestCase.java:168)
>       at junit.framework.TestCase.runBare(TestCase.java:134)
>       at junit.framework.TestResult$1.protect(TestResult.java:110)
>       at junit.framework.TestResult.runProtected(TestResult.java:128)
>       at junit.framework.TestResult.run(TestResult.java:113)
>       at junit.framework.TestCase.run(TestCase.java:124)
>       at junit.framework.TestSuite.runTest(TestSuite.java:232)
>       at junit.framework.TestSuite.run(TestSuite.java:227)
>       at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:520)
>       at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1060)
>       at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:911)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3495) elements in aggParameters passed to SemanticAnalyzer.getGenericUDAFEvaluator are generated in two different ways

Reply via email to