[ https://issues.apache.org/jira/browse/HIVE-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460490#comment-13460490 ]
Yin Huai commented on HIVE-3495: -------------------------------- Just looked at this problem in details. Here is the reason... if we set map-side-aggregation to false (set hive.map.aggr=false;) (to make my point clear, let's also assume hive.groupby.skewindata is false), in org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan, genGroupByPlan1MR will be used to generate the ReduceSinkOperator (through genGroupByPlanReduceSinkOperator) and GroupByOperator (genGroupByPlanGroupByOperator). Since in this case, there is no map-side-aggregation, inside getReduceValuesForReduceSinkNoMapAgg, getReduceValuesForReduceSinkNoMapAgg will be called to generate reduce values by looking at the every children in every aggregation tree. You can see, inside getReduceValuesForReduceSinkNoMapAgg, genExprNodeDesc is called to generate the ExprNodeDesc and for the case I described in the description, ExprNodeConstantDesc will be used for those constant parameters. For every children in every aggregation tree, a entry from the parameter to a ColumnInfo will be added into the reduceSinkOutputRowResolver. However, based on the code, it seems that when an ASTNode has ColumnInfo, ExprNodeColumnDesc will be used for this Node (when a node has ColumnInfo, the class of ExprNodeDesc should be ExprNodeColumnDesc). Thus, in genGroupByPlanGroupByOperator, when all children of all aggregation trees are being converted to aggParameters, ExprNodeColumnDesc will be used because all parameters (no matter what it is) have their own ColumnInfos. Thus, we will get the error. To solve this problem, we can have two ways. 1) We extend the class of ColumnInfo to record ExprNodeDesc not just the typeInfo. 2) For ExprNodeDesc other than ExprNodeColumnDesc, do not create ColumnInfo and thus, RowResolver will not have a match. Then, we use genExprNodeDesc to generate ExprNodeDesc in genGroupByPlanGroupByOperator and genGroupByPlanGroupByOperator1. Seems the second option is the right way since it has a clear meaning. Will do that first. > elements in aggParameters passed to SemanticAnalyzer.getGenericUDAFEvaluator > are generated in two different ways > ----------------------------------------------------------------------------------------------------------------- > > Key: HIVE-3495 > URL: https://issues.apache.org/jira/browse/HIVE-3495 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.10.0 > Reporter: Yin Huai > Assignee: Yin Huai > Priority: Minor > > When I was working on HIVE-3493, I also found elements in aggParameters are > generated by two different ways. One is > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(ASTNode, > RowResolver). Another is to create an ExprNodeColumnDesc. Since a UDAF may > need to check the type of its parameters, e.g. percentile_approx > (GenericUDAFPercentileApprox), if the second way is used, we may get a > UDFArgumentTypeException. > An example used to reply the error is > {code:sql} > set hive.map.aggr=false; > SELECT percentile_approx(cast(substr(src.value,5) AS double), 0.5) FROM src; > {code}. > Here is the log > {code} > 2012-09-20 12:36:06,947 DEBUG exec.FunctionRegistry > (FunctionRegistry.java:getGenericUDAFResolver(849)) - Looking up GenericUDAF: > percentile_approx > 2012-09-20 12:36:06,952 ERROR ql.Driver (SessionState.java:printError(400)) - > FAILED: UDFArgumentTypeException The second argument must be a constant, but > double was passed instead. > org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: The second argument > must be a constant, but double was passed instead. > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox.getEvaluator(GenericUDAFPercentileApprox.java:149) > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:774) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:2389) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator(SemanticAnalyzer.java:2561) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlan1MR(SemanticAnalyzer.java:3341) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6140) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6903) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7484) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:245) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:903) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347) > at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:713) > at > org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_replay(TestCliDriver.java:125) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at junit.framework.TestCase.runTest(TestCase.java:168) > at junit.framework.TestCase.runBare(TestCase.java:134) > at junit.framework.TestResult$1.protect(TestResult.java:110) > at junit.framework.TestResult.runProtected(TestResult.java:128) > at junit.framework.TestResult.run(TestResult.java:113) > at junit.framework.TestCase.run(TestCase.java:124) > at junit.framework.TestSuite.runTest(TestSuite.java:232) > at junit.framework.TestSuite.run(TestSuite.java:227) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:520) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1060) > at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:911) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira