[ https://issues.apache.org/jira/browse/HIVE-22074?focusedWorklogId=290873&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290873 ]
ASF GitHub Bot logged work on HIVE-22074: ----------------------------------------- Author: ASF GitHub Bot Created on: 07/Aug/19 23:43 Start Date: 07/Aug/19 23:43 Worklog Time Spent: 10m Work Description: jcamachor commented on pull request #746: HIVE-22074: Slow compilation due to IN to OR transformation URL: https://github.com/apache/hive/pull/746#discussion_r311806267 ########## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java ########## @@ -1220,16 +1220,26 @@ protected ExprNodeDesc getXpathOrFuncExprNodeDesc(ASTNode expr, } outputOpList.add(nullConst); } + if (!ctx.isCBOExecuted()) { - ArrayList<ExprNodeDesc> orOperands = TypeCheckProcFactoryUtils.rewriteInToOR(children); - if (orOperands != null) { - if (orOperands.size() == 1) { - orOperands.add(new ExprNodeConstantDesc(TypeInfoFactory.booleanTypeInfo, false)); + + HiveConf conf; + try { + conf = Hive.get().getConf(); Review comment: I think it is better to pass this value from the callers in the context. You would not need to change all callers; if value is not passed, rewriting could be skipped. I see mainly two advantages of doing this: 1) if transformation is never happening, we will not be retrieving the conf and this value for every IN clause in a query (note that `isCBOExecuted` method is misleading, the value returned is `foldExpr` boolean which is `false` sometimes even for calls coming from CBO cf. first line in `genFilterRelNode` method in `CalcitePlanner`), and 2) removing the static call to Hive object from within the folding logic. I see there are other calls to `Hive.get()` in the class, that information should probably be moved to context too. These can all be tackled together in a follow-up, but I think since we are cleaning up this logic, it would make sense to do it at some point. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 290873) Time Spent: 1h (was: 50m) > Slow compilation due to IN to OR transformation > ----------------------------------------------- > > Key: HIVE-22074 > URL: https://issues.apache.org/jira/browse/HIVE-22074 > Project: Hive > Issue Type: Improvement > Components: Logical Optimizer > Reporter: Vineet Garg > Assignee: Vineet Garg > Priority: Major > Labels: pull-request-available > Attachments: HIVE-22074.1.patch, HIVE-22074.2.patch, > HIVE-22074.3.patch, HIVE-22074.4.patch > > Time Spent: 1h > Remaining Estimate: 0h > > Currently Hive transform IN expressions to OR to apply various CBO rules. > This incur significant performance hit if IN consist of large number of > expressions. > It is better to not transform IN expressions to OR in such cases because > overall benefit of various optimizations/transformations is unrealized due to > the compilation overhead -- This message was sent by Atlassian JIRA (v7.6.14#76016)