[ https://issues.apache.org/jira/browse/HIVE-25758?focusedWorklogId=761831&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-761831 ]
ASF GitHub Bot logged work on HIVE-25758: ----------------------------------------- Author: ASF GitHub Bot Created on: 25/Apr/22 14:50 Start Date: 25/Apr/22 14:50 Worklog Time Spent: 10m Work Description: asolimando commented on code in PR #2966: URL: https://github.com/apache/hive/pull/2966#discussion_r857716612 ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java: ########## @@ -1214,6 +1214,58 @@ public FixNullabilityShuttle(RexBuilder rexBuilder, } } + /** + * Find disjunction (OR) in an expression (at any level of nesting). + * + * Example 1: OR(=($0, $1), IS NOT NULL($2))):INTEGER (OR in the top-level expression) + * Example 2: NOT(AND(=($0, $1), IS NOT NULL($2)) + * this is equivalent to OR((<>($0, $1), IS NULL($2)) + * Example 3: AND(OR(=($0, $1), IS NOT NULL($2)))) (OR in inner expression) + */ Review Comment: I get no errors while compiling (without `-Dmaven.javadoc.skip` of course). Is there anything specific you are referring to? Issue Time Tracking ------------------- Worklog Id: (was: 761831) Time Spent: 3h 10m (was: 3h) > OOM due to recursive application of CBO rules > --------------------------------------------- > > Key: HIVE-25758 > URL: https://issues.apache.org/jira/browse/HIVE-25758 > Project: Hive > Issue Type: Bug > Components: CBO, Query Planning > Affects Versions: 4.0.0 > Reporter: Alessandro Solimando > Assignee: Alessandro Solimando > Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > > Reproducing query is as follows: > {code:java} > create table test1 (act_nbr string); > create table test2 (month int); > create table test3 (mth int, con_usd double); > EXPLAIN > SELECT c.month, > d.con_usd > FROM > (SELECT > cast(regexp_replace(substr(add_months(from_unixtime(unix_timestamp(), > 'yyyy-MM-dd'), -1), 1, 7), '-', '') AS int) AS month > FROM test1 > UNION ALL > SELECT month > FROM test2 > WHERE month = 202110) c > JOIN test3 d ON c.month = d.mth; {code} > > Different plans are generated during the first CBO steps, last being: > {noformat} > 2021-12-01T08:28:08,598 DEBUG [a18191bb-3a2b-4193-9abf-4e37dd1996bb main] > parse.CalcitePlanner: Plan after decorre > lation: > HiveProject(month=[$0], con_usd=[$2]) > HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], > cost=[not available]) > HiveProject(month=[$0]) > HiveUnion(all=[true]) > > HiveProject(month=[CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP, > _UTF-16LE'yyyy-MM-d > d':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 7), > _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF- > 16LE", _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER]) > HiveTableScan(table=[[default, test1]], table:alias=[test1]) > HiveProject(month=[$0]) > HiveFilter(condition=[=($0, CAST(202110):INTEGER)]) > HiveTableScan(table=[[default, test2]], table:alias=[test2]) > HiveTableScan(table=[[default, test3]], table:alias=[d]){noformat} > > Then, the HEP planner will keep expanding the filter expression with > redundant expressions, such as the following, where the identical CAST > expression is present multiple times: > > {noformat} > rel#118:HiveFilter.HIVE.[].any(input=HepRelVertex#39,condition=IN(CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP, > _UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, > 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", > _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER, > CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP, > _UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, > 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", > _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER, > 202110)){noformat} > > The problem seems to come from a bad interaction of at least > _HiveFilterProjectTransposeRule_ and > {_}HiveJoinPushTransitivePredicatesRule{_}, possibly more. > Most probably then UNION part can be removed and the reproducer be simplified > even further. > -- This message was sent by Atlassian Jira (v8.20.7#820007)