[ https://issues.apache.org/jira/browse/HIVE-26737?focusedWorklogId=827811&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-827811 ]
ASF GitHub Bot logged work on HIVE-26737: ----------------------------------------- Author: ASF GitHub Bot Created on: 22/Nov/22 00:01 Start Date: 22/Nov/22 00:01 Worklog Time Spent: 10m Work Description: scarlin-cloudera commented on code in PR #3761: URL: https://github.com/apache/hive/pull/3761#discussion_r1028622471 ########## ql/src/test/results/clientpositive/perf/tpcds30tb/tez/cbo_query1.q.out: ########## @@ -19,8 +19,8 @@ HiveSortLimit(sort0=[$0], dir0=[ASC], fetch=[100]) HiveProject(s_store_sk=[$0]) HiveFilter(condition=[=($24, _UTF-16LE'NM')]) HiveTableScan(table=[[default, store]], table:alias=[store]) - HiveProject(_o__c0=[*(CAST(/($1, $2)):DECIMAL(21, 6), 1.2:DECIMAL(2, 1))], ctr_store_sk=[$0]) - HiveFilter(condition=[IS NOT NULL(CAST(/($1, $2)):DECIMAL(21, 6))]) + HiveProject(ctr_store_sk=[$0], CAST=[CAST(*(CAST(/($1, $2)):DECIMAL(21, 6), 1.2:DECIMAL(2, 1))):DECIMAL(24, 7)]) + HiveFilter(condition=[IS NOT NULL(CAST(*(CAST(/($1, $2)):DECIMAL(21, 6), 1.2:DECIMAL(2, 1))):DECIMAL(24, 7))]) Review Comment: Good catch! Looks like I missed this change, I thought I had only trivial changes. After looking at this, there was indeed a change of behavior in that it was looking at multiple aggregates in the RelNode stack for "group by" statements when it should only look at the first aggregate. I made the code change and this regression went away. Issue Time Tracking ------------------- Worklog Id: (was: 827811) Time Spent: 1.5h (was: 1h 20m) > Subquery returning wrong results when database has materialized views > --------------------------------------------------------------------- > > Key: HIVE-26737 > URL: https://issues.apache.org/jira/browse/HIVE-26737 > Project: Hive > Issue Type: Bug > Components: HiveServer2 > Reporter: Steve Carlin > Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > When HS2 has materialized views in its registry, subqueries with correlated > variables may return wrong results. > An example of this: > > {code:java} > CREATE TABLE t_test1( > id int, > int_col int, > year int, > month int > ); > CREATE TABLE t_test2( > id int, > int_col int, > year int, > month int > ); > CREATE TABLE dummy ( > id int > ) stored as orc TBLPROPERTIES ('transactional'='true'); > CREATE MATERIALIZED VIEW need_a_mat_view_in_registry AS > SELECT * FROM dummy where id > 5; > INSERT INTO t_test1 VALUES (1, 1, 2009, 1), (10,0, 2009, 1); > INSERT INTO t_test2 VALUES (1, 1, 2009, 1); > select id, int_col, year, month from t_test1 s where s.int_col = (select > count(*) from t_test2 t where s.id = t.id) order by id; > {code} > The select statement should produce 2 rows, but it is only producing one. > The CBO plan produced has an inner join instead of a left join. > {code:java} > HiveSortLimit(sort0=[$0], dir0=[ASC]) > HiveProject(id=[$0], int_col=[$1], year=[$2], month=[$3]) > HiveJoin(condition=[AND(=($0, $5), =($4, $6))], joinType=[inner], > algorithm=[none], cost=[not available]) > HiveProject(id=[$0], int_col=[$1], year=[$2], month=[$3], > CAST=[CAST($1):BIGINT]) > HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT > NULL(CAST($1):BIGINT))]) > HiveTableScan(table=[[default, t_test1]], table:alias=[s]) > HiveProject(id=[$0], $f1=[$1]) > HiveFilter(condition=[IS NOT NULL($1)]) > HiveAggregate(group=[{0}], agg#0=[count()]) > HiveFilter(condition=[IS NOT NULL($0)]) > HiveTableScan(table=[[default, t_test2]], table:alias=[t]){code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)