[jira] [Work logged] (HIVE-24775) Incorrect null handling when rebuilding Materialized view incrementally

ASF GitHub Bot (Jira) Tue, 16 Feb 2021 03:07:05 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-24775?focusedWorklogId=552906&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-552906
 ]


ASF GitHub Bot logged work on HIVE-24775:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Feb/21 11:06
            Start Date: 16/Feb/21 11:06
    Worklog Time Spent: 10m 
      Work Description: kasakrisz opened a new pull request #1981:
URL: https://github.com/apache/hive/pull/1981


   ### What changes were proposed in this pull request?
   When transforming plan of materialized view rebuild to incremental rebuild 
in case of the view definition has aggregate:
   * Instead of checking the aggregate key is null check the aggregated columns 
are null or not. Example:
   ```
   HiveProject($f0=[$3], $f1=[$4], $f2=[CASE(IS NULL($2), $5, +($5, $2))])
   ```
   where `$2` is coming from the Materialized view and `$5` is coming from the 
delta result set (The rows inserted after the last MV refresh)
   * When transforming the `newAST` generated from the CBO plan from an insert 
overwrite plan to a multi insert plan by 
`CalcitePlanner.fixUpASTAggregateIncrementalRebuild` replace equality operators 
with null safe equality operators in join condition
   * The CBO plan contains a Filter on top of the MV scan checking all the 
aggregate key columns should be not null. Remove this in 
`fixUpASTAggregateIncrementalRebuild` since we need all rows from the view.
   * Split the result of the Right outer join to the insert and update branches 
of the "multi insert statement" plan by checking that the RowId coming from the 
materialized view is null or not.
   
   ### Why are the changes needed?
   Rows with null aggregate keys was not handled by incremental MV rebuild and 
it could lead to data corruption.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   ```
   vn test -Dtest.output.overwrite -DskipSparkTests 
-Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=materialized_view_create_rewrite_4.q,materialized_view_create_rewrite_nulls.q
 -pl itests/qtest -Pitests
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 552906)
    Remaining Estimate: 0h
            Time Spent: 10m

> Incorrect null handling when rebuilding Materialized view incrementally
> -----------------------------------------------------------------------
>
>                 Key: HIVE-24775
>                 URL: https://issues.apache.org/jira/browse/HIVE-24775
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> CREATE TABLE t1 (a int, b varchar(256), c decimal(10,2), d int) STORED AS orc 
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO t1 VALUES
>  (NULL, 'null_value', 100.77, 7),
>  (1, 'calvin', 978.76, 3),
>  (1, 'charlie', 9.8, 1);
> CREATE MATERIALIZED VIEW mat1 TBLPROPERTIES ('transactional'='true') AS
>   SELECT a, b, sum(d)
>   FROM t1
>   WHERE c > 10.0
>   GROUP BY a, b;
> INSERT INTO t1 VALUES
>  (NULL, 'null_value', 100.88, 8),
>  (1, 'charlie', 15.8, 1);
> ALTER MATERIALIZED VIEW mat1 REBUILD;
> SELECT * FROM mat1
> ORDER BY a, b;
> {code}
> View contains:
> {code}
> 1     calvin  3
> 1     charlie 1
> NULL  null_value      8
> NULL  null_value      7
> {code}
> but it should contain:
> {code}
> 1     calvin  3
> 1     charlie 1
> NULL  null_value      15
> {code}
> Rows with aggregate key columns having NULL values are not aggregated because 
> incremental materialized view rebuild plan is altered by 
> [applyPreJoinOrderingTransforms|https://github.com/apache/hive/blob/76732ad27e139fbdef25b820a07cf35934771083/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L1975]:
>   IS NOT NULL filter added for each of these columns on top of the view scan 
> when joining with the branch pulls the rows inserted after the last rebuild:
> {code}
> HiveProject($f0=[$3], $f1=[$4], $f2=[CASE(AND(IS NULL($0), IS NULL($1)), $5, 
> +($5, $2))])
>   HiveFilter(condition=[OR(AND(IS NULL($0), IS NULL($1)), AND(=($0, $3), 
> =($1, $4)))])
>     HiveJoin(condition=[AND(=($0, $3), =($1, $4))], joinType=[right], 
> algorithm=[none], cost=[not available])
>       HiveProject(a=[$0], b=[$1], _c2=[$2])
>         HiveFilter(condition=[AND(IS NOT NULL($0), IS NOT NULL($1))])
>           HiveTableScan(table=[[default, mat1]], table:alias=[default.mat1])
>       HiveProject(a=[$0], b=[$1], $f2=[$2])
>         HiveAggregate(group=[{0, 1}], agg#0=[sum($3)])
>           HiveFilter(condition=[AND(<(1, $6.writeid), >($2, 10))])
>             HiveTableScan(table=[[default, t1]], table:alias=[t1])
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24775) Incorrect null handling when rebuilding Materialized view incrementally

Reply via email to