[ 
https://issues.apache.org/jira/browse/IMPALA-14993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Maslov updated IMPALA-14993:
------------------------------------
    Description: 
On Iceberg V2 tables that contain delete files, queries without {{count(*)}} in 
the select list (e.g. {{{}SELECT 1 FROM tbl{}}}) silently return fewer rows 
than they should.
h3. Steps to reproduce

{{CREATE TABLE ice1 (id INT, c1 INT)}}
{{STORED AS ICEBERG TBLPROPERTIES ('format-version' = '2');}}
{{INSERT INTO ice1 SELECT 1, 10;}}
{{INSERT INTO ice1 SELECT 2, 20;}}

{{DELETE FROM ice1 WHERE id = 1;}}
{{SELECT 1 FROM ice1;  – expected: 1 row, actual: 0 rows}}
h3. Root cause

{{SelectStmt.optimizePlainCountStarQueryV2()}} decides to enable the 
optimization based on a loop that _rejects_ anything that is not {{count(*)}} 
or a constant - but never checks that at least one {{count(*)}} is actually 
present. For {{SELECT 1 FROM ice1}} the loop accepts the constant and falls 
through, setting {{{}tableRef.setOptimizeCountStarForIcebergV2(true){}}}.
h3. Proposed fix

Implement the protection in method V2 in a similar way to method V1, by adding 
the hasCountStarFunc flag in file 
fe/src/main/java/org/apache/impala/analysis/SelectStmt.java - 
optimizePlainCountStarQueryV2() :

{{boolean hasCountStarFunc = false;}}
{{boolean alreadyOptimized = false;}}
{{for (SelectListItem selectItem : getSelectList().getItems()) {}}
{{  Expr expr = selectItem.getExpr();}}
{{  if (expr == null) return;}}
{{  if (expr.isConstant()) continue;}}
{{  if (expr instanceof IcebergV2CountStarAccumulator) {}}
{{    alreadyOptimized = true;}}
{{    continue;}}
{{  if (!FunctionCallExpr.isCountStarFunctionCallExpr(expr)) return;}}
{{  hasCountStarFunc = true;}}
{{}}}
{{if (!hasCountStarFunc && !alreadyOptimized) return;}}

  was:
On Iceberg V2 tables that contain delete files, queries without {{count(*)}} in 
the select list (e.g. {{{}SELECT 1 FROM tbl{}}}) silently return fewer rows 
than they should.
h3. Steps to reproduce

{{CREATE TABLE ice1 (id INT, c1 INT)}}
{{STORED AS ICEBERG TBLPROPERTIES ('format-version' = '2');}}
{{INSERT INTO ice1 SELECT 1, 10;}}
{{INSERT INTO ice1 SELECT 2, 20;}}

{{DELETE FROM ice1 WHERE id = 1;}}
{{SELECT 1 FROM ice1;  -- expected: 1 row, actual: 0 rows}}
h3. Root cause

{{SelectStmt.optimizePlainCountStarQueryV2()}} decides to enable the 
optimization based on a loop that _rejects_ anything that is not {{count(*)}} 
or a constant - but never checks that at least one {{count(*)}} is actually 
present. For {{SELECT 1 FROM ice1}} the loop accepts the constant and falls 
through, setting {{{}tableRef.setOptimizeCountStarForIcebergV2(true){}}}.
h3. Proposed fix

Implement the protection in method V2 in a similar way to method V1, by adding 
the hasCountStarFunc flag in file 
fe/src/main/java/org/apache/impala/analysis/SelectStmt.java - 
optimizePlainCountStarQueryV2() :

{{boolean hasCountStarFunc = false;}}
{{boolean alreadyOptimized = false;}}
{{for (SelectListItem selectItem : getSelectList().getItems()) {}}
{{  Expr expr = selectItem.getExpr();}}
{{  if (expr == null) return;}}
{{  if (expr.isConstant()) continue;}}
{{  if (expr instanceof IcebergV2CountStarAccumulator) {}}
{{    alreadyOptimized = true;}}
{{    continue;}}
{{  }}}
{{  if (!FunctionCallExpr.isCountStarFunctionCallExpr(expr)) return;}}
{{  hasCountStarFunc = true;}}
{{}}}
{{if (!hasCountStarFunc && !alreadyOptimized) return;}}


> Iceberg V2 count(*) optimization is incorrectly applied to queries without 
> count(*), causing row loss
> -----------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-14993
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14993
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.5.0
>            Reporter: Dmitriy Maslov
>            Priority: Major
>              Labels: iceberg
>
> On Iceberg V2 tables that contain delete files, queries without {{count(*)}} 
> in the select list (e.g. {{{}SELECT 1 FROM tbl{}}}) silently return fewer 
> rows than they should.
> h3. Steps to reproduce
> {{CREATE TABLE ice1 (id INT, c1 INT)}}
> {{STORED AS ICEBERG TBLPROPERTIES ('format-version' = '2');}}
> {{INSERT INTO ice1 SELECT 1, 10;}}
> {{INSERT INTO ice1 SELECT 2, 20;}}
> {{DELETE FROM ice1 WHERE id = 1;}}
> {{SELECT 1 FROM ice1;  – expected: 1 row, actual: 0 rows}}
> h3. Root cause
> {{SelectStmt.optimizePlainCountStarQueryV2()}} decides to enable the 
> optimization based on a loop that _rejects_ anything that is not {{count(*)}} 
> or a constant - but never checks that at least one {{count(*)}} is actually 
> present. For {{SELECT 1 FROM ice1}} the loop accepts the constant and falls 
> through, setting {{{}tableRef.setOptimizeCountStarForIcebergV2(true){}}}.
> h3. Proposed fix
> Implement the protection in method V2 in a similar way to method V1, by 
> adding the hasCountStarFunc flag in file 
> fe/src/main/java/org/apache/impala/analysis/SelectStmt.java - 
> optimizePlainCountStarQueryV2() :
> {{boolean hasCountStarFunc = false;}}
> {{boolean alreadyOptimized = false;}}
> {{for (SelectListItem selectItem : getSelectList().getItems()) {}}
> {{  Expr expr = selectItem.getExpr();}}
> {{  if (expr == null) return;}}
> {{  if (expr.isConstant()) continue;}}
> {{  if (expr instanceof IcebergV2CountStarAccumulator) {}}
> {{    alreadyOptimized = true;}}
> {{    continue;}}
> {{  if (!FunctionCallExpr.isCountStarFunctionCallExpr(expr)) return;}}
> {{  hasCountStarFunc = true;}}
> {{}}}
> {{if (!hasCountStarFunc && !alreadyOptimized) return;}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to