Arnaud Jegou created CALCITE-7144:
-------------------------------------

             Summary: LIMIT should not be pushed through projections containing 
window functions
                 Key: CALCITE-7144
                 URL: https://issues.apache.org/jira/browse/CALCITE-7144
             Project: Calcite
          Issue Type: Bug
    Affects Versions: 1.40.0
            Reporter: Arnaud Jegou


_This is an issue with the RelBuilder, we have not noticed this behaviour with 
rules_

When adding a limit/offset with no collation, the RelBuilder will try to push 
it down to a lower sort with no limit or offset, even if there is a project in 
between

Basically instead of generating the following:
{code:java}
LogicalSort(fetch=[10])
  LogicalProject(xxx)
    LogicalSort(sort0=[$1], dir0=[ASC]) {code}
it will generate:
{code:java}
LogicalProject(xxx)
  LogicalSort(fetch=[10], sort0=[$1], dir0=[ASC]) {code}
This is valid for "regular" projects, but not if the project contains a window 
function as in this case the limit would be applied before the aggregation and 
produce invalid results

What actually happens in the code is, in 
[RelBuilder.sortLimit:|https://github.com/apache/calcite/blob/66afac689ea5b3f59c8050f9a59619c5c5bee771/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L3662]
 # if there is [no 
collation|https://github.com/apache/calcite/blob/66afac689ea5b3f59c8050f9a59619c5c5bee771/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L3687]
 # and the current top node [is a 
project|https://github.com/apache/calcite/blob/66afac689ea5b3f59c8050f9a59619c5c5bee771/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L3706]
 # and the input of that project [is a 
Sort|https://github.com/apache/calcite/blob/66afac689ea5b3f59c8050f9a59619c5c5bee771/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L3708]
 # and that sort has [no limit or 
offset|https://github.com/apache/calcite/blob/66afac689ea5b3f59c8050f9a59619c5c5bee771/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L3710]
 # then the "new" limit and offset are [pushed down into that 
sort|https://github.com/apache/calcite/blob/66afac689ea5b3f59c8050f9a59619c5c5bee771/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L3711]

{code:java}
if (fieldCollations.isEmpty()) { // 1
  // ...
  if (top instanceof Project) { // 2
    final Project project = (Project) top;
    if (project.getInput() instanceof Sort) { // 3
      final Sort sort2 = (Sort) project.getInput();
      if (sort2.offset == null && sort2.fetch == null) { // 4
        final RelNode sort =
            struct.sortFactory.createSort(sort2.getInput(), 
                sort2.collation, offsetNode, fetchNode); // 5
        replaceTop(
            struct.projectFactory.createProject(sort,
                project.getHints(),
                project.getProjects(),
                Pair.right(project.getNamedProjects()),
                project.getVariablesSet()));
        return this; {code}
One possible fix would simply be to check that the project has no window 
function in 3) by replacing the condition with 
{code:java}
if (!project.containsOver() && project.getInput() instanceof Sort) { {code}
I am not 100% sure that this is the right fix but it looks correct, and the 
tests do run against a version containing this change without errors

 

Here is a test case that covers this scenario:
{code:java}
  /**
   * Test case for
   * <a href="https://issues.apache.org/jira/browse/CALCITE-XXX";>[CALCITE-XXX]
   * RelBuilder pushes limits through window functions which is invalid</a>.
   * The test checks that the RelBuilder does not merge a limit through a 
Project
   * that contains a windowed aggregate function into a lower sort. */
  @Test void testLimitOverProjectWithWindowFunctions() {
    final Function<RelBuilder, RelNode> f = b -> b.scan("EMP")
        .sort(1)
        .project(b.field("DEPTNO"),
            b.aggregateCall(SqlStdOperatorTable.ROW_NUMBER)
                .over()
                .partitionBy()
                .orderBy(b.field("EMPNO"))
                .rowsUnbounded()
                .as("x"))
        .limit(0, 10)
        .build();
    final String expected = ""
        + "LogicalSort(fetch=[10])\n"
        + "  LogicalProject(DEPTNO=[$7], x=[ROW_NUMBER() OVER (ORDER BY $0)])\n"
        + "    LogicalSort(sort0=[$1], dir0=[ASC])\n"
        + "      LogicalTableScan(table=[[scott, EMP]])\n";
    assertThat(f.apply(createBuilder()), hasTree(expected));
  }{code}
And I opened a pr to show what the fix would look like, if the proposed fix is 
correct: [https://github.com/apache/calcite/pull/4503] (but feel free to 
discard it and create a new one if need be)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to