Repeating the comments I made in the JIRA case [1]. I do find your argument compelling, that if the rewritten version contains the same number of calls to the UDF, it should be OK.
But there are other possible semantics. For instance, a “strict” semantic could allow rewrite only if the calls to the UDF are guaranteed to be the same number, and the same order. A “relaxed” semantic would allow non-deterministic functions (and dynamic functions, see [2]) to be rewritten any time. Perhaps there could be variants of this rule, one for each semantic, and the semantics could be chosen via a connection- or statement-level property. To enforce a particular semantic, several rules will need to modify their behavior (e.g. FilterProjectTransposeRule), so those rules would be parameterized on semantic also. Julian [2] https://issues.apache.org/jira/browse/CALCITE-2638 <https://issues.apache.org/jira/browse/CALCITE-2638> > On Nov 19, 2018, at 7:51 AM, Hequn Cheng <[email protected]> wrote: > > Hi, > > Currently, there are some merge rules for Project, such as CalcMergeRule, > ProjectMergeRule, and ProjectCalcMergeRule. I found that these merge rules > should not be performed when Nondeterministic expression of the > bottom(inner) project has been referenced more than once by the top(outer) > project. Take the following test as an example: > > @Test public void testProjectMergeCalcMergeWithNonDeterministic() throws > Exception { > HepProgram program = new HepProgramBuilder() > .addRuleInstance(FilterProjectTransposeRule.INSTANCE) > .addRuleInstance(ProjectMergeRule.INSTANCE) > .build(); > > checkPlanning(program, > "select name, a as a1, a as a2 from (\n" > + " select *, rand() as a\n" > + " from dept)\n" > + "where deptno = 10\n"); > } > > The first select generates `a` from `rand()` and the second select generate > `a1` and `a2` from `a`. From the SQL, `a1` should equal to `a2`. > Let's take a look at the result plan: > > LogicalProject(NAME=[$1], A1=[RAND()], A2=[RAND()]) > LogicalFilter(condition=[=($0, 10)]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > > In the plan, a1 may not equal to a2 due to the projects merge which is > against the SQL(a1 equals to a2). > In order to let a1 equal to a2, one option to solve the problem is to > disable these merge rules in such cases, so that the result plan will be: > > LogicalProject(NAME=[$1], A1=[$2], A2=[$2]) > LogicalProject(DEPTNO=[$0], NAME=[$1], A=[RAND()]) > LogicalFilter(condition=[=($0, 10)]) > LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) > > Do you guys have any good ideas or encountered similar problems? Any > suggestions are greatly appreciated. > > Best, > Hequn > > [1] jira link: https://issues.apache.org/jira/browse/CALCITE-2683
