REGR_SYY do not work

Sergey Nuyanzin (JIRA) Tue, 10 Jul 2018 09:07:28 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538821#comment-16538821
 ]


Sergey Nuyanzin edited comment on CALCITE-2402 at 7/10/18 4:06 PM:
-------------------------------------------------------------------

Ok I see thank you

while working on it I faced with some issues/questions, could you please 
comment them?
# Are there any requirements/rules/whatever to have SqlNodes (only in private) 
for expanding variance while in public there is also RexNode? Some more details 
related to the question. All not biased fail (denominator  {{count - 1}}) in 
boundary cases for window aggregated, e.g. {code}select var_samp(age) 
over(partition by age) from emps;
+------------+
|   EXPR$0   |
+------------+
java.lang.ArithmeticException: / by zero
        at Baz$4$1.current(Unknown Source)
{code} It happens only while window aggregation because in regular aggregate 
there is CASE statement for that, i.e.
While trying to do the similar in case of window for the code {code}final 
SqlNumericLiteral one = SqlLiteral.createExactNumeric("1", pos);
        final SqlLiteral nullLiteral = SqlLiteral.createNull(SqlParserPos.ZERO);
        denominator = new SqlCase(SqlParserPos.ZERO,
            count,
            SqlNodeList.of(
                SqlStdOperatorTable.EQUALS.createCall(pos,
                    count, SqlLiteral.createExactNumeric("1", 
SqlParserPos.ZERO))
            ),
            SqlNodeList.of(
                getCastedSqlNode(nullLiteral, varType, pos, null)
            ),
            SqlStdOperatorTable.MINUS.createCall(pos, count, one));{code} it 
fails like {noformat}Caused by: java.lang.UnsupportedOperationException: class 
org.apache.calcite.sql.SqlLiteral: NULL
        at org.apache.calcite.util.Util.needToImplement(Util.java:921)
        at 
org.apache.calcite.sql.validate.SqlValidatorImpl.getValidatedNodeType(SqlValidatorImpl.java:1551)
        at 
org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertLiteral(SqlNodeToRexConverterImpl.java:93)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4659)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:3977)
        at org.apache.calcite.sql.SqlLiteral.accept(SqlLiteral.java:532)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4541)
        at 
org.apache.calcite.sql2rel.StandardConvertletTable.convertCase(StandardConvertletTable.java:379)
        ... 37 more{noformat} At the same time if I use RexBuilder and RexNode 
instead of SqlNodes for the similar CASE - everything works fine. I agree that 
it could make sense to have implementation for {{SqlLiteral: NULL}} however it 
looks like more generic issue not only relating to covariance/regression 
functions
 # For COVAR* and REGR* implementation I need a function to eliminate all rows 
where any of two arguments is null. Just double check if there is an existing 
function for that or no? Before I thought that {{SqlStdOperatorTable#FILTER}} 
could help however it fails that it does not have implementer.
# VARIANCE and VAR_SAMP are implemented as the same (in different rdbms they 
implemented a little different e.g. 
[Oracle|https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/VARIANCE.html#GUID-EC33717A-2509-402D-B3BB-7EECB2E4ED8B],
 
[PostgreSQL|https://www.postgresql.org/docs/8.2/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE]).
 Double check if Calcite's goes PostegreSQL way of not?


was (Author: sergey nuyanzin):
Ok I see thank you

while working on it I faced with some issues/questions, could you please 
comment them?
# Are there any requirements/rules/whatever to have SqlNodes (only in private) 
for expanding variance while in public there is also RexNode? Some more details 
related to the question. All not biased fail (denominator  {{count - 1}}) in 
boundary cases for window aggregated, e.g. {code}0: 
jdbc:calcite:model=target/test-classes/mod> select var_samp(age) over(partition 
by age) from emps;
+------------+
|   EXPR$0   |
+------------+
java.lang.ArithmeticException: / by zero
        at Baz$4$1.current(Unknown Source)
{code} It happens only while window aggregation because in regular aggregate 
there is CASE statement for that, i.e.
While trying to do the similar in case of window for the code {code}final 
SqlNumericLiteral one = SqlLiteral.createExactNumeric("1", pos);
        final SqlLiteral nullLiteral = SqlLiteral.createNull(SqlParserPos.ZERO);
        denominator = new SqlCase(SqlParserPos.ZERO,
            count,
            SqlNodeList.of(
                SqlStdOperatorTable.EQUALS.createCall(pos,
                    count, SqlLiteral.createExactNumeric("1", 
SqlParserPos.ZERO))
            ),
            SqlNodeList.of(
                getCastedSqlNode(nullLiteral, varType, pos, null)
            ),
            SqlStdOperatorTable.MINUS.createCall(pos, count, one));{code} it 
fails like {noformat}Caused by: java.lang.UnsupportedOperationException: class 
org.apache.calcite.sql.SqlLiteral: NULL
        at org.apache.calcite.util.Util.needToImplement(Util.java:921)
        at 
org.apache.calcite.sql.validate.SqlValidatorImpl.getValidatedNodeType(SqlValidatorImpl.java:1551)
        at 
org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertLiteral(SqlNodeToRexConverterImpl.java:93)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4659)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:3977)
        at org.apache.calcite.sql.SqlLiteral.accept(SqlLiteral.java:532)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4541)
        at 
org.apache.calcite.sql2rel.StandardConvertletTable.convertCase(StandardConvertletTable.java:379)
        ... 37 more{noformat} At the same time if I use RexBuilder and RexNode 
instead of SqlNodes for the similar CASE - everything works fine. 
 # For COVAR* and REGR* implementation I need a function to eliminate all rows 
where any of two arguments is null. Just double check if there is an existing 
function for that or no? Before I thought that {{SqlStdOperatorTable#FILTER}} 
could help however it fails that it does not have implementer.
# VARIANCE and VAR_SAMP are implemented as the same (in different rdbms they 
implemented a little different e.g. 
[Oracle|https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/VARIANCE.html#GUID-EC33717A-2509-402D-B3BB-7EECB2E4ED8B],
 
[PostgreSQL|https://www.postgresql.org/docs/8.2/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE]).
 Double check if Calcite's goes PostegreSQL way of not?

> COVAR_POP/COVAR_SAMP/REGR_SXX/REGR_SYY  do not work
> ---------------------------------------------------
>
>                 Key: CALCITE-2402
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2402
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Sergey Nuyanzin
>            Assignee: Julian Hyde
>            Priority: Major
>
> Any query from {code:sql}
> select covar_samp(empno, deptno) from emps;
> select covar_pop(empno, deptno) from emps;
> select regr_sxx(empno, deptno) from emps;
> select regr_syy(empno, deptno) from emps;
> {code} 
> fails (the trace is below) 
> As I understand the reason is not fully implementation (did not find any 
> convertlet for them e.g.). 
> From my point of view I could fix this issue however I have a question: 
> How these function should be handled? As reducible functions (like 
> STDDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP) or as Rex operators in 
> RexImpTable? 
> Please give some advice here
> {noformat}0: jdbc:calcite:model=target/test-classes/mod> select 
> covar_samp(empno, deptno) from emps;
> Error: Error while executing SQL "select covar_samp(empno, deptno) from 
> emps": Node [rel#123:Subset#2.ENUMERABLE.[]] could not be implemented; 
> planner state:
> Root: rel#123:Subset#2.ENUMERABLE.[]
> Original rel:
> LogicalAggregate(subset=[rel#123:Subset#2.ENUMERABLE.[]], group=[{}], 
> EXPR$0=[COVAR_SAMP($0, $1)]): rowcount = 10.0, cumulative cost = {11.25 rows, 
> 0.0 cpu, 0.0 io}, id = 119
>   LogicalProject(subset=[rel#118:Subset#1.NONE.[]], EMPNO=[$0], DEPTNO=[$2]): 
> rowcount = 100.0, cumulative cost = {100.0 rows, 200.0 cpu, 0.0 io}, id = 117
>     LogicalTableScan(subset=[rel#116:Subset#0.NONE.[]], table=[[SALES, 
> EMPS]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, 
> id = 103
> Sets:
> Set#0, type: RecordType(INTEGER EMPNO, VARCHAR NAME, INTEGER DEPTNO, VARCHAR 
> GENDER, VARCHAR CITY, INTEGER EMPID, INTEGER AGE, BOOLEAN SLACKER, BOOLEAN 
> MANAGER, DATE JOINEDAT)
>         rel#116:Subset#0.NONE.[], best=null, importance=0.7290000000000001
>                 rel#103:LogicalTableScan.NONE.[](table=[SALES, EMPS]), 
> rowcount=100.0, cumulative cost={inf}
>         rel#128:Subset#0.ENUMERABLE.[], best=rel#135, 
> importance=0.36450000000000005
>                 rel#133:EnumerableTableScan.ENUMERABLE.[](table=[SALES, 
> EMPS]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
>                 
> rel#135:EnumerableInterpreter.ENUMERABLE.[](input=rel#132:Subset#0.BINDABLE.[]),
>  rowcount=100.0, cumulative cost={51.0 rows, 51.01 cpu, 0.0 io}
>         rel#132:Subset#0.BINDABLE.[], best=rel#131, 
> importance=0.36450000000000005
>                 rel#131:BindableTableScan.BINDABLE.[](table=[SALES, EMPS]), 
> rowcount=100.0, cumulative cost={1.0 rows, 1.01 cpu, 0.0 io}
> Set#1, type: RecordType(INTEGER EMPNO, INTEGER DEPTNO)
>         rel#118:Subset#1.NONE.[], best=null, importance=0.81
>                 
> rel#117:LogicalProject.NONE.[](input=rel#116:Subset#0.NONE.[],EMPNO=$0,DEPTNO=$2),
>  rowcount=100.0, cumulative cost={inf}
>         rel#125:Subset#1.ENUMERABLE.[], best=rel#134, importance=0.405
>                 
> rel#134:EnumerableProject.ENUMERABLE.[](input=rel#128:Subset#0.ENUMERABLE.[],EMPNO=$0,DEPTNO=$2),
>  rowcount=100.0, cumulative cost={151.0 rows, 251.01 cpu, 0.0 io}
> Set#2, type: RecordType(INTEGER EXPR$0)
>         rel#120:Subset#2.NONE.[], best=null, importance=0.9
>                 
> rel#119:LogicalAggregate.NONE.[](input=rel#118:Subset#1.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
>  $1)), rowcount=10.0, cumulative cost={inf}
>                 
> rel#127:LogicalAggregate.NONE.[](input=rel#116:Subset#0.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
>  $2)), rowcount=10.0, cumulative cost={inf}
>         rel#123:Subset#2.ENUMERABLE.[], best=null, importance=1.0
>                 
> rel#124:AbstractConverter.ENUMERABLE.[](input=rel#120:Subset#2.NONE.[],convention=ENUMERABLE,sort=[]),
>  rowcount=10.0, cumulative cost={inf} (state=,code=0)
> java.sql.SQLException: Error while executing SQL "select covar_samp(empno, 
> deptno) from emps": Node [rel#123:Subset#2.ENUMERABLE.[]] could not be 
> implemented; planner state:
> Root: rel#123:Subset#2.ENUMERABLE.[]
> Original rel:
> LogicalAggregate(subset=[rel#123:Subset#2.ENUMERABLE.[]], group=[{}], 
> EXPR$0=[COVAR_SAMP($0, $1)]): rowcount = 10.0, cumulative cost = {11.25 rows, 
> 0.0 cpu, 0.0 io}, id = 119
>   LogicalProject(subset=[rel#118:Subset#1.NONE.[]], EMPNO=[$0], DEPTNO=[$2]): 
> rowcount = 100.0, cumulative cost = {100.0 rows, 200.0 cpu, 0.0 io}, id = 117
>     LogicalTableScan(subset=[rel#116:Subset#0.NONE.[]], table=[[SALES, 
> EMPS]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, 
> id = 103
> Sets:
> Set#0, type: RecordType(INTEGER EMPNO, VARCHAR NAME, INTEGER DEPTNO, VARCHAR 
> GENDER, VARCHAR CITY, INTEGER EMPID, INTEGER AGE, BOOLEAN SLACKER, BOOLEAN 
> MANAGER, DATE JOINEDAT)
>         rel#116:Subset#0.NONE.[], best=null, importance=0.7290000000000001
>                 rel#103:LogicalTableScan.NONE.[](table=[SALES, EMPS]), 
> rowcount=100.0, cumulative cost={inf}
>         rel#128:Subset#0.ENUMERABLE.[], best=rel#135, 
> importance=0.36450000000000005
>                 rel#133:EnumerableTableScan.ENUMERABLE.[](table=[SALES, 
> EMPS]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
>                 
> rel#135:EnumerableInterpreter.ENUMERABLE.[](input=rel#132:Subset#0.BINDABLE.[]),
>  rowcount=100.0, cumulative cost={51.0 rows, 51.01 cpu, 0.0 io}
>         rel#132:Subset#0.BINDABLE.[], best=rel#131, 
> importance=0.36450000000000005
>                 rel#131:BindableTableScan.BINDABLE.[](table=[SALES, EMPS]), 
> rowcount=100.0, cumulative cost={1.0 rows, 1.01 cpu, 0.0 io}
> Set#1, type: RecordType(INTEGER EMPNO, INTEGER DEPTNO)
>         rel#118:Subset#1.NONE.[], best=null, importance=0.81
>                 
> rel#117:LogicalProject.NONE.[](input=rel#116:Subset#0.NONE.[],EMPNO=$0,DEPTNO=$2),
>  rowcount=100.0, cumulative cost={inf}
>         rel#125:Subset#1.ENUMERABLE.[], best=rel#134, importance=0.405
>                 
> rel#134:EnumerableProject.ENUMERABLE.[](input=rel#128:Subset#0.ENUMERABLE.[],EMPNO=$0,DEPTNO=$2),
>  rowcount=100.0, cumulative cost={151.0 rows, 251.01 cpu, 0.0 io}
> Set#2, type: RecordType(INTEGER EXPR$0)
>         rel#120:Subset#2.NONE.[], best=null, importance=0.9
>                 
> rel#119:LogicalAggregate.NONE.[](input=rel#118:Subset#1.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
>  $1)), rowcount=10.0, cumulative cost={inf}
>                 
> rel#127:LogicalAggregate.NONE.[](input=rel#116:Subset#0.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
>  $2)), rowcount=10.0, cumulative cost={inf}
>         rel#123:Subset#2.ENUMERABLE.[], best=null, importance=1.0
>                 
> rel#124:AbstractConverter.ENUMERABLE.[](input=rel#120:Subset#2.NONE.[],convention=ENUMERABLE,sort=[]),
>  rowcount=10.0, cumulative cost={inf}
>         at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
>         at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
>         at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:163)
>         at 
> org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:217)
>         at sqlline.Commands.execute(Commands.java:823)
>         at sqlline.Commands.sql(Commands.java:733)
>         at sqlline.SqlLine.dispatch(SqlLine.java:795)
>         at sqlline.SqlLine.begin(SqlLine.java:668)
>         at sqlline.SqlLine.start(SqlLine.java:373)
>         at sqlline.SqlLine.main(SqlLine.java:265)
> Caused by: org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node 
> [rel#123:Subset#2.ENUMERABLE.[]] could not be implemented; planner state:
> Root: rel#123:Subset#2.ENUMERABLE.[]
> Original rel:
> LogicalAggregate(subset=[rel#123:Subset#2.ENUMERABLE.[]], group=[{}], 
> EXPR$0=[COVAR_SAMP($0, $1)]): rowcount = 10.0, cumulative cost = {11.25 rows, 
> 0.0 cpu, 0.0 io}, id = 119
>   LogicalProject(subset=[rel#118:Subset#1.NONE.[]], EMPNO=[$0], DEPTNO=[$2]): 
> rowcount = 100.0, cumulative cost = {100.0 rows, 200.0 cpu, 0.0 io}, id = 117
>     LogicalTableScan(subset=[rel#116:Subset#0.NONE.[]], table=[[SALES, 
> EMPS]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, 
> id = 103
> Sets:
> Set#0, type: RecordType(INTEGER EMPNO, VARCHAR NAME, INTEGER DEPTNO, VARCHAR 
> GENDER, VARCHAR CITY, INTEGER EMPID, INTEGER AGE, BOOLEAN SLACKER, BOOLEAN 
> MANAGER, DATE JOINEDAT)
>         rel#116:Subset#0.NONE.[], best=null, importance=0.7290000000000001
>                 rel#103:LogicalTableScan.NONE.[](table=[SALES, EMPS]), 
> rowcount=100.0, cumulative cost={inf}
>         rel#128:Subset#0.ENUMERABLE.[], best=rel#135, 
> importance=0.36450000000000005
>                 rel#133:EnumerableTableScan.ENUMERABLE.[](table=[SALES, 
> EMPS]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
>                 
> rel#135:EnumerableInterpreter.ENUMERABLE.[](input=rel#132:Subset#0.BINDABLE.[]),
>  rowcount=100.0, cumulative cost={51.0 rows, 51.01 cpu, 0.0 io}
>         rel#132:Subset#0.BINDABLE.[], best=rel#131, 
> importance=0.36450000000000005
>                 rel#131:BindableTableScan.BINDABLE.[](table=[SALES, EMPS]), 
> rowcount=100.0, cumulative cost={1.0 rows, 1.01 cpu, 0.0 io}
> Set#1, type: RecordType(INTEGER EMPNO, INTEGER DEPTNO)
>         rel#118:Subset#1.NONE.[], best=null, importance=0.81
>                 
> rel#117:LogicalProject.NONE.[](input=rel#116:Subset#0.NONE.[],EMPNO=$0,DEPTNO=$2),
>  rowcount=100.0, cumulative cost={inf}
>         rel#125:Subset#1.ENUMERABLE.[], best=rel#134, importance=0.405
>                 
> rel#134:EnumerableProject.ENUMERABLE.[](input=rel#128:Subset#0.ENUMERABLE.[],EMPNO=$0,DEPTNO=$2),
>  rowcount=100.0, cumulative cost={151.0 rows, 251.01 cpu, 0.0 io}
> Set#2, type: RecordType(INTEGER EXPR$0)
>         rel#120:Subset#2.NONE.[], best=null, importance=0.9
>                 
> rel#119:LogicalAggregate.NONE.[](input=rel#118:Subset#1.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
>  $1)), rowcount=10.0, cumulative cost={inf}
>                 
> rel#127:LogicalAggregate.NONE.[](input=rel#116:Subset#0.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
>  $2)), rowcount=10.0, cumulative cost={inf}
>         rel#123:Subset#2.ENUMERABLE.[], best=null, importance=1.0
>                 
> rel#124:AbstractConverter.ENUMERABLE.[](input=rel#120:Subset#2.NONE.[],convention=ENUMERABLE,sort=[]),
>  rowcount=10.0, cumulative cost={inf}
>         at 
> org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:448)
>         at 
> org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:298)
>         at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:666)
>         at org.apache.calcite.tools.Programs$5.run(Programs.java:326)
>         at 
> org.apache.calcite.tools.Programs$SequenceProgram.run(Programs.java:387)
>         at org.apache.calcite.prepare.Prepare.optimize(Prepare.java:188)
>         at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:319)
>         at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:230)
>         at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:783)
>         at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:642)
>         at 
> org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:612)
>         at 
> org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:232)
>         at 
> org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:609)
>         at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675)
>         at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
>         ... 7 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (CALCITE-2402) COVAR_POP/COVAR_SAMP/REGR_SXX/REGR_SYY do not work

Reply via email to