[
https://issues.apache.org/jira/browse/CALCITE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538821#comment-16538821
]
Sergey Nuyanzin edited comment on CALCITE-2402 at 7/10/18 4:06 PM:
-------------------------------------------------------------------
Ok I see thank you
while working on it I faced with some issues/questions, could you please
comment them?
# Are there any requirements/rules/whatever to have SqlNodes (only in private)
for expanding variance while in public there is also RexNode? Some more details
related to the question. All not biased fail (denominator {{count - 1}}) in
boundary cases for window aggregated, e.g. {code}select var_samp(age)
over(partition by age) from emps;
+------------+
| EXPR$0 |
+------------+
java.lang.ArithmeticException: / by zero
at Baz$4$1.current(Unknown Source)
{code} It happens only while window aggregation because in regular aggregate
there is CASE statement for that, i.e.
While trying to do the similar in case of window for the code {code}final
SqlNumericLiteral one = SqlLiteral.createExactNumeric("1", pos);
final SqlLiteral nullLiteral = SqlLiteral.createNull(SqlParserPos.ZERO);
denominator = new SqlCase(SqlParserPos.ZERO,
count,
SqlNodeList.of(
SqlStdOperatorTable.EQUALS.createCall(pos,
count, SqlLiteral.createExactNumeric("1",
SqlParserPos.ZERO))
),
SqlNodeList.of(
getCastedSqlNode(nullLiteral, varType, pos, null)
),
SqlStdOperatorTable.MINUS.createCall(pos, count, one));{code} it
fails like {noformat}Caused by: java.lang.UnsupportedOperationException: class
org.apache.calcite.sql.SqlLiteral: NULL
at org.apache.calcite.util.Util.needToImplement(Util.java:921)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.getValidatedNodeType(SqlValidatorImpl.java:1551)
at
org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertLiteral(SqlNodeToRexConverterImpl.java:93)
at
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4659)
at
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:3977)
at org.apache.calcite.sql.SqlLiteral.accept(SqlLiteral.java:532)
at
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4541)
at
org.apache.calcite.sql2rel.StandardConvertletTable.convertCase(StandardConvertletTable.java:379)
... 37 more{noformat} At the same time if I use RexBuilder and RexNode
instead of SqlNodes for the similar CASE - everything works fine. I agree that
it could make sense to have implementation for {{SqlLiteral: NULL}} however it
looks like more generic issue not only relating to covariance/regression
functions
# For COVAR* and REGR* implementation I need a function to eliminate all rows
where any of two arguments is null. Just double check if there is an existing
function for that or no? Before I thought that {{SqlStdOperatorTable#FILTER}}
could help however it fails that it does not have implementer.
# VARIANCE and VAR_SAMP are implemented as the same (in different rdbms they
implemented a little different e.g.
[Oracle|https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/VARIANCE.html#GUID-EC33717A-2509-402D-B3BB-7EECB2E4ED8B],
[PostgreSQL|https://www.postgresql.org/docs/8.2/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE]).
Double check if Calcite's goes PostegreSQL way of not?
was (Author: sergey nuyanzin):
Ok I see thank you
while working on it I faced with some issues/questions, could you please
comment them?
# Are there any requirements/rules/whatever to have SqlNodes (only in private)
for expanding variance while in public there is also RexNode? Some more details
related to the question. All not biased fail (denominator {{count - 1}}) in
boundary cases for window aggregated, e.g. {code}0:
jdbc:calcite:model=target/test-classes/mod> select var_samp(age) over(partition
by age) from emps;
+------------+
| EXPR$0 |
+------------+
java.lang.ArithmeticException: / by zero
at Baz$4$1.current(Unknown Source)
{code} It happens only while window aggregation because in regular aggregate
there is CASE statement for that, i.e.
While trying to do the similar in case of window for the code {code}final
SqlNumericLiteral one = SqlLiteral.createExactNumeric("1", pos);
final SqlLiteral nullLiteral = SqlLiteral.createNull(SqlParserPos.ZERO);
denominator = new SqlCase(SqlParserPos.ZERO,
count,
SqlNodeList.of(
SqlStdOperatorTable.EQUALS.createCall(pos,
count, SqlLiteral.createExactNumeric("1",
SqlParserPos.ZERO))
),
SqlNodeList.of(
getCastedSqlNode(nullLiteral, varType, pos, null)
),
SqlStdOperatorTable.MINUS.createCall(pos, count, one));{code} it
fails like {noformat}Caused by: java.lang.UnsupportedOperationException: class
org.apache.calcite.sql.SqlLiteral: NULL
at org.apache.calcite.util.Util.needToImplement(Util.java:921)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.getValidatedNodeType(SqlValidatorImpl.java:1551)
at
org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertLiteral(SqlNodeToRexConverterImpl.java:93)
at
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4659)
at
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:3977)
at org.apache.calcite.sql.SqlLiteral.accept(SqlLiteral.java:532)
at
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4541)
at
org.apache.calcite.sql2rel.StandardConvertletTable.convertCase(StandardConvertletTable.java:379)
... 37 more{noformat} At the same time if I use RexBuilder and RexNode
instead of SqlNodes for the similar CASE - everything works fine.
# For COVAR* and REGR* implementation I need a function to eliminate all rows
where any of two arguments is null. Just double check if there is an existing
function for that or no? Before I thought that {{SqlStdOperatorTable#FILTER}}
could help however it fails that it does not have implementer.
# VARIANCE and VAR_SAMP are implemented as the same (in different rdbms they
implemented a little different e.g.
[Oracle|https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/VARIANCE.html#GUID-EC33717A-2509-402D-B3BB-7EECB2E4ED8B],
[PostgreSQL|https://www.postgresql.org/docs/8.2/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE]).
Double check if Calcite's goes PostegreSQL way of not?
> COVAR_POP/COVAR_SAMP/REGR_SXX/REGR_SYY do not work
> ---------------------------------------------------
>
> Key: CALCITE-2402
> URL: https://issues.apache.org/jira/browse/CALCITE-2402
> Project: Calcite
> Issue Type: Bug
> Components: core
> Reporter: Sergey Nuyanzin
> Assignee: Julian Hyde
> Priority: Major
>
> Any query from {code:sql}
> select covar_samp(empno, deptno) from emps;
> select covar_pop(empno, deptno) from emps;
> select regr_sxx(empno, deptno) from emps;
> select regr_syy(empno, deptno) from emps;
> {code}
> fails (the trace is below)
> As I understand the reason is not fully implementation (did not find any
> convertlet for them e.g.).
> From my point of view I could fix this issue however I have a question:
> How these function should be handled? As reducible functions (like
> STDDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP) or as Rex operators in
> RexImpTable?
> Please give some advice here
> {noformat}0: jdbc:calcite:model=target/test-classes/mod> select
> covar_samp(empno, deptno) from emps;
> Error: Error while executing SQL "select covar_samp(empno, deptno) from
> emps": Node [rel#123:Subset#2.ENUMERABLE.[]] could not be implemented;
> planner state:
> Root: rel#123:Subset#2.ENUMERABLE.[]
> Original rel:
> LogicalAggregate(subset=[rel#123:Subset#2.ENUMERABLE.[]], group=[{}],
> EXPR$0=[COVAR_SAMP($0, $1)]): rowcount = 10.0, cumulative cost = {11.25 rows,
> 0.0 cpu, 0.0 io}, id = 119
> LogicalProject(subset=[rel#118:Subset#1.NONE.[]], EMPNO=[$0], DEPTNO=[$2]):
> rowcount = 100.0, cumulative cost = {100.0 rows, 200.0 cpu, 0.0 io}, id = 117
> LogicalTableScan(subset=[rel#116:Subset#0.NONE.[]], table=[[SALES,
> EMPS]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io},
> id = 103
> Sets:
> Set#0, type: RecordType(INTEGER EMPNO, VARCHAR NAME, INTEGER DEPTNO, VARCHAR
> GENDER, VARCHAR CITY, INTEGER EMPID, INTEGER AGE, BOOLEAN SLACKER, BOOLEAN
> MANAGER, DATE JOINEDAT)
> rel#116:Subset#0.NONE.[], best=null, importance=0.7290000000000001
> rel#103:LogicalTableScan.NONE.[](table=[SALES, EMPS]),
> rowcount=100.0, cumulative cost={inf}
> rel#128:Subset#0.ENUMERABLE.[], best=rel#135,
> importance=0.36450000000000005
> rel#133:EnumerableTableScan.ENUMERABLE.[](table=[SALES,
> EMPS]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
>
> rel#135:EnumerableInterpreter.ENUMERABLE.[](input=rel#132:Subset#0.BINDABLE.[]),
> rowcount=100.0, cumulative cost={51.0 rows, 51.01 cpu, 0.0 io}
> rel#132:Subset#0.BINDABLE.[], best=rel#131,
> importance=0.36450000000000005
> rel#131:BindableTableScan.BINDABLE.[](table=[SALES, EMPS]),
> rowcount=100.0, cumulative cost={1.0 rows, 1.01 cpu, 0.0 io}
> Set#1, type: RecordType(INTEGER EMPNO, INTEGER DEPTNO)
> rel#118:Subset#1.NONE.[], best=null, importance=0.81
>
> rel#117:LogicalProject.NONE.[](input=rel#116:Subset#0.NONE.[],EMPNO=$0,DEPTNO=$2),
> rowcount=100.0, cumulative cost={inf}
> rel#125:Subset#1.ENUMERABLE.[], best=rel#134, importance=0.405
>
> rel#134:EnumerableProject.ENUMERABLE.[](input=rel#128:Subset#0.ENUMERABLE.[],EMPNO=$0,DEPTNO=$2),
> rowcount=100.0, cumulative cost={151.0 rows, 251.01 cpu, 0.0 io}
> Set#2, type: RecordType(INTEGER EXPR$0)
> rel#120:Subset#2.NONE.[], best=null, importance=0.9
>
> rel#119:LogicalAggregate.NONE.[](input=rel#118:Subset#1.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
> $1)), rowcount=10.0, cumulative cost={inf}
>
> rel#127:LogicalAggregate.NONE.[](input=rel#116:Subset#0.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
> $2)), rowcount=10.0, cumulative cost={inf}
> rel#123:Subset#2.ENUMERABLE.[], best=null, importance=1.0
>
> rel#124:AbstractConverter.ENUMERABLE.[](input=rel#120:Subset#2.NONE.[],convention=ENUMERABLE,sort=[]),
> rowcount=10.0, cumulative cost={inf} (state=,code=0)
> java.sql.SQLException: Error while executing SQL "select covar_samp(empno,
> deptno) from emps": Node [rel#123:Subset#2.ENUMERABLE.[]] could not be
> implemented; planner state:
> Root: rel#123:Subset#2.ENUMERABLE.[]
> Original rel:
> LogicalAggregate(subset=[rel#123:Subset#2.ENUMERABLE.[]], group=[{}],
> EXPR$0=[COVAR_SAMP($0, $1)]): rowcount = 10.0, cumulative cost = {11.25 rows,
> 0.0 cpu, 0.0 io}, id = 119
> LogicalProject(subset=[rel#118:Subset#1.NONE.[]], EMPNO=[$0], DEPTNO=[$2]):
> rowcount = 100.0, cumulative cost = {100.0 rows, 200.0 cpu, 0.0 io}, id = 117
> LogicalTableScan(subset=[rel#116:Subset#0.NONE.[]], table=[[SALES,
> EMPS]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io},
> id = 103
> Sets:
> Set#0, type: RecordType(INTEGER EMPNO, VARCHAR NAME, INTEGER DEPTNO, VARCHAR
> GENDER, VARCHAR CITY, INTEGER EMPID, INTEGER AGE, BOOLEAN SLACKER, BOOLEAN
> MANAGER, DATE JOINEDAT)
> rel#116:Subset#0.NONE.[], best=null, importance=0.7290000000000001
> rel#103:LogicalTableScan.NONE.[](table=[SALES, EMPS]),
> rowcount=100.0, cumulative cost={inf}
> rel#128:Subset#0.ENUMERABLE.[], best=rel#135,
> importance=0.36450000000000005
> rel#133:EnumerableTableScan.ENUMERABLE.[](table=[SALES,
> EMPS]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
>
> rel#135:EnumerableInterpreter.ENUMERABLE.[](input=rel#132:Subset#0.BINDABLE.[]),
> rowcount=100.0, cumulative cost={51.0 rows, 51.01 cpu, 0.0 io}
> rel#132:Subset#0.BINDABLE.[], best=rel#131,
> importance=0.36450000000000005
> rel#131:BindableTableScan.BINDABLE.[](table=[SALES, EMPS]),
> rowcount=100.0, cumulative cost={1.0 rows, 1.01 cpu, 0.0 io}
> Set#1, type: RecordType(INTEGER EMPNO, INTEGER DEPTNO)
> rel#118:Subset#1.NONE.[], best=null, importance=0.81
>
> rel#117:LogicalProject.NONE.[](input=rel#116:Subset#0.NONE.[],EMPNO=$0,DEPTNO=$2),
> rowcount=100.0, cumulative cost={inf}
> rel#125:Subset#1.ENUMERABLE.[], best=rel#134, importance=0.405
>
> rel#134:EnumerableProject.ENUMERABLE.[](input=rel#128:Subset#0.ENUMERABLE.[],EMPNO=$0,DEPTNO=$2),
> rowcount=100.0, cumulative cost={151.0 rows, 251.01 cpu, 0.0 io}
> Set#2, type: RecordType(INTEGER EXPR$0)
> rel#120:Subset#2.NONE.[], best=null, importance=0.9
>
> rel#119:LogicalAggregate.NONE.[](input=rel#118:Subset#1.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
> $1)), rowcount=10.0, cumulative cost={inf}
>
> rel#127:LogicalAggregate.NONE.[](input=rel#116:Subset#0.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
> $2)), rowcount=10.0, cumulative cost={inf}
> rel#123:Subset#2.ENUMERABLE.[], best=null, importance=1.0
>
> rel#124:AbstractConverter.ENUMERABLE.[](input=rel#120:Subset#2.NONE.[],convention=ENUMERABLE,sort=[]),
> rowcount=10.0, cumulative cost={inf}
> at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
> at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
> at
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:163)
> at
> org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:217)
> at sqlline.Commands.execute(Commands.java:823)
> at sqlline.Commands.sql(Commands.java:733)
> at sqlline.SqlLine.dispatch(SqlLine.java:795)
> at sqlline.SqlLine.begin(SqlLine.java:668)
> at sqlline.SqlLine.start(SqlLine.java:373)
> at sqlline.SqlLine.main(SqlLine.java:265)
> Caused by: org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
> [rel#123:Subset#2.ENUMERABLE.[]] could not be implemented; planner state:
> Root: rel#123:Subset#2.ENUMERABLE.[]
> Original rel:
> LogicalAggregate(subset=[rel#123:Subset#2.ENUMERABLE.[]], group=[{}],
> EXPR$0=[COVAR_SAMP($0, $1)]): rowcount = 10.0, cumulative cost = {11.25 rows,
> 0.0 cpu, 0.0 io}, id = 119
> LogicalProject(subset=[rel#118:Subset#1.NONE.[]], EMPNO=[$0], DEPTNO=[$2]):
> rowcount = 100.0, cumulative cost = {100.0 rows, 200.0 cpu, 0.0 io}, id = 117
> LogicalTableScan(subset=[rel#116:Subset#0.NONE.[]], table=[[SALES,
> EMPS]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io},
> id = 103
> Sets:
> Set#0, type: RecordType(INTEGER EMPNO, VARCHAR NAME, INTEGER DEPTNO, VARCHAR
> GENDER, VARCHAR CITY, INTEGER EMPID, INTEGER AGE, BOOLEAN SLACKER, BOOLEAN
> MANAGER, DATE JOINEDAT)
> rel#116:Subset#0.NONE.[], best=null, importance=0.7290000000000001
> rel#103:LogicalTableScan.NONE.[](table=[SALES, EMPS]),
> rowcount=100.0, cumulative cost={inf}
> rel#128:Subset#0.ENUMERABLE.[], best=rel#135,
> importance=0.36450000000000005
> rel#133:EnumerableTableScan.ENUMERABLE.[](table=[SALES,
> EMPS]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
>
> rel#135:EnumerableInterpreter.ENUMERABLE.[](input=rel#132:Subset#0.BINDABLE.[]),
> rowcount=100.0, cumulative cost={51.0 rows, 51.01 cpu, 0.0 io}
> rel#132:Subset#0.BINDABLE.[], best=rel#131,
> importance=0.36450000000000005
> rel#131:BindableTableScan.BINDABLE.[](table=[SALES, EMPS]),
> rowcount=100.0, cumulative cost={1.0 rows, 1.01 cpu, 0.0 io}
> Set#1, type: RecordType(INTEGER EMPNO, INTEGER DEPTNO)
> rel#118:Subset#1.NONE.[], best=null, importance=0.81
>
> rel#117:LogicalProject.NONE.[](input=rel#116:Subset#0.NONE.[],EMPNO=$0,DEPTNO=$2),
> rowcount=100.0, cumulative cost={inf}
> rel#125:Subset#1.ENUMERABLE.[], best=rel#134, importance=0.405
>
> rel#134:EnumerableProject.ENUMERABLE.[](input=rel#128:Subset#0.ENUMERABLE.[],EMPNO=$0,DEPTNO=$2),
> rowcount=100.0, cumulative cost={151.0 rows, 251.01 cpu, 0.0 io}
> Set#2, type: RecordType(INTEGER EXPR$0)
> rel#120:Subset#2.NONE.[], best=null, importance=0.9
>
> rel#119:LogicalAggregate.NONE.[](input=rel#118:Subset#1.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
> $1)), rowcount=10.0, cumulative cost={inf}
>
> rel#127:LogicalAggregate.NONE.[](input=rel#116:Subset#0.NONE.[],group={},EXPR$0=COVAR_SAMP($0,
> $2)), rowcount=10.0, cumulative cost={inf}
> rel#123:Subset#2.ENUMERABLE.[], best=null, importance=1.0
>
> rel#124:AbstractConverter.ENUMERABLE.[](input=rel#120:Subset#2.NONE.[],convention=ENUMERABLE,sort=[]),
> rowcount=10.0, cumulative cost={inf}
> at
> org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:448)
> at
> org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:298)
> at
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:666)
> at org.apache.calcite.tools.Programs$5.run(Programs.java:326)
> at
> org.apache.calcite.tools.Programs$SequenceProgram.run(Programs.java:387)
> at org.apache.calcite.prepare.Prepare.optimize(Prepare.java:188)
> at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:319)
> at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:230)
> at
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:783)
> at
> org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:642)
> at
> org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:612)
> at
> org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:232)
> at
> org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:609)
> at
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675)
> at
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
> ... 7 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)