RE: Problems with abstract syntax tree

Alexander Shoshin Tue, 01 Nov 2016 08:50:03 -0700

Julian, thank you for help.

I had a wrong picture of NULL values processing. So, it looks like there is 
some problem in my planner rules.
As for the AST, I was confused by the wrong Flink "explain()" function 
description :)



Regards,
Alexander

-----Original Message-----
From: Julian Hyde [mailto:[email protected]] 
Sent: Monday, October 31, 2016 10:43 PM
To: [email protected]
Subject: Re: Problems with abstract syntax tree

The behavior of NOT IN in SQL is complicated when there are NULL values around. 
In particular, if one "word" value from the sub-query is null, then the outer 
query must return 0 rows. (Why? Because "word NOT IN ('foo', 'bar' null)" would 
evaluate to UNKNOWN for every row.)

It is valid to deduce that "word" in the sub-query is never null, because of 
the "WHERE word = 'hello'" condition. I would have hoped that a constant 
reduction could do that, and then maybe the CASE expression can be simplified.

By the way, to be pedantic, what we are talking about here is the RelNode tree, 
the relational algebra, which comes out of the SqlToRelConverter. The AST is 
the SqlNode tree, which comes out of the parser and goes into the 
SqlToRelConverter.

On Mon, Oct 31, 2016 at 8:46 AM, Alexander Shoshin <[email protected]> 
wrote:
> Hello, everybody.
>
> Trying to resolve an Apache Flink issue I got some troubles with Calcite. Can 
> you help me to understand is there a problem in Calcite or just in wrong 
> settings passed to Calcite functions?
>
> I have a simple table "Words" with one column named "word" and a query with 
> NOT IN operator:
> val query = "SELECT word FROM Words WHERE word NOT IN (SELECT word FROM Words 
> WHERE word = 'hello')"
>
> This query parsed by org.apache.calcite.sql.parser.SqlParser.parseStmt() and 
> then transformed to a relational tree by 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(...).
>
> As a result I see the following abstract syntax tree
> LogicalProject(word=[$0])
>   LogicalFilter(condition=[NOT(CASE(=($1, 0), false, IS NOT NULL($5), true, 
> IS NULL($3), null, <($2, $1), null, false))])
>     LogicalJoin(condition=[=($3, $4)], joinType=[left])
>       LogicalProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$0])
>         LogicalJoin(condition=[true], joinType=[inner])
>           EnumerableTableScan(table=[[Words]])
>           LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>             LogicalProject($f0=[$0], $f1=[true])
>               LogicalProject(word=[$0])
>                 LogicalFilter(condition=[=($0, 'hello')])
>                   EnumerableTableScan(table=[[Words]])
>       LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>         LogicalProject($f0=[$0], $f1=[true])
>           LogicalProject(word=[$0])
>             LogicalFilter(condition=[=($0, 'hello')])
>               EnumerableTableScan(table=[[Words]])
>
> which fails later during query plan optimization (while calling 
> org.apache.calcite.tools.Programs.RuleSetProgram.run()).
>
> I think it might be because of a very complex abstract syntax tree generated 
> by Calcite. Shouldn't it be more simple? This one looks good for me:
> LogicalProject(word=[$0])
>   LogicalFilter(condition=[IS NULL($2)])
>     LogicalJoin(condition=[=($0, $1)], joinType=[left])
>       EnumerableTableScan(table=[[Words]])
>       LogicalProject($f0=[$0], $f1=[true])
>         LogicalProject(word=[$0])
>           LogicalFilter(condition=[=($0, 'hello')])
>             EnumerableTableScan(table=[[Words]])
>
> And when I write a query using LEFT OUTER JOIN to receive this syntax tree - 
> the optimization works fine. And the query execution result is the same as 
> must be in case of using NOT IN. So am I wrong with a supposition about bad 
> abstract syntax tree or not? I will be glad to receive any comments.
>
> Regards,
> Alexander

RE: Problems with abstract syntax tree

Reply via email to