Liang-Chi Hsieh created SPARK-18395:
---------------------------------------
Summary: Evaluate common subexpression like lazy variable with a
function approach
Key: SPARK-18395
URL: https://issues.apache.org/jira/browse/SPARK-18395
Project: Spark
Issue Type: Improvement
Components: SQL
Reporter: Liang-Chi Hsieh
As per the discussion at pr 15807, we need to change the way of subexpression
elimination.
In current approach, common subexpressions are evaluated no matter they are
really used or not later. E.g., in the following generated codes, {{subexpr2}}
is evaluated even only the {{if}} branch is run.
{code}
if (isNull(subexpr)) {
...
} else {
AssertNotNull(subexpr) // subexpr2
....
SomeExpr(AssertNotNull(subexpr)) // SomeExpr(subexpr2)
}
{code}
Besides possible performance regression, the expression like {{AssertNotNull}}
can throw exception. So wrongly evaluating {{subexpr2}} will throw exception
unexceptedly..
With this patch, now common subexpressions are not evaluated until they are
used. We create a function for each common subexpression which evaluates and
stores the result as a member variable. We have an initialization status
variable to record whether the subexpression is evaluated.
Thus, when an expression using the subexpression is going to be evaluated, we
check if the subexpression is initialized, if yes directly returning the
result, if no call the function to evaluate it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]