[
https://issues.apache.org/jira/browse/SPARK-18395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653351#comment-15653351
]
Apache Spark commented on SPARK-18395:
--------------------------------------
User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/15837
> Evaluate common subexpression like lazy variable with a function approach
> -------------------------------------------------------------------------
>
> Key: SPARK-18395
> URL: https://issues.apache.org/jira/browse/SPARK-18395
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Liang-Chi Hsieh
>
> As per the discussion at pr 15807, we need to change the way of subexpression
> elimination.
> In current approach, common subexpressions are evaluated no matter they are
> really used or not later. E.g., in the following generated codes,
> {{subexpr2}} is evaluated even only the {{if}} branch is run.
> {code}
> if (isNull(subexpr)) {
> ...
> } else {
> AssertNotNull(subexpr) // subexpr2
> ....
> SomeExpr(AssertNotNull(subexpr)) // SomeExpr(subexpr2)
> }
> {code}
> Besides possible performance regression, the expression like
> {{AssertNotNull}} can throw exception. So wrongly evaluating {{subexpr2}}
> will throw exception unexceptedly..
> With this patch, now common subexpressions are not evaluated until they are
> used. We create a function for each common subexpression which evaluates and
> stores the result as a member variable. We have an initialization status
> variable to record whether the subexpression is evaluated.
> Thus, when an expression using the subexpression is going to be evaluated, we
> check if the subexpression is initialized, if yes directly returning the
> result, if no call the function to evaluate it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]