Liang-Chi Hsieh created SPARK-18395:
---------------------------------------

             Summary: Evaluate common subexpression like lazy variable with a 
function approach
                 Key: SPARK-18395
                 URL: https://issues.apache.org/jira/browse/SPARK-18395
             Project: Spark
          Issue Type: Improvement
          Components: SQL
            Reporter: Liang-Chi Hsieh


As per the discussion at pr 15807, we need to change the way of subexpression 
elimination.

In current approach, common subexpressions are evaluated no matter they are 
really used or not later. E.g., in the following generated codes, {{subexpr2}} 
is evaluated even only the {{if}} branch is run.

{code}
    if (isNull(subexpr)) {
      ...
    } else {
      AssertNotNull(subexpr)  // subexpr2
      ....
      SomeExpr(AssertNotNull(subexpr)) // SomeExpr(subexpr2)
    }
{code}

Besides possible performance regression, the expression like {{AssertNotNull}} 
can throw exception. So wrongly evaluating {{subexpr2}} will throw exception 
unexceptedly..

With this patch, now common subexpressions are not evaluated until they are 
used. We create a function for each common subexpression which evaluates and 
stores the result as a member variable. We have an initialization status 
variable to record whether the subexpression is evaluated.

Thus, when an expression using the subexpression is going to be evaluated, we 
check if the subexpression is initialized, if yes directly returning the 
result, if no call the function to evaluate it.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to