I see. Thanks for the clarification. It's not a a big issue but I am
surprised my UDF can be executed in planning phase. If my UDF is doing
something expensive it could get weird.



On Fri, Jun 8, 2018 at 3:44 PM, Reynold Xin <r...@databricks.com> wrote:

> But from the user's perspective, optimization is not run right? So it is
> still lazy.
>
>
> On Fri, Jun 8, 2018 at 12:35 PM Li Jin <ice.xell...@gmail.com> wrote:
>
>> Hi All,
>>
>> Sorry for the long email title. I am a bit surprised to find that the
>> current optimizer rule "ConvertToLocalRelation" causes expressions to be
>> eager-evaluated in planning phase, this can be demonstrated with the
>> following code:
>>
>> scala> val myUDF = udf((x: String) => { println("UDF evaled"); "result" })
>>
>> myUDF: org.apache.spark.sql.expressions.UserDefinedFunction =
>> UserDefinedFunction(<function1>,StringType,Some(List(StringType)))
>>
>>
>> scala> val df = Seq(("test", 1)).toDF("s", "i").select(myUDF($"s"))
>>
>> df: org.apache.spark.sql.DataFrame = [UDF(s): string]
>>
>>
>> scala> println(df.queryExecution.optimizedPlan)
>>
>> UDF evaled
>>
>> LocalRelation [UDF(s)#9]
>>
>>  This is somewhat unexpected to me because of Spark's lazy execution
>> model.
>>
>> I am wondering if this behavior is by design?
>>
>> Thanks!
>> Li
>>
>>
>>

Reply via email to