I've noticed that after I use a Window function over a DataFrame if I call a map() with a function, Spark returns a "Task not serializable" Exception This is my code:
val hc = new org.apache.spark.sql.hive.HiveContext(sc) import hc.implicits._ import org.apache.spark.sql.expressions.Window import org.apache.spark.sql.functions._ def f():String = "test" case class P(name:String,surname:String) val lag_result = lag($"name",1).over(Window.partitionBy($"surname")) val lista = List(P("N1","S1"),P("N2","S2"),P("N2","S2")) val data_frame = hc.createDataFrame(sc.parallelize(lista)) df.withColumn("lag_result", lag_result).map(x => f) //df.withColumn("lag_result", lag_result).map{case x => def f():String = "test";f}.collect // This works And this is the Stack Trace: org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:324) at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:323) at ... and more Caused by: java.io.NotSerializableException: org.apache.spark.sql.Column Serialization stack: - object not serializable (class: org.apache.spark.sql.Column, value: 'lag(name,1,null) windowspecdefinition(surname,UnspecifiedFrame)) - field (class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC, name: lag_result, type: class org.apache.spark.sql.Column) ... and more -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Task-not-serializable-with-lag-Window-function-tp26976.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org