Thanks for help. I run this script again with "bin/spark-shell --conf spark.serializer=org.apache.spark.serializer.KryoSerializer”
in the console, I can see: scala> sc.getConf.getAll.foreach(println) (spark.tachyonStore.folderName,spark-eaabe986-03cb-41bd-bde5-993c7db3f048) (spark.driver.host,10.1.51.127) (spark.executor.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true) (spark.serializer,org.apache.spark.serializer.KryoSerializer) (spark.repl.class.uri,http://10.1.51.127:51319) (spark.app.name,Spark shell) (spark.driver.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true) (spark.fileserver.uri,http://10.1.51.127:51322) (spark.jars,) (spark.driver.port,51320) (spark.master,local[*]) But it fails again with the same error. On Aug 20, 2014, at 15:59, Fengyun RAO <raofeng...@gmail.com> wrote: > try: > > sparkConf.set("spark.serializer", > "org.apache.spark.serializer.KryoSerializer") > > > 2014-08-20 14:27 GMT+08:00 田毅 <tia...@asiainfo.com>: > Hi everyone! > > I got a exception when i run my script with spark-shell: > > I added > > SPARK_JAVA_OPTS="-Dsun.io.serialization.extendedDebugInfo=true" > > in spark-env.sh to show the following stack: > > > org.apache.spark.SparkException: Task not serializable > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) > at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) > at org.apache.spark.rdd.RDD.filter(RDD.scala:282) > at org.apache.spark.sql.SchemaRDD.filter(SchemaRDD.scala:460) > at $iwC$$iwC$$iwC$$iwC.<init>(<console>:18) > at $iwC$$iwC$$iwC.<init>(<console>:23) > at $iwC$$iwC.<init>(<console>:25) > at $iwC.<init>(<console>:27) > at <init>(<console>:29) > at .<init>(<console>:33) > at .<clinit>(<console>) > at .<init>(<console>:7) > at .<clinit>(<console>) > at $print(<console>) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062) > …… > Caused by: java.io.NotSerializableException: > org.apache.spark.sql.hive.HiveContext$$anon$3 > - field (class "org.apache.spark.sql.hive.HiveContext", name: > "functionRegistry", type: "class > org.apache.spark.sql.hive.HiveFunctionRegistry") > - object (class "org.apache.spark.sql.hive.HiveContext", > org.apache.spark.sql.hive.HiveContext@4648e685) > - field (class "$iwC$$iwC$$iwC$$iwC", name: "hc", type: "class > org.apache.spark.sql.hive.HiveContext") > - object (class "$iwC$$iwC$$iwC$$iwC", $iwC$$iwC$$iwC$$iwC@23d652ef) > - field (class "$iwC$$iwC$$iwC", name: "$iw", type: "class > $iwC$$iwC$$iwC$$iwC") > - object (class "$iwC$$iwC$$iwC", $iwC$$iwC$$iwC@71cc14f1) > - field (class "$iwC$$iwC", name: "$iw", type: "class $iwC$$iwC$$iwC") > - object (class "$iwC$$iwC", $iwC$$iwC@74eca89e) > - field (class "$iwC", name: "$iw", type: "class $iwC$$iwC") > - object (class "$iwC", $iwC@685c4cc4) > - field (class "$line9.$read", name: "$iw", type: "class $iwC") > - object (class "$line9.$read", $line9.$read@519f9aae) > - field (class "$iwC$$iwC$$iwC", name: "$VAL7", type: "class > $line9.$read") > - object (class "$iwC$$iwC$$iwC", $iwC$$iwC$$iwC@4b996858) > - field (class "$iwC$$iwC$$iwC$$iwC", name: "$outer", type: "class > $iwC$$iwC$$iwC") > - object (class "$iwC$$iwC$$iwC$$iwC", $iwC$$iwC$$iwC$$iwC@31d646d4) > - field (class "$iwC$$iwC$$iwC$$iwC$$anonfun$1", name: "$outer", type: > "class $iwC$$iwC$$iwC$$iwC") > - root object (class "$iwC$$iwC$$iwC$$iwC$$anonfun$1", <function1>) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528) > > I write some simple script to reproduce this problem. > > case 1 : > val barr1 = sc.broadcast("test") > val sret = sc.parallelize(1 to 10, 2) > val ret = sret.filter(row => !barr1.equals("test")) > ret.collect.foreach(println) > > It’s working fine with local mode and yarn-client mode. > > case 2 : > val barr1 = sc.broadcast("test") > val hc = new org.apache.spark.sql.hive.HiveContext(sc) > val sret = hc.sql("show tables") > val ret = sret.filter(row => !barr1.equals("test")) > ret.collect.foreach(println) > > It will throw java.io.NotSerializableException: > org.apache.spark.sql.hive.HiveContext > with local mode and yarn-client mode > > But it working fine if I write the same code in a scala file and run in > Intellij IDEA. > > import org.apache.spark.{SparkConf, SparkContext} > > object TestBroadcast2 { > def main(args: Array[String]) { > val sparkConf = new SparkConf().setAppName("Broadcast > Test").setMaster("local[3]") > val sc = new SparkContext(sparkConf) > val barr1 = sc.broadcast("test") > val hc = new org.apache.spark.sql.hive.HiveContext(sc) > val sret = hc.sql("show tables") > val ret = sret.filter(row => !barr1.equals("test")) > ret.collect.foreach(println) > } > } > > > > > >