Hi,

I doubt the the broadcast variable is your problem, since you are seeing:

org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException: org.apache.spark.sql
.hive.HiveContext$$anon$3

We have a knowledgebase article that explains why this happens - it's a
very common error I see users triggering on the mailing list:

https://github.com/databricks/spark-knowledgebase/blob/master/troubleshooting/javaionotserializableexception.md

Are you using the HiveContext within a tranformation that is called on an
RDD?  That will definitely create a problem.

-Vida





On Wed, Aug 20, 2014 at 1:20 AM, tianyi <tia...@asiainfo.com> wrote:

> Thanks for help.
>
> I run this script again with "bin/spark-shell --conf
> spark.serializer=org.apache.spark.serializer.KryoSerializer”
>
> in the console, I can see:
>
> scala> sc.getConf.getAll.foreach(println)
> (spark.tachyonStore.folderName,spark-eaabe986-03cb-41bd-bde5-993c7db3f048)
> (spark.driver.host,10.1.51.127)
>
> (spark.executor.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true)
> (spark.serializer,org.apache.spark.serializer.KryoSerializer)
> (spark.repl.class.uri,http://10.1.51.127:51319)
> (spark.app.name,Spark shell)
>
> (spark.driver.extraJavaOptions,-Dsun.io.serialization.extendedDebugInfo=true)
> (spark.fileserver.uri,http://10.1.51.127:51322)
> (spark.jars,)
> (spark.driver.port,51320)
> (spark.master,local[*])
>
> But it fails again with the same error.
>
>
>
>
> On Aug 20, 2014, at 15:59, Fengyun RAO <raofeng...@gmail.com> wrote:
>
> try:
>
> sparkConf.set("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer")
>
>
> 2014-08-20 14:27 GMT+08:00 田毅 <tia...@asiainfo.com>:
>
> Hi everyone!
>>
>> I got a exception when i run my script with spark-shell:
>>
>> I added
>>
>> SPARK_JAVA_OPTS="-Dsun.io.serialization.extendedDebugInfo=true"
>>
>> in spark-env.sh to show the following stack:
>>
>>
>> org.apache.spark.SparkException: Task not serializable
>>  at
>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
>> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
>>  at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
>> at org.apache.spark.rdd.RDD.filter(RDD.scala:282)
>>  at org.apache.spark.sql.SchemaRDD.filter(SchemaRDD.scala:460)
>> at $iwC$$iwC$$iwC$$iwC.<init>(<console>:18)
>>  at $iwC$$iwC$$iwC.<init>(<console>:23)
>> at $iwC$$iwC.<init>(<console>:25)
>>  at $iwC.<init>(<console>:27)
>> at <init>(<console>:29)
>>  at .<init>(<console>:33)
>> at .<clinit>(<console>)
>>  at .<init>(<console>:7)
>> at .<clinit>(<console>)
>>  at $print(<console>)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>  at java.lang.reflect.Method.invoke(Method.java:601)
>> at
>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789)
>>  at
>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
>> ……
>> Caused by: java.io.NotSerializableException:
>> org.apache.spark.sql.hive.HiveContext$$anon$3
>> - field (class "org.apache.spark.sql.hive.HiveContext", name:
>> "functionRegistry", type: "class
>> org.apache.spark.sql.hive.HiveFunctionRegistry")
>>  - object (class "org.apache.spark.sql.hive.HiveContext",
>> org.apache.spark.sql.hive.HiveContext@4648e685)
>>  - field (class "$iwC$$iwC$$iwC$$iwC", name: "hc", type: "class
>> org.apache.spark.sql.hive.HiveContext")
>>  - object (class "$iwC$$iwC$$iwC$$iwC", $iwC$$iwC$$iwC$$iwC@23d652ef)
>> - field (class "$iwC$$iwC$$iwC", name: "$iw", type: "class
>> $iwC$$iwC$$iwC$$iwC")
>>  - object (class "$iwC$$iwC$$iwC", $iwC$$iwC$$iwC@71cc14f1)
>> - field (class "$iwC$$iwC", name: "$iw", type: "class $iwC$$iwC$$iwC")
>>  - object (class "$iwC$$iwC", $iwC$$iwC@74eca89e)
>> - field (class "$iwC", name: "$iw", type: "class $iwC$$iwC")
>>  - object (class "$iwC", $iwC@685c4cc4)
>> - field (class "$line9.$read", name: "$iw", type: "class $iwC")
>>  - object (class "$line9.$read", $line9.$read@519f9aae)
>> - field (class "$iwC$$iwC$$iwC", name: "$VAL7", type: "class
>> $line9.$read")
>>  - object (class "$iwC$$iwC$$iwC", $iwC$$iwC$$iwC@4b996858)
>> - field (class "$iwC$$iwC$$iwC$$iwC", name: "$outer", type: "class
>> $iwC$$iwC$$iwC")
>>  - object (class "$iwC$$iwC$$iwC$$iwC", $iwC$$iwC$$iwC$$iwC@31d646d4)
>> - field (class "$iwC$$iwC$$iwC$$iwC$$anonfun$1", name: "$outer", type:
>> "class $iwC$$iwC$$iwC$$iwC")
>>  - root object (class "$iwC$$iwC$$iwC$$iwC$$anonfun$1", <function1>)
>> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>>  at
>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>>
>> I write some simple script to reproduce this problem.
>>
>> case 1 :
>>     val barr1 = sc.broadcast("test")
>>     val sret = sc.parallelize(1 to 10, 2)
>>     val ret = sret.filter(row => !barr1.equals("test"))
>>     ret.collect.foreach(println)
>>
>> It’s working fine with local mode and yarn-client mode.
>>
>> case 2 :
>>     val barr1 = sc.broadcast("test")
>>     val hc = new org.apache.spark.sql.hive.HiveContext(sc)
>>     val sret = hc.sql("show tables")
>>     val ret = sret.filter(row => !barr1.equals("test"))
>>     ret.collect.foreach(println)
>>
>> It will throw java.io.NotSerializableException:
>> org.apache.spark.sql.hive.HiveContext
>>  with local mode and yarn-client mode
>>
>> But it working fine if I write the same code in a scala file and run in
>> Intellij IDEA.
>>
>> import org.apache.spark.{SparkConf, SparkContext}
>>
>> object TestBroadcast2 {
>>   def main(args: Array[String]) {
>>     val sparkConf = new SparkConf().setAppName("Broadcast
>> Test").setMaster("local[3]")
>>     val sc = new SparkContext(sparkConf)
>>     val barr1 = sc.broadcast("test")
>>     val hc = new org.apache.spark.sql.hive.HiveContext(sc)
>>     val sret = hc.sql("show tables")
>>     val ret = sret.filter(row => !barr1.equals("test"))
>>     ret.collect.foreach(println)
>>   }
>> }
>>
>>
>>
>>
>>
>>
>
>

Reply via email to