I was suggesting you mark the variable that is holding the HiveContext '@transient' since the scala compiler is not correctly propagating this through the tuple extraction. This is only a workaround. We can also remove the tuple extraction.
On Mon, Feb 16, 2015 at 10:47 AM, Reynold Xin <r...@databricks.com> wrote: > Michael - it is already transient. This should probably considered a bug > in the scala compiler, but we can easily work around it by removing the use > of destructuring binding. > > On Mon, Feb 16, 2015 at 10:41 AM, Michael Armbrust <mich...@databricks.com > > wrote: > >> I'd suggest marking the HiveContext as @transient since its not valid to >> use it on the slaves anyway. >> >> On Mon, Feb 16, 2015 at 4:27 AM, Haopu Wang <hw...@qilinsoft.com> wrote: >> >> > When I'm investigating this issue (in the end of this email), I take a >> > look at HiveContext's code and find this change >> > ( >> https://github.com/apache/spark/commit/64945f868443fbc59cb34b34c16d782d >> > da0fb63d#diff-ff50aea397a607b79df9bec6f2a841db): >> > >> > >> > >> > - @transient protected[hive] lazy val hiveconf = new >> > HiveConf(classOf[SessionState]) >> > >> > - @transient protected[hive] lazy val sessionState = { >> > >> > - val ss = new SessionState(hiveconf) >> > >> > - setConf(hiveconf.getAllProperties) // Have SQLConf pick up the >> > initial set of HiveConf. >> > >> > - ss >> > >> > - } >> > >> > + @transient protected[hive] lazy val (hiveconf, sessionState) = >> > >> > + Option(SessionState.get()) >> > >> > + .orElse { >> > >> > >> > >> > With the new change, Scala compiler always generate a Tuple2 field of >> > HiveContext as below: >> > >> > >> > >> > private Tuple2 x$3; >> > >> > private transient OutputStream outputBuffer; >> > >> > private transient HiveConf hiveconf; >> > >> > private transient SessionState sessionState; >> > >> > private transient HiveMetastoreCatalog catalog; >> > >> > >> > >> > That "x$3" field's key is HiveConf object that cannot be serialized. So >> > can you suggest how to resolve this issue? Thank you very much! >> > >> > >> > >> > ================================ >> > >> > >> > >> > I have a streaming application which registered temp table on a >> > HiveContext for each batch duration. >> > >> > The application runs well in Spark 1.1.0. But I get below error from >> > 1.1.1. >> > >> > Do you have any suggestions to resolve it? Thank you! >> > >> > >> > >> > java.io.NotSerializableException: org.apache.hadoop.hive.conf.HiveConf >> > >> > - field (class "scala.Tuple2", name: "_1", type: "class >> > java.lang.Object") >> > >> > - object (class "scala.Tuple2", (Configuration: core-default.xml, >> > core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, >> > yarn-site.xml, hdfs-default.xml, hdfs-site.xml, >> > org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@2158ce23 >> ,org.apa >> > che.hadoop.hive.ql.session.SessionState@49b6eef9)) >> > >> > - field (class "org.apache.spark.sql.hive.HiveContext", name: "x$3", >> > type: "class scala.Tuple2") >> > >> > - object (class "org.apache.spark.sql.hive.HiveContext", >> > org.apache.spark.sql.hive.HiveContext@4e6e66a4) >> > >> > - field (class >> > "example.BaseQueryableDStream$$anonfun$registerTempTable$2", name: >> > "sqlContext$1", type: "class org.apache.spark.sql.SQLContext") >> > >> > - object (class >> > "example.BaseQueryableDStream$$anonfun$registerTempTable$2", >> > <function1>) >> > >> > - field (class >> > "org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1", >> > name: "foreachFunc$1", type: "interface scala.Function1") >> > >> > - object (class >> > "org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1", >> > <function2>) >> > >> > - field (class "org.apache.spark.streaming.dstream.ForEachDStream", >> > name: "org$apache$spark$streaming$dstream$ForEachDStream$$foreachFunc", >> > type: "interface scala.Function2") >> > >> > - object (class "org.apache.spark.streaming.dstream.ForEachDStream", >> > org.apache.spark.streaming.dstream.ForEachDStream@5ccbdc20) >> > >> > - element of array (index: 0) >> > >> > - array (class "[Ljava.lang.Object;", size: 16) >> > >> > - field (class "scala.collection.mutable.ArrayBuffer", name: >> > "array", type: "class [Ljava.lang.Object;") >> > >> > - object (class "scala.collection.mutable.ArrayBuffer", >> > ArrayBuffer(org.apache.spark.streaming.dstream.ForEachDStream@5ccbdc20 >> )) >> > >> > - field (class "org.apache.spark.streaming.DStreamGraph", name: >> > "outputStreams", type: "class scala.collection.mutable.ArrayBuffer") >> > >> > - custom writeObject data (class >> > "org.apache.spark.streaming.DStreamGraph") >> > >> > - object (class "org.apache.spark.streaming.DStreamGraph", >> > org.apache.spark.streaming.DStreamGraph@776ae7da) >> > >> > - field (class "org.apache.spark.streaming.Checkpoint", name: >> > "graph", type: "class org.apache.spark.streaming.DStreamGraph") >> > >> > - root object (class "org.apache.spark.streaming.Checkpoint", >> > org.apache.spark.streaming.Checkpoint@5eade065) >> > >> > at java.io.ObjectOutputStream.writeObject0(Unknown Source) >> > >> > >> > >> > >> > >> > >> > >> > >> > >