Rewriting the paragraph to use sparkSession fixes most of the class issues. The remaining class issues were resolved by upgrading to SPARK2-2.0.0.cloudera2-1, which was released on 02-24-2017 ( http://community.cloudera.com/t5/Community-News-Release/ANNOUNCE-Spark-2-0-Release-2/m-p/51464#M161 ).
Thanks, Rob On Tue, Mar 7, 2017 at 4:49 PM, Jianfeng (Jeff) Zhang < jzh...@hortonworks.com> wrote: > >>> It appears that during execution time on the yarn hosts, the native > CDH spark1.5 jars are loaded before the new spark2 jars. I've tried > using spark.yarn.archive to specify the spark2 jars in hdfs as well as > using other spark options, none of which seems to make a difference. > > Where do you see “ spark1.5 jars are loaded before the new spark2 jars” ? > > Best Regard, > Jeff Zhang > > > From: Rob Anderson <rockclimbings...@gmail.com> > Reply-To: "users@zeppelin.apache.org" <users@zeppelin.apache.org> > Date: Wednesday, March 8, 2017 at 2:29 AM > To: "users@zeppelin.apache.org" <users@zeppelin.apache.org> > Subject: Re: java.lang.ClassNotFoundException: $anonfun$1 > > Thanks. I can reach out to Cloudera, although the same commands seem to > be work via Spak-Shell (see below). So, the issue seems unique to > Zeppelin. > > Spark context available as 'sc' (master = yarn, app id = > application_1472496315722_481416). > > Spark session available as 'spark'. > > Welcome to > > ____ __ > > / __/__ ___ _____/ /__ > > _\ \/ _ \/ _ `/ __/ '_/ > > /___/ .__/\_,_/_/ /_/\_\ version 2.0.0.cloudera1 > > /_/ > > > > Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java > 1.8.0_60) > > Type in expressions to have them evaluated. > > Type :help for more information. > > > scala> val taxonomy = sc.textFile("/user/user1/data/") > > taxonomy: org.apache.spark.rdd.RDD[String] = /user/user1/data/ > MapPartitionsRDD[1] > at textFile at <console>:24 > > > scala> .map(l => l.split("\t")) > > res0: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[2] at > map at <console>:27 > > > scala> taxonomy.first > > res1: String = 43 B&B 459Sheets & Pillow 45 Sheets1 Sheets > > On Mon, Mar 6, 2017 at 6:48 PM, moon soo Lee <m...@apache.org> wrote: > >> Hi Rob, >> >> Thanks for sharing the problem. >> fyi, https://issues.apache.org/jira/browse/ZEPPELIN-1735 is tracking the >> problem. >> >> If we can get help from cloudera forum, that would be great. >> >> Thanks, >> moon >> >> On Tue, Mar 7, 2017 at 10:08 AM Jeff Zhang <zjf...@gmail.com> wrote: >> >>> >>> It seems CDH specific issue, you might be better to ask cloudera forum. >>> >>> >>> Rob Anderson <rockclimbings...@gmail.com>于2017年3月7日周二 上午9:02写道: >>> >>> Hey Everyone, >>> >>> We're running Zeppelin 0.7.0. We've just cut over to spark2, using >>> scala11 via the CDH parcel (SPARK2-2.0.0.cloudera1-1.cdh5 >>> .7.0.p0.113931). >>> >>> Running a simple job, throws a "Caused by: >>> java.lang.ClassNotFoundException: $anonfun$1". It appears that during >>> execution time on the yarn hosts, the native CDH spark1.5 jars are loaded >>> before the new spark2 jars. I've tried using spark.yarn.archive to specify >>> the spark2 jars in hdfs as well as using other spark options, none of which >>> seems to make a difference. >>> >>> >>> Any suggestions you can offer is appreciated. >>> >>> Thanks, >>> >>> Rob >>> >>> ------------------------ >>> >>> >>> %spark >>> val taxonomy = sc.textFile("/user/user1/data/") >>> .map(l => l.split("\t")) >>> >>> %spark >>> taxonomy.first >>> >>> >>> org.apache.spark.SparkException: Job aborted due to stage failure: Task >>> 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage >>> 1.0 (TID 7, data08.hadoop.prod.ostk.com, executor 2): >>> java.lang.ClassNotFoundException: $anonfun$1 >>> at org.apache.spark.repl.ExecutorClassLoader.findClass(Executor >>> ClassLoader.scala:82) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:348) >>> at org.apache.spark.serializer.JavaDeserializationStream$$anon$ >>> 1.resolveClass(JavaSerializer.scala:67) >>> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream >>> .java:1613) >>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) >>> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre >>> am.java:1774) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >>> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea >>> m.java:2000) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) >>> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre >>> am.java:1801) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >>> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea >>> m.java:2000) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) >>> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre >>> am.java:1801) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >>> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea >>> m.java:2000) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) >>> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre >>> am.java:1801) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) >>> at org.apache.spark.serializer.JavaDeserializationStream.readOb >>> ject(JavaSerializer.scala:75) >>> at org.apache.spark.serializer.JavaSerializerInstance.deseriali >>> ze(JavaSerializer.scala:114) >>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >>> at org.apache.spark.scheduler.Task.run(Task.scala:86) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) >>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >>> Executor.java:1142) >>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >>> lExecutor.java:617) >>> at java.lang.Thread.run(Thread.java:745) >>> Caused by: java.lang.ClassNotFoundException: $anonfun$1 >>> at java.lang.ClassLoader.findClass(ClassLoader.java:530) >>> at org.apache.spark.util.ParentClassLoader.findClass(ParentClas >>> sLoader.scala:26) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>> at org.apache.spark.util.ParentClassLoader.loadClass(ParentClas >>> sLoader.scala:34) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>> at org.apache.spark.util.ParentClassLoader.loadClass(ParentClas >>> sLoader.scala:30) >>> at org.apache.spark.repl.ExecutorClassLoader.findClass(Executor >>> ClassLoader.scala:77) >>> ... 30 more >>> Driver stacktrace: >>> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$sch >>> eduler$DAGScheduler$$failJobAndIndependentStages(DAGSchedule >>> r.scala:1454) >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$ >>> 1.apply(DAGScheduler.scala:1442) >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$ >>> 1.apply(DAGScheduler.scala:1441) >>> at scala.collection.mutable.ResizableArray$class.foreach(Resiza >>> bleArray.scala:59) >>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) >>> at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedu >>> ler.scala:1441) >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS >>> etFailed$1.apply(DAGScheduler.scala:811) >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS >>> etFailed$1.apply(DAGScheduler.scala:811) >>> at scala.Option.foreach(Option.scala:257) >>> at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( >>> DAGScheduler.scala:811) >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn >>> Receive(DAGScheduler.scala:1669) >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe >>> ceive(DAGScheduler.scala:1624) >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe >>> ceive(DAGScheduler.scala:1613) >>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >>> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler. >>> scala:632) >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1893) >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1906) >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1919) >>> at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1318) >>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >>> onScope.scala:151) >>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >>> onScope.scala:112) >>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:358) >>> at org.apache.spark.rdd.RDD.take(RDD.scala:1292) >>> at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1332) >>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >>> onScope.scala:151) >>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >>> onScope.scala:112) >>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:358) >>> at org.apache.spark.rdd.RDD.first(RDD.scala:1331) >>> ... 37 elided >>> Caused by: java.lang.ClassNotFoundException: $anonfun$1 >>> at org.apache.spark.repl.ExecutorClassLoader.findClass(Executor >>> ClassLoader.scala:82) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:348) >>> at org.apache.spark.serializer.JavaDeserializationStream$$anon$ >>> 1.resolveClass(JavaSerializer.scala:67) >>> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream >>> .java:1613) >>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) >>> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre >>> am.java:1774) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >>> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea >>> m.java:2000) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) >>> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre >>> am.java:1801) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >>> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea >>> m.java:2000) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) >>> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre >>> am.java:1801) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >>> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea >>> m.java:2000) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) >>> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre >>> am.java:1801) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) >>> at org.apache.spark.serializer.JavaDeserializationStream.readOb >>> ject(JavaSerializer.scala:75) >>> at org.apache.spark.serializer.JavaSerializerInstance.deseriali >>> ze(JavaSerializer.scala:114) >>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >>> at org.apache.spark.scheduler.Task.run(Task.scala:86) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) >>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >>> Executor.java:1142) >>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >>> lExecutor.java:617) >>> ... 1 more >>> Caused by: java.lang.ClassNotFoundException: $anonfun$1 >>> at java.lang.ClassLoader.findClass(ClassLoader.java:530) >>> at org.apache.spark.util.ParentClassLoader.findClass(ParentClas >>> sLoader.scala:26) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>> at org.apache.spark.util.ParentClassLoader.loadClass(ParentClas >>> sLoader.scala:34) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>> at org.apache.spark.util.ParentClassLoader.loadClass(ParentClas >>> sLoader.scala:30) >>> at org.apache.spark.repl.ExecutorClassL >>> oader.findClass(ExecutorClassLoader.scala:77) >>> ... 30 more >>> >>> >