Thanks. I can reach out to Cloudera, although the same commands seem to be work via Spak-Shell (see below). So, the issue seems unique to Zeppelin.
Spark context available as 'sc' (master = yarn, app id = application_1472496315722_481416). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.0.cloudera1 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60) Type in expressions to have them evaluated. Type :help for more information. scala> val taxonomy = sc.textFile("/user/user1/data/") taxonomy: org.apache.spark.rdd.RDD[String] = /user/user1/data/ MapPartitionsRDD[1] at textFile at <console>:24 scala> .map(l => l.split("\t")) res0: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[2] at map at <console>:27 scala> taxonomy.first res1: String = 43 B&B 459 Sheets & Pillow 45 Sheets 1 Sheets On Mon, Mar 6, 2017 at 6:48 PM, moon soo Lee <m...@apache.org> wrote: > Hi Rob, > > Thanks for sharing the problem. > fyi, https://issues.apache.org/jira/browse/ZEPPELIN-1735 is tracking the > problem. > > If we can get help from cloudera forum, that would be great. > > Thanks, > moon > > On Tue, Mar 7, 2017 at 10:08 AM Jeff Zhang <zjf...@gmail.com> wrote: > >> >> It seems CDH specific issue, you might be better to ask cloudera forum. >> >> >> Rob Anderson <rockclimbings...@gmail.com>于2017年3月7日周二 上午9:02写道: >> >> Hey Everyone, >> >> We're running Zeppelin 0.7.0. We've just cut over to spark2, using >> scala11 via the CDH parcel (SPARK2-2.0.0.cloudera1-1.cdh5.7.0.p0.113931). >> >> Running a simple job, throws a "Caused by: java.lang.ClassNotFoundException: >> $anonfun$1". It appears that during execution time on the yarn hosts, >> the native CDH spark1.5 jars are loaded before the new spark2 jars. I've >> tried using spark.yarn.archive to specify the spark2 jars in hdfs as well >> as using other spark options, none of which seems to make a difference. >> >> >> Any suggestions you can offer is appreciated. >> >> Thanks, >> >> Rob >> >> ------------------------ >> >> >> %spark >> val taxonomy = sc.textFile("/user/user1/data/") >> .map(l => l.split("\t")) >> >> %spark >> taxonomy.first >> >> >> org.apache.spark.SparkException: Job aborted due to stage failure: Task >> 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage >> 1.0 (TID 7, data08.hadoop.prod.ostk.com, executor 2): >> java.lang.ClassNotFoundException: >> $anonfun$1 >> at org.apache.spark.repl.ExecutorClassLoader.findClass( >> ExecutorClassLoader.scala:82) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:348) >> at org.apache.spark.serializer.JavaDeserializationStream$$ >> anon$1.resolveClass(JavaSerializer.scala:67) >> at java.io.ObjectInputStream.readNonProxyDesc( >> ObjectInputStream.java:1613) >> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) >> at java.io.ObjectInputStream.readOrdinaryObject( >> ObjectInputStream.java:1774) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >> at java.io.ObjectInputStream.defaultReadFields( >> ObjectInputStream.java:2000) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) >> at java.io.ObjectInputStream.readOrdinaryObject( >> ObjectInputStream.java:1801) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >> at java.io.ObjectInputStream.defaultReadFields( >> ObjectInputStream.java:2000) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) >> at java.io.ObjectInputStream.readOrdinaryObject( >> ObjectInputStream.java:1801) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >> at java.io.ObjectInputStream.defaultReadFields( >> ObjectInputStream.java:2000) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) >> at java.io.ObjectInputStream.readOrdinaryObject( >> ObjectInputStream.java:1801) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) >> at org.apache.spark.serializer.JavaDeserializationStream. >> readObject(JavaSerializer.scala:75) >> at org.apache.spark.serializer.JavaSerializerInstance. >> deserialize(JavaSerializer.scala:114) >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >> at org.apache.spark.scheduler.Task.run(Task.scala:86) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) >> at java.util.concurrent.ThreadPoolExecutor.runWorker( >> ThreadPoolExecutor.java:1142) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run( >> ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.lang.ClassNotFoundException: $anonfun$1 >> at java.lang.ClassLoader.findClass(ClassLoader.java:530) >> at org.apache.spark.util.ParentClassLoader.findClass( >> ParentClassLoader.scala:26) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at org.apache.spark.util.ParentClassLoader.loadClass( >> ParentClassLoader.scala:34) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> at org.apache.spark.util.ParentClassLoader.loadClass( >> ParentClassLoader.scala:30) >> at org.apache.spark.repl.ExecutorClassLoader.findClass( >> ExecutorClassLoader.scala:77) >> ... 30 more >> Driver stacktrace: >> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$ >> scheduler$DAGScheduler$$failJobAndIndependentStages( >> DAGScheduler.scala:1454) >> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( >> DAGScheduler.scala:1442) >> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( >> DAGScheduler.scala:1441) >> at scala.collection.mutable.ResizableArray$class.foreach( >> ResizableArray.scala:59) >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) >> at org.apache.spark.scheduler.DAGScheduler.abortStage( >> DAGScheduler.scala:1441) >> at org.apache.spark.scheduler.DAGScheduler$$anonfun$ >> handleTaskSetFailed$1.apply(DAGScheduler.scala:811) >> at org.apache.spark.scheduler.DAGScheduler$$anonfun$ >> handleTaskSetFailed$1.apply(DAGScheduler.scala:811) >> at scala.Option.foreach(Option.scala:257) >> at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( >> DAGScheduler.scala:811) >> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop. >> doOnReceive(DAGScheduler.scala:1669) >> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop. >> onReceive(DAGScheduler.scala:1624) >> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop. >> onReceive(DAGScheduler.scala:1613) >> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1893) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1906) >> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1919) >> at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1318) >> at org.apache.spark.rdd.RDDOperationScope$.withScope( >> RDDOperationScope.scala:151) >> at org.apache.spark.rdd.RDDOperationScope$.withScope( >> RDDOperationScope.scala:112) >> at org.apache.spark.rdd.RDD.withScope(RDD.scala:358) >> at org.apache.spark.rdd.RDD.take(RDD.scala:1292) >> at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1332) >> at org.apache.spark.rdd.RDDOperationScope$.withScope( >> RDDOperationScope.scala:151) >> at org.apache.spark.rdd.RDDOperationScope$.withScope( >> RDDOperationScope.scala:112) >> at org.apache.spark.rdd.RDD.withScope(RDD.scala:358) >> at org.apache.spark.rdd.RDD.first(RDD.scala:1331) >> ... 37 elided >> Caused by: java.lang.ClassNotFoundException: $anonfun$1 >> at org.apache.spark.repl.ExecutorClassLoader.findClass( >> ExecutorClassLoader.scala:82) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:348) >> at org.apache.spark.serializer.JavaDeserializationStream$$ >> anon$1.resolveClass(JavaSerializer.scala:67) >> at java.io.ObjectInputStream.readNonProxyDesc( >> ObjectInputStream.java:1613) >> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) >> at java.io.ObjectInputStream.readOrdinaryObject( >> ObjectInputStream.java:1774) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >> at java.io.ObjectInputStream.defaultReadFields( >> ObjectInputStream.java:2000) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) >> at java.io.ObjectInputStream.readOrdinaryObject( >> ObjectInputStream.java:1801) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >> at java.io.ObjectInputStream.defaultReadFields( >> ObjectInputStream.java:2000) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) >> at java.io.ObjectInputStream.readOrdinaryObject( >> ObjectInputStream.java:1801) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >> at java.io.ObjectInputStream.defaultReadFields( >> ObjectInputStream.java:2000) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) >> at java.io.ObjectInputStream.readOrdinaryObject( >> ObjectInputStream.java:1801) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) >> at org.apache.spark.serializer.JavaDeserializationStream. >> readObject(JavaSerializer.scala:75) >> at org.apache.spark.serializer.JavaSerializerInstance. >> deserialize(JavaSerializer.scala:114) >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >> at org.apache.spark.scheduler.Task.run(Task.scala:86) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) >> at java.util.concurrent.ThreadPoolExecutor.runWorker( >> ThreadPoolExecutor.java:1142) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run( >> ThreadPoolExecutor.java:617) >> ... 1 more >> Caused by: java.lang.ClassNotFoundException: $anonfun$1 >> at java.lang.ClassLoader.findClass(ClassLoader.java:530) >> at org.apache.spark.util.ParentClassLoader.findClass( >> ParentClassLoader.scala:26) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at org.apache.spark.util.ParentClassLoader.loadClass( >> ParentClassLoader.scala:34) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> at org.apache.spark.util.ParentClassLoader.loadClass( >> ParentClassLoader.scala:30) >> at org.apache.spark.repl.ExecutorClassL >> oader.findClass(ExecutorClassLoader.scala:77) >> ... 30 more >> >>