In the beginning I tried to read HBase and found that exception was thrown, then I start to debug the app. I removed the codes reading HBase and tried to save an rdd containing a list and the exception was still thrown. So I'm sure that exception was not caused by reading HBase.
While debugging I did not change the object name and file name. 2014-10-13 0:00 GMT+08:00 Ted Yu <yuzhih...@gmail.com>: > Your app is named scala.HBaseApp > Does it read / write to HBase ? > > Just curious. > > On Sun, Oct 12, 2014 at 8:00 AM, Tao Xiao <xiaotao.cs....@gmail.com> > wrote: > >> Hi all, >> >> I'm using CDH 5.0.1 (Spark 0.9) and submitting a job in Spark Standalone >> Cluster mode. >> >> The job is quite simple as follows: >> >> object HBaseApp { >> def main(args:Array[String]) { >> testHBase("student", "/test/xt/saveRDD") >> } >> >> >> def testHBase(tableName: String, outFile:String) { >> val sparkConf = new SparkConf() >> .setAppName("-- Test HBase --") >> .set("spark.executor.memory", "2g") >> .set("spark.cores.max", "16") >> >> val sparkContext = new SparkContext(sparkConf) >> >> val rdd = sparkContext.parallelize(List(1,2,3,4,5,6,7,8,9,10), 3) >> >> val c = rdd.count // successful >> println("\n\n\n" + c + "\n\n\n") >> >> rdd.saveAsTextFile(outFile) // This line will throw >> "java.lang.ClassNotFoundException: >> com.xt.scala.HBaseApp$$anonfun$testHBase$1" >> >> println("\n down \n") >> } >> } >> >> I submitted this job using the following script: >> >> #!/bin/bash >> >> HBASE_CLASSPATH=$(hbase classpath) >> APP_JAR=/usr/games/spark/xt/SparkDemo-0.0.1-SNAPSHOT.jar >> >> SPARK_ASSEMBLY_JAR=/usr/games/spark/xt/spark-assembly_2.10-0.9.0-cdh5.0.1-hadoop2.3.0-cdh5.0.1.jar >> SPARK_MASTER=spark://b02.jsepc.com:7077 >> >> CLASSPATH=$CLASSPATH:$APP_JAR:$SPARK_ASSEMBLY_JAR:$HBASE_CLASSPATH >> export SPARK_CLASSPATH=/usr/lib/hbase/lib/* >> >> CONFIG_OPTS="-Dspark.master=$SPARK_MASTER" >> >> java -cp $CLASSPATH $CONFIG_OPTS com.xt.scala.HBaseApp $@ >> >> After I submitted the job, the count of rdd could be computed >> successfully, but that rdd could not be saved into HDFS and the following >> exception was thrown: >> >> 14/10/11 16:09:33 WARN scheduler.TaskSetManager: Loss was due to >> java.lang.ClassNotFoundException >> java.lang.ClassNotFoundException: >> com.xt.scala.HBaseApp$$anonfun$testHBase$1 >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:270) >> at >> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37) >> at >> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) >> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >> at scala.collection.immutable.$colon$colon.readObject(List.scala:362) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >> at >> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) >> at >> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63) >> at >> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139) >> at >> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >> at >> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) >> at >> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) >> at >> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:195) >> at >> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) >> at >> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) >> at >> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:744) >> >> >> >> I also noted that, if I add "-Dspark.jars=$APP_JAR" to the variable >> *CONFIG_OPTS*, i.e., CONFIG_OPTS="-Dspark.master=$SPARK_MASTER >> Dspark.jars=$APP_JAR", the job will finish successfully and rdd can be >> written into HDFS. >> So, what does "java.lang.ClassNotFoundException: >> com.xt.scala.HBaseApp$$anonfun$testHBase$1" mean and why would it be >> thrown ? >> >> Thanks >> >> >