FYI, in case anybody else has this problem, we switched to Spark 1.1 (outside CDH) and the same Spark application worked first time (once recompiled with Spark 1.1 libs of course). I assume this is because Spark 1.1 is compiled with Hive.
On 29 September 2014 17:41, Patrick McGloin <mcgloin.patr...@gmail.com> wrote: > Hi, > > I have an error when submitting a Spark SQL application to our Spark > cluster: > > 14/09/29 16:02:11 WARN scheduler.TaskSetManager: Loss was due to > java.lang.NoClassDefFoundError > *java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf* > at > org.apache.spark.sql.hive.SparkHiveHadoopWriter.setIDs(SparkHadoopWriter.scala:169) > at > org.apache.spark.sql.hive.SparkHiveHadoopWriter.setup(SparkHadoopWriter.scala:69) > at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org > $apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(hiveOperators.scala:260) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(hiveOperators.scala:274) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(hiveOperators.scala:274) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) > at org.apache.spark.scheduler.Task.run(Task.scala:51) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > I assume this is because the Executor does not have the hadoop-core.jar > file. I've tried adding it to the SparkContext using addJar but this > didn't help. > > I also see that the documentation says you must rebuild Spark if you want > to use Hive. > > https://spark.apache.org/docs/1.0.2/sql-programming-guide.html#hive-tables > > Is this really true or can we just package the jar files with the Spark > Application we build? Rebuilding Spark itself isn't possible for us as it > is installed on a VM without internet access and we are using the Cloudera > distribution (Spark 1.0). > > Is it possible to assemble the Hive dependencies into our Spark > Application and submit this to the cluster? I've tried to do this with > spark-submit (and the Hadoop JobConf class is in AAC-assembly-1.0.jar) but > the Executor doesn't find the class. Here is the command: > > sudo ./spark-submit --class aac.main.SparkDriver --master > spark://localhost:7077 --jars AAC-assembly-1.0.jar aacApp_2.10-1.0.jar > > Any pointers would be appreciated! > > Best regards, > Patrick > > >