FYI, in case anybody else has this problem, we switched to Spark 1.1
(outside CDH) and the same Spark application worked first time (once
recompiled with Spark 1.1 libs of course).  I assume this is because Spark
1.1 is compiled with Hive.

On 29 September 2014 17:41, Patrick McGloin <mcgloin.patr...@gmail.com>
wrote:

> Hi,
>
> I have an error when submitting a Spark SQL application to our Spark
> cluster:
>
> 14/09/29 16:02:11 WARN scheduler.TaskSetManager: Loss was due to
> java.lang.NoClassDefFoundError
> *java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf*
>         at
> org.apache.spark.sql.hive.SparkHiveHadoopWriter.setIDs(SparkHadoopWriter.scala:169)
>         at
> org.apache.spark.sql.hive.SparkHiveHadoopWriter.setup(SparkHadoopWriter.scala:69)
>         at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org
> $apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(hiveOperators.scala:260)
>         at
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(hiveOperators.scala:274)
>         at
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(hiveOperators.scala:274)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>         at org.apache.spark.scheduler.Task.run(Task.scala:51)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
>
> I assume this is because the Executor does not have the hadoop-core.jar
> file.  I've tried adding it to the SparkContext using addJar but this
> didn't help.
>
> I also see that the documentation says you must rebuild Spark if you want
> to use Hive.
>
> https://spark.apache.org/docs/1.0.2/sql-programming-guide.html#hive-tables
>
> Is this really true or can we just package the jar files with the Spark
> Application we build?  Rebuilding Spark itself isn't possible for us as it
> is installed on a VM without internet access and we are using the Cloudera
> distribution (Spark 1.0).
>
> Is it possible to assemble the Hive dependencies into our Spark
> Application and submit this to the cluster?  I've tried to do this with
> spark-submit (and the Hadoop JobConf class is in AAC-assembly-1.0.jar) but
> the Executor doesn't find the class.  Here is the command:
>
> sudo ./spark-submit --class aac.main.SparkDriver --master
> spark://localhost:7077 --jars AAC-assembly-1.0.jar aacApp_2.10-1.0.jar
>
> Any pointers would be appreciated!
>
> Best regards,
> Patrick
>
>
>

Reply via email to