Re: Spark SQL + Hive + JobConf NoClassDefFoundError

Patrick McGloin Wed, 01 Oct 2014 04:36:14 -0700

FYI, in case anybody else has this problem, we switched to Spark 1.1
(outside CDH) and the same Spark application worked first time (once
recompiled with Spark 1.1 libs of course).  I assume this is because Spark
1.1 is compiled with Hive.


On 29 September 2014 17:41, Patrick McGloin <mcgloin.patr...@gmail.com>
wrote:

> Hi,
>
> I have an error when submitting a Spark SQL application to our Spark
> cluster:
>
> 14/09/29 16:02:11 WARN scheduler.TaskSetManager: Loss was due to
> java.lang.NoClassDefFoundError
> *java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf*
>         at
> org.apache.spark.sql.hive.SparkHiveHadoopWriter.setIDs(SparkHadoopWriter.scala:169)
>         at
> org.apache.spark.sql.hive.SparkHiveHadoopWriter.setup(SparkHadoopWriter.scala:69)
>         at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org
> $apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(hiveOperators.scala:260)
>         at
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(hiveOperators.scala:274)
>         at
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(hiveOperators.scala:274)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>         at org.apache.spark.scheduler.Task.run(Task.scala:51)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
>
> I assume this is because the Executor does not have the hadoop-core.jar
> file.  I've tried adding it to the SparkContext using addJar but this
> didn't help.
>
> I also see that the documentation says you must rebuild Spark if you want
> to use Hive.
>
> https://spark.apache.org/docs/1.0.2/sql-programming-guide.html#hive-tables
>
> Is this really true or can we just package the jar files with the Spark
> Application we build?  Rebuilding Spark itself isn't possible for us as it
> is installed on a VM without internet access and we are using the Cloudera
> distribution (Spark 1.0).
>
> Is it possible to assemble the Hive dependencies into our Spark
> Application and submit this to the cluster?  I've tried to do this with
> spark-submit (and the Hadoop JobConf class is in AAC-assembly-1.0.jar) but
> the Executor doesn't find the class.  Here is the command:
>
> sudo ./spark-submit --class aac.main.SparkDriver --master
> spark://localhost:7077 --jars AAC-assembly-1.0.jar aacApp_2.10-1.0.jar
>
> Any pointers would be appreciated!
>
> Best regards,
> Patrick
>
>
>

Re: Spark SQL + Hive + JobConf NoClassDefFoundError

Reply via email to