Hi, I am trying to run Hive on Spark on HDP Virtual machine 2.3
Following wiki https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started I have replaced all the occurrences of hdp.version with 2.3.0.0-2557 I start hive with following set hive.execution.engine=spark; set spark.master=yarn-client; set spark.executor.memory=512m; I run the query select count(*) from sample_07; The query starts and fails with following error. In console Status: Running (Hive on Spark job[0]) Job Progress Format CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost] 2015-11-01 23:40:26,411 Stage-0_0: 0(+1)/1 Stage-1_0: 0/1 state = FAILED Status: Failed FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask hive> select count(*) from sample_07; In the logs 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: 2015-11-01 23:55:36,313 INFO - [pool-1-thread-1:] ~ Failed to run job b8649c92-1504-43c7-8100-020b866e58da (RemoteDriver:389) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: java.util.concurrent.ExecutionException: Exception thrown by job 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:311) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:382) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at java.util.concurrent.FutureTask.run(FutureTask.java:262) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at java.lang.Thread.run(Thread.java:745) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, sandbox.hortonworks.com): java.lang.NullPointerException 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:236) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:212) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.scheduler.Task.run(Task.scala:64) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at java.lang.Thread.run(Thread.java:745) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: Driver stacktrace: 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at scala.Option.foreach(Option.scala:236) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) 15/11/01 23:55:36 [stdout-redir-1]: INFO client.SparkClientImpl: at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/11/01 23:55:36 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes) 15/11/01 23:55:36 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded message of type org.apache.hive.spark.client.BaseProtocol$JobResult (3851 bytes) 15/11/01 23:55:36 [RPC-Handler-3]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Received RPC message: type=CALL id=2 payload=org.apache.hive.spark.client.BaseProtocol$JobResult 15/11/01 23:55:36 [RPC-Handler-3]: INFO client.SparkClientImpl: Received result for b8649c92-1504-43c7-8100-020b866e58da 15/11/01 23:55:36 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes) 15/11/01 23:55:36 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$NullMessage (2 bytes) state = FAILED 15/11/01 23:55:36 [main]: INFO status.SparkJobMonitor: state = FAILED Status: Failed 15/11/01 23:55:36 [main]: ERROR status.SparkJobMonitor: Status: Failed In Resource manager i get succeeded [image: Inline image 1] How to debug this ? Thanks,