[ https://issues.apache.org/jira/browse/HIVE-13314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Szehon Ho resolved HIVE-13314. ------------------------------ Resolution: Duplicate Fix Version/s: 2.1.0 > Hive on spark mapjoin errors if spark.master is not set > ------------------------------------------------------- > > Key: HIVE-13314 > URL: https://issues.apache.org/jira/browse/HIVE-13314 > Project: Hive > Issue Type: Bug > Components: Spark > Reporter: Szehon Ho > Assignee: Szehon Ho > Priority: Minor > Fix For: 2.1.0 > > > There are some errors that happen if spark.master is not set. > This is despite the code defaulting to yarn-cluster if spark.master is not > set by user or on the config files: > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java#L51] > The funny thing is that while it works the first time due to this default, > subsequent tries will fail as the hiveConf is refreshed without that default > being set. > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java#L180] > Exception is follows: > {noformat} > Job aborted due to stage failure: Task 40 in stage 1.0 failed 4 times, most > recent failure: Lost task 40.3 in stage 1.0 (TID 22, > d2409.halxg.cloudera.com): java.lang.RuntimeException: Error processing row: > org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:154) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) > at > org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003) > at > org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:117) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:197) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:223) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:490) > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141) > ... 16 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.isDedicatedCluster(SparkUtilities.java:108) > at > org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:124) > at > org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:114) > ... 24 more > Driver stacktrace: > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)