Looks like there is a case in TableReader.scala where Hive.get() is being
called without already setting it via Hive.get(hiveconf). I am running in
yarn-client mode (compiled with -Phive-provided and with hive-0.13.1a).
Basically this means the broadcasted hiveconf is not getting used and the
default HiveConf object is getting created and used -- which sounds wrong.
My understanding is that the HiveConf created on the driver should be used
on all executors for correct behaviour. The query I am running is:

      insert overwrite table X partition(month='2014-12')
      select colA, colB from Y where month='2014-12'

On the executor, it appears that the HiveContext is not created, so there
should have been one call to Hive.get(broadcastedHiveConf) somewhere which
runs only on the executor. Let me know if my analysis is correct and I can
file a JIRA For this.


  [1] org.apache.hadoop.hive.ql.metadata.Hive.get (Hive.java:211)
  [2]
org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler
(PlanUtils.java:810)
  [3]
org.apache.hadoop.hive.ql.plan.PlanUtils.configureInputJobPropertiesForStorageHandler
(PlanUtils.java:789)
  [4]
org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc
(TableReader.scala:253)
  [5] org.apache.spark.sql.hive.HadoopTableReader$$anonfun$11.apply
(TableReader.scala:229)
  [6] org.apache.spark.sql.hive.HadoopTableReader$$anonfun$11.apply
(TableReader.scala:229)
  [7] org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply
(HadoopRDD.scala:172)
  [8] org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply
(HadoopRDD.scala:172)
  [9] scala.Option.map (Option.scala:145)
  [10] org.apache.spark.rdd.HadoopRDD.getJobConf (HadoopRDD.scala:172)
  [11] org.apache.spark.rdd.HadoopRDD$$anon$1.<init> (HadoopRDD.scala:216)
  [12] org.apache.spark.rdd.HadoopRDD.compute (HadoopRDD.scala:212)
  [13] org.apache.spark.rdd.HadoopRDD.compute (HadoopRDD.scala:101)
  [14] org.apache.spark.rdd.RDD.computeOrReadCheckpoint (RDD.scala:277)
  [15] org.apache.spark.rdd.RDD.iterator (RDD.scala:244)
  [16] org.apache.spark.rdd.MapPartitionsRDD.compute
(MapPartitionsRDD.scala:35)
  [17] org.apache.spark.rdd.RDD.computeOrReadCheckpoint (RDD.scala:277)
  [18] org.apache.spark.rdd.RDD.iterator (RDD.scala:244)
  [19] org.apache.spark.rdd.MapPartitionsRDD.compute
(MapPartitionsRDD.scala:35)
  [20] org.apache.spark.rdd.RDD.computeOrReadCheckpoint (RDD.scala:277)
  [21] org.apache.spark.rdd.RDD.iterator (RDD.scala:244)
  [22] org.apache.spark.rdd.UnionRDD.compute (UnionRDD.scala:87)
  [23] org.apache.spark.rdd.RDD.computeOrReadCheckpoint (RDD.scala:277)
  [24] org.apache.spark.rdd.RDD.iterator (RDD.scala:244)
  [25] org.apache.spark.scheduler.ResultTask.runTask (ResultTask.scala:61)
  [26] org.apache.spark.scheduler.Task.run (Task.scala:64)
  [27] org.apache.spark.executor.Executor$TaskRunner.run
(Executor.scala:203)
  [28] java.util.concurrent.ThreadPoolExecutor.runWorker
(ThreadPoolExecutor.java:1,145)
  [29] java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:615)
  [30] java.lang.Thread.run (Thread.java:745)

Reply via email to