HI, First of all, I'm sorry if you received mail before (apache spark user mailing group). I posted this mail to user mailing group but I didn't receive any informations about resolving so I'd like to post this to dev mailing group.
This link points to the thread I'm forwarding, so if you feel convenient referring to mail archive, please use this link. https://mail-archives.apache.org/mod_mbox/spark-user/201511.mbox/%3CCAF5108jMXyOjiGmCgr%3Ds%2BNvTMcyKWMBVM1GsrH7Pz4xUj48LfA%40mail.gmail.com%3E This behavior is a bit odd for me, so I'd like to get any hints to resolve, or report bug if it is. Thanks! Jungtaek Lim (HeartSaVioR) ---------- Forwarded message ---------- From: 임정택 <kabh...@gmail.com> Date: 2015-11-17 18:01 GMT+09:00 Subject: zeppelin (or spark-shell) with HBase fails on executor level To: u...@spark.apache.org Hi all, I'm evaluating zeppelin to run driver which interacts with HBase. I use fat jar to include HBase dependencies, and see failures on executor level. I thought it is zeppelin's issue, but it fails on spark-shell, too. I loaded fat jar via --jars option, > ./bin/spark-shell --jars hbase-included-assembled.jar and run driver code using provided SparkContext instance, and see failures from spark-shell console and executor logs. below is stack traces, org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 in stage 0.0 failed 4 times, most recent failure: Lost task 55.3 in stage 0.0 (TID 281, <svr hostname>): java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.client.HConnectionManager at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:197) at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:159) at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:128) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/11/16 18:59:57 ERROR Executor: Exception in task 14.0 in stage 0.0 (TID 14) java.lang.ExceptionInInitializerError at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:197) at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:159) at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:128) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: hbase-default.xml file seems to be for and old version of HBase (null), this version is 0.98.6-cdh5.2.0 at org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:73) at org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:105) at org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:116) at org.apache.hadoop.hbase.client.HConnectionManager.<clinit>(HConnectionManager.java:222) ... 18 more Please note that it runs smoothly on spark-submit. Btw, if issue is that hbase-default.xml is not properly loaded (maybe because of classloader), it seems to run properly on driver level. import org.apache.hadoop.hbase.HBaseConfiguration val conf = HBaseConfiguration.create() println(conf.get("hbase.defaults.for.version")) It prints "0.98.6-cdh5.2.0". I'm using Spark-1.4.1-hadoop-2.4-bin, and zeppelin 0.5.5, and HBase 0.98.6-CDH5.2.0. Thanks in advance! Best, Jungtaek Lim (HeartSaVioR) -- Name : 임 정택 Blog : http://www.heartsavior.net / http://dev.heartsavior.net Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior