Here is related code: private static void checkDefaultsVersion(Configuration conf) {
if (conf.getBoolean("hbase.defaults.for.version.skip", Boolean.FALSE)) return; String defaultsVersion = conf.get("hbase.defaults.for.version"); String thisVersion = VersionInfo.getVersion(); if (!thisVersion.equals(defaultsVersion)) { throw new RuntimeException( "hbase-default.xml file seems to be for an older version of HBase (" + defaultsVersion + "), this version is " + thisVersion); null means that "hbase.defaults.for.version" was not set in the other hbase-default.xml Can you retrieve the classpath of Spark task so that we can have more clue ? Cheers On Tue, Nov 17, 2015 at 10:06 PM, 임정택 <kabh...@gmail.com> wrote: > Ted, > > Thanks for the reply. > > My fat jar has dependency with spark related library to only spark-core as > "provided". > Seems like Spark only adds 0.98.7-hadoop2 of hbase-common in spark-example > module. > > And if there're two hbase-default.xml in the classpath, should one of them > be loaded, instead of showing (null)? > > Best, > Jungtaek Lim (HeartSaVioR) > > > > 2015-11-18 13:50 GMT+09:00 Ted Yu <yuzhih...@gmail.com>: > >> Looks like there're two hbase-default.xml in the classpath: one for 0.98.6 >> and another for 0.98.7-hadoop2 (used by Spark) >> >> You can specify hbase.defaults.for.version.skip as true in your >> hbase-site.xml >> >> Cheers >> >> On Tue, Nov 17, 2015 at 1:01 AM, 임정택 <kabh...@gmail.com> wrote: >> >>> Hi all, >>> >>> I'm evaluating zeppelin to run driver which interacts with HBase. >>> I use fat jar to include HBase dependencies, and see failures on >>> executor level. >>> I thought it is zeppelin's issue, but it fails on spark-shell, too. >>> >>> I loaded fat jar via --jars option, >>> >>> > ./bin/spark-shell --jars hbase-included-assembled.jar >>> >>> and run driver code using provided SparkContext instance, and see >>> failures from spark-shell console and executor logs. >>> >>> below is stack traces, >>> >>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 >>> in stage 0.0 failed 4 times, most recent failure: Lost task 55.3 in stage >>> 0.0 (TID 281, <svr hostname>): java.lang.NoClassDefFoundError: Could not >>> initialize class org.apache.hadoop.hbase.client.HConnectionManager >>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:197) >>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:159) >>> at >>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101) >>> at >>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:128) >>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) >>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) >>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>> at >>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>> at >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) >>> at >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> Driver stacktrace: >>> at >>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263) >>> at >>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>> at >>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) >>> at scala.Option.foreach(Option.scala:236) >>> at >>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) >>> at >>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457) >>> at >>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418) >>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >>> >>> >>> 15/11/16 18:59:57 ERROR Executor: Exception in task 14.0 in stage 0.0 (TID >>> 14) >>> java.lang.ExceptionInInitializerError >>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:197) >>> at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:159) >>> at >>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101) >>> at >>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:128) >>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104) >>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66) >>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>> at >>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) >>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) >>> at >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) >>> at >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>> at org.apache.spark.scheduler.Task.run(Task.scala:70) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> Caused by: java.lang.RuntimeException: hbase-default.xml file seems to be >>> for and old version of HBase (null), this version is 0.98.6-cdh5.2.0 >>> at >>> org.apache.hadoop.hbase.HBaseConfiguration.checkDefaultsVersion(HBaseConfiguration.java:73) >>> at >>> org.apache.hadoop.hbase.HBaseConfiguration.addHbaseResources(HBaseConfiguration.java:105) >>> at >>> org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:116) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager.<clinit>(HConnectionManager.java:222) >>> ... 18 more >>> >>> >>> Please note that it runs smoothly on spark-submit. >>> >>> Btw, if issue is that hbase-default.xml is not properly loaded (maybe >>> because of classloader), it seems to run properly on driver level. >>> >>> import org.apache.hadoop.hbase.HBaseConfiguration >>> val conf = HBaseConfiguration.create() >>> println(conf.get("hbase.defaults.for.version")) >>> >>> It prints "0.98.6-cdh5.2.0". >>> >>> I'm using Spark-1.4.1-hadoop-2.4-bin, and zeppelin 0.5.5, and HBase >>> 0.98.6-CDH5.2.0. >>> >>> Thanks in advance! >>> >>> Best, >>> Jungtaek Lim (HeartSaVioR) >>> >> >> > > > -- > Name : 임 정택 > Blog : http://www.heartsavior.net / http://dev.heartsavior.net > Twitter : http://twitter.com/heartsavior > LinkedIn : http://www.linkedin.com/in/heartsavior >