Hi All, When i do count on a Hbase table from Spark Shell which runs as yarn-client mode, the job fails at count().
MASTER=yarn-client ./spark-shell import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor, TableName} import org.apache.hadoop.hbase.client.HBaseAdmin import org.apache.hadoop.hbase.mapreduce.TableInputFormat val conf = HBaseConfiguration.create() conf.set(TableInputFormat.INPUT_TABLE,"spark") val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],classOf[org.apache.hadoop.hbase.client.Result]) hBaseRDD.count() Tasks throw below exception, the actual exception is swallowed, a bug JDK-7172206. After installing hbase client on all NodeManager machines, the Spark job ran fine. So I confirmed that the issue is with executor classpath. But i am searching for some other way of including hbase jars in spark executor classpath instead of installing hbase client on all NM machines. Tried adding all hbase jars in spark.yarn.dist.files , NM logs shows that it localized all hbase jars, still the job fails. Tried spark.executor.extraClasspath, still the job fails. Is there any way we can access hbase from Executor without installing hbase-client on all machines. 16/02/09 02:34:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, prabhuFS1): *java.lang.IllegalStateException: unread block data* at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2428) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Thanks, Prabhu Joseph