Hello: I am trying to setup Spark to connect to a Hive table which is backed by HBase, but I am running into the following NullPointerException:
scala> val hiveCount = hiveContext.sql("select count(*) from dataset_records").collect().head.getLong(0) 14/08/18 06:34:29 INFO ParseDriver: Parsing command: select count(*) from dataset_records 14/08/18 06:34:29 INFO ParseDriver: Parse Completed 14/08/18 06:34:29 INFO HiveMetaStore: 0: get_table : db=default tbl=dataset_records 14/08/18 06:34:29 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=dataset_records 14/08/18 06:34:30 INFO MemoryStore: ensureFreeSpace(160296) called with curMem=0, maxMem=280248975 14/08/18 06:34:30 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 156.5 KB, free 267.1 MB) 14/08/18 06:34:30 INFO SparkContext: Starting job: collect at SparkPlan.scala:85 14/08/18 06:34:31 WARN DAGScheduler: Creating new stage failed due to exception - job: 0 java.lang.NullPointerException at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:502) at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:418) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:179) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) This is happening from the master on spark, I am running hbase version hbase-0.98.4-hadoop1 and hive version 0.13.1. And here is how I am running the spark shell: bin/spark-shell --driver-class-path /opt/hive/latest/lib/hive-hbase-handler-0.13.1.jar:/opt/hive/latest/lib/zookeeper-3.4.5.jar:/opt/spark-poc/lib_managed/jars/com.google.guava/guava/guava-14.0.1.jar:/opt/hbase/latest/lib/hbase-common-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-server-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-client-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-protocol-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/htrace-core-2.04.jar:/opt/hbase/latest/lib/netty-3.6.6.Final.jar:/opt/hbase/latest/lib/hbase-hadoop-compat-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-client/hbase-client-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-common/hbase-common-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-server/hbase-server-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-prefix-tree/hbase-prefix-tree-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-protocol/hbase-protocol-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/bundles/com.google.protobuf/protobuf-java/protobuf-java-2.5.0.jar:/opt/spark-poc/lib_managed/jars/org.cloudera.htrace/htrace-core/htrace-core-2.04.jar:/opt/spark/sql/hive/target/spark-hive_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-common/hive-common-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-exec/hive-exec-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libthrift/libthrift-0.9.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-shims/hive-shims-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-metastore/hive-metastore-0.12.0.jar:/opt/spark/sql/catalyst/target/spark-catalyst_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.antlr/antlr-runtime/antlr-runtime-3.4.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libfb303/libfb303-0.9.0.jar:/opt/spark-poc/lib_managed/jars/javax.jdo/jdo-api/jdo-api-3.0.1.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-api-jdo/datanucleus-api-jdo-3.2.1.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-core/datanucleus-core-3.2.2.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-rdbms/datanucleus-rdbms-3.2.1.jar:/opt/spark-poc/lib_managed/jars/org.apache.derby/derby/derby-10.4.2.0.jar:/opt/spark-poc/sbt/ivy/cache/org.apache.hive/hive-hbase-handler/jars/hive-hbase-handler-0.13.1.jar:/opt/spark-poc/lib_managed/jars/com.typesafe/scalalogging-slf4j_2.10/scalalogging-slf4j_2.10-1.0.1.jar:/opt/spark-poc/lib_managed/bundles/com.jolbox/bonecp/bonecp-0.7.1.RELEASE.jar:/opt/spark-poc/sbt/ivy/cache/com.datastax.cassandra/cassandra-driver-core/bundles/cassandra-driver-core-2.0.4.jar:/opt/spark-poc/lib_managed/jars/org.json/json/json-20090211.jar Can anybody help me? Best, -- Cesar Arevalo Software Engineer ❘ Zephyr Health 450 Mission Street, Suite #201 ❘ San Francisco, CA 94105 m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth <https://twitter.com/zephyrhealth> o: +1 415-529-7649 ❘ f: +1 415-520-9288 http://www.zephyrhealth.com