Nope, it is NOT null. Check this out: scala> hiveContext == null res2: Boolean = false
And thanks for sending that link, but I had already looked at it. Any other ideas? I looked through some of the relevant Spark Hive code and I'm starting to think this may be a bug. -Cesar On Mon, Aug 18, 2014 at 12:00 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Looks like your hiveContext is null. Have a look at this documentation. > <https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables> > > Thanks > Best Regards > > > On Mon, Aug 18, 2014 at 12:09 PM, Cesar Arevalo <ce...@zephyrhealthinc.com > > wrote: > >> Hello: >> >> I am trying to setup Spark to connect to a Hive table which is backed by >> HBase, but I am running into the following NullPointerException: >> >> scala> val hiveCount = hiveContext.sql("select count(*) from >> dataset_records").collect().head.getLong(0) >> 14/08/18 06:34:29 INFO ParseDriver: Parsing command: select count(*) from >> dataset_records >> 14/08/18 06:34:29 INFO ParseDriver: Parse Completed >> 14/08/18 06:34:29 INFO HiveMetaStore: 0: get_table : db=default >> tbl=dataset_records >> 14/08/18 06:34:29 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table >> : db=default tbl=dataset_records >> 14/08/18 06:34:30 INFO MemoryStore: ensureFreeSpace(160296) called with >> curMem=0, maxMem=280248975 >> 14/08/18 06:34:30 INFO MemoryStore: Block broadcast_0 stored as values in >> memory (estimated size 156.5 KB, free 267.1 MB) >> 14/08/18 06:34:30 INFO SparkContext: Starting job: collect at >> SparkPlan.scala:85 >> 14/08/18 06:34:31 WARN DAGScheduler: Creating new stage failed due to >> exception - job: 0 >> java.lang.NullPointerException >> at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:502) >> at >> org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:418) >> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:179) >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) >> at scala.Option.getOrElse(Option.scala:120) >> at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) >> at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) >> at scala.Option.getOrElse(Option.scala:120) >> >> >> >> >> This is happening from the master on spark, I am running hbase version >> hbase-0.98.4-hadoop1 and hive version 0.13.1. And here is how I am running >> the spark shell: >> >> bin/spark-shell --driver-class-path >> /opt/hive/latest/lib/hive-hbase-handler-0.13.1.jar:/opt/hive/latest/lib/zookeeper-3.4.5.jar:/opt/spark-poc/lib_managed/jars/com.google.guava/guava/guava-14.0.1.jar:/opt/hbase/latest/lib/hbase-common-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-server-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-client-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-protocol-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/htrace-core-2.04.jar:/opt/hbase/latest/lib/netty-3.6.6.Final.jar:/opt/hbase/latest/lib/hbase-hadoop-compat-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-client/hbase-client-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-common/hbase-common-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-server/hbase-server-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-prefix-tree/hbase-prefix-tree-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-protocol/hbase-protocol-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/bundles/com.google.protobuf/protobuf-java/protobuf-java-2.5.0.jar:/opt/spark-poc/lib_managed/jars/org.cloudera.htrace/htrace-core/htrace-core-2.04.jar:/opt/spark/sql/hive/target/spark-hive_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-common/hive-common-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-exec/hive-exec-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libthrift/libthrift-0.9.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-shims/hive-shims-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-metastore/hive-metastore-0.12.0.jar:/opt/spark/sql/catalyst/target/spark-catalyst_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.antlr/antlr-runtime/antlr-runtime-3.4.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libfb303/libfb303-0.9.0.jar:/opt/spark-poc/lib_managed/jars/javax.jdo/jdo-api/jdo-api-3.0.1.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-api-jdo/datanucleus-api-jdo-3.2.1.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-core/datanucleus-core-3.2.2.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-rdbms/datanucleus-rdbms-3.2.1.jar:/opt/spark-poc/lib_managed/jars/org.apache.derby/derby/derby-10.4.2.0.jar:/opt/spark-poc/sbt/ivy/cache/org.apache.hive/hive-hbase-handler/jars/hive-hbase-handler-0.13.1.jar:/opt/spark-poc/lib_managed/jars/com.typesafe/scalalogging-slf4j_2.10/scalalogging-slf4j_2.10-1.0.1.jar:/opt/spark-poc/lib_managed/bundles/com.jolbox/bonecp/bonecp-0.7.1.RELEASE.jar:/opt/spark-poc/sbt/ivy/cache/com.datastax.cassandra/cassandra-driver-core/bundles/cassandra-driver-core-2.0.4.jar:/opt/spark-poc/lib_managed/jars/org.json/json/json-20090211.jar >> >> >> >> Can anybody help me? >> >> Best, >> -- >> Cesar Arevalo >> Software Engineer ❘ Zephyr Health >> 450 Mission Street, Suite #201 ❘ San Francisco, CA 94105 >> m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth >> <https://twitter.com/zephyrhealth> >> o: +1 415-529-7649 ❘ f: +1 415-520-9288 >> http://www.zephyrhealth.com >> >> >> > -- Cesar Arevalo Software Engineer ❘ Zephyr Health 450 Mission Street, Suite #201 ❘ San Francisco, CA 94105 m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth <https://twitter.com/zephyrhealth> o: +1 415-529-7649 ❘ f: +1 415-520-9288 http://www.zephyrhealth.com