Re: NullPointerException when connecting from Spark to a Hive table backed by HBase

Cesar Arevalo Mon, 18 Aug 2014 00:16:50 -0700

Nope, it is NOT null. Check this out:

scala> hiveContext == null
res2: Boolean = false



And thanks for sending that link, but I had already looked at it. Any other
ideas?

I looked through some of the relevant Spark Hive code and I'm starting to
think this may be a bug.

-Cesar



On Mon, Aug 18, 2014 at 12:00 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Looks like your hiveContext is null. Have a look at this documentation.
> <https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables>
>
> Thanks
> Best Regards
>
>
> On Mon, Aug 18, 2014 at 12:09 PM, Cesar Arevalo <ce...@zephyrhealthinc.com
> > wrote:
>
>> Hello:
>>
>> I am trying to setup Spark to connect to a Hive table which is backed by
>> HBase, but I am running into the following NullPointerException:
>>
>> scala> val hiveCount = hiveContext.sql("select count(*) from
>> dataset_records").collect().head.getLong(0)
>> 14/08/18 06:34:29 INFO ParseDriver: Parsing command: select count(*) from
>> dataset_records
>> 14/08/18 06:34:29 INFO ParseDriver: Parse Completed
>> 14/08/18 06:34:29 INFO HiveMetaStore: 0: get_table : db=default
>> tbl=dataset_records
>> 14/08/18 06:34:29 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table
>> : db=default tbl=dataset_records
>> 14/08/18 06:34:30 INFO MemoryStore: ensureFreeSpace(160296) called with
>> curMem=0, maxMem=280248975
>> 14/08/18 06:34:30 INFO MemoryStore: Block broadcast_0 stored as values in
>> memory (estimated size 156.5 KB, free 267.1 MB)
>> 14/08/18 06:34:30 INFO SparkContext: Starting job: collect at
>> SparkPlan.scala:85
>> 14/08/18 06:34:31 WARN DAGScheduler: Creating new stage failed due to
>> exception - job: 0
>> java.lang.NullPointerException
>> at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:502)
>>  at
>> org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:418)
>> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:179)
>>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
>>  at scala.Option.getOrElse(Option.scala:120)
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
>>  at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
>>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
>> at scala.Option.getOrElse(Option.scala:120)
>>
>>
>>
>>
>> This is happening from the master on spark, I am running hbase version
>> hbase-0.98.4-hadoop1 and hive version 0.13.1. And here is how I am running
>> the spark shell:
>>
>> bin/spark-shell --driver-class-path
>> /opt/hive/latest/lib/hive-hbase-handler-0.13.1.jar:/opt/hive/latest/lib/zookeeper-3.4.5.jar:/opt/spark-poc/lib_managed/jars/com.google.guava/guava/guava-14.0.1.jar:/opt/hbase/latest/lib/hbase-common-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-server-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-client-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-protocol-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/htrace-core-2.04.jar:/opt/hbase/latest/lib/netty-3.6.6.Final.jar:/opt/hbase/latest/lib/hbase-hadoop-compat-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-client/hbase-client-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-common/hbase-common-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-server/hbase-server-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-prefix-tree/hbase-prefix-tree-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-protocol/hbase-protocol-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/bundles/com.google.protobuf/protobuf-java/protobuf-java-2.5.0.jar:/opt/spark-poc/lib_managed/jars/org.cloudera.htrace/htrace-core/htrace-core-2.04.jar:/opt/spark/sql/hive/target/spark-hive_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-common/hive-common-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-exec/hive-exec-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libthrift/libthrift-0.9.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-shims/hive-shims-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-metastore/hive-metastore-0.12.0.jar:/opt/spark/sql/catalyst/target/spark-catalyst_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.antlr/antlr-runtime/antlr-runtime-3.4.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libfb303/libfb303-0.9.0.jar:/opt/spark-poc/lib_managed/jars/javax.jdo/jdo-api/jdo-api-3.0.1.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-api-jdo/datanucleus-api-jdo-3.2.1.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-core/datanucleus-core-3.2.2.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-rdbms/datanucleus-rdbms-3.2.1.jar:/opt/spark-poc/lib_managed/jars/org.apache.derby/derby/derby-10.4.2.0.jar:/opt/spark-poc/sbt/ivy/cache/org.apache.hive/hive-hbase-handler/jars/hive-hbase-handler-0.13.1.jar:/opt/spark-poc/lib_managed/jars/com.typesafe/scalalogging-slf4j_2.10/scalalogging-slf4j_2.10-1.0.1.jar:/opt/spark-poc/lib_managed/bundles/com.jolbox/bonecp/bonecp-0.7.1.RELEASE.jar:/opt/spark-poc/sbt/ivy/cache/com.datastax.cassandra/cassandra-driver-core/bundles/cassandra-driver-core-2.0.4.jar:/opt/spark-poc/lib_managed/jars/org.json/json/json-20090211.jar
>>
>>
>>
>> Can anybody help me?
>>
>> Best,
>> --
>>  Cesar Arevalo
>> Software Engineer ❘ Zephyr Health
>> 450 Mission Street, Suite #201 ❘ San Francisco, CA 94105
>> m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth
>> <https://twitter.com/zephyrhealth>
>> o: +1 415-529-7649 ❘ f: +1 415-520-9288
>> http://www.zephyrhealth.com
>>
>>
>>
>


-- 
Cesar Arevalo
Software Engineer ❘ Zephyr Health
450 Mission Street, Suite #201 ❘ San Francisco, CA 94105
m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth
<https://twitter.com/zephyrhealth>
o: +1 415-529-7649 ❘ f: +1 415-520-9288
http://www.zephyrhealth.com

Re: NullPointerException when connecting from Spark to a Hive table backed by HBase

Reply via email to