Re: NullPointerException when connecting from Spark to a Hive table backed by HBase

Cesar Arevalo Mon, 18 Aug 2014 10:46:55 -0700

I removed the JAR that you suggested but now I get another error when I try
to create the HiveContext. Here is the error:


scala> val hiveContext = new HiveContext(sc)
error: bad symbolic reference. A signature in HiveContext.class refers to
term ql
in package org.apache.hadoop.hive which is not available.
It may be completely missing from the current classpath,
<ommitted more stacktrace for readability...>


Best,
-Cesar


On Mon, Aug 18, 2014 at 12:47 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Then definitely its a jar conflict. Can you try removing this jar from the
> class path /opt/spark-poc/lib_managed/jars/org.
> spark-project.hive/hive-exec/hive-exec-0.12.0.jar
>
> Thanks
> Best Regards
>
>
> On Mon, Aug 18, 2014 at 12:45 PM, Cesar Arevalo <ce...@zephyrhealthinc.com
> > wrote:
>
>> Nope, it is NOT null. Check this out:
>>
>> scala> hiveContext == null
>> res2: Boolean = false
>>
>>
>> And thanks for sending that link, but I had already looked at it. Any
>> other ideas?
>>
>> I looked through some of the relevant Spark Hive code and I'm starting to
>> think this may be a bug.
>>
>> -Cesar
>>
>>
>>
>> On Mon, Aug 18, 2014 at 12:00 AM, Akhil Das <ak...@sigmoidanalytics.com>
>> wrote:
>>
>>> Looks like your hiveContext is null. Have a look at this documentation.
>>> <https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables>
>>>
>>> Thanks
>>> Best Regards
>>>
>>>
>>> On Mon, Aug 18, 2014 at 12:09 PM, Cesar Arevalo <
>>> ce...@zephyrhealthinc.com> wrote:
>>>
>>>> Hello:
>>>>
>>>> I am trying to setup Spark to connect to a Hive table which is backed
>>>> by HBase, but I am running into the following NullPointerException:
>>>>
>>>> scala> val hiveCount = hiveContext.sql("select count(*) from
>>>> dataset_records").collect().head.getLong(0)
>>>> 14/08/18 06:34:29 INFO ParseDriver: Parsing command: select count(*)
>>>> from dataset_records
>>>> 14/08/18 06:34:29 INFO ParseDriver: Parse Completed
>>>> 14/08/18 06:34:29 INFO HiveMetaStore: 0: get_table : db=default
>>>> tbl=dataset_records
>>>> 14/08/18 06:34:29 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table
>>>> : db=default tbl=dataset_records
>>>> 14/08/18 06:34:30 INFO MemoryStore: ensureFreeSpace(160296) called with
>>>> curMem=0, maxMem=280248975
>>>> 14/08/18 06:34:30 INFO MemoryStore: Block broadcast_0 stored as values
>>>> in memory (estimated size 156.5 KB, free 267.1 MB)
>>>> 14/08/18 06:34:30 INFO SparkContext: Starting job: collect at
>>>> SparkPlan.scala:85
>>>> 14/08/18 06:34:31 WARN DAGScheduler: Creating new stage failed due to
>>>> exception - job: 0
>>>> java.lang.NullPointerException
>>>> at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:502)
>>>>  at
>>>> org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:418)
>>>> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:179)
>>>>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
>>>>  at scala.Option.getOrElse(Option.scala:120)
>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
>>>>  at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
>>>>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
>>>> at scala.Option.getOrElse(Option.scala:120)
>>>>
>>>>
>>>>
>>>>
>>>> This is happening from the master on spark, I am running hbase version
>>>> hbase-0.98.4-hadoop1 and hive version 0.13.1. And here is how I am running
>>>> the spark shell:
>>>>
>>>> bin/spark-shell --driver-class-path
>>>> /opt/hive/latest/lib/hive-hbase-handler-0.13.1.jar:/opt/hive/latest/lib/zookeeper-3.4.5.jar:/opt/spark-poc/lib_managed/jars/com.google.guava/guava/guava-14.0.1.jar:/opt/hbase/latest/lib/hbase-common-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-server-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-client-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-protocol-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/htrace-core-2.04.jar:/opt/hbase/latest/lib/netty-3.6.6.Final.jar:/opt/hbase/latest/lib/hbase-hadoop-compat-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-client/hbase-client-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-common/hbase-common-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-server/hbase-server-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-prefix-tree/hbase-prefix-tree-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-protocol/hbase-protocol-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/bundles/com.google.protobuf/protobuf-java/protobuf-java-2.5.0.jar:/opt/spark-poc/lib_managed/jars/org.cloudera.htrace/htrace-core/htrace-core-2.04.jar:/opt/spark/sql/hive/target/spark-hive_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-common/hive-common-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-exec/hive-exec-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libthrift/libthrift-0.9.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-shims/hive-shims-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-metastore/hive-metastore-0.12.0.jar:/opt/spark/sql/catalyst/target/spark-catalyst_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.antlr/antlr-runtime/antlr-runtime-3.4.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libfb303/libfb303-0.9.0.jar:/opt/spark-poc/lib_managed/jars/javax.jdo/jdo-api/jdo-api-3.0.1.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-api-jdo/datanucleus-api-jdo-3.2.1.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-core/datanucleus-core-3.2.2.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-rdbms/datanucleus-rdbms-3.2.1.jar:/opt/spark-poc/lib_managed/jars/org.apache.derby/derby/derby-10.4.2.0.jar:/opt/spark-poc/sbt/ivy/cache/org.apache.hive/hive-hbase-handler/jars/hive-hbase-handler-0.13.1.jar:/opt/spark-poc/lib_managed/jars/com.typesafe/scalalogging-slf4j_2.10/scalalogging-slf4j_2.10-1.0.1.jar:/opt/spark-poc/lib_managed/bundles/com.jolbox/bonecp/bonecp-0.7.1.RELEASE.jar:/opt/spark-poc/sbt/ivy/cache/com.datastax.cassandra/cassandra-driver-core/bundles/cassandra-driver-core-2.0.4.jar:/opt/spark-poc/lib_managed/jars/org.json/json/json-20090211.jar
>>>>
>>>>
>>>>
>>>> Can anybody help me?
>>>>
>>>> Best,
>>>> --
>>>>  Cesar Arevalo
>>>> Software Engineer ❘ Zephyr Health
>>>> 450 Mission Street, Suite #201 ❘ San Francisco, CA 94105
>>>> m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth
>>>> <https://twitter.com/zephyrhealth>
>>>> o: +1 415-529-7649 ❘ f: +1 415-520-9288
>>>> http://www.zephyrhealth.com
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>>  Cesar Arevalo
>> Software Engineer ❘ Zephyr Health
>> 450 Mission Street, Suite #201 ❘ San Francisco, CA 94105
>> m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth
>> <https://twitter.com/zephyrhealth>
>> o: +1 415-529-7649 ❘ f: +1 415-520-9288
>> http://www.zephyrhealth.com
>>
>>
>>
>


-- 
Cesar Arevalo
Software Engineer ❘ Zephyr Health
450 Mission Street, Suite #201 ❘ San Francisco, CA 94105
m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth
<https://twitter.com/zephyrhealth>
o: +1 415-529-7649 ❘ f: +1 415-520-9288
http://www.zephyrhealth.com

Re: NullPointerException when connecting from Spark to a Hive table backed by HBase

Reply via email to