Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"

Ruslan Dautkhanov Thu, 24 Nov 2016 09:48:12 -0800

Yep, CDH doesn't have Spark compiled with Thrift server.
My understanding Zeppelin uses spark-shell REPL and not Spark thrift server.


Thank you.



-- 
Ruslan Dautkhanov

On Thu, Nov 24, 2016 at 1:57 AM, Jeff Zhang <zjf...@gmail.com> wrote:

> AFAIK, spark of CDH don’t support spark thrift server, so it is possible
> it is not compiled with hive. Can you run spark-shell to verify that ? If
> it is built with hive, HiveContext will be created in spark-shell.
>
> Ruslan Dautkhanov <dautkha...@gmail.com>于2016年11月24日周四 下午3:30写道：
>
>> I can't reproduce this in %spark, nor %sql
>>
>> It seems to be %pyspark-specific.
>>
>> Also seems it runs fine first time I start Zeppelin, then it shows this
>> error
>> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run
>> build/sbt assembly
>>
>>
>> sqlc = HiveContext(sc)
>> sqlc.sql("select count(*) from hivedb.someTable")
>>
>> It runs fine only one time, then
>>
>> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run
>> build/sbt assembly
>> Traceback (most recent call last):
>>
>> File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 267, in
>> <module>
>>
>>
>> raise Exception(traceback.format_exc())
>> Exception: Traceback (most recent call last):
>>
>> File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 265, in
>> <module>
>> exec(code)
>> File "<stdin>", line 2, in <module>
>>
>>
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 580, in sql
>>
>> return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 683, in _ssql_ctx
>> self._scala_HiveContext = self._get_hive_ctx()
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 692, in _get_hive_ctx
>> return self._jvm.HiveContext(self._jsc.sc())
>> File 
>> "/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
>> line 1064, in __call__
>> answer, self._gateway_client, None, self._fqn)
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/utils.py",
>> line 45, in deco
>> return f(*a, **kw)
>>
>>
>>
>> I don't see more details in logs than above error stack.
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>> On Wed, Nov 23, 2016 at 7:02 AM, Felix Cheung <felixcheun...@hotmail.com>
>> wrote:
>>
>> Hmm, SPARK_HOME is set it should pick up the right Spark.
>>
>> Does this work with the Scala Spark interpreter instead of pyspark? If it
>> doesn't, is there more info in the log?
>>
>>
>> ------------------------------
>> *From:* Ruslan Dautkhanov <dautkha...@gmail.com>
>> *Sent:* Monday, November 21, 2016 1:52:36 PM
>> *To:* users@zeppelin.apache.org
>> *Subject:* "You must build Spark with Hive. Export 'SPARK_HIVE=true'"
>>
>> Getting
>> You must *build Spark with Hive*. Export 'SPARK_HIVE=true'
>> See full stack [2] below.
>>
>> I'm using Spark 1.6 that comes with CDH 5.8.3.
>> So it's definitely compiled with Hive.
>> We use Jupyter notebooks without problems in the same environment.
>>
>> Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from
>> apache.org
>>
>> Is Zeppelin compiled with Hive too? I guess so.
>> Not sure what else is missing.
>>
>> Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make
>> difference.
>>
>>
>> [1]
>> $ cat zeppelin-env.sh
>> export JAVA_HOME=/usr/java/java7
>> export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
>> export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf
>> spark.driver.memory=7g --conf spark.executor.cores=2 --conf
>> spark.executor.memory=8g"
>> export SPARK_APP_NAME="Zeppelin notebook"
>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>> export HIVE_CONF_DIR=/etc/hive/conf
>> export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
>> export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2"
>> export PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/
>> opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip"
>> export MASTER="yarn-client"
>> export ZEPPELIN_SPARK_USEHIVECONTEXT=true
>>
>>
>>
>>
>> [2]
>>
>> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run
>> build/sbt assembly
>> Traceback (most recent call last):
>> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in
>> <module>
>> raise Exception(traceback.format_exc())
>> Exception: Traceback (most recent call last):
>> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in
>> <module>
>> exec(code)
>> File "<stdin>", line 9, in <module>
>> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py",
>> line 580, in sql
>>
>> [3]
>> Also have correct symlinks in zeppelin_home/conf for
>> - hive-site.xml
>> - hdfs-site.xml
>> - core-site.xml
>> - yarn-site.xml
>>
>>
>>
>> Thank you,
>> Ruslan Dautkhanov
>>
>>
>>

Re: "You must build Spark with Hive. Export 'SPARK_HIVE=true'"

Reply via email to