Yep, CDH doesn't have Spark compiled with Thrift server. My understanding Zeppelin uses spark-shell REPL and not Spark thrift server.
Thank you. -- Ruslan Dautkhanov On Thu, Nov 24, 2016 at 1:57 AM, Jeff Zhang <zjf...@gmail.com> wrote: > AFAIK, spark of CDH don’t support spark thrift server, so it is possible > it is not compiled with hive. Can you run spark-shell to verify that ? If > it is built with hive, HiveContext will be created in spark-shell. > > Ruslan Dautkhanov <dautkha...@gmail.com>于2016年11月24日周四 下午3:30写道: > >> I can't reproduce this in %spark, nor %sql >> >> It seems to be %pyspark-specific. >> >> Also seems it runs fine first time I start Zeppelin, then it shows this >> error >> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run >> build/sbt assembly >> >> >> sqlc = HiveContext(sc) >> sqlc.sql("select count(*) from hivedb.someTable") >> >> It runs fine only one time, then >> >> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run >> build/sbt assembly >> Traceback (most recent call last): >> >> File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 267, in >> <module> >> >> >> raise Exception(traceback.format_exc()) >> Exception: Traceback (most recent call last): >> >> File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 265, in >> <module> >> exec(code) >> File "<stdin>", line 2, in <module> >> >> >> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", >> line 580, in sql >> >> return DataFrame(self._ssql_ctx.sql(sqlQuery), self) >> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", >> line 683, in _ssql_ctx >> self._scala_HiveContext = self._get_hive_ctx() >> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", >> line 692, in _get_hive_ctx >> return self._jvm.HiveContext(self._jsc.sc()) >> File >> "/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", >> line 1064, in __call__ >> answer, self._gateway_client, None, self._fqn) >> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/utils.py", >> line 45, in deco >> return f(*a, **kw) >> >> >> >> I don't see more details in logs than above error stack. >> >> >> -- >> Ruslan Dautkhanov >> >> On Wed, Nov 23, 2016 at 7:02 AM, Felix Cheung <felixcheun...@hotmail.com> >> wrote: >> >> Hmm, SPARK_HOME is set it should pick up the right Spark. >> >> Does this work with the Scala Spark interpreter instead of pyspark? If it >> doesn't, is there more info in the log? >> >> >> ------------------------------ >> *From:* Ruslan Dautkhanov <dautkha...@gmail.com> >> *Sent:* Monday, November 21, 2016 1:52:36 PM >> *To:* users@zeppelin.apache.org >> *Subject:* "You must build Spark with Hive. Export 'SPARK_HIVE=true'" >> >> Getting >> You must *build Spark with Hive*. Export 'SPARK_HIVE=true' >> See full stack [2] below. >> >> I'm using Spark 1.6 that comes with CDH 5.8.3. >> So it's definitely compiled with Hive. >> We use Jupyter notebooks without problems in the same environment. >> >> Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from >> apache.org >> >> Is Zeppelin compiled with Hive too? I guess so. >> Not sure what else is missing. >> >> Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make >> difference. >> >> >> [1] >> $ cat zeppelin-env.sh >> export JAVA_HOME=/usr/java/java7 >> export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark >> export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf >> spark.driver.memory=7g --conf spark.executor.cores=2 --conf >> spark.executor.memory=8g" >> export SPARK_APP_NAME="Zeppelin notebook" >> export HADOOP_CONF_DIR=/etc/hadoop/conf >> export HIVE_CONF_DIR=/etc/hive/conf >> export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive >> export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2" >> export PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/ >> opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip" >> export MASTER="yarn-client" >> export ZEPPELIN_SPARK_USEHIVECONTEXT=true >> >> >> >> >> [2] >> >> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run >> build/sbt assembly >> Traceback (most recent call last): >> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in >> <module> >> raise Exception(traceback.format_exc()) >> Exception: Traceback (most recent call last): >> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in >> <module> >> exec(code) >> File "<stdin>", line 9, in <module> >> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", >> line 580, in sql >> >> [3] >> Also have correct symlinks in zeppelin_home/conf for >> - hive-site.xml >> - hdfs-site.xml >> - core-site.xml >> - yarn-site.xml >> >> >> >> Thank you, >> Ruslan Dautkhanov >> >> >>