Also, to get rid of this problem (once HiveContext(sc) was assigned at least twice to a variable, the only fix is - ro restart Zeppelin :-(
-- Ruslan Dautkhanov On Sun, Nov 27, 2016 at 9:00 AM, Ruslan Dautkhanov <dautkha...@gmail.com> wrote: > I found a pattern when this happens. > > When I run > sqlCtx = HiveContext(sc) > > it works as expected. > > Second and any time after that - gives that exception stack I reported in > this email chain. > > > sqlCtx = HiveContext(sc) > > sqlCtx.sql('select * from marketview.spend_dim') > > You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt > assembly > Traceback (most recent call last): > File "/tmp/zeppelin_pyspark-6752406810533348793.py", line 267, in <module> > raise Exception(traceback.format_exc()) > Exception: Traceback (most recent call last): > File "/tmp/zeppelin_pyspark-6752406810533348793.py", line 265, in <module> > exec(code) > File "<stdin>", line 2, in <module> > File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", > line 580, in sql > return DataFrame(self._ssql_ctx.sql(sqlQuery), self) > File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", > line 683, in _ssql_ctx > self._scala_HiveContext = self._get_hive_ctx() > File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", > line 692, in _get_hive_ctx > return self._jvm.HiveContext(self._jsc.sc()) > File > "/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", > line 1064, in __call__ > answer, self._gateway_client, None, self._fqn) > File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/utils.py", > line 45, in deco > return f(*a, **kw) > > > Key piece to reproduce this issue - assign HiveContext(sc) to a variable > more than once, > and use that variable between assignments. > > > -- > Ruslan Dautkhanov > > On Mon, Nov 21, 2016 at 2:52 PM, Ruslan Dautkhanov <dautkha...@gmail.com> > wrote: > >> Getting >> You must *build Spark with Hive*. Export 'SPARK_HIVE=true' >> See full stack [2] below. >> >> I'm using Spark 1.6 that comes with CDH 5.8.3. >> So it's definitely compiled with Hive. >> We use Jupyter notebooks without problems in the same environment. >> >> Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from >> apache.org >> >> Is Zeppelin compiled with Hive too? I guess so. >> Not sure what else is missing. >> >> Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make >> difference. >> >> >> [1] >> $ cat zeppelin-env.sh >> export JAVA_HOME=/usr/java/java7 >> export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark >> export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf >> spark.driver.memory=7g --conf spark.executor.cores=2 --conf >> spark.executor.memory=8g" >> export SPARK_APP_NAME="Zeppelin notebook" >> export HADOOP_CONF_DIR=/etc/hadoop/conf >> export HIVE_CONF_DIR=/etc/hive/conf >> export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive >> export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2" >> export PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/opt/ >> cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip" >> export MASTER="yarn-client" >> export ZEPPELIN_SPARK_USEHIVECONTEXT=true >> >> >> >> >> [2] >> >> You must build Spark with Hive. Export 'SPARK_HIVE=true' and run >> build/sbt assembly >> Traceback (most recent call last): >> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in >> <module> >> raise Exception(traceback.format_exc()) >> Exception: Traceback (most recent call last): >> File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in >> <module> >> exec(code) >> File "<stdin>", line 9, in <module> >> File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", >> line 580, in sql >> >> [3] >> Also have correct symlinks in zeppelin_home/conf for >> - hive-site.xml >> - hdfs-site.xml >> - core-site.xml >> - yarn-site.xml >> >> >> >> Thank you, >> Ruslan Dautkhanov >> > >