Hello, I’m trying to install Zeppelin (0.7.2) on my CDH cluster, and I am unable to connect the sql + graphical representations of the %sql interpreter with my Hive data, and more surprisingly I really can’t find any good source on the internet (apache zeppelin documentation or stack overflow) that gives a practical answer about how to do this. Most of the time, the data comes from compressed Hive tables and not plain hdfs text files ; so using a hive context is far more convenient than a plain spark sql context.
The following : %spark val hc = new org.apache.spark.sql.hive.HiveContext(sc) val result = hc.sql("select * from hivedb.hivetable") result.registerTempTable("myTest") works but no myTest table is available in the following %sql interpreter : %sql select * from myTest org.apache.spark.sql.AnalysisException: Table not found: myTest; However the following : %pyspark result = sqlContext.read.text("hdfs://cluster/test.txt") result.registerTempTable("mySqlTest") works as the %sql interpreter is “plugged” to the sqlContext but result = sqlContext.sql("select * from hivedb.hivetable") does not work as the sqlContext is not a hive context. I have set zeppelin.spark.useHiveContext to true, but it seems to have no effect (btw, it was more of a wild guess since the documentation is not giving much detail on parameters and context configuration) Can you direct me towards how to configure the context used by the %sql interpreter? Best regards, Arnaud PS : %spark and %sql interpreter conf: master yarn-client spark.app.name Zeppelin spark.cores.max spark.executor.memory 5g zeppelin.R.cmd R zeppelin.R.image.width 100% zeppelin.R.knitr true zeppelin.R.render.options out.format = 'html', comment = NA, echo = FALSE, results = 'asis', message = F, warning = F zeppelin.dep.additionalRemoteRepository spark-packages,http://dl.bintray.com/spark-packages/maven,false; zeppelin.dep.localrepo local-repo zeppelin.interpreter.localRepo /opt/zeppelin/local-repo/2CYVF45A9 zeppelin.interpreter.output.limit 102400 zeppelin.pyspark.python /usr/bin/pyspark zeppelin.spark.concurrentSQL true zeppelin.spark.importImplicit true zeppelin.spark.maxResult 1000 zeppelin.spark.printREPLOutput true zeppelin.spark.sql.stacktrace true zeppelin.spark.useHiveContext true ________________________________ L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur. The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.