Hello. You don't need to install hadoop in your machine but you need a proper version of spark[0] to use spark-submit. and then you can set[1] the SPARK_HOME where the spark exists and HADOOP_CONF_DIR, master as yarn-client your spark interpreter in the interpreter menu.
[0] http://spark.apache.org/downloads.html [1] http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/interpreter/spark.html#1-export-spark_home http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/install/spark_cluster_mode.html#4-configure-spark-interpreter-in-zeppelin Hope this helps. 2016-11-02 19:06 GMT+09:00 Benoit Hanotte <benoit.h...@gmail.com>: > I have only set HADOOP_CONF_DIR as following (my hadoop conf files are in > /usr/local/lib/hadoop/etc/hadoop/, eg /usr/local/lib/hadoop/etc/ > hadoop/yarn-site.xml): > > #!/bin/bash > # > # Licensed to the Apache Software Foundation (ASF) under one or more > # contributor license agreements. See the NOTICE file distributed with > # this work for additional information regarding copyright ownership. > # The ASF licenses this file to You under the Apache License, Version > 2.0 > # (the "License"); you may not use this file except in compliance with > # the License. You may obtain a copy of the License at > # > # http://www.apache.org/licenses/LICENSE-2.0 > # > # Unless required by applicable law or agreed to in writing, software > # distributed under the License is distributed on an "AS IS" BASIS, > # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or > implied. > # See the License for the specific language governing permissions and > # limitations under the License. > # > > # export JAVA_HOME= > # export MASTER= # Spark master url. eg. > spark://master_addr:7077. Leave empty if you want to use local mode. > # export ZEPPELIN_JAVA_OPTS # Additional jvm options. for > example, export ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g > -Dspark.cores.max=16" > # export ZEPPELIN_MEM # Zeppelin jvm mem options Default > -Xms1024m -Xmx1024m -XX:MaxPermSize=512m > # export ZEPPELIN_INTP_MEM # zeppelin interpreter process jvm > mem options. Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m > # export ZEPPELIN_INTP_JAVA_OPTS # zeppelin interpreter process jvm > options. > # export ZEPPELIN_SSL_PORT # ssl port (used when ssl > environment variable is set to true) > > # export ZEPPELIN_LOG_DIR # Where log files are stored. PWD > by default. > # export ZEPPELIN_PID_DIR # The pid files are stored. > ${ZEPPELIN_HOME}/run by default. > # export ZEPPELIN_WAR_TEMPDIR # The location of jetty temporary > directory. > # export ZEPPELIN_NOTEBOOK_DIR # Where notebook saved > # export ZEPPELIN_NOTEBOOK_HOMESCREEN # Id of notebook to be > displayed in homescreen. ex) 2A94M5J1Z > # export ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE # hide homescreen notebook > from list when this value set to "true". default "false" > # export ZEPPELIN_NOTEBOOK_S3_BUCKET # Bucket where notebook > saved > # export ZEPPELIN_NOTEBOOK_S3_ENDPOINT # Endpoint of the bucket > # export ZEPPELIN_NOTEBOOK_S3_USER # User in bucket where > notebook saved. For example bucket/user/notebook/2A94M5J1Z/note.json > # export ZEPPELIN_IDENT_STRING # A string representing this > instance of zeppelin. $USER by default. > # export ZEPPELIN_NICENESS # The scheduling priority for > daemons. Defaults to 0. > # export ZEPPELIN_INTERPRETER_LOCALREPO # Local repository for > interpreter's additional dependency loading > # export ZEPPELIN_NOTEBOOK_STORAGE # Refers to pluggable notebook > storage class, can have two classes simultaneously with a sync between them > (e.g. local and remote). > # export ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC # If there are multiple > notebook storages, should we treat the first one as the only source of > truth? > > #### Spark interpreter configuration #### > > ## Use provided spark installation ## > ## defining SPARK_HOME makes Zeppelin run spark interpreter process > using spark-submit > ## > # export SPARK_HOME # (required) When it > is defined, load it instead of Zeppelin embedded Spark libraries > # export SPARK_SUBMIT_OPTIONS # (optional) extra > options to pass to spark submit. eg) "--driver-memory 512M > --executor-memory 1G". > # export SPARK_APP_NAME # (optional) The name > of spark application. > > ## Use embedded spark binaries ## > ## without SPARK_HOME defined, Zeppelin still able to run spark > interpreter process using embedded spark binaries. > ## however, it is not encouraged when you can define SPARK_HOME > ## > # Options read in YARN client mode > export HADOOP_CONF_DIR = /usr/local/lib/hadoop/etc/hadoop/ # > yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR. > # Pyspark (supported with Spark 1.2.1 and above) > # To configure pyspark, you need to set spark distribution's path to > 'spark.home' property in Interpreter setting screen in Zeppelin GUI > # export PYSPARK_PYTHON # path to the python command. must > be the same path on the driver(Zeppelin) and all workers. > # export PYTHONPATH > > ## Spark interpreter options ## > ## > # export ZEPPELIN_SPARK_USEHIVECONTEXT # Use HiveContext instead of > SQLContext if set true. true by default. > # export ZEPPELIN_SPARK_CONCURRENTSQL # Execute multiple SQL > concurrently if set true. false by default. > # export ZEPPELIN_SPARK_IMPORTIMPLICIT # Import implicits, UDF > collection, and sql if set true. true by default. > # export ZEPPELIN_SPARK_MAXRESULT # Max number of Spark SQL > result to display. 1000 by default. > # export ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE # Size in > characters of the maximum text message to be received by websocket. > Defaults to 1024000 > > > #### HBase interpreter configuration #### > > ## To connect to HBase running on a cluster, either HBASE_HOME or > HBASE_CONF_DIR must be set > > # export HBASE_HOME= # (require) Under which HBase > scripts and configuration should be > # export HBASE_CONF_DIR= # (optional) Alternatively, > configuration directory can be set to point to the directory that has > hbase-site.xml > > #### ZeppelinHub connection configuration #### > # export ZEPPELINHUB_API_ADDRESS # Refers to the address of the > ZeppelinHub service in use > # export ZEPPELINHUB_API_TOKEN # Refers to the Zeppelin instance > token of the user > # export ZEPPELINHUB_USER_KEY # Optional, when using Zeppelin with > authentication. > > > > I also tried simply /usr/local/lib/hadoop and I also create a conf > directory within /usr/local/lib/hadoop/etc/hadoop and placed > yarn-site.xml within this folder > > Thanks > > On Wed, Nov 2, 2016 at 10:06 AM, Hyung Sung Shim <hss...@nflabs.com> > wrote: > >> Could you share your zeppelin-env.sh ? >> 2016년 11월 2일 (수) 오후 4:57, Benoit Hanotte <benoit.h...@gmail.com>님이 작성: >> >>> Thanks for your reply, >>> I have tried setting it within zeppelin-env.sh but it doesn't work any >>> better. >>> >>> Thanks >>> >>> On Wed, Nov 2, 2016 at 2:13 AM, Hyung Sung Shim <hss...@nflabs.com> >>> wrote: >>> >>> Hello. >>> You should set the HADOOP_CONF_DIR to /usr/local/lib/hadoop/etc/hadoop/ >>> in the conf/zeppelin-env.sh. >>> Thanks. >>> 2016년 11월 2일 (수) 오전 5:07, Benoit Hanotte <benoit.h...@gmail.com>님이 작성: >>> >>> Hello, >>> >>> I'd like to use zeppelin on my local computer and use it to run spark >>> executors on a distant yarn cluster since I can't easily install zeppelin >>> on the cluster gateway. >>> >>> I installed the correct hadoop version (2.6), and compiled zeppelin >>> (from the master branch) as following: >>> >>> *mvn clean package -DskipTests -Phadoop-2.6 >>> -Dhadoop.version=2.6.0-cdh5.5.0 -Pyarn -Pspark-2.0 -Pscala-2.11* >>> >>> I also set HADOOP_HOME_DIR to /usr/local/lib/hadoop where my hadoop is >>> installed (I also tried with /usr/local/lib/hadoop/etc/hadoop/ where >>> the conf files such as yarn-site.xml are). I set >>> yarn.resourcemanager.hostname to the resource manager of the cluster (I >>> copied the value from the config file on the cluster) but when I start a >>> spark command it still tries to connect to 0.0.0.0:8032 as one can see >>> in the logs: >>> >>> *INFO [2016-11-01 20:48:26,581] ({pool-2-thread-2} >>> Client.java[handleConnectionFailure]:862) - Retrying connect to server: >>> 0.0.0.0/0.0.0.0:8032 <http://0.0.0.0/0.0.0.0:8032>. Already tried 9 >>> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, >>> sleepTime=1000 MILLISECONDS)* >>> >>> Am I missing something something? Is there any additional parameters to >>> set? >>> >>> Thanks! >>> >>> Benoit >>> >>> >>> >>> >