I have only set HADOOP_CONF_DIR as following (my hadoop conf files are in /usr/local/lib/hadoop/etc/hadoop/, eg /usr/local/lib/hadoop/etc/hadoop/yarn-site.xml):
#!/bin/bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # export JAVA_HOME= # export MASTER= # Spark master url. eg. spark://master_addr:7077. Leave empty if you want to use local mode. # export ZEPPELIN_JAVA_OPTS # Additional jvm options. for example, export ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g -Dspark.cores.max=16" # export ZEPPELIN_MEM # Zeppelin jvm mem options Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m # export ZEPPELIN_INTP_MEM # zeppelin interpreter process jvm mem options. Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m # export ZEPPELIN_INTP_JAVA_OPTS # zeppelin interpreter process jvm options. # export ZEPPELIN_SSL_PORT # ssl port (used when ssl environment variable is set to true) # export ZEPPELIN_LOG_DIR # Where log files are stored. PWD by default. # export ZEPPELIN_PID_DIR # The pid files are stored. ${ZEPPELIN_HOME}/run by default. # export ZEPPELIN_WAR_TEMPDIR # The location of jetty temporary directory. # export ZEPPELIN_NOTEBOOK_DIR # Where notebook saved # export ZEPPELIN_NOTEBOOK_HOMESCREEN # Id of notebook to be displayed in homescreen. ex) 2A94M5J1Z # export ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE # hide homescreen notebook from list when this value set to "true". default "false" # export ZEPPELIN_NOTEBOOK_S3_BUCKET # Bucket where notebook saved # export ZEPPELIN_NOTEBOOK_S3_ENDPOINT # Endpoint of the bucket # export ZEPPELIN_NOTEBOOK_S3_USER # User in bucket where notebook saved. For example bucket/user/notebook/2A94M5J1Z/note.json # export ZEPPELIN_IDENT_STRING # A string representing this instance of zeppelin. $USER by default. # export ZEPPELIN_NICENESS # The scheduling priority for daemons. Defaults to 0. # export ZEPPELIN_INTERPRETER_LOCALREPO # Local repository for interpreter's additional dependency loading # export ZEPPELIN_NOTEBOOK_STORAGE # Refers to pluggable notebook storage class, can have two classes simultaneously with a sync between them (e.g. local and remote). # export ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC # If there are multiple notebook storages, should we treat the first one as the only source of truth? #### Spark interpreter configuration #### ## Use provided spark installation ## ## defining SPARK_HOME makes Zeppelin run spark interpreter process using spark-submit ## # export SPARK_HOME # (required) When it is defined, load it instead of Zeppelin embedded Spark libraries # export SPARK_SUBMIT_OPTIONS # (optional) extra options to pass to spark submit. eg) "--driver-memory 512M --executor-memory 1G". # export SPARK_APP_NAME # (optional) The name of spark application. ## Use embedded spark binaries ## ## without SPARK_HOME defined, Zeppelin still able to run spark interpreter process using embedded spark binaries. ## however, it is not encouraged when you can define SPARK_HOME ## # Options read in YARN client mode export HADOOP_CONF_DIR = /usr/local/lib/hadoop/etc/hadoop/ # yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR. # Pyspark (supported with Spark 1.2.1 and above) # To configure pyspark, you need to set spark distribution's path to 'spark.home' property in Interpreter setting screen in Zeppelin GUI # export PYSPARK_PYTHON # path to the python command. must be the same path on the driver(Zeppelin) and all workers. # export PYTHONPATH ## Spark interpreter options ## ## # export ZEPPELIN_SPARK_USEHIVECONTEXT # Use HiveContext instead of SQLContext if set true. true by default. # export ZEPPELIN_SPARK_CONCURRENTSQL # Execute multiple SQL concurrently if set true. false by default. # export ZEPPELIN_SPARK_IMPORTIMPLICIT # Import implicits, UDF collection, and sql if set true. true by default. # export ZEPPELIN_SPARK_MAXRESULT # Max number of Spark SQL result to display. 1000 by default. # export ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE # Size in characters of the maximum text message to be received by websocket. Defaults to 1024000 #### HBase interpreter configuration #### ## To connect to HBase running on a cluster, either HBASE_HOME or HBASE_CONF_DIR must be set # export HBASE_HOME= # (require) Under which HBase scripts and configuration should be # export HBASE_CONF_DIR= # (optional) Alternatively, configuration directory can be set to point to the directory that has hbase-site.xml #### ZeppelinHub connection configuration #### # export ZEPPELINHUB_API_ADDRESS # Refers to the address of the ZeppelinHub service in use # export ZEPPELINHUB_API_TOKEN # Refers to the Zeppelin instance token of the user # export ZEPPELINHUB_USER_KEY # Optional, when using Zeppelin with authentication. I also tried simply /usr/local/lib/hadoop and I also create a conf directory within /usr/local/lib/hadoop/etc/hadoop and placed yarn-site.xml within this folder Thanks On Wed, Nov 2, 2016 at 10:06 AM, Hyung Sung Shim <hss...@nflabs.com> wrote: > Could you share your zeppelin-env.sh ? > 2016년 11월 2일 (수) 오후 4:57, Benoit Hanotte <benoit.h...@gmail.com>님이 작성: > >> Thanks for your reply, >> I have tried setting it within zeppelin-env.sh but it doesn't work any >> better. >> >> Thanks >> >> On Wed, Nov 2, 2016 at 2:13 AM, Hyung Sung Shim <hss...@nflabs.com> >> wrote: >> >> Hello. >> You should set the HADOOP_CONF_DIR to /usr/local/lib/hadoop/etc/hadoop/ >> in the conf/zeppelin-env.sh. >> Thanks. >> 2016년 11월 2일 (수) 오전 5:07, Benoit Hanotte <benoit.h...@gmail.com>님이 작성: >> >> Hello, >> >> I'd like to use zeppelin on my local computer and use it to run spark >> executors on a distant yarn cluster since I can't easily install zeppelin >> on the cluster gateway. >> >> I installed the correct hadoop version (2.6), and compiled zeppelin (from >> the master branch) as following: >> >> *mvn clean package -DskipTests -Phadoop-2.6 >> -Dhadoop.version=2.6.0-cdh5.5.0 -Pyarn -Pspark-2.0 -Pscala-2.11* >> >> I also set HADOOP_HOME_DIR to /usr/local/lib/hadoop where my hadoop is >> installed (I also tried with /usr/local/lib/hadoop/etc/hadoop/ where the >> conf files such as yarn-site.xml are). I set yarn.resourcemanager.hostname >> to the resource manager of the cluster (I copied the value from the config >> file on the cluster) but when I start a spark command it still tries to >> connect to 0.0.0.0:8032 as one can see in the logs: >> >> *INFO [2016-11-01 20:48:26,581] ({pool-2-thread-2} >> Client.java[handleConnectionFailure]:862) - Retrying connect to server: >> 0.0.0.0/0.0.0.0:8032 <http://0.0.0.0/0.0.0.0:8032>. Already tried 9 >> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, >> sleepTime=1000 MILLISECONDS)* >> >> Am I missing something something? Is there any additional parameters to >> set? >> >> Thanks! >> >> Benoit >> >> >> >>