I am pointing to the dirs on my local machine, what I want is simply for my jobs to be submitted to the distant yarn cluster
Thanks On Wed, Nov 2, 2016 at 4:00 PM, Abhi Basu <9000r...@gmail.com> wrote: > I am assuming you are pointing to hadoop/spark on remote host, right? Can > you not point hadoop conf and spark dirs to remote machine? Not sure if > this works, just suggesting, others may have tried. > > On Wed, Nov 2, 2016 at 9:58 AM, Hyung Sung Shim <hss...@nflabs.com> wrote: > >> Hello. >> You don't need to install hadoop in your machine but you need a proper >> version of spark[0] to use spark-submit. >> and then you can set[1] the SPARK_HOME where the spark exists and >> HADOOP_CONF_DIR, master as yarn-client your spark interpreter in the >> interpreter menu. >> >> [0] >> http://spark.apache.org/downloads.html >> [1] >> http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/interpreter/ >> spark.html#1-export-spark_home >> http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/install/spark >> _cluster_mode.html#4-configure-spark-interpreter-in-zeppelin >> >> Hope this helps. >> >> 2016-11-02 19:06 GMT+09:00 Benoit Hanotte <benoit.h...@gmail.com>: >> >>> I have only set HADOOP_CONF_DIR as following (my hadoop conf files are >>> in /usr/local/lib/hadoop/etc/hadoop/, eg /usr/local/lib/hadoop/etc/hado >>> op/yarn-site.xml): >>> >>> #!/bin/bash >>> # >>> # Licensed to the Apache Software Foundation (ASF) under one or more >>> # contributor license agreements. See the NOTICE file distributed >>> with >>> # this work for additional information regarding copyright ownership. >>> # The ASF licenses this file to You under the Apache License, >>> Version 2.0 >>> # (the "License"); you may not use this file except in compliance >>> with >>> # the License. You may obtain a copy of the License at >>> # >>> # http://www.apache.org/licenses/LICENSE-2.0 >>> # >>> # Unless required by applicable law or agreed to in writing, software >>> # distributed under the License is distributed on an "AS IS" BASIS, >>> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or >>> implied. >>> # See the License for the specific language governing permissions and >>> # limitations under the License. >>> # >>> >>> # export JAVA_HOME= >>> # export MASTER= # Spark master url. eg. >>> spark://master_addr:7077. Leave empty if you want to use local mode. >>> # export ZEPPELIN_JAVA_OPTS # Additional jvm options. for >>> example, export ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g >>> -Dspark.cores.max=16" >>> # export ZEPPELIN_MEM # Zeppelin jvm mem options >>> Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m >>> # export ZEPPELIN_INTP_MEM # zeppelin interpreter process jvm >>> mem options. Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m >>> # export ZEPPELIN_INTP_JAVA_OPTS # zeppelin interpreter process jvm >>> options. >>> # export ZEPPELIN_SSL_PORT # ssl port (used when ssl >>> environment variable is set to true) >>> >>> # export ZEPPELIN_LOG_DIR # Where log files are stored. >>> PWD by default. >>> # export ZEPPELIN_PID_DIR # The pid files are stored. >>> ${ZEPPELIN_HOME}/run by default. >>> # export ZEPPELIN_WAR_TEMPDIR # The location of jetty temporary >>> directory. >>> # export ZEPPELIN_NOTEBOOK_DIR # Where notebook saved >>> # export ZEPPELIN_NOTEBOOK_HOMESCREEN # Id of notebook to be >>> displayed in homescreen. ex) 2A94M5J1Z >>> # export ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE # hide homescreen >>> notebook from list when this value set to "true". default "false" >>> # export ZEPPELIN_NOTEBOOK_S3_BUCKET # Bucket where notebook >>> saved >>> # export ZEPPELIN_NOTEBOOK_S3_ENDPOINT # Endpoint of the bucket >>> # export ZEPPELIN_NOTEBOOK_S3_USER # User in bucket where >>> notebook saved. For example bucket/user/notebook/2A94M5J1Z/note.json >>> # export ZEPPELIN_IDENT_STRING # A string representing this >>> instance of zeppelin. $USER by default. >>> # export ZEPPELIN_NICENESS # The scheduling priority for >>> daemons. Defaults to 0. >>> # export ZEPPELIN_INTERPRETER_LOCALREPO # Local repository >>> for interpreter's additional dependency loading >>> # export ZEPPELIN_NOTEBOOK_STORAGE # Refers to pluggable notebook >>> storage class, can have two classes simultaneously with a sync between them >>> (e.g. local and remote). >>> # export ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC # If there are multiple >>> notebook storages, should we treat the first one as the only source of >>> truth? >>> >>> #### Spark interpreter configuration #### >>> >>> ## Use provided spark installation ## >>> ## defining SPARK_HOME makes Zeppelin run spark interpreter process >>> using spark-submit >>> ## >>> # export SPARK_HOME # (required) When it >>> is defined, load it instead of Zeppelin embedded Spark libraries >>> # export SPARK_SUBMIT_OPTIONS # (optional) extra >>> options to pass to spark submit. eg) "--driver-memory 512M >>> --executor-memory 1G". >>> # export SPARK_APP_NAME # (optional) The >>> name of spark application. >>> >>> ## Use embedded spark binaries ## >>> ## without SPARK_HOME defined, Zeppelin still able to run spark >>> interpreter process using embedded spark binaries. >>> ## however, it is not encouraged when you can define SPARK_HOME >>> ## >>> # Options read in YARN client mode >>> export HADOOP_CONF_DIR = /usr/local/lib/hadoop/etc/hadoop/ # >>> yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR. >>> # Pyspark (supported with Spark 1.2.1 and above) >>> # To configure pyspark, you need to set spark distribution's path to >>> 'spark.home' property in Interpreter setting screen in Zeppelin GUI >>> # export PYSPARK_PYTHON # path to the python command. >>> must be the same path on the driver(Zeppelin) and all workers. >>> # export PYTHONPATH >>> >>> ## Spark interpreter options ## >>> ## >>> # export ZEPPELIN_SPARK_USEHIVECONTEXT # Use HiveContext instead of >>> SQLContext if set true. true by default. >>> # export ZEPPELIN_SPARK_CONCURRENTSQL # Execute multiple SQL >>> concurrently if set true. false by default. >>> # export ZEPPELIN_SPARK_IMPORTIMPLICIT # Import implicits, UDF >>> collection, and sql if set true. true by default. >>> # export ZEPPELIN_SPARK_MAXRESULT # Max number of Spark SQL >>> result to display. 1000 by default. >>> # export ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE # Size in >>> characters of the maximum text message to be received by websocket. >>> Defaults to 1024000 >>> >>> >>> #### HBase interpreter configuration #### >>> >>> ## To connect to HBase running on a cluster, either HBASE_HOME or >>> HBASE_CONF_DIR must be set >>> >>> # export HBASE_HOME= # (require) Under which >>> HBase scripts and configuration should be >>> # export HBASE_CONF_DIR= # (optional) Alternatively, >>> configuration directory can be set to point to the directory that has >>> hbase-site.xml >>> >>> #### ZeppelinHub connection configuration #### >>> # export ZEPPELINHUB_API_ADDRESS # Refers to the address of the >>> ZeppelinHub service in use >>> # export ZEPPELINHUB_API_TOKEN # Refers to the Zeppelin instance >>> token of the user >>> # export ZEPPELINHUB_USER_KEY # Optional, when using Zeppelin with >>> authentication. >>> >>> >>> >>> I also tried simply /usr/local/lib/hadoop and I also create a conf >>> directory within /usr/local/lib/hadoop/etc/hadoop and placed >>> yarn-site.xml within this folder >>> >>> Thanks >>> >>> On Wed, Nov 2, 2016 at 10:06 AM, Hyung Sung Shim <hss...@nflabs.com> >>> wrote: >>> >>>> Could you share your zeppelin-env.sh ? >>>> 2016년 11월 2일 (수) 오후 4:57, Benoit Hanotte <benoit.h...@gmail.com>님이 작성: >>>> >>>>> Thanks for your reply, >>>>> I have tried setting it within zeppelin-env.sh but it doesn't work any >>>>> better. >>>>> >>>>> Thanks >>>>> >>>>> On Wed, Nov 2, 2016 at 2:13 AM, Hyung Sung Shim <hss...@nflabs.com> >>>>> wrote: >>>>> >>>>> Hello. >>>>> You should set the HADOOP_CONF_DIR to /usr/local/lib/hadoop/etc/hadoop/ >>>>> in the conf/zeppelin-env.sh. >>>>> Thanks. >>>>> 2016년 11월 2일 (수) 오전 5:07, Benoit Hanotte <benoit.h...@gmail.com>님이 작성: >>>>> >>>>> Hello, >>>>> >>>>> I'd like to use zeppelin on my local computer and use it to run spark >>>>> executors on a distant yarn cluster since I can't easily install zeppelin >>>>> on the cluster gateway. >>>>> >>>>> I installed the correct hadoop version (2.6), and compiled zeppelin >>>>> (from the master branch) as following: >>>>> >>>>> *mvn clean package -DskipTests -Phadoop-2.6 >>>>> -Dhadoop.version=2.6.0-cdh5.5.0 -Pyarn -Pspark-2.0 -Pscala-2.11* >>>>> >>>>> I also set HADOOP_HOME_DIR to /usr/local/lib/hadoop where my hadoop is >>>>> installed (I also tried with /usr/local/lib/hadoop/etc/hadoop/ where >>>>> the conf files such as yarn-site.xml are). I set >>>>> yarn.resourcemanager.hostname to the resource manager of the cluster (I >>>>> copied the value from the config file on the cluster) but when I start a >>>>> spark command it still tries to connect to 0.0.0.0:8032 as one can >>>>> see in the logs: >>>>> >>>>> *INFO [2016-11-01 20:48:26,581] ({pool-2-thread-2} >>>>> Client.java[handleConnectionFailure]:862) - Retrying connect to server: >>>>> 0.0.0.0/0.0.0.0:8032 <http://0.0.0.0/0.0.0.0:8032>. Already tried 9 >>>>> time(s); retry policy is >>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, >>>>> sleepTime=1000 MILLISECONDS)* >>>>> >>>>> Am I missing something something? Is there any additional parameters >>>>> to set? >>>>> >>>>> Thanks! >>>>> >>>>> Benoit >>>>> >>>>> >>>>> >>>>> >>> >> > > > -- > Abhi Basu >