Re: Zeppelin in local computer using yarn on distant cluster

Benoit Hanotte Wed, 02 Nov 2016 10:11:04 -0700

I am pointing to the dirs on my local machine, what I want is simply for my
jobs to be submitted to the distant yarn cluster


Thanks

On Wed, Nov 2, 2016 at 4:00 PM, Abhi Basu <9000r...@gmail.com> wrote:

> I am assuming you are pointing to hadoop/spark on remote host, right? Can
> you not point hadoop conf and spark dirs to remote machine? Not sure if
> this works, just suggesting, others may have tried.
>
> On Wed, Nov 2, 2016 at 9:58 AM, Hyung Sung Shim <hss...@nflabs.com> wrote:
>
>> Hello.
>> You don't need to install hadoop in your machine but you need a proper
>> version of spark[0] to use spark-submit.
>> and then you can set[1] the SPARK_HOME where the spark exists and
>> HADOOP_CONF_DIR, master as yarn-client your spark interpreter in the
>> interpreter menu.
>>
>> [0]
>> http://spark.apache.org/downloads.html
>> [1]
>> http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/interpreter/
>> spark.html#1-export-spark_home
>> http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/install/spark
>> _cluster_mode.html#4-configure-spark-interpreter-in-zeppelin
>>
>> Hope this helps.
>>
>> 2016-11-02 19:06 GMT+09:00 Benoit Hanotte <benoit.h...@gmail.com>:
>>
>>> I have only set HADOOP_CONF_DIR as following (my hadoop conf files are
>>> in /usr/local/lib/hadoop/etc/hadoop/, eg /usr/local/lib/hadoop/etc/hado
>>> op/yarn-site.xml):
>>>
>>>     #!/bin/bash
>>>     #
>>>     # Licensed to the Apache Software Foundation (ASF) under one or more
>>>     # contributor license agreements.  See the NOTICE file distributed
>>> with
>>>     # this work for additional information regarding copyright ownership.
>>>     # The ASF licenses this file to You under the Apache License,
>>> Version 2.0
>>>     # (the "License"); you may not use this file except in compliance
>>> with
>>>     # the License.  You may obtain a copy of the License at
>>>     #
>>>     #    http://www.apache.org/licenses/LICENSE-2.0
>>>     #
>>>     # Unless required by applicable law or agreed to in writing, software
>>>     # distributed under the License is distributed on an "AS IS" BASIS,
>>>     # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>>> implied.
>>>     # See the License for the specific language governing permissions and
>>>     # limitations under the License.
>>>     #
>>>
>>>     # export JAVA_HOME=
>>>     # export MASTER=                 # Spark master url. eg.
>>> spark://master_addr:7077. Leave empty if you want to use local mode.
>>>     # export ZEPPELIN_JAVA_OPTS       # Additional jvm options. for
>>> example, export ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g
>>> -Dspark.cores.max=16"
>>>     # export ZEPPELIN_MEM             # Zeppelin jvm mem options
>>> Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m
>>>     # export ZEPPELIN_INTP_MEM       # zeppelin interpreter process jvm
>>> mem options. Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m
>>>     # export ZEPPELIN_INTP_JAVA_OPTS # zeppelin interpreter process jvm
>>> options.
>>>     # export ZEPPELIN_SSL_PORT       # ssl port (used when ssl
>>> environment variable is set to true)
>>>
>>>     # export ZEPPELIN_LOG_DIR         # Where log files are stored.
>>> PWD by default.
>>>     # export ZEPPELIN_PID_DIR         # The pid files are stored.
>>> ${ZEPPELIN_HOME}/run by default.
>>>     # export ZEPPELIN_WAR_TEMPDIR     # The location of jetty temporary
>>> directory.
>>>     # export ZEPPELIN_NOTEBOOK_DIR   # Where notebook saved
>>>     # export ZEPPELIN_NOTEBOOK_HOMESCREEN # Id of notebook to be
>>> displayed in homescreen. ex) 2A94M5J1Z
>>>     # export ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE # hide homescreen
>>> notebook from list when this value set to "true". default "false"
>>>     # export ZEPPELIN_NOTEBOOK_S3_BUCKET        # Bucket where notebook
>>> saved
>>>     # export ZEPPELIN_NOTEBOOK_S3_ENDPOINT      # Endpoint of the bucket
>>>     # export ZEPPELIN_NOTEBOOK_S3_USER          # User in bucket where
>>> notebook saved. For example bucket/user/notebook/2A94M5J1Z/note.json
>>>     # export ZEPPELIN_IDENT_STRING   # A string representing this
>>> instance of zeppelin. $USER by default.
>>>     # export ZEPPELIN_NICENESS       # The scheduling priority for
>>> daemons. Defaults to 0.
>>>     # export ZEPPELIN_INTERPRETER_LOCALREPO         # Local repository
>>> for interpreter's additional dependency loading
>>>     # export ZEPPELIN_NOTEBOOK_STORAGE # Refers to pluggable notebook
>>> storage class, can have two classes simultaneously with a sync between them
>>> (e.g. local and remote).
>>>     # export ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC # If there are multiple
>>> notebook storages, should we treat the first one as the only source of
>>> truth?
>>>
>>>     #### Spark interpreter configuration ####
>>>
>>>     ## Use provided spark installation ##
>>>     ## defining SPARK_HOME makes Zeppelin run spark interpreter process
>>> using spark-submit
>>>     ##
>>>     # export SPARK_HOME                             # (required) When it
>>> is defined, load it instead of Zeppelin embedded Spark libraries
>>>     # export SPARK_SUBMIT_OPTIONS                   # (optional) extra
>>> options to pass to spark submit. eg) "--driver-memory 512M
>>> --executor-memory 1G".
>>>     # export SPARK_APP_NAME                         # (optional) The
>>> name of spark application.
>>>
>>>     ## Use embedded spark binaries ##
>>>     ## without SPARK_HOME defined, Zeppelin still able to run spark
>>> interpreter process using embedded spark binaries.
>>>     ## however, it is not encouraged when you can define SPARK_HOME
>>>     ##
>>>     # Options read in YARN client mode
>>>     export HADOOP_CONF_DIR = /usr/local/lib/hadoop/etc/hadoop/         #
>>> yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR.
>>>     # Pyspark (supported with Spark 1.2.1 and above)
>>>     # To configure pyspark, you need to set spark distribution's path to
>>> 'spark.home' property in Interpreter setting screen in Zeppelin GUI
>>>     # export PYSPARK_PYTHON           # path to the python command.
>>> must be the same path on the driver(Zeppelin) and all workers.
>>>     # export PYTHONPATH
>>>
>>>     ## Spark interpreter options ##
>>>     ##
>>>     # export ZEPPELIN_SPARK_USEHIVECONTEXT  # Use HiveContext instead of
>>> SQLContext if set true. true by default.
>>>     # export ZEPPELIN_SPARK_CONCURRENTSQL   # Execute multiple SQL
>>> concurrently if set true. false by default.
>>>     # export ZEPPELIN_SPARK_IMPORTIMPLICIT  # Import implicits, UDF
>>> collection, and sql if set true. true by default.
>>>     # export ZEPPELIN_SPARK_MAXRESULT       # Max number of Spark SQL
>>> result to display. 1000 by default.
>>>     # export ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE       # Size in
>>> characters of the maximum text message to be received by websocket.
>>> Defaults to 1024000
>>>
>>>
>>>     #### HBase interpreter configuration ####
>>>
>>>     ## To connect to HBase running on a cluster, either HBASE_HOME or
>>> HBASE_CONF_DIR must be set
>>>
>>>     # export HBASE_HOME=                    # (require) Under which
>>> HBase scripts and configuration should be
>>>     # export HBASE_CONF_DIR=                # (optional) Alternatively,
>>> configuration directory can be set to point to the directory that has
>>> hbase-site.xml
>>>
>>>     #### ZeppelinHub connection configuration ####
>>>     # export ZEPPELINHUB_API_ADDRESS # Refers to the address of the
>>> ZeppelinHub service in use
>>>     # export ZEPPELINHUB_API_TOKEN # Refers to the Zeppelin instance
>>> token of the user
>>>     # export ZEPPELINHUB_USER_KEY # Optional, when using Zeppelin with
>>> authentication.
>>>
>>>
>>>
>>> I also tried simply /usr/local/lib/hadoop and I also create a conf
>>> directory within /usr/local/lib/hadoop/etc/hadoop and placed
>>> yarn-site.xml within this folder
>>>
>>> Thanks
>>>
>>> On Wed, Nov 2, 2016 at 10:06 AM, Hyung Sung Shim <hss...@nflabs.com>
>>> wrote:
>>>
>>>> Could you share your zeppelin-env.sh ?
>>>> 2016년 11월 2일 (수) 오후 4:57, Benoit Hanotte <benoit.h...@gmail.com>님이 작성:
>>>>
>>>>> Thanks for your reply,
>>>>> I have tried setting it within zeppelin-env.sh but it doesn't work any
>>>>> better.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Wed, Nov 2, 2016 at 2:13 AM, Hyung Sung Shim <hss...@nflabs.com>
>>>>> wrote:
>>>>>
>>>>> Hello.
>>>>> You should set the HADOOP_CONF_DIR to /usr/local/lib/hadoop/etc/hadoop/
>>>>> in the conf/zeppelin-env.sh.
>>>>> Thanks.
>>>>> 2016년 11월 2일 (수) 오전 5:07, Benoit Hanotte <benoit.h...@gmail.com>님이 작성:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I'd like to use zeppelin on my local computer and use it to run spark
>>>>> executors on a distant yarn cluster since I can't easily install zeppelin
>>>>> on the cluster gateway.
>>>>>
>>>>> I installed the correct hadoop version (2.6), and compiled zeppelin
>>>>> (from the master branch) as following:
>>>>>
>>>>> *mvn clean package -DskipTests -Phadoop-2.6
>>>>> -Dhadoop.version=2.6.0-cdh5.5.0 -Pyarn -Pspark-2.0 -Pscala-2.11*
>>>>>
>>>>> I also set HADOOP_HOME_DIR to /usr/local/lib/hadoop where my hadoop is
>>>>> installed (I also tried with /usr/local/lib/hadoop/etc/hadoop/ where
>>>>> the conf files such as yarn-site.xml are). I set
>>>>> yarn.resourcemanager.hostname to the resource manager of the cluster (I
>>>>> copied the value from the config file on the cluster) but when I start a
>>>>> spark command it still tries to connect to 0.0.0.0:8032 as one can
>>>>> see in the logs:
>>>>>
>>>>> *INFO [2016-11-01 20:48:26,581] ({pool-2-thread-2}
>>>>> Client.java[handleConnectionFailure]:862) - Retrying connect to server:
>>>>> 0.0.0.0/0.0.0.0:8032 <http://0.0.0.0/0.0.0.0:8032>. Already tried 9
>>>>> time(s); retry policy is 
>>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>>>>> sleepTime=1000 MILLISECONDS)*
>>>>>
>>>>> Am I missing something something? Is there any additional parameters
>>>>> to set?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Benoit
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>
>
>
> --
> Abhi Basu
>

Re: Zeppelin in local computer using yarn on distant cluster

Reply via email to