Hi, Here is a small patch that solves this issue. Considering all the scripts, I'm not sure if sbin/stop-workers.sh and sbin/stop-worker.sh need a similar change. Do they really care about SPARK_CONF_DIR to do the job?
Note that I have also removed the following part in the script: cd "${SPARK_HOME}" \; in the command to workers.sh To me, it doesn't seem helpful unless CWD is important, but it shouldn't.. Still in this consideration of minimalism, I think it could also be removed from: sbin/start-workers.sh:46:"${SPARK_HOME}/sbin/workers.sh" cd "${SPARK_HOME}" \; "${SPARK_HOME}/sbin/start-worker.sh" "spark://$SPARK_MASTER_HOST:$SPARK_MASTER_PORT" sbin/stop-workers.sh:28:"${SPARK_HOME}/sbin/workers.sh" cd "${SPARK_HOME}" \; "${SPARK_HOME}/sbin"/stop-worker.sh sbin/spark-daemons.sh:36:exec "${SPARK_HOME}/sbin/workers.sh" cd "${SPARK_HOME}" \; "${SPARK_HOME}/sbin/spark-daemon.sh" "$@" Regards, Patrice Le jeu. 18 juil. 2024 à 15:34, Patrice Duroux <patrice.dur...@gmail.com> a écrit : > > Hi, > > I'm trying to build a SLURM script to start a Spark environment > (master+workers) dynamically configured by the job sent to the queue. > Issue during the job execution is to start all the workers that are > then using an default (empty) configuration. > How could I "forward" the SPARK_CONF_DIR at this step? > Using SPARK_SSH_OPTS in sbin/workers.sh is of no help because adding > -o SendEnv requires > authorization in sshd. Is there any way to add option/parameters to > the ssh command? > Currently, here is the corresponding call in start-workers.sh > > "${SPARK_HOME}/sbin/workers.sh" cd "${SPARK_HOME}" \; > "${SPARK_HOME}/sbin/start-worker.sh" > "spark://$SPARK_MASTER_HOST:$SPARK_MASTER_PORT" > > Also modifying files like .profile or .bashrc, etc. is risky and not a > solution mainly here because each job will have its own conf_dir and > multiple jobs could be run in parallel. > > Many thanks! > > Regards, > Patrice > > Here is a sample of such a script: > > #!/usr/bin/sh > > #SBATCH -N 2 > #SBATCH --time=00:05:00 > > SPARK_HOME="$WORK"/spark-3.5.1-bin-hadoop3 > > create_spark_conf(){ > export SPARK_LOCAL_DIRS=$(mktemp -d spark-XXXXXXXX) > export SPARK_CONF_DIR="$SPARK_LOCAL_DIRS"/conf > mkdir -p $SPARK_CONF_DIR > echo "export SPARK_LOCAL_DIRS=\"$(realpath "$SPARK_LOCAL_DIRS")\" > export SPARK_CONF_DIR=\"$(realpath "$SPARK_LOCAL_DIRS")/conf\" > export SPARK_LOG_DIR=\"$(realpath "$SPARK_LOCAL_DIRS")/logs\" > module load openjdk/11.0.2 > " > "$SPARK_CONF_DIR"/spark-env.sh > scontrol show hostname $SLURM_JOB_NODELIST > "$SPARK_CONF_DIR"/workers > } > > cd "$SCRATCH" > create_spark_conf > "$SPARK_HOME"/sbin/start-all.sh > "$SPARK_HOME"/bin/spark-submit "$HOME"/testspark-0.0.1-SNAPSHOT.jar "$@" > "$SPARK_HOME"/sbin/stop-all.sh
diff --git a/sbin/start-worker.sh b/sbin/start-worker.sh index fd58f01..bb808d5 100755 --- a/sbin/start-worker.sh +++ b/sbin/start-worker.sh @@ -39,8 +39,8 @@ fi # Any changes need to be reflected there. CLASS="org.apache.spark.deploy.worker.Worker" -if [[ $# -lt 1 ]] || [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then - echo "Usage: ./sbin/start-worker.sh <master> [options]" +if [[ $# -lt 1 ]] || [[ "$@" = *--help ]] || [[ "$@" = *-h ]] || [[ "$@" = *--config ]]; then + echo "Usage: ./sbin/start-worker.sh [--config <conf-dir>] <master> [options]" pattern="Usage:" pattern+="\|Using Spark's default log4j profile:" pattern+="\|Started daemon with process name" @@ -52,6 +52,22 @@ fi . "${SPARK_HOME}/sbin/spark-config.sh" +# Check if --config is passed as an argument. It is an optional parameter. +# Exit if the argument is not a directory. +if [ "$1" == "--config" ] +then + shift + conf_dir="$1" + if [ ! -d "$conf_dir" ] + then + echo "ERROR: $conf_dir is not a directory" + exit 1 + else + export SPARK_CONF_DIR="$conf_dir" + fi + shift +fi + . "${SPARK_HOME}/bin/load-spark-env.sh" # First argument should be the master; we need to store it aside because we may diff --git a/sbin/start-workers.sh b/sbin/start-workers.sh index 3867ef3..c891545 100755 --- a/sbin/start-workers.sh +++ b/sbin/start-workers.sh @@ -43,4 +43,4 @@ if [ "$SPARK_MASTER_HOST" = "" ]; then fi # Launch the workers -"${SPARK_HOME}/sbin/workers.sh" cd "${SPARK_HOME}" \; "${SPARK_HOME}/sbin/start-worker.sh" "spark://$SPARK_MASTER_HOST:$SPARK_MASTER_PORT" +"${SPARK_HOME}/sbin/workers.sh" "${SPARK_HOME}/sbin/start-worker.sh" --config "$SPARK_CONF_DIR" "spark://$SPARK_MASTER_HOST:$SPARK_MASTER_PORT"
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org