This might a good discussion for the dev@ list, I don’t know much about SLURM deployments personally.
Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Sat, Jul 20, 2024 at 11:00 AM Patrice Duroux <patrice.dur...@gmail.com> wrote: > Hi, > > Here is a small patch that solves this issue. > Considering all the scripts, I'm not sure if sbin/stop-workers.sh and > sbin/stop-worker.sh need a similar change. > Do they really care about SPARK_CONF_DIR to do the job? > > Note that I have also removed the following part in the script: > cd "${SPARK_HOME}" \; > in the command to workers.sh > To me, it doesn't seem helpful unless CWD is important, but it shouldn't.. > > Still in this consideration of minimalism, I think it could also be > removed from: > > sbin/start-workers.sh:46:"${SPARK_HOME}/sbin/workers.sh" cd > "${SPARK_HOME}" \; "${SPARK_HOME}/sbin/start-worker.sh" > "spark://$SPARK_MASTER_HOST:$SPARK_MASTER_PORT" > sbin/stop-workers.sh:28:"${SPARK_HOME}/sbin/workers.sh" cd > "${SPARK_HOME}" \; "${SPARK_HOME}/sbin"/stop-worker.sh > sbin/spark-daemons.sh:36:exec "${SPARK_HOME}/sbin/workers.sh" cd > "${SPARK_HOME}" \; "${SPARK_HOME}/sbin/spark-daemon.sh" "$@" > > Regards, > Patrice > > Le jeu. 18 juil. 2024 à 15:34, Patrice Duroux > <patrice.dur...@gmail.com> a écrit : > > > > Hi, > > > > I'm trying to build a SLURM script to start a Spark environment > > (master+workers) dynamically configured by the job sent to the queue. > > Issue during the job execution is to start all the workers that are > > then using an default (empty) configuration. > > How could I "forward" the SPARK_CONF_DIR at this step? > > Using SPARK_SSH_OPTS in sbin/workers.sh is of no help because adding > > -o SendEnv requires > > authorization in sshd. Is there any way to add option/parameters to > > the ssh command? > > Currently, here is the corresponding call in start-workers.sh > > > > "${SPARK_HOME}/sbin/workers.sh" cd "${SPARK_HOME}" \; > > "${SPARK_HOME}/sbin/start-worker.sh" > > "spark://$SPARK_MASTER_HOST:$SPARK_MASTER_PORT" > > > > Also modifying files like .profile or .bashrc, etc. is risky and not a > > solution mainly here because each job will have its own conf_dir and > > multiple jobs could be run in parallel. > > > > Many thanks! > > > > Regards, > > Patrice > > > > Here is a sample of such a script: > > > > #!/usr/bin/sh > > > > #SBATCH -N 2 > > #SBATCH --time=00:05:00 > > > > SPARK_HOME="$WORK"/spark-3.5.1-bin-hadoop3 > > > > create_spark_conf(){ > > export SPARK_LOCAL_DIRS=$(mktemp -d spark-XXXXXXXX) > > export SPARK_CONF_DIR="$SPARK_LOCAL_DIRS"/conf > > mkdir -p $SPARK_CONF_DIR > > echo "export SPARK_LOCAL_DIRS=\"$(realpath "$SPARK_LOCAL_DIRS")\" > > export SPARK_CONF_DIR=\"$(realpath "$SPARK_LOCAL_DIRS")/conf\" > > export SPARK_LOG_DIR=\"$(realpath "$SPARK_LOCAL_DIRS")/logs\" > > module load openjdk/11.0.2 > > " > "$SPARK_CONF_DIR"/spark-env.sh > > scontrol show hostname $SLURM_JOB_NODELIST > "$SPARK_CONF_DIR"/workers > > } > > > > cd "$SCRATCH" > > create_spark_conf > > "$SPARK_HOME"/sbin/start-all.sh > > "$SPARK_HOME"/bin/spark-submit "$HOME"/testspark-0.0.1-SNAPSHOT.jar "$@" > > "$SPARK_HOME"/sbin/stop-all.sh > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org