This might a good discussion for the dev@ list, I don’t know much about
SLURM deployments personally.

Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


On Sat, Jul 20, 2024 at 11:00 AM Patrice Duroux <patrice.dur...@gmail.com>
wrote:

> Hi,
>
> Here is a small patch that solves this issue.
> Considering all the scripts, I'm not sure if sbin/stop-workers.sh and
> sbin/stop-worker.sh need a similar change.
> Do they really care about SPARK_CONF_DIR to do the job?
>
> Note that I have also removed the following part in the script:
> cd "${SPARK_HOME}" \;
> in the command to workers.sh
> To me, it doesn't seem helpful unless CWD is important, but it shouldn't..
>
> Still in this consideration of minimalism, I think it could also be
> removed from:
>
> sbin/start-workers.sh:46:"${SPARK_HOME}/sbin/workers.sh" cd
> "${SPARK_HOME}" \; "${SPARK_HOME}/sbin/start-worker.sh"
> "spark://$SPARK_MASTER_HOST:$SPARK_MASTER_PORT"
> sbin/stop-workers.sh:28:"${SPARK_HOME}/sbin/workers.sh" cd
> "${SPARK_HOME}" \; "${SPARK_HOME}/sbin"/stop-worker.sh
> sbin/spark-daemons.sh:36:exec "${SPARK_HOME}/sbin/workers.sh" cd
> "${SPARK_HOME}" \; "${SPARK_HOME}/sbin/spark-daemon.sh" "$@"
>
> Regards,
> Patrice
>
> Le jeu. 18 juil. 2024 à 15:34, Patrice Duroux
> <patrice.dur...@gmail.com> a écrit :
> >
> > Hi,
> >
> > I'm trying to build a SLURM script to start a Spark environment
> > (master+workers) dynamically configured by the job sent to the queue.
> > Issue during the job execution is to start all the workers that are
> > then using an default (empty) configuration.
> > How could I "forward" the SPARK_CONF_DIR at this step?
> > Using SPARK_SSH_OPTS in sbin/workers.sh is of no help because adding
> > -o SendEnv requires
> > authorization in sshd. Is there any way to add option/parameters to
> > the ssh command?
> > Currently, here is the corresponding call in start-workers.sh
> >
> > "${SPARK_HOME}/sbin/workers.sh" cd "${SPARK_HOME}" \;
> > "${SPARK_HOME}/sbin/start-worker.sh"
> > "spark://$SPARK_MASTER_HOST:$SPARK_MASTER_PORT"
> >
> > Also modifying files like .profile or .bashrc, etc. is risky and not a
> > solution mainly here because each job will have its own conf_dir and
> > multiple jobs could be run in parallel.
> >
> > Many thanks!
> >
> > Regards,
> > Patrice
> >
> > Here is a sample of such a script:
> >
> > #!/usr/bin/sh
> >
> > #SBATCH -N 2
> > #SBATCH --time=00:05:00
> >
> > SPARK_HOME="$WORK"/spark-3.5.1-bin-hadoop3
> >
> > create_spark_conf(){
> > export SPARK_LOCAL_DIRS=$(mktemp -d spark-XXXXXXXX)
> > export SPARK_CONF_DIR="$SPARK_LOCAL_DIRS"/conf
> > mkdir -p $SPARK_CONF_DIR
> > echo "export SPARK_LOCAL_DIRS=\"$(realpath "$SPARK_LOCAL_DIRS")\"
> > export SPARK_CONF_DIR=\"$(realpath "$SPARK_LOCAL_DIRS")/conf\"
> > export SPARK_LOG_DIR=\"$(realpath "$SPARK_LOCAL_DIRS")/logs\"
> > module load openjdk/11.0.2
> > " > "$SPARK_CONF_DIR"/spark-env.sh
> > scontrol show hostname $SLURM_JOB_NODELIST > "$SPARK_CONF_DIR"/workers
> > }
> >
> > cd "$SCRATCH"
> > create_spark_conf
> > "$SPARK_HOME"/sbin/start-all.sh
> > "$SPARK_HOME"/bin/spark-submit "$HOME"/testspark-0.0.1-SNAPSHOT.jar "$@"
> > "$SPARK_HOME"/sbin/stop-all.sh
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to