
I can try to tell from PMIx/UCX perspective.
Do you have "MPI=pmix" parameter in your slurm.conf or have you specified
"--mpi=pmix" in your srun command? If not - you are not running PMIx and
thus UCX (UCX support is only in the PMIx plugin).
I think this is confirmed by the log output that you have provided, I don't
see any traces of PMIx plugin.

пт, 17 авг. 2018 г. в 20:43, zhangtao102...@126.com <zhangtao102...@126.com

> Hi,
> I have installed SLURM 18.08.0-0pre2 on a my cluster based on RHEL7.4
> (x86_64).
> My configure parameters likes this:
> ./configure --prefix=/opt/slurm17 --with-munge=/opt/munge
> --with-pmix=/opt/pmix --with-ucx=/opt/openucx --with-hwloc=/usr
> (openucx version is 1.5.0, pmix version is 3.0.0, hwloc version is 1.11.8)
> After completing the installation and configuration, it looks like slurm
> is working normally. But when I submitted a simple test job with sbatch
> sleep.sh(just call srun sleep 30 at single computing node), I found that
> the job (ID=1032) state was R, but the job did not start normally on the
> computation node (no process found).
> The appendix is the output log of the computing node of the management
> node.
> I can't tell if the cause of this problem is related to the compilation
> parameters I specify (such as pmix, ucx), and I've never seen anything
> similar in earlier versions.
> Has anyone ever responded to a similar phenomenon with me? How to solve
> the problem?
> Best regards
> ------------------------------
> zhangtao102...@126.com

С Уважением, Поляков Артем Юрьевич
Best regards, Artem Y. Polyakov

Reply via email to