Hi all,
I am trying to run mpi application through SLURM job scheduler. Here is my
running sequence
sbatch --> my_env_script.sh --> my_run_script.sh --> mpirun
In order to minimize modification of my production environment, I had to
setup following hostlist management in different scripts:
On May 15, 2018, at 1:39 AM, Max Mellette wrote:
>
> Thanks everyone for all your assistance. The problem seems to be resolved
> now, although I'm not entirely sure why these changes made a difference.
> There were two things I changed:
>
> (1) I had some additional `export ...` lines in .bash
Hi Max
Name resolution in /etc/hosts is a simple solution for (2).
I hope this helps,
Gus
> On May 15, 2018, at 01:39, Max Mellette wrote:
>
> Thanks everyone for all your assistance. The problem seems to be resolved
> now, although I'm not entirely sure why these changes made a difference.
The long story is you need always need a subnet manager to initialize
the fabric.
That means you can run the subnet manager and stop it once so each HCA
is assigned a LID.
In that case, the commands that interact with the SM (ibhosts,
ibdiagnet) will obviously fail.
Cheers,
Gilles
On
Xie, as far as I know you need to run OpenSM even on two hosts.
On 15 May 2018 at 03:29, Blade Shieh wrote:
> Hi, John:
>
> You are right on the network framework. I do have no IB switch and just
> connect the servers with an IB cable. I did not even open the opensmd
> service because it seems
Hi Gilles,
Thank you for pointing out my error on *-N*.
And you are right that I opened opensmd service before so the link up can
be set up correctly. But many IB-related command cannot be executed
correctly, like ibhosts and ibdiagnet.
As for pml, I am pretty sure I was using ob1, because ompi_inf