Passant,

I built a similar environment, and had no issue running a simple MPI program.


Can you please post your slurm script (I assume it uses srun to start the MPI app),

the output of

scontrol show config | grep Mpi

and the full output of your job ?


Cheers,


Gilles


On 3/12/2019 7:59 AM, Passant A. Hafez wrote:

Hello,


So we now have Slurm 18.08.6-2 compiled with PMIx 3.1.2

then I installed openmpi 4.0.0 with:

--with-slurm  --with-pmix=internal --with-libevent=internal --enable-shared --enable-
static  --with-x


(Following the thread, it was mentioned that building OMPI 4.0.0 with PMIx 3.1.2 will fail with PMIX_MODEX and PMIX_INFO_ARRAY errors, so I used internal PMIx)



The MPI program fails with:


*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[cn603-13-r:387088] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!


for each process, please advise! what's going wrong here?







All the best,
--
Passant A. Hafez | HPC Applications Specialist
KAUST Supercomputing Core Laboratory (KSL)
King Abdullah University of Science and Technology
Building 1, Al-Khawarizmi, Room 0123
Mobile : +966 (0) 55-247-9568
Mobile : +20 (0) 106-146-9644
Office : +966 (0) 12-808-0367
------------------------------------------------------------------------
*From:* users <users-boun...@lists.open-mpi.org> on behalf of Ralph H Castain <r...@open-mpi.org>
*Sent:* Monday, March 4, 2019 5:29 PM
*To:* Open MPI Users
*Subject:* Re: [OMPI users] Building PMIx and Slurm support


On Mar 4, 2019, at 5:34 AM, Daniel Letai <d...@letai.org.il <mailto:d...@letai.org.il>> wrote:

Gilles,
On 3/4/19 8:28 AM, Gilles Gouaillardet wrote:
Daniel,


On 3/4/2019 3:18 PM, Daniel Letai wrote:

So unless you have a specific reason not to mix both, you might also give the internal PMIx a try.
Does this hold true for libevent too? Configure complains if libevent for openmpi is different than the one used for the other tools.


I am not exactly sure of which scenario you are running.

Long story short,

 - If you use an external PMIx, then you have to use an external libevent (otherwise configure will fail).

It must be the same one used by PMIx, but I am not sure configure checks that.

- If you use the internal PMIx, then it is up to you. you can either use the internal libevent, or an external one.

Thanks, that clarifies the issues I've experienced. Since PMIx doesn't have to be the same for server and nodes, I can compile slurm with external PMIx with system libevent, and compile openmpi with internal PMIx and libevent, and that should work. Is that correct?

Yes - that is indeed correct!


BTW, building 4.0.1rc1 completed successfully using external for all, will start testing in near future.

Cheers,


Gilles

Thanks,
Dani_L.
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to