Hi there,
First, thank you for all your helpful answers!
On Mon, 2 Apr 2012 20:30:37 -0700, Ralph Castain <r...@open-mpi.org>
wrote:
I'm afraid the 1.5 series doesn't offer any help in this regard. The
required changes only exist in the developers trunk, which will be
released as the 1.7 series in the not-too-distant future.
I've tested the same use case with 1.5.5 and I obtain the exact same
result than with 1.4.5. I confirm this version doesn't offer any help on
this.
I've also tested the last available snapshot 1.7a1r26338 of the trunk,
but it seems to have 2 regressions:
- when PSM enabled, undefined symbol error within mca_mtl_psm.so:
$ mpirun -n 1 get-allowed-cpu-ompi
[cn0286:23252] mca: base: component_find: unable to open
/home/H76170/openmpi/1.7a1r26338/lib/openmpi/mca_mtl_psm:
/home/H76170/openmpi/1.7a1r26338/lib/openmpi/mca_mtl_psm.so: undefined
symbol: ompi_mtl_psm_imrecv (ignored)
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.
Host: cn0286
Framework: mtl
Component: psm
--------------------------------------------------------------------------
[cn0286:23252] mca: base: components_open: component pml / cm open
function failed
--------------------------------------------------------------------------
No available pml components were found!
This means that there are no components of this type installed on your
system or all the components reported that they could not be used.
This is a fatal error; your MPI process is likely to abort. Check the
output of the "ompi_info" command and ensure that components of this
type are available on your system. You may also wish to check the
value of the "component_path" MCA parameter and ensure that it has at
least one directory that contains valid MCA components.
--------------------------------------------------------------------------
[cn0286:23252] PML cm cannot be selected
- when disabling PSM support (in order to avoid previous error),
binding to cores allocated by Slurm fails:
$ salloc --qos=debug -N 2 -n 20
$ srun hostname | sort | uniq -c
12 cn0564
8 cn0565
$ module load openmpi_1.7a1r26338
$ unset OMPI_MCA_mtl OMPI_MCA_pml
$ mpicc -o get-allowed-cpu-ompi get-allowed-cpu.c
$ mpirun get-allowed-cpu-ompi
Launch (null) Task 12 of 20 (cn0565): 0-23
Launch (null) Task 13 of 20 (cn0565): 0-23
Launch (null) Task 14 of 20 (cn0565): 0-23
Launch (null) Task 15 of 20 (cn0565): 0-23
Launch (null) Task 16 of 20 (cn0565): 0-23
Launch (null) Task 17 of 20 (cn0565): 0-23
Launch (null) Task 18 of 20 (cn0565): 0-23
Launch (null) Task 19 of 20 (cn0565): 0-23
Launch (null) Task 07 of 20 (cn0564): 0-23
Launch (null) Task 08 of 20 (cn0564): 0-23
Launch (null) Task 09 of 20 (cn0564): 0-23
Launch (null) Task 10 of 20 (cn0564): 0-23
Launch (null) Task 11 of 20 (cn0564): 0-23
Launch (null) Task 00 of 20 (cn0564): 0-23
Launch (null) Task 01 of 20 (cn0564): 0-23
Launch (null) Task 02 of 20 (cn0564): 0-23
Launch (null) Task 03 of 20 (cn0564): 0-23
Launch (null) Task 04 of 20 (cn0564): 0-23
Launch (null) Task 05 of 20 (cn0564): 0-23
Launch (null) Task 06 of 20 (cn0564): 0-23
FYI, I am using Slurm 2.3.3.
Did I missed something with this trunk version?
Do you want me to send the corresponding generated config.log,
"ompi_info" and "mpirun ompi full"?
Regards,
--
Rémi Palancher
http://rezib.org