Couple of things: 1. please do send the output from ompi_info
2. please send the slurm envars from your allocation - i.e., after you do your salloc. Are you sure that slurm is actually "binding" us during this launch? If you just srun your get-allowed-cpu, what does it show? I'm wondering if it just gets reflected in the allocation envar and not actually binding the orteds. On Apr 27, 2012, at 8:41 AM, Rémi Palancher wrote: > Hi there, > > First, thank you for all your helpful answers! > > On Mon, 2 Apr 2012 20:30:37 -0700, Ralph Castain <r...@open-mpi.org> wrote: >> I'm afraid the 1.5 series doesn't offer any help in this regard. The >> required changes only exist in the developers trunk, which will be >> released as the 1.7 series in the not-too-distant future. > > I've tested the same use case with 1.5.5 and I obtain the exact same result > than with 1.4.5. I confirm this version doesn't offer any help on this. > > I've also tested the last available snapshot 1.7a1r26338 of the trunk, but it > seems to have 2 regressions: > > - when PSM enabled, undefined symbol error within mca_mtl_psm.so: > > $ mpirun -n 1 get-allowed-cpu-ompi > [cn0286:23252] mca: base: component_find: unable to open > /home/H76170/openmpi/1.7a1r26338/lib/openmpi/mca_mtl_psm: > /home/H76170/openmpi/1.7a1r26338/lib/openmpi/mca_mtl_psm.so: undefined > symbol: ompi_mtl_psm_imrecv (ignored) > -------------------------------------------------------------------------- > A requested component was not found, or was unable to be opened. This > means that this component is either not installed or is unable to be > used on your system (e.g., sometimes this means that shared libraries > that the component requires are unable to be found/loaded). Note that > Open MPI stopped checking at the first component that it did not find. > > Host: cn0286 > Framework: mtl > Component: psm > -------------------------------------------------------------------------- > [cn0286:23252] mca: base: components_open: component pml / cm open function > failed > -------------------------------------------------------------------------- > No available pml components were found! > > This means that there are no components of this type installed on your > system or all the components reported that they could not be used. > > This is a fatal error; your MPI process is likely to abort. Check the > output of the "ompi_info" command and ensure that components of this > type are available on your system. You may also wish to check the > value of the "component_path" MCA parameter and ensure that it has at > least one directory that contains valid MCA components. > -------------------------------------------------------------------------- > [cn0286:23252] PML cm cannot be selected > > - when disabling PSM support (in order to avoid previous error), binding to > cores allocated by Slurm fails: > > $ salloc --qos=debug -N 2 -n 20 > $ srun hostname | sort | uniq -c > 12 cn0564 > 8 cn0565 > $ module load openmpi_1.7a1r26338 > $ unset OMPI_MCA_mtl OMPI_MCA_pml > $ mpicc -o get-allowed-cpu-ompi get-allowed-cpu.c > $ mpirun get-allowed-cpu-ompi > Launch (null) Task 12 of 20 (cn0565): 0-23 > Launch (null) Task 13 of 20 (cn0565): 0-23 > Launch (null) Task 14 of 20 (cn0565): 0-23 > Launch (null) Task 15 of 20 (cn0565): 0-23 > Launch (null) Task 16 of 20 (cn0565): 0-23 > Launch (null) Task 17 of 20 (cn0565): 0-23 > Launch (null) Task 18 of 20 (cn0565): 0-23 > Launch (null) Task 19 of 20 (cn0565): 0-23 > Launch (null) Task 07 of 20 (cn0564): 0-23 > Launch (null) Task 08 of 20 (cn0564): 0-23 > Launch (null) Task 09 of 20 (cn0564): 0-23 > Launch (null) Task 10 of 20 (cn0564): 0-23 > Launch (null) Task 11 of 20 (cn0564): 0-23 > Launch (null) Task 00 of 20 (cn0564): 0-23 > Launch (null) Task 01 of 20 (cn0564): 0-23 > Launch (null) Task 02 of 20 (cn0564): 0-23 > Launch (null) Task 03 of 20 (cn0564): 0-23 > Launch (null) Task 04 of 20 (cn0564): 0-23 > Launch (null) Task 05 of 20 (cn0564): 0-23 > Launch (null) Task 06 of 20 (cn0564): 0-23 > > FYI, I am using Slurm 2.3.3. > > Did I missed something with this trunk version? > > Do you want me to send the corresponding generated config.log, "ompi_info" > and "mpirun ompi full"? > > Regards, > -- > Rémi Palancher > http://rezib.org > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users