May have to wait for Nathan on Mon - I'm not familiar enough with the XE environment. One thing I note: in your modules, I see cray-mpich2 but not OMPI. Are you sure you are using the OMPI you built?
What version of OMPI is this? You can add --display-alloc to your cmd line to see what OMPI thinks it was given. If you configure OMPI --enable-debug, you can also add -mca ras_base_verbose 10 to the cmd line to get more debug info. My best guess is that the way you are requesting the allocation is causing OMPI to think you were given only two slots. We're reading the allocation out of the alps file, so it could be that the qsub you gave creates something in there that we don't know how to parse. On Nov 23, 2013, at 3:27 PM, Teranishi, Keita <knte...@sandia.gov> wrote: > Here is the module environment, and I allocate interactive node by "qsub -I > -lmppwidth=32 -lmppnppn=16." > module list > Currently Loaded Modulefiles: > 1) modules/3.2.6.7 > 2) craype-network-gemini > 3) cray-mpich2/5.6.4 > 4) atp/1.6.3 > 5) xe-sysroot/4.1.40 > 6) switch/1.0-1.0401.36779.2.72.gem > 7) shared-root/1.0-1.0401.37253.3.50.gem > 8) pdsh/2.26-1.0401.37449.1.1.gem > 9) nodehealth/5.0-1.0401.38460.12.18.gem > 10) lbcd/2.1-1.0401.35360.1.2.gem > 11) hosts/1.0-1.0401.35364.1.115.gem > 12) configuration/1.0-1.0401.35391.1.2.gem > 13) ccm/2.2.0-1.0401.37254.2.142 > 14) audit/1.0.0-1.0401.37969.2.32.gem > 15) rca/1.0.0-2.0401.38656.2.2.gem > 16) dvs/1.8.6_0.9.0-1.0401.1401.1.120 > 17) csa/3.0.0-1_2.0401.37452.4.50.gem > 18) job/1.5.5-0.1_2.0401.35380.1.10.gem > 19) xpmem/0.1-2.0401.36790.4.3.gem > 20) gni-headers/2.1-1.0401.5675.4.4.gem > 21) dmapp/3.2.1-1.0401.5983.4.5.gem > 22) pmi/4.0.1-1.0000.9421.73.3.gem > 23) ugni/4.0-1.0401.5928.9.5.gem > 24) udreg/2.3.2-1.0401.5929.3.3.gem > 25) xt-libsci/12.0.00 > 26) xt-totalview/8.12.0 > 27) totalview-support/1.1.5 > 28) gcc/4.7.2 > 29) xt-asyncpe/5.22 > 30) eswrap/1.0.8 > 31) craype-mc8 > 32) PrgEnv-gnu/4.1.40 > 33) moab/5.4.4 > > > In interactive mode (as well as batch mode), "aprun –np 32" can run my > OpenMPI code. > aprun -n 32 ./cpi > Process 5 of 32 is on nid00015 > Process 7 of 32 is on nid00015 > Process 2 of 32 is on nid00015 > Process 0 of 32 is on nid00015 > Process 13 of 32 is on nid00015 > Process 10 of 32 is on nid00015 > Process 3 of 32 is on nid00015 > Process 1 of 32 is on nid00015 > Process 6 of 32 is on nid00015 > Process 4 of 32 is on nid00015 > Process 15 of 32 is on nid00015 > Process 9 of 32 is on nid00015 > Process 12 of 32 is on nid00015 > Process 8 of 32 is on nid00015 > Process 11 of 32 is on nid00015 > Process 14 of 32 is on nid00015 > Process 29 of 32 is on nid00014 > Process 22 of 32 is on nid00014 > Process 17 of 32 is on nid00014 > Process 28 of 32 is on nid00014 > Process 31 of 32 is on nid00014 > Process 26 of 32 is on nid00014 > Process 30 of 32 is on nid00014 > Process 16 of 32 is on nid00014 > Process 25 of 32 is on nid00014 > Process 24 of 32 is on nid00014 > Process 21 of 32 is on nid00014 > Process 20 of 32 is on nid00014 > Process 27 of 32 is on nid00014 > Process 19 of 32 is on nid00014 > Process 18 of 32 is on nid00014 > Process 23 of 32 is on nid00014 > pi is approximately 3.1415926544231265, Error is 0.0000000008333334 > wall clock time = 0.004645 > > > Here is what I have with openmpi. > mpiexec --bind-to-core --mca plm_base_strip_prefix_from_node_names 0 -np 32 > ./cpi > -------------------------------------------------------------------------- > There are not enough slots available in the system to satisfy the 32 slots > that were requested by the application: > ./cpi > > Either request fewer slots for your application, or make more slots available > for use. > -------------------------------------------------------------------------- > > > > From: Ralph Castain <r...@open-mpi.org> > Reply-To: Open MPI Users <us...@open-mpi.org> > Date: Saturday, November 23, 2013 2:27 PM > To: Open MPI Users <us...@open-mpi.org> > Subject: [EXTERNAL] Re: [OMPI users] (OpenMPI for Cray XE6 ) How to set mca > parameters through aprun? > > My guess is that you aren't doing the allocation correctly - since you are > using qsub, can I assume you have Moab as your scheduler? > > aprun should be forwarding the envars - do you see them if you just run > "aprun -n 1 printenv"? > > On Nov 23, 2013, at 2:13 PM, Teranishi, Keita <knte...@sandia.gov> wrote: > >> Hi, >> >> I installed OpenMPI on our small XE6 using the configure options under >> /contrib directory. It appears it is working fine, but it ignores MCA >> parameters (set in env var). So I switched to mpirun (in OpenMPI) and it >> can handle MCA parameters somehow. However, mpirun fails to allocate >> process by cores. For example, I allocated 32 cores (on 2 nodes) by "qsub >> –lmppwidth=32 –lmppnppn=16", mpirun recognizes it as 2 slots. Is it >> possible to mpirun to handle mluticore nodes of XE6 properly or is there any >> options to handle MCA parameters for aprun? >> >> Regards, >> ----------------------------------------------------------------------------- >> Keita Teranishi >> Principal Member of Technical Staff >> Scalable Modeling and Analysis Systems >> Sandia National Laboratories >> Livermore, CA 94551 >> +1 (925) 294-3738 >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users