Hi,
   I'd like to point out that Cray doesn't run a Work Load Manager (WLM)
  on the compute nodes. So if you use PBS or Torque/Moab, your job
  ends up on the login node. You have to use something like "aprun"
  or "ccmrun" to launch the job on the compute nodes.
  Unless "mpirun" or "mpiexec" is Cray aware, it is trying to
  launch processes on the login or MOM node.

   I've only run OpenMPI linked codes in CCM (Cray Cluster
  Compatability) mode. On a system with PBS/Torque/Moab, I use:

  qsub -I -lgres=ccm -lmppwidth=32 -lmppnppn=16

  This gives me an interactivePBS/Torque/Moab session:
   I then do:
     1) cd $PBS_O_WORKDIR
     2) module load ccm # gets access to ccmrun command
     3) setenv PATH /lus/scratch/whitaker/OpenMPI/bin:$PATH
     4) setenv LD_LIBRARY_PATH /lus/scratch/whitaker/OpenMPI/lib
     5) \rm -rf hosts
     6) cat $PBS_NODEFILE > hosts
7) ccmrun /lus/scratch/whitaker/OpenMPI/bin/mpirun --mca plm ^tm --mca ras ^tm --mca btl openib,sm,self -np 32 -machinefile hosts ./hello

        If you are running under Torque/Moab, OpenMPI will attempt
      to use the Torque/Moab API to launch the job. This won't work
      since Cray does not run a Torque/Moab MOM process
      on the compute nodes. Hence, you have to turn off OpenMPI's
      attempt to use Torque/Moab.

        OpenMPI has a version that speaks uGNI natively on the
      Cray. I have no experience with that.

ccmrun /lus/scratch/whitaker/OpenMPI/bin/mpirun --mca plm ^tm --mca ras ^tm -np 32 -machinefile hosts ./hello
 Hello World!, I am 0 of 32 (NodeID=nid00056)
 Hello World!, I am 1 of 32 (NodeID=nid00056)
 Hello World!, I am 2 of 32 (NodeID=nid00056)
 Hello World!, I am 3 of 32 (NodeID=nid00056)
 Hello World!, I am 4 of 32 (NodeID=nid00056)
 Hello World!, I am 5 of 32 (NodeID=nid00056)
 Hello World!, I am 6 of 32 (NodeID=nid00056)
 Hello World!, I am 7 of 32 (NodeID=nid00056)
 Hello World!, I am 8 of 32 (NodeID=nid00056)
 Hello World!, I am 9 of 32 (NodeID=nid00056)
 Hello World!, I am 10 of 32 (NodeID=nid00056)
 Hello World!, I am 11 of 32 (NodeID=nid00056)
 Hello World!, I am 12 of 32 (NodeID=nid00056)
 Hello World!, I am 13 of 32 (NodeID=nid00056)
 Hello World!, I am 14 of 32 (NodeID=nid00056)
 Hello World!, I am 15 of 32 (NodeID=nid00056)
 Hello World!, I am 16 of 32 (NodeID=nid00057)
 Hello World!, I am 17 of 32 (NodeID=nid00057)
 Hello World!, I am 18 of 32 (NodeID=nid00057)
 Hello World!, I am 21 of 32 (NodeID=nid00057)
 Hello World!, I am 19 of 32 (NodeID=nid00057)
 Hello World!, I am 20 of 32 (NodeID=nid00057)
 Hello World!, I am 22 of 32 (NodeID=nid00057)
 Hello World!, I am 23 of 32 (NodeID=nid00057)
 Hello World!, I am 24 of 32 (NodeID=nid00057)
 Hello World!, I am 25 of 32 (NodeID=nid00057)
 Hello World!, I am 26 of 32 (NodeID=nid00057)
 Hello World!, I am 27 of 32 (NodeID=nid00057)
 Hello World!, I am 28 of 32 (NodeID=nid00057)
 Hello World!, I am 29 of 32 (NodeID=nid00057)
 Hello World!, I am 30 of 32 (NodeID=nid00057)
 Hello World!, I am 31 of 32 (NodeID=nid00057)

          Hope this helps,
                Dave



On 11/23/2013 05:27 PM, Teranishi, Keita wrote:
Here is the module environment, and I allocate interactive node by "qsub -I -lmppwidth=32 -lmppnppn=16."
module list
Currently Loaded Modulefiles:
  1) modules/3.2.6.7
  2) craype-network-gemini
  3) cray-mpich2/5.6.4
  4) atp/1.6.3
  5) xe-sysroot/4.1.40
  6) switch/1.0-1.0401.36779.2.72.gem
  7) shared-root/1.0-1.0401.37253.3.50.gem
  8) pdsh/2.26-1.0401.37449.1.1.gem
  9) nodehealth/5.0-1.0401.38460.12.18.gem
 10) lbcd/2.1-1.0401.35360.1.2.gem
 11) hosts/1.0-1.0401.35364.1.115.gem
 12) configuration/1.0-1.0401.35391.1.2.gem
 13) ccm/2.2.0-1.0401.37254.2.142
 14) audit/1.0.0-1.0401.37969.2.32.gem
 15) rca/1.0.0-2.0401.38656.2.2.gem
 16) dvs/1.8.6_0.9.0-1.0401.1401.1.120
 17) csa/3.0.0-1_2.0401.37452.4.50.gem
 18) job/1.5.5-0.1_2.0401.35380.1.10.gem
 19) xpmem/0.1-2.0401.36790.4.3.gem
 20) gni-headers/2.1-1.0401.5675.4.4.gem
 21) dmapp/3.2.1-1.0401.5983.4.5.gem
 22) pmi/4.0.1-1.0000.9421.73.3.gem
 23) ugni/4.0-1.0401.5928.9.5.gem
 24) udreg/2.3.2-1.0401.5929.3.3.gem
 25) xt-libsci/12.0.00
 26) xt-totalview/8.12.0
 27) totalview-support/1.1.5
 28) gcc/4.7.2
 29) xt-asyncpe/5.22
 30) eswrap/1.0.8
 31) craype-mc8
 32) PrgEnv-gnu/4.1.40
 33) moab/5.4.4


In interactive mode (as well as batch mode), "aprun --np 32" can run my OpenMPI code.
aprun -n 32 ./cpi
Process 5 of 32 is on nid00015
Process 7 of 32 is on nid00015
Process 2 of 32 is on nid00015
Process 0 of 32 is on nid00015
Process 13 of 32 is on nid00015
Process 10 of 32 is on nid00015
Process 3 of 32 is on nid00015
Process 1 of 32 is on nid00015
Process 6 of 32 is on nid00015
Process 4 of 32 is on nid00015
Process 15 of 32 is on nid00015
Process 9 of 32 is on nid00015
Process 12 of 32 is on nid00015
Process 8 of 32 is on nid00015
Process 11 of 32 is on nid00015
Process 14 of 32 is on nid00015
Process 29 of 32 is on nid00014
Process 22 of 32 is on nid00014
Process 17 of 32 is on nid00014
Process 28 of 32 is on nid00014
Process 31 of 32 is on nid00014
Process 26 of 32 is on nid00014
Process 30 of 32 is on nid00014
Process 16 of 32 is on nid00014
Process 25 of 32 is on nid00014
Process 24 of 32 is on nid00014
Process 21 of 32 is on nid00014
Process 20 of 32 is on nid00014
Process 27 of 32 is on nid00014
Process 19 of 32 is on nid00014
Process 18 of 32 is on nid00014
Process 23 of 32 is on nid00014
pi is approximately 3.1415926544231265, Error is 0.0000000008333334
wall clock time = 0.004645


Here is what I have with openmpi.
mpiexec --bind-to-core  --mca plm_base_strip_prefix_from_node_names 0 -np 32 
./cpi
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 32 slots
that were requested by the application:
  ./cpi

Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------



From: Ralph Castain <r...@open-mpi.org <mailto:r...@open-mpi.org>>
Reply-To: Open MPI Users <us...@open-mpi.org <mailto:us...@open-mpi.org>>
Date: Saturday, November 23, 2013 2:27 PM
To: Open MPI Users <us...@open-mpi.org <mailto:us...@open-mpi.org>>
Subject: [EXTERNAL] Re: [OMPI users] (OpenMPI for Cray XE6 ) How to set mca parameters through aprun?

My guess is that you aren't doing the allocation correctly - since you are using qsub, can I assume you have Moab as your scheduler?

aprun should be forwarding the envars - do you see them if you just run "aprun -n 1 printenv"?

On Nov 23, 2013, at 2:13 PM, Teranishi, Keita <knte...@sandia.gov <mailto:knte...@sandia.gov>> wrote:

Hi,

I installed OpenMPI on our small XE6 using the configure options under /contrib directory. It appears it is working fine, but it ignores MCA parameters (set in env var). So I switched to mpirun (in OpenMPI) and it can handle MCA parameters somehow. However, mpirun fails to allocate process by cores. For example, I allocated 32 cores (on 2 nodes) by "qsub --lmppwidth=32 --lmppnppn=16", mpirun recognizes it as 2 slots. Is it possible to mpirun to handle mluticore nodes of XE6 properly or is there any options to handle MCA parameters for aprun?

Regards,
-----------------------------------------------------------------------------
Keita Teranishi
Principal Member of Technical Staff
Scalable Modeling and Analysis Systems
Sandia National Laboratories
Livermore, CA 94551
+1 (925) 294-3738

_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
CCCCCCCCCCCCCCCCCCCCCCFFFFFFFFFFFFFFFFFFFFFFFFFDDDDDDDDDDDDDDDDDDDDD
David Whitaker, Ph.D.                              whita...@cray.com
Aerospace CFD Specialist                        phone: (651)605-9078
ISV Applications/Cray Inc                         fax: (651)605-9001
CCCCCCCCCCCCCCCCCCCCCCFFFFFFFFFFFFFFFFFFFFFFFFFDDDDDDDDDDDDDDDDDDDDD

Reply via email to