Hi, I think you misunderstood what a MIMD launch with mpirun/mpiexec actually does.
mpirun -np X prog1 : -np Y prog2 starts a *single* MPI job consisting of X+Y processes in total of which the X processes execute prog1 and Y processes execute prog2 but they still belong to the same MPI job and hence share the same rank space and MPI_COMM_WORLD. Ranks 0 to X-1 execute prog1 and ranks X to Y-1 - prog2. Cheers, Hristo From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Ufuk Utku Turuncoglu (BE) Sent: Thursday, June 21, 2012 9:29 AM To: us...@open-mpi.org Subject: [OMPI users] two jobs in a single mpirun command and MPI_COMM_WORLD issue ... Hi, I try to submit two MPI jobs using single OpenMPI mpirun command (command can be seen in job submission script). To test this configuration, i compiled simple mpihello application and run. The problem is that each distinct mpihello jobs (run1 and run2) uses same MPI_COMM_WORLD and rank of the process goes like following, --- out1 (comes from first mpihello.x) --- node 17 : Hello world node 28 : Hello world ... ... --- out2 (comes from second mpihello.x) --- node 115 : Hello world node 113 : Hello world node 74 : Hello world ... ... If the MPI_COMM_WORLD is created separately for each jobs then the node number (or id or rank) must be start from 0 until 63 in each log file but this is not the case. So, in the second one the node numbers start from 64 to 131. If Fortran application uses MPI_COMM_SIZE and MPI_COMM_RANK to get the total number of processor (in this case it is 132), then rank and total number of processor will be wrong. I think mpirun is not smart enough in this case. What do you think? Any suggestions can help. PS: I am using OpenMPI version 1.5.3 compiled with Intel 12.0.4 compilers. Regards, --ufuk --- job submission script (in OpenPBS) --- #!/bin/bash #PBS -l walltime=01:00:00 #PBS -l nodes=11:ppn=12 #PBS -N both #PBS -q esp # load modules . /etc/profile.d/modules.sh module load openmpi/1.5.3/intel/2011 module load netcdf/4.1.1/intel/11.1 # parameters WRKDIR1=/home/netapp/clima-users/users/uturunco/CAS/run.lake/BOTH WRKDIR2=/home/netapp/clima-users/users/uturunco/CAS/run.lake/BOTH # create node files head -n 64 $PBS_NODEFILE >& $WRKDIR1/nodes1.txt tail -n 64 $PBS_NODEFILE >& $WRKDIR2/nodes2.txt # submit jobs mpirun -np `cat $WRKDIR1/nodes1.txt | wc -l` -machinefile $WRKDIR1/nodes1.txt -wd $WRKDIR1 ./run1.sh : -np `cat $WRKDIR2/nodes2.txt | wc -l` -machinefile $WRKDIR2/nodes2.txt -wd $WRKDIR2 ./run2.sh --- end of job submission script --- --- script run1.sh --- #!/bin/sh ./mpihello.x >> out1.txt --- end of script run1.sh --- --- script run2.sh --- #!/bin/sh ./mpihello.x >> out2.txt --- end of script run2.sh --- -- Hristo Iliev, Ph.D. -- High Performance Computing RWTH Aachen University, Center for Computing and Communication Rechen- und Kommunikationszentrum der RWTH Aachen Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241 80 24367 -- Fax/UMS: +49 241 80 624367
smime.p7s
Description: S/MIME cryptographic signature