Re: [OMPI users] Running a hybrid MPI+openMP program

Reuti Thu, 21 Aug 2014 06:44:26 -0400 (EDT)

Hi,

Am 20.08.2014 um 20:08 schrieb Oscar Mojica:


> Well, with qconf -sq one.q I got the following:
> 
> [oscar@aguia free-noise]$ qconf -sq one.q
> qname                 one.q
> hostlist                 compute-1-30.local compute-1-2.local 
> compute-1-3.local \
>                       compute-1-4.local compute-1-5.local compute-1-6.local \
>                       compute-1-7.local compute-1-8.local compute-1-9.local \
>                       compute-1-10.local compute-1-11.local 
> compute-1-12.local \
>                       compute-1-13.local compute-1-14.local compute-1-15.local
> seq_no                0
> load_thresholds         np_load_avg=1.75
> suspend_thresholds      NONE
> nsuspend              1
> suspend_interval        00:05:00
> priority                0
> min_cpu_interval        00:05:00
> processors             UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list               NONE
> pe_list                 make mpich mpi orte
> rerun                 FALSE
> slots                  1,[compute-1-30.local=1],[compute-1-2.local=1], \
>                       [compute-1-3.local=1],[compute-1-5.local=1], \
>                       [compute-1-8.local=1],[compute-1-6.local=1], \
>                       [compute-1-4.local=1],[compute-1-9.local=1], \
>                       [compute-1-11.local=1],[compute-1-7.local=1], \
>                       [compute-1-13.local=1],[compute-1-10.local=1], \
>                       [compute-1-15.local=1],[compute-1-12.local=1], \
>                       [compute-1-14.local=1]
> 
> the admin was who created this queue, so I have to speak to him to change the 
> number of slots to number of threads that i wish to use. 

Yep. I think it was his intention to allow an exclusive use of each node by 
this (this can be done in SGE by other means too). While one could do it, it 
doesn't reflect the proper amount of cores to SGE the user wants to use (it's 
more like the number of machines) and so any accounting won't work, or getting 
from `qacct` the correct information what the job requested at time he was 
submitted.


> Then I could make use of: 
> ===
> export OMP_NUM_THREADS=N 
> mpirun -map-by slot:pe=$OMP_NUM_THREADS -np $(bc <<<"$NSLOTS / 
> $OMP_NUM_THREADS") ./inverse.exe
> ==

As mentioned by tmishima, it's sufficient to use:

$ qsub -pe orte 80 ...

export OMP_NUM_THREADS=8
mpirun -map-by slot:pe=$OMP_NUM_THREADS ./yourapp.exe


=> you get a proper binding here, either if you are alone on each machine or 
all jobs get proper binding and Open MPI stays inside it (not all versions of 
SGE support this though)


> For now in my case this command line just would work for 10 processes and the 
> work wouldn't be divided in threads, is it right?

It works for 10 machines which you get exclusively, hence oversubscribing the 
granted single slot on each machine with "-bind-to none" what Ralph mentioned 
in the beginning is up to you (unless other users would get hurt as they are 
having there jobs too).

$ qsub -pe orte 10 ...

export OMP_NUM_THREADS=8
mpirun -bind-to none ./yourapp.exe


=> The OS will shift the processes around, while SGE doesn't know anything 
about the final number of slots/cores you want to use on each machine (or to 
leave free for others).

===

Both ways above work right now, but IMO it's not the optimum in a shared 
cluster for the SGE versions w/o hard-binding. In the second case Open MPI 
starts 1 process per node as we need it. In case you would request here `qsub 
-pe orte 80 ...` here too, Open MPI would start 80 processes. To avoid this I 
came up with altering the machinefile to give Open MPI a different information 
about the granted slots on each machine.

$ qsub -pe orte 80 ...

export OMP_NUM_THREADS=8
awk -v omp_num_threads=$OMP_NUM_THREADS '{ $2/=omp_num_threads; print }' 
$PE_HOSTFILE > $TMPDIR/machines
export PE_HOSTFILE=$TMPDIR/machines
mpirun -bind-to none ./yourapp.exe

===

I hope having all three versions in one email sheds some light on it.

-- Reuti


> can I set a maximum number of threads in the queue one.q (e.g. 15 ) and 
> change the number in the 'export' for my convenience
> 
> I feel like a child hearing the adults speaking
> Thanks I'm learning a lot   
>   
> 
> Oscar Fabian Mojica Ladino
> Geologist M.S. in  Geophysics
> 
> 
> > From: re...@staff.uni-marburg.de
> > Date: Tue, 19 Aug 2014 19:51:46 +0200
> > To: us...@open-mpi.org
> > Subject: Re: [OMPI users] Running a hybrid MPI+openMP program
> > 
> > Hi,
> > 
> > Am 19.08.2014 um 19:06 schrieb Oscar Mojica:
> > 
> > > I discovered what was the error. I forgot include the '-fopenmp' when I 
> > > compiled the objects in the Makefile, so the program worked but it didn't 
> > > divide the job in threads. Now the program is working and I can use until 
> > > 15 cores for machine in the queue one.q.
> > > 
> > > Anyway i would like to try implement your advice. Well I'm not alone in 
> > > the cluster so i must implement your second suggestion. The steps are
> > > 
> > > a) Use '$ qconf -mp orte' to change the allocation rule to 8
> > 
> > The number of slots defined in your used one.q was also increased to 8 
> > (`qconf -sq one.q`)?
> > 
> > 
> > > b) Set '#$ -pe orte 80' in the script
> > 
> > Fine.
> > 
> > 
> > > c) I'm not sure how to do this step. I'd appreciate your help here. I can 
> > > add some lines to the script to determine the PE_HOSTFILE path and 
> > > contents, but i don't know how alter it 
> > 
> > For now you can put in your jobscript (just after OMP_NUM_THREAD is 
> > exported):
> > 
> > awk -v omp_num_threads=$OMP_NUM_THREADS '{ $2/=omp_num_threads; print }' 
> > $PE_HOSTFILE > $TMPDIR/machines
> > export PE_HOSTFILE=$TMPDIR/machines
> > 
> > =============
> > 
> > Unfortunately noone stepped into this discussion, as in my opinion it's a 
> > much broader issue which targets all users who want to combine MPI with 
> > OpenMP. The queuingsystem should get a proper request for the overall 
> > amount of slots the user needs. For now this will be forwarded to Open MPI 
> > and it will use this information to start the appropriate number of 
> > processes (which was an achievement for the Tight Integration 
> > out-of-the-box of course) and ignores any setting of OMP_NUM_THREADS. So, 
> > where should the generated list of machines be adjusted; there are several 
> > options:
> > 
> > a) The PE of the queuingsystem should do it:
> > 
> > + a one time setup for the admin
> > + in SGE the "start_proc_args" of the PE could alter the $PE_HOSTFILE
> > - the "start_proc_args" would need to know the number of threads, i.e. 
> > OMP_NUM_THREADS must be defined by "qsub -v ..." outside of the jobscript 
> > (tricky scanning of the submitted jobscript for OMP_NUM_THREADS would be 
> > too nasty)
> > - limits to use inside the jobscript calls to libraries behaving in the 
> > same way as Open MPI only
> > 
> > 
> > b) The particular queue should do it in a queue prolog:
> > 
> > same as a) I think
> > 
> > 
> > c) The user should do it
> > 
> > + no change in the SGE installation
> > - each and every user must include it in all the jobscripts to adjust the 
> > list and export the pointer to the $PE_HOSTFILE, but he could change it 
> > forth and back for different steps of the jobscript though
> > 
> > 
> > d) Open MPI should do it
> > 
> > + no change in the SGE installation
> > + no change to the jobscript
> > + OMP_NUM_THREADS can be altered for different steps of the jobscript while 
> > staying inside the granted allocation automatically
> > o should MKL_NUM_THREADS be covered too (does it use OMP_NUM_THREADS 
> > already)?
> > 
> > -- Reuti
> > 
> > 
> > > echo "PE_HOSTFILE:"
> > > echo $PE_HOSTFILE
> > > echo
> > > echo "cat PE_HOSTFILE:"
> > > cat $PE_HOSTFILE 
> > > 
> > > Thanks for take a time for answer this emails, your advices had been very 
> > > useful
> > > 
> > > PS: The version of SGE is OGS/GE 2011.11p1
> > > 
> > > 
> > > Oscar Fabian Mojica Ladino
> > > Geologist M.S. in Geophysics
> > > 
> > > 
> > > > From: re...@staff.uni-marburg.de
> > > > Date: Fri, 15 Aug 2014 20:38:12 +0200
> > > > To: us...@open-mpi.org
> > > > Subject: Re: [OMPI users] Running a hybrid MPI+openMP program
> > > > 
> > > > Hi,
> > > > 
> > > > Am 15.08.2014 um 19:56 schrieb Oscar Mojica:
> > > > 
> > > > > Yes, my installation of Open MPI is SGE-aware. I got the following
> > > > > 
> > > > > [oscar@compute-1-2 ~]$ ompi_info | grep grid
> > > > > MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.2)
> > > > 
> > > > Fine.
> > > > 
> > > > 
> > > > > I'm a bit slow and I didn't understand the las part of your message. 
> > > > > So i made a test trying to solve my doubts.
> > > > > This is the cluster configuration: There are some machines turned off 
> > > > > but that is no problem
> > > > > 
> > > > > [oscar@aguia free-noise]$ qhost
> > > > > HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
> > > > > -------------------------------------------------------------------------------
> > > > > global - - - - - - -
> > > > > compute-1-10 linux-x64 16 0.97 23.6G 558.6M 996.2M 0.0
> > > > > compute-1-11 linux-x64 16 - 23.6G - 996.2M -
> > > > > compute-1-12 linux-x64 16 0.97 23.6G 561.1M 996.2M 0.0
> > > > > compute-1-13 linux-x64 16 0.99 23.6G 558.7M 996.2M 0.0
> > > > > compute-1-14 linux-x64 16 1.00 23.6G 555.1M 996.2M 0.0
> > > > > compute-1-15 linux-x64 16 0.97 23.6G 555.5M 996.2M 0.0
> > > > > compute-1-16 linux-x64 8 0.00 15.7G 296.9M 1000.0M 0.0
> > > > > compute-1-17 linux-x64 8 0.00 15.7G 299.4M 1000.0M 0.0
> > > > > compute-1-18 linux-x64 8 - 15.7G - 1000.0M -
> > > > > compute-1-19 linux-x64 8 - 15.7G - 996.2M -
> > > > > compute-1-2 linux-x64 16 1.19 23.6G 468.1M 1000.0M 0.0
> > > > > compute-1-20 linux-x64 8 0.04 15.7G 297.2M 1000.0M 0.0
> > > > > compute-1-21 linux-x64 8 - 15.7G - 1000.0M -
> > > > > compute-1-22 linux-x64 8 0.00 15.7G 297.2M 1000.0M 0.0
> > > > > compute-1-23 linux-x64 8 0.16 15.7G 299.6M 1000.0M 0.0
> > > > > compute-1-24 linux-x64 8 0.00 15.7G 291.5M 996.2M 0.0
> > > > > compute-1-25 linux-x64 8 0.04 15.7G 293.4M 996.2M 0.0
> > > > > compute-1-26 linux-x64 8 - 15.7G - 1000.0M -
> > > > > compute-1-27 linux-x64 8 0.00 15.7G 297.0M 1000.0M 0.0
> > > > > compute-1-29 linux-x64 8 - 15.7G - 1000.0M -
> > > > > compute-1-3 linux-x64 16 - 23.6G - 996.2M -
> > > > > compute-1-30 linux-x64 16 - 23.6G - 996.2M -
> > > > > compute-1-4 linux-x64 16 0.97 23.6G 571.6M 996.2M 0.0
> > > > > compute-1-5 linux-x64 16 1.00 23.6G 559.6M 996.2M 0.0
> > > > > compute-1-6 linux-x64 16 0.66 23.6G 403.1M 996.2M 0.0
> > > > > compute-1-7 linux-x64 16 0.95 23.6G 402.7M 996.2M 0.0
> > > > > compute-1-8 linux-x64 16 0.97 23.6G 556.8M 996.2M 0.0
> > > > > compute-1-9 linux-x64 16 1.02 23.6G 566.0M 1000.0M 0.0 
> > > > > 
> > > > > I ran my program using only MPI with 10 processors of the queue one.q 
> > > > > which has 14 machines (compute-1-2 to compute-1-15). Whit 'qstat -t' 
> > > > > I got:
> > > > > 
> > > > > [oscar@aguia free-noise]$ qstat -t
> > > > > job-ID prior name user state submit/start at queue master ja-task-ID 
> > > > > task-ID state cpu mem io stat failed 
> > > > > -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-2.local 
> > > > > MASTER r 00:49:12 554.13753 0.09163 
> > > > > one.q@compute-1-2.local SLAVE 
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-5.local 
> > > > > SLAVE 1.compute-1-5 r 00:48:53 551.49022 0.09410 
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-9.local 
> > > > > SLAVE 1.compute-1-9 r 00:50:00 564.22764 0.09409 
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-12.local 
> > > > > SLAVE 1.compute-1-12 r 00:47:30 535.30379 0.09379 
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-13.local 
> > > > > SLAVE 1.compute-1-13 r 00:49:51 561.69868 0.09379 
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-14.local 
> > > > > SLAVE 1.compute-1-14 r 00:49:14 554.60818 0.09379 
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-10.local 
> > > > > SLAVE 1.compute-1-10 r 00:49:59 562.95487 0.09349 
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-15.local 
> > > > > SLAVE 1.compute-1-15 r 00:50:01 563.27221 0.09361 
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-8.local 
> > > > > SLAVE 1.compute-1-8 r 00:49:26 556.68431 0.09349 
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21 one.q@compute-1-4.local 
> > > > > SLAVE 1.compute-1-4 r 00:49:27 556.87510 0.04967 
> > > > 
> > > > Yes, here you got 10 slots (= cores) granted by SGE. So there is no 
> > > > free core left inside the allocation of SGE to allow the use of 
> > > > additional cores for your threads. If you use more cores than granted 
> > > > by SGE, it will oversubscribe the machines.
> > > > 
> > > > The issue is now:
> > > > 
> > > > a) If you want 8 threads per MPI process, your job will use 80 cores in 
> > > > total - for now SGE isn't aware of it.
> > > > 
> > > > b) Although you specified $fill_up as allocation rule, it looks like 
> > > > $round_robin. Is there more than one slot defined in the queue 
> > > > definition of one.q to get exclusive access?
> > > > 
> > > > c) What version of SGE are you using? Certain ones use cgroups or bind 
> > > > processes directly to cores (although it usually needs to be requested 
> > > > by the job: first line of `qconf -help`).
> > > > 
> > > > 
> > > > In case you are alone in the cluster, you could bypass the allocation 
> > > > with b) (unless you are hit by c)). But having a mixture of users and 
> > > > jobs a different handling would be necessary to handle this in a proper 
> > > > way IMO:
> > > > 
> > > > a) having a PE with a fixed allocation rule of 8
> > > > 
> > > > b) requesting this PE with an overall slot count of 80
> > > > 
> > > > c) copy and alter the $PE_HOSTFILE to show only (granted core count per 
> > > > machine) divided by (OMP_NUM_THREADS) per entry, change $PE_HOSTFILE so 
> > > > that it points to the altered file
> > > > 
> > > > d) Open MPI with a Tight Integration will now start only N process per 
> > > > machine according to the altered hostfile, in your case one
> > > > 
> > > > e) Your application can start the desired threads and you stay inside 
> > > > the granted allocation
> > > > 
> > > > -- Reuti
> > > > 
> > > > 
> > > > > I accessed to the MASTER processor with 'ssh compute-1-2.local' , and 
> > > > > with $ ps -e f and got this, I'm showing only the last lines 
> > > > > 
> > > > > 2506 ? Ss 0:00 /usr/sbin/atd
> > > > > 2548 tty1 Ss+ 0:00 /sbin/mingetty /dev/tty1
> > > > > 2550 tty2 Ss+ 0:00 /sbin/mingetty /dev/tty2
> > > > > 2552 tty3 Ss+ 0:00 /sbin/mingetty /dev/tty3
> > > > > 2554 tty4 Ss+ 0:00 /sbin/mingetty /dev/tty4
> > > > > 2556 tty5 Ss+ 0:00 /sbin/mingetty /dev/tty5
> > > > > 2558 tty6 Ss+ 0:00 /sbin/mingetty /dev/tty6
> > > > > 3325 ? Sl 0:04 /opt/gridengine/bin/linux-x64/sge_execd
> > > > > 17688 ? S 0:00 \_ sge_shepherd-2726 -bg
> > > > > 17695 ? Ss 0:00 \_ -bash 
> > > > > /opt/gridengine/default/spool/compute-1-2/job_scripts/2726
> > > > > 17797 ? S 0:00 \_ /usr/bin/time -f %E /opt/openmpi/bin/mpirun -v -np 
> > > > > 10 ./inverse.exe
> > > > > 17798 ? S 0:01 \_ /opt/openmpi/bin/mpirun -v -np 10 ./inverse.exe
> > > > > 17799 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit 
> > > > > -nostdin -V compute-1-5.local PATH=/opt/openmpi/bin:$PATH ; expo
> > > > > 17800 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit 
> > > > > -nostdin -V compute-1-9.local PATH=/opt/openmpi/bin:$PATH ; expo
> > > > > 17801 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit 
> > > > > -nostdin -V compute-1-12.local PATH=/opt/openmpi/bin:$PATH ; exp
> > > > > 17802 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit 
> > > > > -nostdin -V compute-1-13.local PATH=/opt/openmpi/bin:$PATH ; exp
> > > > > 17803 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit 
> > > > > -nostdin -V compute-1-14.local PATH=/opt/openmpi/bin:$PATH ; exp
> > > > > 17804 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit 
> > > > > -nostdin -V compute-1-10.local PATH=/opt/openmpi/bin:$PATH ; exp
> > > > > 17805 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit 
> > > > > -nostdin -V compute-1-15.local PATH=/opt/openmpi/bin:$PATH ; exp
> > > > > 17806 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit 
> > > > > -nostdin -V compute-1-8.local PATH=/opt/openmpi/bin:$PATH ; expo
> > > > > 17807 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit 
> > > > > -nostdin -V compute-1-4.local PATH=/opt/openmpi/bin:$PATH ; expo
> > > > > 17826 ? R 31:36 \_ ./inverse.exe
> > > > > 3429 ? Ssl 0:00 automount --pid-file /var/run/autofs.pid 
> > > > > 
> > > > > So the job is using the 10 machines, Until here is all right OK. Do 
> > > > > you think that changing the "allocation_rule " to a number instead 
> > > > > $fill_up the MPI processes would divide the work in that number of 
> > > > > threads?
> > > > > 
> > > > > Thanks a lot 
> > > > > 
> > > > > Oscar Fabian Mojica Ladino
> > > > > Geologist M.S. in Geophysics
> > > > > 
> > > > > 
> > > > > PS: I have another doubt, what is a slot? is a physical core?
> > > > > 
> > > > > 
> > > > > > From: re...@staff.uni-marburg.de
> > > > > > Date: Thu, 14 Aug 2014 23:54:22 +0200
> > > > > > To: us...@open-mpi.org
> > > > > > Subject: Re: [OMPI users] Running a hybrid MPI+openMP program
> > > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > I think this is a broader issue in case an MPI library is used in 
> > > > > > conjunction with threads while running inside a queuing system. 
> > > > > > First: whether your actual installation of Open MPI is SGE-aware 
> > > > > > you can check with:
> > > > > > 
> > > > > > $ ompi_info | grep grid
> > > > > > MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5)
> > > > > > 
> > > > > > Then we can look at the definition of your PE: "allocation_rule 
> > > > > > $fill_up". This means that SGE will grant you 14 slots in total in 
> > > > > > any combination on the available machines, means 8+4+2 slots 
> > > > > > allocation is an allowed combination like 4+4+3+3 and so on. 
> > > > > > Depending on the SGE-awareness it's a question: will your 
> > > > > > application just start processes on all nodes and completely 
> > > > > > disregard the granted allocation, or as the other extreme does it 
> > > > > > stays on one and the same machine for all started processes? On the 
> > > > > > master node of the parallel job you can issue:
> > > > > > 
> > > > > > $ ps -e f
> > > > > > 
> > > > > > (f w/o -) to have a look whether `ssh` or `qrsh -inhert ...` is 
> > > > > > used to reach other machines and their requested process count.
> > > > > > 
> > > > > > 
> > > > > > Now to the common problem in such a set up:
> > > > > > 
> > > > > > AFAICS: for now there is no way in the Open MPI + SGE combination 
> > > > > > to specify the number of MPI processes and intended number of 
> > > > > > threads which are automatically read by Open MPI while staying 
> > > > > > inside the granted slot count and allocation. So it seems to be 
> > > > > > necessary to have the intended number of threads being honored by 
> > > > > > Open MPI too.
> > > > > > 
> > > > > > Hence specifying e.g. "allocation_rule 8" in such a setup while 
> > > > > > requesting 32 processes, would for now start 32 processes by MPI 
> > > > > > already, as Open MP reads the $PE_HOSTFILE and acts accordingly.
> > > > > > 
> > > > > > Open MPI would have to read the generated machine file in a 
> > > > > > slightly different way regarding threads: a) read the $PE_HOSTFILE, 
> > > > > > b) divide the granted slots per machine by OMP_NUM_THREADS, c) 
> > > > > > throw an error in case it's not divisible by OMP_NUM_THREADS. Then 
> > > > > > start one process per quotient.
> > > > > > 
> > > > > > Would this work for you?
> > > > > > 
> > > > > > -- Reuti
> > > > > > 
> > > > > > PS: This would also mean to have a couple of PEs in SGE having a 
> > > > > > fixed "allocation_rule". While this works right now, an extension 
> > > > > > in SGE could be "$fill_up_omp"/"$round_robin_omp" and using 
> > > > > > OMP_NUM_THREADS there too, hence it must not be specified as an 
> > > > > > `export` in the job script but either on the command line or inside 
> > > > > > the job script in #$ lines as job requests. This would mean to 
> > > > > > collect slots in bunches of OMP_NUM_THREADS on each machine to 
> > > > > > reach the overall specified slot count. Whether OMP_NUM_THREADS or 
> > > > > > n times OMP_NUM_THREADS is allowed per machine needs to be 
> > > > > > discussed.
> > > > > > 
> > > > > > PS2: As Univa SGE can also supply a list of granted cores in the 
> > > > > > $PE_HOSTFILE, it would be an extension to feed this to Open MPI to 
> > > > > > allow any UGE aware binding.
> > > > > > 
> > > > > > 
> > > > > > Am 14.08.2014 um 21:52 schrieb Oscar Mojica:
> > > > > > 
> > > > > > > Guys
> > > > > > > 
> > > > > > > I changed the line to run the program in the script with both 
> > > > > > > options
> > > > > > > /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v --bind-to-none 
> > > > > > > -np $NSLOTS ./inverse.exe
> > > > > > > /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v --bind-to-socket 
> > > > > > > -np $NSLOTS ./inverse.exe
> > > > > > > 
> > > > > > > but I got the same results. When I use man mpirun appears:
> > > > > > > 
> > > > > > > -bind-to-none, --bind-to-none
> > > > > > > Do not bind processes. (Default.)
> > > > > > > 
> > > > > > > and the output of 'qconf -sp orte' is
> > > > > > > 
> > > > > > > pe_name orte
> > > > > > > slots 9999
> > > > > > > user_lists NONE
> > > > > > > xuser_lists NONE
> > > > > > > start_proc_args /bin/true
> > > > > > > stop_proc_args /bin/true
> > > > > > > allocation_rule $fill_up
> > > > > > > control_slaves TRUE
> > > > > > > job_is_first_task FALSE
> > > > > > > urgency_slots min
> > > > > > > accounting_summary TRUE
> > > > > > > 
> > > > > > > I don't know if the installed Open MPI was compiled with 
> > > > > > > '--with-sge'. How can i know that?
> > > > > > > before to think in an hybrid application i was using only MPI and 
> > > > > > > the program used few processors (14). The cluster possesses 28 
> > > > > > > machines, 15 with 16 cores and 13 with 8 cores totalizing 344 
> > > > > > > units of processing. When I submitted the job (only MPI), the MPI 
> > > > > > > processes were spread to the cores directly, for that reason I 
> > > > > > > created a new queue with 14 machines trying to gain more time. 
> > > > > > > the results were the same in both cases. In the last case i could 
> > > > > > > prove that the processes were distributed to all machines 
> > > > > > > correctly.
> > > > > > > 
> > > > > > > What I must to do?
> > > > > > > Thanks 
> > > > > > > 
> > > > > > > Oscar Fabian Mojica Ladino
> > > > > > > Geologist M.S. in Geophysics
> > > > > > > 
> > > > > > > 
> > > > > > > > Date: Thu, 14 Aug 2014 10:10:17 -0400
> > > > > > > > From: maxime.boissonnea...@calculquebec.ca
> > > > > > > > To: us...@open-mpi.org
> > > > > > > > Subject: Re: [OMPI users] Running a hybrid MPI+openMP program
> > > > > > > > 
> > > > > > > > Hi,
> > > > > > > > You DEFINITELY need to disable OpenMPI's new default binding. 
> > > > > > > > Otherwise, 
> > > > > > > > your N threads will run on a single core. --bind-to socket 
> > > > > > > > would be my 
> > > > > > > > recommendation for hybrid jobs.
> > > > > > > > 
> > > > > > > > Maxime
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Le 2014-08-14 10:04, Jeff Squyres (jsquyres) a écrit :
> > > > > > > > > I don't know much about OpenMP, but do you need to disable 
> > > > > > > > > Open MPI's default bind-to-core functionality (I'm assuming 
> > > > > > > > > you're using Open MPI 1.8.x)?
> > > > > > > > >
> > > > > > > > > You can try "mpirun --bind-to none ...", which will have Open 
> > > > > > > > > MPI not bind MPI processes to cores, which might allow OpenMP 
> > > > > > > > > to think that it can use all the cores, and therefore it will 
> > > > > > > > > spawn num_cores threads...?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Aug 14, 2014, at 9:50 AM, Oscar Mojica 
> > > > > > > > > <o_moji...@hotmail.com> wrote:
> > > > > > > > >
> > > > > > > > >> Hello everybody
> > > > > > > > >>
> > > > > > > > >> I am trying to run a hybrid mpi + openmp program in a 
> > > > > > > > >> cluster. I created a queue with 14 machines, each one with 
> > > > > > > > >> 16 cores. The program divides the work among the 14 
> > > > > > > > >> processors with MPI and within each processor a loop is also 
> > > > > > > > >> divided into 8 threads for example, using openmp. The 
> > > > > > > > >> problem is that when I submit the job to the queue the MPI 
> > > > > > > > >> processes don't divide the work into threads and the program 
> > > > > > > > >> prints the number of threads that are working within each 
> > > > > > > > >> process as one.
> > > > > > > > >>
> > > > > > > > >> I made a simple test program that uses openmp and I logged 
> > > > > > > > >> in one machine of the fourteen. I compiled it using gfortran 
> > > > > > > > >> -fopenmp program.f -o exe, set the OMP_NUM_THREADS 
> > > > > > > > >> environment variable equal to 8 and when I ran directly in 
> > > > > > > > >> the terminal the loop was effectively divided among the 
> > > > > > > > >> cores and for example in this case the program printed the 
> > > > > > > > >> number of threads equal to 8
> > > > > > > > >>
> > > > > > > > >> This is my Makefile
> > > > > > > > >> 
> > > > > > > > >> # Start of the makefile
> > > > > > > > >> # Defining variables
> > > > > > > > >> objects = inv_grav3d.o funcpdf.o gr3dprm.o fdjac.o dsvd.o
> > > > > > > > >> #f90comp = /opt/openmpi/bin/mpif90
> > > > > > > > >> f90comp = /usr/bin/mpif90
> > > > > > > > >> #switch = -O3
> > > > > > > > >> executable = inverse.exe
> > > > > > > > >> # Makefile
> > > > > > > > >> all : $(executable)
> > > > > > > > >> $(executable) : $(objects)   
> > > > > > > > >> $(f90comp) -fopenmp -g -O -o $(executable) $(objects)
> > > > > > > > >> rm $(objects)
> > > > > > > > >> %.o: %.f
> > > > > > > > >> $(f90comp) -c $<
> > > > > > > > >> # Cleaning everything
> > > > > > > > >> clean:
> > > > > > > > >> rm $(executable)
> > > > > > > > >> #    rm $(objects)
> > > > > > > > >> # End of the makefile
> > > > > > > > >>
> > > > > > > > >> and the script that i am using is
> > > > > > > > >>
> > > > > > > > >> #!/bin/bash
> > > > > > > > >> #$ -cwd
> > > > > > > > >> #$ -j y
> > > > > > > > >> #$ -S /bin/bash
> > > > > > > > >> #$ -pe orte 14
> > > > > > > > >> #$ -N job
> > > > > > > > >> #$ -q new.q
> > > > > > > > >>
> > > > > > > > >> export OMP_NUM_THREADS=8
> > > > > > > > >> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v -np $NSLOTS 
> > > > > > > > >> ./inverse.exe
> > > > > > > > >>
> > > > > > > > >> am I forgetting something?
> > > > > > > > >>
> > > > > > > > >> Thanks,
> > > > > > > > >>
> > > > > > > > >> Oscar Fabian Mojica Ladino
> > > > > > > > >> Geologist M.S. in Geophysics
> > > > > > > > >> _______________________________________________
> > > > > > > > >> users mailing list
> > > > > > > > >> us...@open-mpi.org
> > > > > > > > >> Subscription: 
> > > > > > > > >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > > > > > >> Link to this post: 
> > > > > > > > >> http://www.open-mpi.org/community/lists/users/2014/08/25016.php
> > > > > > > > >
> > > > > > > > 
> > > > > > > > 
> > > > > > > > -- 
> > > > > > > > ---------------------------------
> > > > > > > > Maxime Boissonneault
> > > > > > > > Analyste de calcul - Calcul Québec, Université Laval
> > > > > > > > Ph. D. en physique
> > > > > > > > 
> > > > > > > > _______________________________________________
> > > > > > > > users mailing list
> > > > > > > > us...@open-mpi.org
> > > > > > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > > > > > Link to this post: 
> > > > > > > > http://www.open-mpi.org/community/lists/users/2014/08/25020.php
> > > > > > > _______________________________________________
> > > > > > > users mailing list
> > > > > > > us...@open-mpi.org
> > > > > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > > > > Link to this post: 
> > > > > > > http://www.open-mpi.org/community/lists/users/2014/08/25032.php
> > > > > > 
> > > > > > _______________________________________________
> > > > > > users mailing list
> > > > > > us...@open-mpi.org
> > > > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > > > Link to this post: 
> > > > > > http://www.open-mpi.org/community/lists/users/2014/08/25034.php
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > us...@open-mpi.org
> > > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > > Link to this post: 
> > > > > http://www.open-mpi.org/community/lists/users/2014/08/25037.php
> > > > 
> > > > _______________________________________________
> > > > users mailing list
> > > > us...@open-mpi.org
> > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > Link to this post: 
> > > > http://www.open-mpi.org/community/lists/users/2014/08/25038.php
> > > _______________________________________________
> > > users mailing list
> > > us...@open-mpi.org
> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > Link to this post: 
> > > http://www.open-mpi.org/community/lists/users/2014/08/25079.php
> > 
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/users/2014/08/25080.php
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/25096.php

Re: [OMPI users] Running a hybrid MPI+openMP program

Reply via email to