Re: [OMPI users] Running a hybrid MPI+openMP program

tmishima Wed, 20 Aug 2014 22:45:42 -0400 (EDT)

Oscar,

As I mentioned before, I've never used SGE. So please ask
for Reuti's advise. Only thing I can tell is that you have
to use the openmpi 1.8 series to use -map-by slot:pe=N option.


Tetsuya


> Hi
>
> Well, with qconf -sq one.q I got the following:
>
> [oscar@aguia free-noise]$ qconf -sq one.q
> qname                 one.q
> hostlist                 compute-1-30.local compute-1-2.local
compute-1-3.local \
>                       compute-1-4.local compute-1-5.local
compute-1-6.local \
>                       compute-1-7.local compute-1-8.local
compute-1-9.local \
>                       compute-1-10.local compute-1-11.local
compute-1-12.local \
>                       compute-1-13.local compute-1-14.local
compute-1-15.local
> seq_no                0
> load_thresholds         np_load_avg=1.75
> suspend_thresholds      NONE
> nsuspend              1
> suspend_interval        00:05:00
> priority                0
> min_cpu_interval        00:05:00
> processors             UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list               NONE
> pe_list                 make mpich mpi orte
> rerun                 FALSE
> slots                  1,[compute-1-30.local=1],[compute-1-2.local=1], \
>                       [compute-1-3.local=1],[compute-1-5.local=1], \
>                       [compute-1-8.local=1],[compute-1-6.local=1], \
>                       [compute-1-4.local=1],[compute-1-9.local=1], \
>                       [compute-1-11.local=1],[compute-1-7.local=1], \
>                       [compute-1-13.local=1],[compute-1-10.local=1], \
>                       [compute-1-15.local=1],[compute-1-12.local=1], \
>                       [compute-1-14.local=1]
>
> the admin was who created this queue, so I have to speak to him to change
the number of slots to number of threads that i wish to use.
>
> Then I could make use of:
> ===
> export OMP_NUM_THREADS=N
> mpirun -map-by slot:pe=$OMP_NUM_THREADS -np $(bc <<<"$NSLOTS /
$OMP_NUM_THREADS") ./inverse.exe
> ===
>
> For now in my case this command line just would work for 10 processes and
the work wouldn't be divided in threads, is it right?
>
> can I set a maximum number of threads in the queue one.q (e.g. 15 ) and
change the number in the 'export' for my convenience
>
> I feel like a child hearing the adults speaking
> Thanks I'm learning a lot
>
>
> Oscar Fabian Mojica Ladino
> Geologist M.S. in  Geophysics
>
>
> > From: re...@staff.uni-marburg.de
> > Date: Tue, 19 Aug 2014 19:51:46 +0200
> > To: us...@open-mpi.org
> > Subject: Re: [OMPI users] Running a hybrid MPI+openMP program
> >
> > Hi,
> >
> > Am 19.08.2014 um 19:06 schrieb Oscar Mojica:
> >
> > > I discovered what was the error. I forgot include the '-fopenmp' when
I compiled the objects in the Makefile, so the program worked but it didn't
divide the job in threads. Now the program is
> working and I can use until 15 cores for machine in the queue one.q.
> > >
> > > Anyway i would like to try implement your advice. Well I'm not alone
in the cluster so i must implement your second suggestion. The steps are
> > >
> > > a) Use '$ qconf -mp orte' to change the allocation rule to 8
> >
> > The number of slots defined in your used one.q was also increased to 8
(`qconf -sq one.q`)?
> >
> >
> > > b) Set '#$ -pe orte 80' in the script
> >
> > Fine.
> >
> >
> > > c) I'm not sure how to do this step. I'd appreciate your help here. I
can add some lines to the script to determine the PE_HOSTFILE path and
contents, but i don't know how alter it
> >
> > For now you can put in your jobscript (just after OMP_NUM_THREAD is
exported):
> >
> > awk -v omp_num_threads=$OMP_NUM_THREADS '{ $2/=omp_num_threads;
print }' $PE_HOSTFILE > $TMPDIR/machines
> > export PE_HOSTFILE=$TMPDIR/machines
> >
> > =============
> >
> > Unfortunately noone stepped into this discussion, as in my opinion it's
a much broader issue which targets all users who want to combine MPI with
OpenMP. The queuingsystem should get a proper
> request for the overall amount of slots the user needs. For now this will
be forwarded to Open MPI and it will use this information to start the
appropriate number of processes (which was an
> achievement for the Tight Integration out-of-the-box of course) and
ignores any setting of OMP_NUM_THREADS. So, where should the generated list
of machines be adjusted; there are several options:
> >
> > a) The PE of the queuingsystem should do it:
> >
> > + a one time setup for the admin
> > + in SGE the "start_proc_args" of the PE could alter the $PE_HOSTFILE
> > - the "start_proc_args" would need to know the number of threads, i.e.
OMP_NUM_THREADS must be defined by "qsub -v ..." outside of the jobscript
(tricky scanning of the submitted jobscript for
> OMP_NUM_THREADS would be too nasty)
> > - limits to use inside the jobscript calls to libraries behaving in the
same way as Open MPI only
> >
> >
> > b) The particular queue should do it in a queue prolog:
> >
> > same as a) I think
> >
> >
> > c) The user should do it
> >
> > + no change in the SGE installation
> > - each and every user must include it in all the jobscripts to adjust
the list and export the pointer to the $PE_HOSTFILE, but he could change it
forth and back for different steps of the jobscript
> though
> >
> >
> > d) Open MPI should do it
> >
> > + no change in the SGE installation
> > + no change to the jobscript
> > + OMP_NUM_THREADS can be altered for different steps of the jobscript
while staying inside the granted allocation automatically
> > o should MKL_NUM_THREADS be covered too (does it use OMP_NUM_THREADS
already)?
> >
> > -- Reuti
> >
> >
> > > echo "PE_HOSTFILE:"
> > > echo $PE_HOSTFILE
> > > echo
> > > echo "cat PE_HOSTFILE:"
> > > cat $PE_HOSTFILE
> > >
> > > Thanks for take a time for answer this emails, your advices had been
very useful
> > >
> > > PS: The version of SGE is   OGS/GE 2011.11p1
> > >
> > >
> > > Oscar Fabian Mojica Ladino
> > > Geologist M.S. in  Geophysics
> > >
> > >
> > > > From: re...@staff.uni-marburg.de
> > > > Date: Fri, 15 Aug 2014 20:38:12 +0200
> > > > To: us...@open-mpi.org
> > > > Subject: Re: [OMPI users] Running a hybrid MPI+openMP program
> > > >
> > > > Hi,
> > > >
> > > > Am 15.08.2014 um 19:56 schrieb Oscar Mojica:
> > > >
> > > > > Yes, my installation of Open MPI is SGE-aware. I got the
following
> > > > >
> > > > > [oscar@compute-1-2 ~]$ ompi_info | grep grid
> > > > > MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.2)
> > > >
> > > > Fine.
> > > >
> > > >
> > > > > I'm a bit slow and I didn't understand the las part of your
message. So i made a test trying to solve my doubts.
> > > > > This is the cluster configuration: There are some machines turned
off but that is no problem
> > > > >
> > > > > [oscar@aguia free-noise]$ qhost
> > > > > HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
> > > > >
-------------------------------------------------------------------------------

> > > > > global - - - - - - -
> > > > > compute-1-10 linux-x64 16 0.97 23.6G 558.6M 996.2M 0.0
> > > > > compute-1-11 linux-x64 16 - 23.6G - 996.2M -
> > > > > compute-1-12 linux-x64 16 0.97 23.6G 561.1M 996.2M 0.0
> > > > > compute-1-13 linux-x64 16 0.99 23.6G 558.7M 996.2M 0.0
> > > > > compute-1-14 linux-x64 16 1.00 23.6G 555.1M 996.2M 0.0
> > > > > compute-1-15 linux-x64 16 0.97 23.6G 555.5M 996.2M 0.0
> > > > > compute-1-16 linux-x64 8 0.00 15.7G 296.9M 1000.0M 0.0
> > > > > compute-1-17 linux-x64 8 0.00 15.7G 299.4M 1000.0M 0.0
> > > > > compute-1-18 linux-x64 8 - 15.7G - 1000.0M -
> > > > > compute-1-19 linux-x64 8 - 15.7G - 996.2M -
> > > > > compute-1-2 linux-x64 16 1.19 23.6G 468.1M 1000.0M 0.0
> > > > > compute-1-20 linux-x64 8 0.04 15.7G 297.2M 1000.0M 0.0
> > > > > compute-1-21 linux-x64 8 - 15.7G - 1000.0M -
> > > > > compute-1-22 linux-x64 8 0.00 15.7G 297.2M 1000.0M 0.0
> > > > > compute-1-23 linux-x64 8 0.16 15.7G 299.6M 1000.0M 0.0
> > > > > compute-1-24 linux-x64 8 0.00 15.7G 291.5M 996.2M 0.0
> > > > > compute-1-25 linux-x64 8 0.04 15.7G 293.4M 996.2M 0.0
> > > > > compute-1-26 linux-x64 8 - 15.7G - 1000.0M -
> > > > > compute-1-27 linux-x64 8 0.00 15.7G 297.0M 1000.0M 0.0
> > > > > compute-1-29 linux-x64 8 - 15.7G - 1000.0M -
> > > > > compute-1-3 linux-x64 16 - 23.6G - 996.2M -
> > > > > compute-1-30 linux-x64 16 - 23.6G - 996.2M -
> > > > > compute-1-4 linux-x64 16 0.97 23.6G 571.6M 996.2M 0.0
> > > > > compute-1-5 linux-x64 16 1.00 23.6G 559.6M 996.2M 0.0
> > > > > compute-1-6 linux-x64 16 0.66 23.6G 403.1M 996.2M 0.0
> > > > > compute-1-7 linux-x64 16 0.95 23.6G 402.7M 996.2M 0.0
> > > > > compute-1-8 linux-x64 16 0.97 23.6G 556.8M 996.2M 0.0
> > > > > compute-1-9 linux-x64 16 1.02 23.6G 566.0M 1000.0M 0.0
> > > > >
> > > > > I ran my program using only MPI with 10 processors of the queue
one.q which has 14 machines (compute-1-2 to compute-1-15). Whit 'qstat -t'
I got:
> > > > >
> > > > > [oscar@aguia free-noise]$ qstat -t
> > > > > job-ID prior name user state submit/start at queue master
ja-task-ID task-ID state cpu mem io stat failed
> > > > >
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-2.local MASTER r 00:49:12 554.13753 0.09163
> > > > > one.q@compute-1-2.local SLAVE
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-5.local SLAVE 1.compute-1-5 r 00:48:53 551.49022 0.09410
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-9.local SLAVE 1.compute-1-9 r 00:50:00 564.22764 0.09409
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-12.local SLAVE 1.compute-1-12 r 00:47:30 535.30379 0.09379
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-13.local SLAVE 1.compute-1-13 r 00:49:51 561.69868 0.09379
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-14.local SLAVE 1.compute-1-14 r 00:49:14 554.60818 0.09379
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-10.local SLAVE 1.compute-1-10 r 00:49:59 562.95487 0.09349
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-15.local SLAVE 1.compute-1-15 r 00:50:01 563.27221 0.09361
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-8.local SLAVE 1.compute-1-8 r 00:49:26 556.68431 0.09349
> > > > > 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-4.local SLAVE 1.compute-1-4 r 00:49:27 556.87510 0.04967
> > > >
> > > > Yes, here you got 10 slots (= cores) granted by SGE. So there is no
free core left inside the allocation of SGE to allow the use of additional
cores for your threads. If you use more cores than
> granted by SGE, it will oversubscribe the machines.
> > > >
> > > > The issue is now:
> > > >
> > > > a) If you want 8 threads per MPI process, your job will use 80
cores in total - for now SGE isn't aware of it.
> > > >
> > > > b) Although you specified $fill_up as allocation rule, it looks
like $round_robin. Is there more than one slot defined in the queue
definition of one.q to get exclusive access?
> > > >
> > > > c) What version of SGE are you using? Certain ones use cgroups or
bind processes directly to cores (although it usually needs to be requested
by the job: first line of `qconf -help`).
> > > >
> > > >
> > > > In case you are alone in the cluster, you could bypass the
allocation with b) (unless you are hit by c)). But having a mixture of
users and jobs a different handling would be necessary to
> handle this in a proper way IMO:
> > > >
> > > > a) having a PE with a fixed allocation rule of 8
> > > >
> > > > b) requesting this PE with an overall slot count of 80
> > > >
> > > > c) copy and alter the $PE_HOSTFILE to show only (granted core count
per machine) divided by (OMP_NUM_THREADS) per entry, change $PE_HOSTFILE so
that it points to the altered file
> > > >
> > > > d) Open MPI with a Tight Integration will now start only N process
per machine according to the altered hostfile, in your case one
> > > >
> > > > e) Your application can start the desired threads and you stay
inside the granted allocation
> > > >
> > > > -- Reuti
> > > >
> > > >
> > > > > I accessed to the MASTER processor with 'ssh compute-1-2.local' ,
and with $ ps -e f and got this, I'm showing only the last lines
> > > > >
> > > > > 2506 ? Ss 0:00 /usr/sbin/atd
> > > > > 2548 tty1 Ss+ 0:00 /sbin/mingetty /dev/tty1
> > > > > 2550 tty2 Ss+ 0:00 /sbin/mingetty /dev/tty2
> > > > > 2552 tty3 Ss+ 0:00 /sbin/mingetty /dev/tty3
> > > > > 2554 tty4 Ss+ 0:00 /sbin/mingetty /dev/tty4
> > > > > 2556 tty5 Ss+ 0:00 /sbin/mingetty /dev/tty5
> > > > > 2558 tty6 Ss+ 0:00 /sbin/mingetty /dev/tty6
> > > > > 3325 ? Sl 0:04 /opt/gridengine/bin/linux-x64/sge_execd
> > > > > 17688 ? S 0:00 \_ sge_shepherd-2726 -bg
> > > > > 17695 ? Ss 0:00 \_
-bash /opt/gridengine/default/spool/compute-1-2/job_scripts/2726
> > > > > 17797 ? S 0:00 \_ /usr/bin/time -f %E /opt/openmpi/bin/mpirun -v
-np 10 ./inverse.exe
> > > > > 17798 ? S 0:01 \_ /opt/openmpi/bin/mpirun -v -np 10 ./inverse.exe
> > > > > 17799 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-5.local PATH=/opt/openmpi/bin:$PATH ; expo
> > > > > 17800 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-9.local PATH=/opt/openmpi/bin:$PATH ; expo
> > > > > 17801 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-12.local PATH=/opt/openmpi/bin:$PATH ; exp
> > > > > 17802 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-13.local PATH=/opt/openmpi/bin:$PATH ; exp
> > > > > 17803 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-14.local PATH=/opt/openmpi/bin:$PATH ; exp
> > > > > 17804 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-10.local PATH=/opt/openmpi/bin:$PATH ; exp
> > > > > 17805 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-15.local PATH=/opt/openmpi/bin:$PATH ; exp
> > > > > 17806 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-8.local PATH=/opt/openmpi/bin:$PATH ; expo
> > > > > 17807 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-4.local PATH=/opt/openmpi/bin:$PATH ; expo
> > > > > 17826 ? R 31:36 \_ ./inverse.exe
> > > > > 3429 ? Ssl 0:00 automount --pid-file /var/run/autofs.pid
> > > > >
> > > > > So the job is using the 10 machines, Until here is all right OK.
Do you think that changing the "allocation_rule " to a number instead
$fill_up the MPI processes would divide the work in that
> number of threads?
> > > > >
> > > > > Thanks a lot
> > > > >
> > > > > Oscar Fabian Mojica Ladino
> > > > > Geologist M.S. in Geophysics
> > > > >
> > > > >
> > > > > PS: I have another doubt, what is a slot? is a physical core?
> > > > >
> > > > >
> > > > > > From: re...@staff.uni-marburg.de
> > > > > > Date: Thu, 14 Aug 2014 23:54:22 +0200
> > > > > > To: us...@open-mpi.org
> > > > > > Subject: Re: [OMPI users] Running a hybrid MPI+openMP program
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I think this is a broader issue in case an MPI library is used
in conjunction with threads while running inside a queuing system. First:
whether your actual installation of Open MPI is
> SGE-aware you can check with:
> > > > > >
> > > > > > $ ompi_info | grep grid
> > > > > > MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5)
> > > > > >
> > > > > > Then we can look at the definition of your PE: "allocation_rule
$fill_up". This means that SGE will grant you 14 slots in total in any
combination on the available machines, means 8+4+2
> slots allocation is an allowed combination like 4+4+3+3 and so on.
Depending on the SGE-awareness it's a question: will your application just
start processes on all nodes and completely disregard the
> granted allocation, or as the other extreme does it stays on one and the
same machine for all started processes? On the master node of the parallel
job you can issue:
> > > > > >
> > > > > > $ ps -e f
> > > > > >
> > > > > > (f w/o -) to have a look whether `ssh` or `qrsh -inhert ...` is
used to reach other machines and their requested process count.
> > > > > >
> > > > > >
> > > > > > Now to the common problem in such a set up:
> > > > > >
> > > > > > AFAICS: for now there is no way in the Open MPI + SGE
combination to specify the number of MPI processes and intended number of
threads which are automatically read by Open MPI while
> staying inside the granted slot count and allocation. So it seems to be
necessary to have the intended number of threads being honored by Open MPI
too.
> > > > > >
> > > > > > Hence specifying e.g. "allocation_rule 8" in such a setup while
requesting 32 processes, would for now start 32 processes by MPI already,
as Open MP reads the $PE_HOSTFILE and acts
> accordingly.
> > > > > >
> > > > > > Open MPI would have to read the generated machine file in a
slightly different way regarding threads: a) read the $PE_HOSTFILE, b)
divide the granted slots per machine by OMP_NUM_THREADS,
> c) throw an error in case it's not divisible by OMP_NUM_THREADS. Then
start one process per quotient.
> > > > > >
> > > > > > Would this work for you?
> > > > > >
> > > > > > -- Reuti
> > > > > >
> > > > > > PS: This would also mean to have a couple of PEs in SGE having
a fixed "allocation_rule". While this works right now, an extension in SGE
could be "$fill_up_omp"/"$round_robin_omp" and
> using OMP_NUM_THREADS there too, hence it must not be specified as an
`export` in the job script but either on the command line or inside the job
script in #$ lines as job requests. This would mean
> to collect slots in bunches of OMP_NUM_THREADS on each machine to reach
the overall specified slot count. Whether OMP_NUM_THREADS or n times
OMP_NUM_THREADS is allowed per machine needs to be
> discussed.
> > > > > >
> > > > > > PS2: As Univa SGE can also supply a list of granted cores in
the $PE_HOSTFILE, it would be an extension to feed this to Open MPI to
allow any UGE aware binding.
> > > > > >
> > > > > >
> > > > > > Am 14.08.2014 um 21:52 schrieb Oscar Mojica:
> > > > > >
> > > > > > > Guys
> > > > > > >
> > > > > > > I changed the line to run the program in the script with both
options
> > > > > > > /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v
--bind-to-none -np $NSLOTS ./inverse.exe
> > > > > > > /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v
--bind-to-socket -np $NSLOTS ./inverse.exe
> > > > > > >
> > > > > > > but I got the same results. When I use man mpirun appears:
> > > > > > >
> > > > > > > -bind-to-none, --bind-to-none
> > > > > > > Do not bind processes. (Default.)
> > > > > > >
> > > > > > > and the output of 'qconf -sp orte' is
> > > > > > >
> > > > > > > pe_name orte
> > > > > > > slots 9999
> > > > > > > user_lists NONE
> > > > > > > xuser_lists NONE
> > > > > > > start_proc_args /bin/true
> > > > > > > stop_proc_args /bin/true
> > > > > > > allocation_rule $fill_up
> > > > > > > control_slaves TRUE
> > > > > > > job_is_first_task FALSE
> > > > > > > urgency_slots min
> > > > > > > accounting_summary TRUE
> > > > > > >
> > > > > > > I don't know if the installed Open MPI was compiled with
'--with-sge'. How can i know that?
> > > > > > > before to think in an hybrid application i was using only MPI
and the program used few processors (14). The cluster possesses 28
machines, 15 with 16 cores and 13 with 8 cores totalizing
> 344 units of processing. When I submitted the job (only MPI), the MPI
processes were spread to the cores directly, for that reason I created a
new queue with 14 machines trying to gain more time. the
> results were the same in both cases. In the last case i could prove that
the processes were distributed to all machines correctly.
> > > > > > >
> > > > > > > What I must to do?
> > > > > > > Thanks
> > > > > > >
> > > > > > > Oscar Fabian Mojica Ladino
> > > > > > > Geologist M.S. in Geophysics
> > > > > > >
> > > > > > >
> > > > > > > > Date: Thu, 14 Aug 2014 10:10:17 -0400
> > > > > > > > From: maxime.boissonnea...@calculquebec.ca
> > > > > > > > To: us...@open-mpi.org
> > > > > > > > Subject: Re: [OMPI users] Running a hybrid MPI+openMP
program
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > > You DEFINITELY need to disable OpenMPI's new default
binding. Otherwise,
> > > > > > > > your N threads will run on a single core. --bind-to socket
would be my
> > > > > > > > recommendation for hybrid jobs.
> > > > > > > >
> > > > > > > > Maxime
> > > > > > > >
> > > > > > > >
> > > > > > > > Le 2014-08-14 10:04, Jeff Squyres (jsquyres) a écrit :
> > > > > > > > > I don't know much about OpenMP, but do you need to
disable Open MPI's default bind-to-core functionality (I'm assuming you're
using Open MPI 1.8.x)?
> > > > > > > > >
> > > > > > > > > You can try "mpirun --bind-to none ...", which will have
Open MPI not bind MPI processes to cores, which might allow OpenMP to think
that it can use all the cores, and therefore it
> will spawn num_cores threads...?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Aug 14, 2014, at 9:50 AM, Oscar Mojica
<o_moji...@hotmail.com> wrote:
> > > > > > > > >
> > > > > > > > >> Hello everybody
> > > > > > > > >>
> > > > > > > > >> I am trying to run a hybrid mpi + openmp program in a
cluster. I created a queue with 14 machines, each one with 16 cores. The
program divides the work among the 14 processors with
> MPI and within each processor a loop is also divided into 8 threads for
example, using openmp. The problem is that when I submit the job to the
queue the MPI processes don't divide the work into
> threads and the program prints the number of threads that are working
within each process as one.
> > > > > > > > >>
> > > > > > > > >> I made a simple test program that uses openmp and I
logged in one machine of the fourteen. I compiled it using gfortran
-fopenmp program.f -o exe, set the OMP_NUM_THREADS environment
> variable equal to 8 and when I ran directly in the terminal the loop was
effectively divided among the cores and for example in this case the
program printed the number of threads equal to 8
> > > > > > > > >>
> > > > > > > > >> This is my Makefile
> > > > > > > > >>
> > > > > > > > >> # Start of the makefile
> > > > > > > > >> # Defining variables
> > > > > > > > >> objects = inv_grav3d.o funcpdf.o gr3dprm.o fdjac.o
dsvd.o
> > > > > > > > >> #f90comp = /opt/openmpi/bin/mpif90
> > > > > > > > >> f90comp = /usr/bin/mpif90
> > > > > > > > >> #switch = -O3
> > > > > > > > >> executable = inverse.exe
> > > > > > > > >> # Makefile
> > > > > > > > >> all : $(executable)
> > > > > > > > >> $(executable) : $(objects)
> > > > > > > > >> $(f90comp) -fopenmp -g -O -o $(executable) $(objects)
> > > > > > > > >> rm $(objects)
> > > > > > > > >> %.o: %.f
> > > > > > > > >> $(f90comp) -c $<
> > > > > > > > >> # Cleaning everything
> > > > > > > > >> clean:
> > > > > > > > >> rm $(executable)
> > > > > > > > >> #    rm $(objects)
> > > > > > > > >> # End of the makefile
> > > > > > > > >>
> > > > > > > > >> and the script that i am using is
> > > > > > > > >>
> > > > > > > > >> #!/bin/bash
> > > > > > > > >> #$ -cwd
> > > > > > > > >> #$ -j y
> > > > > > > > >> #$ -S /bin/bash
> > > > > > > > >> #$ -pe orte 14
> > > t; > > > > >> #$ -N job
> > > > > > > > >> #$ -q new.q
> > > > > > > > >>
> > > > > > > > >> export OMP_NUM_THREADS=8
> > > > > > > > >> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v -np
$NSLOTS ./inverse.exe
> > > > > > > > >>
> > > > > > > > >> am I forgetting something?
> > > > > > > > >>
> > > > > > > > >> Thanks,
> > > > > > > > >>
> > > > > > > > >> Oscar Fabian Mojica Ladino
> > > > > > > > >> Geologist M.S. in Geophysics
> > > > > > > > >> _______________________________________________
> > > > > > > > >> users mailing list
> > > > > > > > >> us...@open-mpi.org
> > > > > > > > >> Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > > > > > >> Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25016.php
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > ---------------------------------
> > > > > > > > Maxime Boissonneault
> > > > > > > > Analyste de calcul - Calcul Québec, Université Laval
> > > > > > > > Ph. D. en physique
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > users mailing list
> > > > > > > > us...@open-mpi.org
> > > > > > > > Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > > > > > Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25020.php
> > > > > > > _______________________________________________
> > > > > > > users mailing list
> > > > > > > us...@open-mpi.org
> > > > > > > Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > > > > Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25032.php
> > > > > >
> > > > > > _______________________________________________
> > > > > > users mailing list
> > > > > > us...@open-mpi.org
> > > > > > Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > > > Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25034.php
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > us...@open-mpi.org
> > > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > > Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25037.php
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > us...@open-mpi.org
> > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25038.php
> > > _______________________________________________
> > > users mailing list
> > > us...@open-mpi.org
> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25079.php
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25080.php_______________________________________________

> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/usersLink to
this post: http://www.open-mpi.org/community/lists/users/2014/08/25096.php

Re: [OMPI users] Running a hybrid MPI+openMP program

Reply via email to