Re: [OMPI users] Running a hybrid MPI+openMP program

tmishima Wed, 20 Aug 2014 07:26:33 -0400 (EDT)

Reuti,

If you want to allocate 10 procs with N threads, the Torque
script below should work for you:


qsub -l nodes=10:ppn=N
mpirun -map-by slot:pe=N -np 10 -x OMP_NUM_THREADS=N ./inverse.exe

Then, the openmpi automatically reduces the logical slot count to 10
by dividing real slot count 10N by binding width of N.

I don't know why you want to use pe=N without binding, but unfortunately
the openmpi allocates successive cores to each process so far when you
use pe option - it forcibly bind_to core.

Tetsuya


> Hi,
>
> Am 20.08.2014 um 06:26 schrieb Tetsuya Mishima:
>
> > Reuti and Oscar,
> >
> > I'm a Torque user and I myself have never used SGE, so I hesitated to
join
> > the discussion.
> >
> > From my experience with the Torque, the openmpi 1.8 series has already
> > resolved the issue you pointed out in combining MPI with OpenMP.
> >
> > Please try to add --map-by slot:pe=8 option, if you want to use 8
threads.
> > Then, then openmpi 1.8 should allocate processes properly without any
modification
> > of the hostfile provided by the Torque.
> >
> > In your case(8 threads and 10 procs):
> >
> > # you have to request 80 slots using SGE command before mpirun
> > mpirun --map-by slot:pe=8 -np 10 ./inverse.exe
>
> Thx for pointing me to this option, for now I can't get it working though
(in fact, I want to use it without binding essentially). This allows to
tell Open MPI to bind more cores to each of the MPI
> processes - ok, but does it lower the slot count granted by Torque too? I
mean, was your submission command like:
>
> $ qsub -l nodes=10:ppn=8 ...
>
> so that Torque knows, that it should grant and remember this slot count
of a total of 80 for the correct accounting?
>
> -- Reuti
>
>
> > where you can omit --bind-to option because --bind-to core is assumed
> > as default when pe=N is provided by the user.
> > Regards,
> > Tetsuya
> >
> >> Hi,
> >>
> >> Am 19.08.2014 um 19:06 schrieb Oscar Mojica:
> >>
> >>> I discovered what was the error. I forgot include the '-fopenmp' when
I compiled the objects in the Makefile, so the program worked but it didn't
divide the job
> > in threads. Now the program is working and I can use until 15 cores for
machine in the queue one.q.
> >>>
> >>> Anyway i would like to try implement your advice. Well I'm not alone
in the cluster so i must implement your second suggestion. The steps are
> >>>
> >>> a) Use '$ qconf -mp orte' to change the allocation rule to 8
> >>
> >> The number of slots defined in your used one.q was also increased to 8
(`qconf -sq one.q`)?
> >>
> >>
> >>> b) Set '#$ -pe orte 80' in the script
> >>
> >> Fine.
> >>
> >>
> >>> c) I'm not sure how to do this step. I'd appreciate your help here. I
can add some lines to the script to determine the PE_HOSTFILE path and
contents, but i
> > don't know how alter it
> >>
> >> For now you can put in your jobscript (just after OMP_NUM_THREAD is
exported):
> >>
> >> awk -v omp_num_threads=$OMP_NUM_THREADS '{ $2/=omp_num_threads;
print }' $PE_HOSTFILE > $TMPDIR/machines
> >> export PE_HOSTFILE=$TMPDIR/machines
> >>
> >> =============
> >>
> >> Unfortunately noone stepped into this discussion, as in my opinion
it's a much broader issue which targets all users who want to combine MPI
with OpenMP. The
> > queuingsystem should get a proper request for the overall amount of
slots the user needs. For now this will be forwarded to Open MPI and it
will use this
> > information to start the appropriate number of processes (which was an
achievement for the Tight Integration out-of-the-box of course) and ignores
any setting of
> > OMP_NUM_THREADS. So, where should the generated list of machines be
adjusted; there are several options:
> >>
> >> a) The PE of the queuingsystem should do it:
> >>
> >> + a one time setup for the admin
> >> + in SGE the "start_proc_args" of the PE could alter the $PE_HOSTFILE
> >> - the "start_proc_args" would need to know the number of threads, i.e.
OMP_NUM_THREADS must be defined by "qsub -v ..." outside of the jobscript
(tricky scanning
> > of the submitted jobscript for OMP_NUM_THREADS would be too nasty)
> >> - limits to use inside the jobscript calls to libraries behaving in
the same way as Open MPI only
> >>
> >>
> >> b) The particular queue should do it in a queue prolog:
> >>
> >> same as a) I think
> >>
> >>
> >> c) The user should do it
> >>
> >> + no change in the SGE installation
> >> - each and every user must include it in all the jobscripts to adjust
the list and export the pointer to the $PE_HOSTFILE, but he could change it
forth and back
> > for different steps of the jobscript though
> >>
> >>
> >> d) Open MPI should do it
> >>
> >> + no change in the SGE installation
> >> + no change to the jobscript
> >> + OMP_NUM_THREADS can be altered for different steps of the jobscript
while staying inside the granted allocation automatically
> >> o should MKL_NUM_THREADS be covered too (does it use OMP_NUM_THREADS
already)?
> >>
> >> -- Reuti
> >>
> >>
> >>> echo "PE_HOSTFILE:"
> >>> echo $PE_HOSTFILE
> >>> echo
> >>> echo "cat PE_HOSTFILE:"
> >>> cat $PE_HOSTFILE
> >>>
> >>> Thanks for take a time for answer this emails, your advices had been
very useful
> >>>
> >>> PS: The version of SGE is   OGS/GE 2011.11p1
> >>>
> >>>
> >>> Oscar Fabian Mojica Ladino
> >>> Geologist M.S. in  Geophysics
> >>>
> >>>
> >>>> From: re...@staff.uni-marburg.de
> >>>> Date: Fri, 15 Aug 2014 20:38:12 +0200
> >>>> To: us...@open-mpi.org
> >>>> Subject: Re: [OMPI users] Running a hybrid MPI+openMP program
> >>>>
> >>>> Hi,
> >>>>
> >>>> Am 15.08.2014 um 19:56 schrieb Oscar Mojica:
> >>>>
> >>>>> Yes, my installation of Open MPI is SGE-aware. I got the following
> >>>>>
> >>>>> [oscar@compute-1-2 ~]$ ompi_info | grep grid
> >>>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.2)
> >>>>
> >>>> Fine.
> >>>>
> >>>>
> >>>>> I'm a bit slow and I didn't understand the las part of your
message. So i made a test trying to solve my doubts.
> >>>>> This is the cluster configuration: There are some machines turned
off but that is no problem
> >>>>>
> >>>>> [oscar@aguia free-noise]$ qhost
> >>>>> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
> >>>>>
-------------------------------------------------------------------------------

> >>>>> global - - - - - - -
> >>>>> compute-1-10 linux-x64 16 0.97 23.6G 558.6M 996.2M 0.0
> >>>>> compute-1-11 linux-x64 16 - 23.6G - 996.2M -
> >>>>> compute-1-12 linux-x64 16 0.97 23.6G 561.1M 996.2M 0.0
> >>>>> compute-1-13 linux-x64 16 0.99 23.6G 558.7M 996.2M 0.0
> >>>>> compute-1-14 linux-x64 16 1.00 23.6G 555.1M 996.2M 0.0
> >>>>> compute-1-15 linux-x64 16 0.97 23.6G 555.5M 996.2M 0.0
> >>>>> compute-1-16 linux-x64 8 0.00 15.7G 296.9M 1000.0M 0.0
> >>>>> compute-1-17 linux-x64 8 0.00 15.7G 299.4M 1000.0M 0.0
> >>>>> compute-1-18 linux-x64 8 - 15.7G - 1000.0M -
> >>>>> compute-1-19 linux-x64 8 - 15.7G - 996.2M -
> >>>>> compute-1-2 linux-x64 16 1.19 23.6G 468.1M 1000.0M 0.0
> >>>>> compute-1-20 linux-x64 8 0.04 15.7G 297.2M 1000.0M 0.0
> >>>>> compute-1-21 linux-x64 8 - 15.7G - 1000.0M -
> >>>>> compute-1-22 linux-x64 8 0.00 15.7G 297.2M 1000.0M 0.0
> >>>>> compute-1-23 linux-x64 8 0.16 15.7G 299.6M 1000.0M 0.0
> >>>>> compute-1-24 linux-x64 8 0.00 15.7G 291.5M 996.2M 0.0
> >>>>> compute-1-25 linux-x64 8 0.04 15.7G 293.4M 996.2M 0.0
> >>>>> compute-1-26 linux-x64 8 - 15.7G - 1000.0M -
> >>>>> compute-1-27 linux-x64 8 0.00 15.7G 297.0M 1000.0M 0.0
> >>>>> compute-1-29 linux-x64 8 - 15.7G - 1000.0M -
> >>>>> compute-1-3 linux-x64 16 - 23.6G - 996.2M -
> >>>>> compute-1-30 linux-x64 16 - 23.6G - 996.2M -
> >>>>> compute-1-4 linux-x64 16 0.97 23.6G 571.6M 996.2M 0.0
> >>>>> compute-1-5 linux-x64 16 1.00 23.6G 559.6M 996.2M 0.0
> >>>>> compute-1-6 linux-x64 16 0.66 23.6G 403.1M 996.2M 0.0
> >>>>> compute-1-7 linux-x64 16 0.95 23.6G 402.7M 996.2M 0.0
> >>>>> compute-1-8 linux-x64 16 0.97 23.6G 556.8M 996.2M 0.0
> >>>>> compute-1-9 linux-x64 16 1.02 23.6G 566.0M 1000.0M 0.0
> >>>>>
> >>>>> I ran my program using only MPI with 10 processors of the queue
one.q which has 14 machines (compute-1-2 to compute-1-15). Whit 'qstat -t'
I got:
> >>>>>
> >>>>> [oscar@aguia free-noise]$ qstat -t
> >>>>> job-ID prior name user state submit/start at queue master
ja-task-ID task-ID state cpu mem io stat failed
> >>>>>
> >
-------------------------------------------------------------------------------------------------------------------------------------------------------------------

> > ----
> >>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-2.local MASTER r 00:49:12 554.13753 0.09163
> >>>>> one.q@compute-1-2.local SLAVE
> >>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-5.local SLAVE 1.compute-1-5 r 00:48:53 551.49022 0.09410
> >>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-9.local SLAVE 1.compute-1-9 r 00:50:00 564.22764 0.09409
> >>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-12.local SLAVE 1.compute-1-12 r 00:47:30 535.30379 0.09379
> >>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-13.local SLAVE 1.compute-1-13 r 00:49:51 561.69868 0.09379
> >>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-14.local SLAVE 1.compute-1-14 r 00:49:14 554.60818 0.09379
> >>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-10.local SLAVE 1.compute-1-10 r 00:49:59 562.95487 0.09349
> >>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-15.local SLAVE 1.compute-1-15 r 00:50:01 563.27221 0.09361
> >>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-8.local SLAVE 1.compute-1-8 r 00:49:26 556.68431 0.09349
> >>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
one.q@compute-1-4.local SLAVE 1.compute-1-4 r 00:49:27 556.87510 0.04967
> >>>>
> >>>> Yes, here you got 10 slots (= cores) granted by SGE. So there is no
free core left inside the allocation of SGE to allow the use of additional
cores for your
> > threads. If you use more cores than granted by SGE, it will
oversubscribe the machines.
> >>>>
> >>>> The issue is now:
> >>>>
> >>>> a) If you want 8 threads per MPI process, your job will use 80 cores
in total - for now SGE isn't aware of it.
> >>>>
> >>>> b) Although you specified $fill_up as allocation rule, it looks like
$round_robin. Is there more than one slot defined in the queue definition
of one.q to get
> > exclusive access?
> >>>>
> >>>> c) What version of SGE are you using? Certain ones use cgroups or
bind processes directly to cores (although it usually needs to be requested
by the job:
> > first line of `qconf -help`).
> >>>>
> >>>>
> >>>> In case you are alone in the cluster, you could bypass the
allocation with b) (unless you are hit by c)). But having a mixture of
users and jobs a different
> > handling would be necessary to handle this in a proper way IMO:
> >>>>
> >>>> a) having a PE with a fixed allocation rule of 8
> >>>>
> >>>> b) requesting this PE with an overall slot count of 80
> >>>>
> >>>> c) copy and alter the $PE_HOSTFILE to show only (granted core count
per machine) divided by (OMP_NUM_THREADS) per entry, change $PE_HOSTFILE so
that it points
> > to the altered file
> >>>>
> >>>> d) Open MPI with a Tight Integration will now start only N process
per machine according to the altered hostfile, in your case one
> >>>>
> >>>> e) Your application can start the desired threads and you stay
inside the granted allocation
> >>>>
> >>>> -- Reuti
> >>>>
> >>>>
> >>>>> I accessed to the MASTER processor with 'ssh compute-1-2.local' ,
and with $ ps -e f and got this, I'm showing only the last lines
> >>>>>
> >>>>> 2506 ? Ss 0:00 /usr/sbin/atd
> >>>>> 2548 tty1 Ss+ 0:00 /sbin/mingetty /dev/tty1
> >>>>> 2550 tty2 Ss+ 0:00 /sbin/mingetty /dev/tty2
> >>>>> 2552 tty3 Ss+ 0:00 /sbin/mingetty /dev/tty3
> >>>>> 2554 tty4 Ss+ 0:00 /sbin/mingetty /dev/tty4
> >>>>> 2556 tty5 Ss+ 0:00 /sbin/mingetty /dev/tty5
> >>>>> 2558 tty6 Ss+ 0:00 /sbin/mingetty /dev/tty6
> >>>>> 3325 ? Sl 0:04 /opt/gridengine/bin/linux-x64/sge_execd
> >>>>> 17688 ? S 0:00 \_ sge_shepherd-2726 -bg
> >>>>> 17695 ? Ss 0:00 \_
-bash /opt/gridengine/default/spool/compute-1-2/job_scripts/2726
> >>>>> 17797 ? S 0:00 \_ /usr/bin/time -f %E /opt/openmpi/bin/mpirun -v
-np 10 ./inverse.exe
> >>>>> 17798 ? S 0:01 \_ /opt/openmpi/bin/mpirun -v -np 10 ./inverse.exe
> >>>>> 17799 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-5.local PATH=/opt/openmpi/bin:$PATH ; expo
> >>>>> 17800 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-9.local PATH=/opt/openmpi/bin:$PATH ; expo
> >>>>> 17801 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-12.local PATH=/opt/openmpi/bin:$PATH ; exp
> >>>>> 17802 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-13.local PATH=/opt/openmpi/bin:$PATH ; exp
> >>>>> 17803 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-14.local PATH=/opt/openmpi/bin:$PATH ; exp
> >>>>> 17804 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-10.local PATH=/opt/openmpi/bin:$PATH ; exp
> >>>>> 17805 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-15.local PATH=/opt/openmpi/bin:$PATH ; exp
> >>>>> 17806 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-8.local PATH=/opt/openmpi/bin:$PATH ; expo
> >>>>> 17807 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
-nostdin -V compute-1-4.local PATH=/opt/openmpi/bin:$PATH ; expo
> >>>>> 17826 ? R 31:36 \_ ./inverse.exe
> >>>>> 3429 ? Ssl 0:00 automount --pid-file /var/run/autofs.pid
> >>>>>
> >>>>> So the job is using the 10 machines, Until here is all right OK. Do
you think that changing the "allocation_rule " to a number instead $fill_up
the MPI
> > processes would divide the work in that number of threads?
> >>>>>
> >>>>> Thanks a lot
> >>>>>
> >>>>> Oscar Fabian Mojica Ladino
> >>>>> Geologist M.S. in Geophysics
> >>>>>
> >>>>>
> >>>>> PS: I have another doubt, what is a slot? is a physical core?
> >>>>>
> >>>>>
> >>>>>> From: re...@staff.uni-marburg.de
> >>>>>> Date: Thu, 14 Aug 2014 23:54:22 +0200
> >>>>>> To: us...@open-mpi.org
> >>>>>> Subject: Re: [OMPI users] Running a hybrid MPI+openMP program
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I think this is a broader issue in case an MPI library is used in
conjunction with threads while running inside a queuing system. First:
whether your
> > actual installation of Open MPI is SGE-aware you can check with:
> >>>>>>
> >>>>>> $ ompi_info | grep grid
> >>>>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5)
> >>>>>>
> >>>>>> Then we can look at the definition of your PE: "allocation_rule
$fill_up". This means that SGE will grant you 14 slots in total in any
combination on the
> > available machines, means 8+4+2 slots allocation is an allowed
combination like 4+4+3+3 and so on. Depending on the SGE-awareness it's a
question: will your
> > application just start processes on all nodes and completely disregard
the granted allocation, or as the other extreme does it stays on one and
the same machine
> > for all started processes? On the master node of the parallel job you
can issue:
> >>>>>>
> >>>>>> $ ps -e f
> >>>>>>
> >>>>>> (f w/o -) to have a look whether `ssh` or `qrsh -inhert ...` is
used to reach other machines and their requested process count.
> >>>>>>
> >>>>>>
> >>>>>> Now to the common problem in such a set up:
> >>>>>>
> >>>>>> AFAICS: for now there is no way in the Open MPI + SGE combination
to specify the number of MPI processes and intended number of threads which
are
> > automatically read by Open MPI while staying inside the granted slot
count and allocation. So it seems to be necessary to have the intended
number of threads being
> > honored by Open MPI too.
> >>>>>>
> >>>>>> Hence specifying e.g. "allocation_rule 8" in such a setup while
requesting 32 processes, would for now start 32 processes by MPI already,
as Open MP reads > the $PE_HOSTFILE and acts accordingly.
> >>>>>>
> >>>>>> Open MPI would have to read the generated machine file in a
slightly different way regarding threads: a) read the $PE_HOSTFILE, b)
divide the granted
> > slots per machine by OMP_NUM_THREADS, c) throw an error in case it's
not divisible by OMP_NUM_THREADS. Then start one process per quotient.
> >>>>>>
> >>>>>> Would this work for you?
> >>>>>>
> >>>>>> -- Reuti
> >>>>>>
> >>>>>> PS: This would also mean to have a couple of PEs in SGE having a
fixed "allocation_rule". While this works right now, an extension in SGE
could be
> > "$fill_up_omp"/"$round_robin_omp" and using OMP_NUM_THREADS there too,
hence it must not be specified as an `export` in the job script but either
on the command
> > line or inside the job script in #$ lines as job requests. This would
mean to collect slots in bunches of OMP_NUM_THREADS on each machine to
reach the overall
> > specified slot count. Whether OMP_NUM_THREADS or n times
OMP_NUM_THREADS is allowed per machine needs to be discussed.
> >>>>>>
> >>>>>> PS2: As Univa SGE can also supply a list of granted cores in the
$PE_HOSTFILE, it would be an extension to feed this to Open MPI to allow
any UGE aware
> > binding.
> >>>>>>
> >>>>>>
> >>>>>> Am 14.08.2014 um 21:52 schrieb Oscar Mojica:
> >>>>>>
> >>>>>>> Guys
> >>>>>>>
> >>>>>>> I changed the line to run the program in the script with both
options
> >>>>>>> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v --bind-to-none
-np $NSLOTS ./inverse.exe
> >>>>>>> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v --bind-to-socket
-np $NSLOTS ./inverse.exe
> >>>>>>>
> >>>>>>> but I got the same results. When I use man mpirun appears:
> >>>>>>>
> >>>>>>> -bind-to-none, --bind-to-none
> >>>>>>> Do not bind processes. (Default.)
> >>>>>>>
> >>>>>>> and the output of 'qconf -sp orte' is
> >>>>>>>
> >>>>>>> pe_name orte
> >>>>>>> slots 9999
> >>>>>>> user_lists NONE
> >>>>>>> xuser_lists NONE
> >>>>>>> start_proc_args /bin/true
> >>>>>>> stop_proc_args /bin/true
> >>>>>>> allocation_rule $fill_up
> >>>>>>> control_slaves TRUE
> >>>>>>> job_is_first_task FALSE
> >>>>>>> urgency_slots min
> >>>>>>> accounting_summary TRUE
> >>>>>>>
> >>>>>>> I don't know if the installed Open MPI was compiled with
'--with-sge'. How can i know that?
> >>>>>>> before to think in an hybrid application i was using only MPI and
the program used few processors (14). The cluster possesses 28 machines, 15
with 16
> > cores and 13 with 8 cores totalizing 344 units of processing. When I
submitted the job (only MPI), the MPI processes were spread to the cores
directly, for that
> > reason I created a new queue with 14 machines trying to gain more time.
the results were the same in both cases. In the last case i could prove
that the processes
> > were distributed to all machines correctly.
> >>>>>>>
> >>>>>>> What I must to do?
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>> Oscar Fabian Mojica Ladino
> >>>>>>> Geologist M.S. in Geophysics
> >>>>>>>
> >>>>>>>
> >>>>>>>> Date: Thu, 14 Aug 2014 10:10:17 -0400
> >>>>>>>> From: maxime.boissonnea...@calculquebec.ca
> >>>>>>>> To: us...@open-mpi.org
> >>>>>>>> Subject: Re: [OMPI users] Running a hybrid MPI+openMP program
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>> You DEFINITELY need to disable OpenMPI's new default binding.
Otherwise,
> >>>>>>>> your N threads will run on a single core. --bind-to socket would
be my
> >>>>>>>> recommendation for hybrid jobs.
> >>>>>>>>
> >>>>>>>> Maxime
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Le 2014-08-14 10:04, Jeff Squyres (jsquyres) a 馗rit :
> >>>>>>>>> I don't know much about OpenMP, but do you need to disable Open
MPI's default bind-to-core functionality (I'm assuming you're using Open
MPI 1.8.x)?
> >>>>>>>>>
> >>>>>>>>> You can try "mpirun --bind-to none ...", which will have Open
MPI not bind MPI processes to cores, which might allow OpenMP to think that
it can use
> > all the cores, and therefore it will spawn num_cores threads...?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Aug 14, 2014, at 9:50 AM, Oscar Mojica
<o_moji...@hotmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hello everybody
> >>>>>>>>>>
> >>>>>>>>>> I am trying to run a hybrid mpi + openmp program in a cluster.
I created a queue with 14 machines, each one with 16 cores. The program
divides the
> > work among the 14 processors with MPI and within each processor a loop
is also divided into 8 threads for example, using openmp. The problem is
that when I submit
> > the job to the queue the MPI processes don't divide the work into
threads and the program prints the number of threads that are working
within each process as one.
> >>>>>>>>>>
> >>>>>>>>>> I made a simple test program that uses openmp and I logged in
one machine of the fourteen. I compiled it using gfortran -fopenmp
program.f -o exe,
> > set the OMP_NUM_THREADS environment variable equal to 8 and when I ran
directly in the terminal the loop was effectively divided among the cores
and for example in
> > this case the program printed the number of threads equal to 8
> >>>>>>>>>>
> >>>>>>>>>> This is my Makefile
> >>>>>>>>>>
> >>>>>>>>>> # Start of the makefile
> >>>>>>>>>> # Defining variables
> >>>>>>>>>> objects = inv_grav3d.o funcpdf.o gr3dprm.o fdjac.o dsvd.o
> >>>>>>>>>> #f90comp = /opt/openmpi/bin/mpif90
> >>>>>>>>>> f90comp = /usr/bin/mpif90
> >>>>>>>>>> #switch = -O3
> >>>>>>>>>> executable = inverse.exe
> >>>>>>>>>> # Makefile
> >>>>>>>>>> all : $(executable)
> >>>>>>>>>> $(executable) : $(objects)
> >>>>>>>>>> $(f90comp) -fopenmp -g -O -o $(executable) $(objects)
> >>>>>>>>>> rm $(objects)
> >>>>>>>>>> %.o: %.f
> >>>>>>>>>> $(f90comp) -c $<
> >>>>>>>>>> # Cleaning everything
> >>>>>>>>>> clean:
> >>>>>>>>>> rm $(executable)
> >>>>>>>>>> #  rm $(objects)
> >>>>>>>>>> # End of the makefile
> >>>>>>>>>>
> >>>>>>>>>> and the script that i am using is
> >>>>>>>>>>
> >>>>>>>>>> #!/bin/bash
> >>>>>>>>>> #$ -cwd
> >>>>>>>>>> #$ -j y
> >>>>>>>>>> #$ -S /bin/bash
> >>>>>>>>>> #$ -pe orte 14
> >>>>>>>>>> #$ -N job
> >>>>>>>>>> #$ -q new.q
> >>>>>>>>>>
> >>>>>>>>>> export OMP_NUM_THREADS=8
> >>>>>>>>>> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v -np
$NSLOTS ./inverse.exe
> >>>>>>>>>>
> >>>>>>>>>> am I forgetting something?
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>>
> >>>>>>>>>> Oscar Fabian Mojica Ladino
> >>>>>>>>>> Geologist M.S. in Geophysics
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> users mailing list
> >>>>>>>>>> us...@open-mpi.org
> >>>>>>>>>> Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>>>>>> Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25016.php
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> ---------------------------------
> >>>>>>>> Maxime Boissonneault
> >>>>>>>> Analyste de calcul - Calcul Qu饕ec, Universit・Laval
> >>>>>>>> Ph. D. en physique
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> users mailing list
> >>>>>>>> us...@open-mpi.org
> >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>>>> Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25020.php
> >>>>>>> _______________________________________________
> >>>>>>> users mailing list
> >>>>>>> us...@open-mpi.org
> >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>>> Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25032.php
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> users mailing list
> >>>>>> us...@open-mpi.org
> >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>> Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25034.php
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> us...@open-mpi.org
> >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>> Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25037.php
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> us...@open-mpi.org
> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>> Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25038.php
> >>> _______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>> Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25079.php
> >>
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25080.php
> >
> > ----
> > Tetsuya Mishima  tmish...@jcity.maeda.co.jp
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25081.php
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25083.php

Re: [OMPI users] Running a hybrid MPI+openMP program

Reply via email to