Re: [OMPI users] Running a hybrid MPI+openMP program

Ralph Castain Wed, 20 Aug 2014 10:26:53 -0400 (EDT)

On Aug 20, 2014, at 6:58 AM, Reuti <re...@staff.uni-marburg.de> wrote:


> Hi,
> 
> Am 20.08.2014 um 13:26 schrieb tmish...@jcity.maeda.co.jp:
> 
>> Reuti,
>> 
>> If you want to allocate 10 procs with N threads, the Torque
>> script below should work for you:
>> 
>> qsub -l nodes=10:ppn=N
>> mpirun -map-by slot:pe=N -np 10 -x OMP_NUM_THREADS=N ./inverse.exe
> 
> I played around with giving -np 10 in addition to a Tight Integration. The 
> slot count is not really divided I think, but only 10 out of the granted 
> maximum is used (while on each of the listed machines an `orted` is started). 
> Due to the fixed allocation this is of course the result we want to achieve 
> as it subtracts bunches of 8 from the given list of machines resp. slots. In 
> SGE it's sufficient to use and AFAICS it works (without touching the 
> $PE_HOSTFILE any longer):
> 
> ===
> export OMP_NUM_THREADS=8
> mpirun -map-by slot:pe=$OMP_NUM_THREADS -np $(bc <<<"$NSLOTS / 
> $OMP_NUM_THREADS") ./inverse.exe
> ===
> 
> and submit with:
> 
> $ qsub -pe orte 80 job.sh
> 
> as the variables are distributed to the slave nodes by SGE already.
> 
> Nevertheless, using -np in addition to the Tight Integration gives a taste of 
> a kind of half-tight integration in some way. And would not work for us 
> because "--bind-to none" can't be used in such a command (see below) and 
> throws an error.
> 
> 
>> Then, the openmpi automatically reduces the logical slot count to 10
>> by dividing real slot count 10N by binding width of N.
>> 
>> I don't know why you want to use pe=N without binding, but unfortunately
>> the openmpi allocates successive cores to each process so far when you
>> use pe option - it forcibly bind_to core.
> 
> In a shared cluster with many users and different MPI libraries in use, only 
> the queuingsystem could know which job got which cores granted. This avoids 
> any oversubscription of cores, while others are idle.

FWIW: we detect the exterior binding constraint and work within it


> 
> -- Reuti
> 
> 
>> Tetsuya
>> 
>> 
>>> Hi,
>>> 
>>> Am 20.08.2014 um 06:26 schrieb Tetsuya Mishima:
>>> 
>>>> Reuti and Oscar,
>>>> 
>>>> I'm a Torque user and I myself have never used SGE, so I hesitated to
>> join
>>>> the discussion.
>>>> 
>>>> From my experience with the Torque, the openmpi 1.8 series has already
>>>> resolved the issue you pointed out in combining MPI with OpenMP.
>>>> 
>>>> Please try to add --map-by slot:pe=8 option, if you want to use 8
>> threads.
>>>> Then, then openmpi 1.8 should allocate processes properly without any
>> modification
>>>> of the hostfile provided by the Torque.
>>>> 
>>>> In your case(8 threads and 10 procs):
>>>> 
>>>> # you have to request 80 slots using SGE command before mpirun
>>>> mpirun --map-by slot:pe=8 -np 10 ./inverse.exe
>>> 
>>> Thx for pointing me to this option, for now I can't get it working though
>> (in fact, I want to use it without binding essentially). This allows to
>> tell Open MPI to bind more cores to each of the MPI
>>> processes - ok, but does it lower the slot count granted by Torque too? I
>> mean, was your submission command like:
>>> 
>>> $ qsub -l nodes=10:ppn=8 ...
>>> 
>>> so that Torque knows, that it should grant and remember this slot count
>> of a total of 80 for the correct accounting?
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> where you can omit --bind-to option because --bind-to core is assumed
>>>> as default when pe=N is provided by the user.
>>>> Regards,
>>>> Tetsuya
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Am 19.08.2014 um 19:06 schrieb Oscar Mojica:
>>>>> 
>>>>>> I discovered what was the error. I forgot include the '-fopenmp' when
>> I compiled the objects in the Makefile, so the program worked but it didn't
>> divide the job
>>>> in threads. Now the program is working and I can use until 15 cores for
>> machine in the queue one.q.
>>>>>> 
>>>>>> Anyway i would like to try implement your advice. Well I'm not alone
>> in the cluster so i must implement your second suggestion. The steps are
>>>>>> 
>>>>>> a) Use '$ qconf -mp orte' to change the allocation rule to 8
>>>>> 
>>>>> The number of slots defined in your used one.q was also increased to 8
>> (`qconf -sq one.q`)?
>>>>> 
>>>>> 
>>>>>> b) Set '#$ -pe orte 80' in the script
>>>>> 
>>>>> Fine.
>>>>> 
>>>>> 
>>>>>> c) I'm not sure how to do this step. I'd appreciate your help here. I
>> can add some lines to the script to determine the PE_HOSTFILE path and
>> contents, but i
>>>> don't know how alter it
>>>>> 
>>>>> For now you can put in your jobscript (just after OMP_NUM_THREAD is
>> exported):
>>>>> 
>>>>> awk -v omp_num_threads=$OMP_NUM_THREADS '{ $2/=omp_num_threads;
>> print }' $PE_HOSTFILE > $TMPDIR/machines
>>>>> export PE_HOSTFILE=$TMPDIR/machines
>>>>> 
>>>>> =============
>>>>> 
>>>>> Unfortunately noone stepped into this discussion, as in my opinion
>> it's a much broader issue which targets all users who want to combine MPI
>> with OpenMP. The
>>>> queuingsystem should get a proper request for the overall amount of
>> slots the user needs. For now this will be forwarded to Open MPI and it
>> will use this
>>>> information to start the appropriate number of processes (which was an
>> achievement for the Tight Integration out-of-the-box of course) and ignores
>> any setting of
>>>> OMP_NUM_THREADS. So, where should the generated list of machines be
>> adjusted; there are several options:
>>>>> 
>>>>> a) The PE of the queuingsystem should do it:
>>>>> 
>>>>> + a one time setup for the admin
>>>>> + in SGE the "start_proc_args" of the PE could alter the $PE_HOSTFILE
>>>>> - the "start_proc_args" would need to know the number of threads, i.e.
>> OMP_NUM_THREADS must be defined by "qsub -v ..." outside of the jobscript
>> (tricky scanning
>>>> of the submitted jobscript for OMP_NUM_THREADS would be too nasty)
>>>>> - limits to use inside the jobscript calls to libraries behaving in
>> the same way as Open MPI only
>>>>> 
>>>>> 
>>>>> b) The particular queue should do it in a queue prolog:
>>>>> 
>>>>> same as a) I think
>>>>> 
>>>>> 
>>>>> c) The user should do it
>>>>> 
>>>>> + no change in the SGE installation
>>>>> - each and every user must include it in all the jobscripts to adjust
>> the list and export the pointer to the $PE_HOSTFILE, but he could change it
>> forth and back
>>>> for different steps of the jobscript though
>>>>> 
>>>>> 
>>>>> d) Open MPI should do it
>>>>> 
>>>>> + no change in the SGE installation
>>>>> + no change to the jobscript
>>>>> + OMP_NUM_THREADS can be altered for different steps of the jobscript
>> while staying inside the granted allocation automatically
>>>>> o should MKL_NUM_THREADS be covered too (does it use OMP_NUM_THREADS
>> already)?
>>>>> 
>>>>> -- Reuti
>>>>> 
>>>>> 
>>>>>> echo "PE_HOSTFILE:"
>>>>>> echo $PE_HOSTFILE
>>>>>> echo
>>>>>> echo "cat PE_HOSTFILE:"
>>>>>> cat $PE_HOSTFILE
>>>>>> 
>>>>>> Thanks for take a time for answer this emails, your advices had been
>> very useful
>>>>>> 
>>>>>> PS: The version of SGE is   OGS/GE 2011.11p1
>>>>>> 
>>>>>> 
>>>>>> Oscar Fabian Mojica Ladino
>>>>>> Geologist M.S. in  Geophysics
>>>>>> 
>>>>>> 
>>>>>>> From: re...@staff.uni-marburg.de
>>>>>>> Date: Fri, 15 Aug 2014 20:38:12 +0200
>>>>>>> To: us...@open-mpi.org
>>>>>>> Subject: Re: [OMPI users] Running a hybrid MPI+openMP program
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Am 15.08.2014 um 19:56 schrieb Oscar Mojica:
>>>>>>> 
>>>>>>>> Yes, my installation of Open MPI is SGE-aware. I got the following
>>>>>>>> 
>>>>>>>> [oscar@compute-1-2 ~]$ ompi_info | grep grid
>>>>>>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.2)
>>>>>>> 
>>>>>>> Fine.
>>>>>>> 
>>>>>>> 
>>>>>>>> I'm a bit slow and I didn't understand the las part of your
>> message. So i made a test trying to solve my doubts.
>>>>>>>> This is the cluster configuration: There are some machines turned
>> off but that is no problem
>>>>>>>> 
>>>>>>>> [oscar@aguia free-noise]$ qhost
>>>>>>>> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
>>>>>>>> 
>> -------------------------------------------------------------------------------
>> 
>>>>>>>> global - - - - - - -
>>>>>>>> compute-1-10 linux-x64 16 0.97 23.6G 558.6M 996.2M 0.0
>>>>>>>> compute-1-11 linux-x64 16 - 23.6G - 996.2M -
>>>>>>>> compute-1-12 linux-x64 16 0.97 23.6G 561.1M 996.2M 0.0
>>>>>>>> compute-1-13 linux-x64 16 0.99 23.6G 558.7M 996.2M 0.0
>>>>>>>> compute-1-14 linux-x64 16 1.00 23.6G 555.1M 996.2M 0.0
>>>>>>>> compute-1-15 linux-x64 16 0.97 23.6G 555.5M 996.2M 0.0
>>>>>>>> compute-1-16 linux-x64 8 0.00 15.7G 296.9M 1000.0M 0.0
>>>>>>>> compute-1-17 linux-x64 8 0.00 15.7G 299.4M 1000.0M 0.0
>>>>>>>> compute-1-18 linux-x64 8 - 15.7G - 1000.0M -
>>>>>>>> compute-1-19 linux-x64 8 - 15.7G - 996.2M -
>>>>>>>> compute-1-2 linux-x64 16 1.19 23.6G 468.1M 1000.0M 0.0
>>>>>>>> compute-1-20 linux-x64 8 0.04 15.7G 297.2M 1000.0M 0.0
>>>>>>>> compute-1-21 linux-x64 8 - 15.7G - 1000.0M -
>>>>>>>> compute-1-22 linux-x64 8 0.00 15.7G 297.2M 1000.0M 0.0
>>>>>>>> compute-1-23 linux-x64 8 0.16 15.7G 299.6M 1000.0M 0.0
>>>>>>>> compute-1-24 linux-x64 8 0.00 15.7G 291.5M 996.2M 0.0
>>>>>>>> compute-1-25 linux-x64 8 0.04 15.7G 293.4M 996.2M 0.0
>>>>>>>> compute-1-26 linux-x64 8 - 15.7G - 1000.0M -
>>>>>>>> compute-1-27 linux-x64 8 0.00 15.7G 297.0M 1000.0M 0.0
>>>>>>>> compute-1-29 linux-x64 8 - 15.7G - 1000.0M -
>>>>>>>> compute-1-3 linux-x64 16 - 23.6G - 996.2M -
>>>>>>>> compute-1-30 linux-x64 16 - 23.6G - 996.2M -
>>>>>>>> compute-1-4 linux-x64 16 0.97 23.6G 571.6M 996.2M 0.0
>>>>>>>> compute-1-5 linux-x64 16 1.00 23.6G 559.6M 996.2M 0.0
>>>>>>>> compute-1-6 linux-x64 16 0.66 23.6G 403.1M 996.2M 0.0
>>>>>>>> compute-1-7 linux-x64 16 0.95 23.6G 402.7M 996.2M 0.0
>>>>>>>> compute-1-8 linux-x64 16 0.97 23.6G 556.8M 996.2M 0.0
>>>>>>>> compute-1-9 linux-x64 16 1.02 23.6G 566.0M 1000.0M 0.0
>>>>>>>> 
>>>>>>>> I ran my program using only MPI with 10 processors of the queue
>> one.q which has 14 machines (compute-1-2 to compute-1-15). Whit 'qstat -t'
>> I got:
>>>>>>>> 
>>>>>>>> [oscar@aguia free-noise]$ qstat -t
>>>>>>>> job-ID prior name user state submit/start at queue master
>> ja-task-ID task-ID state cpu mem io stat failed
>>>>>>>> 
>>>> 
>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> 
>>>> ----
>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
>> one.q@compute-1-2.local MASTER r 00:49:12 554.13753 0.09163
>>>>>>>> one.q@compute-1-2.local SLAVE
>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
>> one.q@compute-1-5.local SLAVE 1.compute-1-5 r 00:48:53 551.49022 0.09410
>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
>> one.q@compute-1-9.local SLAVE 1.compute-1-9 r 00:50:00 564.22764 0.09409
>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
>> one.q@compute-1-12.local SLAVE 1.compute-1-12 r 00:47:30 535.30379 0.09379
>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
>> one.q@compute-1-13.local SLAVE 1.compute-1-13 r 00:49:51 561.69868 0.09379
>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
>> one.q@compute-1-14.local SLAVE 1.compute-1-14 r 00:49:14 554.60818 0.09379
>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
>> one.q@compute-1-10.local SLAVE 1.compute-1-10 r 00:49:59 562.95487 0.09349
>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
>> one.q@compute-1-15.local SLAVE 1.compute-1-15 r 00:50:01 563.27221 0.09361
>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
>> one.q@compute-1-8.local SLAVE 1.compute-1-8 r 00:49:26 556.68431 0.09349
>>>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21
>> one.q@compute-1-4.local SLAVE 1.compute-1-4 r 00:49:27 556.87510 0.04967
>>>>>>> 
>>>>>>> Yes, here you got 10 slots (= cores) granted by SGE. So there is no
>> free core left inside the allocation of SGE to allow the use of additional
>> cores for your
>>>> threads. If you use more cores than granted by SGE, it will
>> oversubscribe the machines.
>>>>>>> 
>>>>>>> The issue is now:
>>>>>>> 
>>>>>>> a) If you want 8 threads per MPI process, your job will use 80 cores
>> in total - for now SGE isn't aware of it.
>>>>>>> 
>>>>>>> b) Although you specified $fill_up as allocation rule, it looks like
>> $round_robin. Is there more than one slot defined in the queue definition
>> of one.q to get
>>>> exclusive access?
>>>>>>> 
>>>>>>> c) What version of SGE are you using? Certain ones use cgroups or
>> bind processes directly to cores (although it usually needs to be requested
>> by the job:
>>>> first line of `qconf -help`).
>>>>>>> 
>>>>>>> 
>>>>>>> In case you are alone in the cluster, you could bypass the
>> allocation with b) (unless you are hit by c)). But having a mixture of
>> users and jobs a different
>>>> handling would be necessary to handle this in a proper way IMO:
>>>>>>> 
>>>>>>> a) having a PE with a fixed allocation rule of 8
>>>>>>> 
>>>>>>> b) requesting this PE with an overall slot count of 80
>>>>>>> 
>>>>>>> c) copy and alter the $PE_HOSTFILE to show only (granted core count
>> per machine) divided by (OMP_NUM_THREADS) per entry, change $PE_HOSTFILE so
>> that it points
>>>> to the altered file
>>>>>>> 
>>>>>>> d) Open MPI with a Tight Integration will now start only N process
>> per machine according to the altered hostfile, in your case one
>>>>>>> 
>>>>>>> e) Your application can start the desired threads and you stay
>> inside the granted allocation
>>>>>>> 
>>>>>>> -- Reuti
>>>>>>> 
>>>>>>> 
>>>>>>>> I accessed to the MASTER processor with 'ssh compute-1-2.local' ,
>> and with $ ps -e f and got this, I'm showing only the last lines
>>>>>>>> 
>>>>>>>> 2506 ? Ss 0:00 /usr/sbin/atd
>>>>>>>> 2548 tty1 Ss+ 0:00 /sbin/mingetty /dev/tty1
>>>>>>>> 2550 tty2 Ss+ 0:00 /sbin/mingetty /dev/tty2
>>>>>>>> 2552 tty3 Ss+ 0:00 /sbin/mingetty /dev/tty3
>>>>>>>> 2554 tty4 Ss+ 0:00 /sbin/mingetty /dev/tty4
>>>>>>>> 2556 tty5 Ss+ 0:00 /sbin/mingetty /dev/tty5
>>>>>>>> 2558 tty6 Ss+ 0:00 /sbin/mingetty /dev/tty6
>>>>>>>> 3325 ? Sl 0:04 /opt/gridengine/bin/linux-x64/sge_execd
>>>>>>>> 17688 ? S 0:00 \_ sge_shepherd-2726 -bg
>>>>>>>> 17695 ? Ss 0:00 \_
>> -bash /opt/gridengine/default/spool/compute-1-2/job_scripts/2726
>>>>>>>> 17797 ? S 0:00 \_ /usr/bin/time -f %E /opt/openmpi/bin/mpirun -v
>> -np 10 ./inverse.exe
>>>>>>>> 17798 ? S 0:01 \_ /opt/openmpi/bin/mpirun -v -np 10 ./inverse.exe
>>>>>>>> 17799 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
>> -nostdin -V compute-1-5.local PATH=/opt/openmpi/bin:$PATH ; expo
>>>>>>>> 17800 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
>> -nostdin -V compute-1-9.local PATH=/opt/openmpi/bin:$PATH ; expo
>>>>>>>> 17801 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
>> -nostdin -V compute-1-12.local PATH=/opt/openmpi/bin:$PATH ; exp
>>>>>>>> 17802 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
>> -nostdin -V compute-1-13.local PATH=/opt/openmpi/bin:$PATH ; exp
>>>>>>>> 17803 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
>> -nostdin -V compute-1-14.local PATH=/opt/openmpi/bin:$PATH ; exp
>>>>>>>> 17804 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
>> -nostdin -V compute-1-10.local PATH=/opt/openmpi/bin:$PATH ; exp
>>>>>>>> 17805 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
>> -nostdin -V compute-1-15.local PATH=/opt/openmpi/bin:$PATH ; exp
>>>>>>>> 17806 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
>> -nostdin -V compute-1-8.local PATH=/opt/openmpi/bin:$PATH ; expo
>>>>>>>> 17807 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit
>> -nostdin -V compute-1-4.local PATH=/opt/openmpi/bin:$PATH ; expo
>>>>>>>> 17826 ? R 31:36 \_ ./inverse.exe
>>>>>>>> 3429 ? Ssl 0:00 automount --pid-file /var/run/autofs.pid
>>>>>>>> 
>>>>>>>> So the job is using the 10 machines, Until here is all right OK. Do
>> you think that changing the "allocation_rule " to a number instead $fill_up
>> the MPI
>>>> processes would divide the work in that number of threads?
>>>>>>>> 
>>>>>>>> Thanks a lot
>>>>>>>> 
>>>>>>>> Oscar Fabian Mojica Ladino
>>>>>>>> Geologist M.S. in Geophysics
>>>>>>>> 
>>>>>>>> 
>>>>>>>> PS: I have another doubt, what is a slot? is a physical core?
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> From: re...@staff.uni-marburg.de
>>>>>>>>> Date: Thu, 14 Aug 2014 23:54:22 +0200
>>>>>>>>> To: us...@open-mpi.org
>>>>>>>>> Subject: Re: [OMPI users] Running a hybrid MPI+openMP program
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I think this is a broader issue in case an MPI library is used in
>> conjunction with threads while running inside a queuing system. First:
>> whether your
>>>> actual installation of Open MPI is SGE-aware you can check with:
>>>>>>>>> 
>>>>>>>>> $ ompi_info | grep grid
>>>>>>>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5)
>>>>>>>>> 
>>>>>>>>> Then we can look at the definition of your PE: "allocation_rule
>> $fill_up". This means that SGE will grant you 14 slots in total in any
>> combination on the
>>>> available machines, means 8+4+2 slots allocation is an allowed
>> combination like 4+4+3+3 and so on. Depending on the SGE-awareness it's a
>> question: will your
>>>> application just start processes on all nodes and completely disregard
>> the granted allocation, or as the other extreme does it stays on one and
>> the same machine
>>>> for all started processes? On the master node of the parallel job you
>> can issue:
>>>>>>>>> 
>>>>>>>>> $ ps -e f
>>>>>>>>> 
>>>>>>>>> (f w/o -) to have a look whether `ssh` or `qrsh -inhert ...` is
>> used to reach other machines and their requested process count.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Now to the common problem in such a set up:
>>>>>>>>> 
>>>>>>>>> AFAICS: for now there is no way in the Open MPI + SGE combination
>> to specify the number of MPI processes and intended number of threads which
>> are
>>>> automatically read by Open MPI while staying inside the granted slot
>> count and allocation. So it seems to be necessary to have the intended
>> number of threads being
>>>> honored by Open MPI too.
>>>>>>>>> 
>>>>>>>>> Hence specifying e.g. "allocation_rule 8" in such a setup while
>> requesting 32 processes, would for now start 32 processes by MPI already,
>> as Open MP reads > the $PE_HOSTFILE and acts accordingly.
>>>>>>>>> 
>>>>>>>>> Open MPI would have to read the generated machine file in a
>> slightly different way regarding threads: a) read the $PE_HOSTFILE, b)
>> divide the granted
>>>> slots per machine by OMP_NUM_THREADS, c) throw an error in case it's
>> not divisible by OMP_NUM_THREADS. Then start one process per quotient.
>>>>>>>>> 
>>>>>>>>> Would this work for you?
>>>>>>>>> 
>>>>>>>>> -- Reuti
>>>>>>>>> 
>>>>>>>>> PS: This would also mean to have a couple of PEs in SGE having a
>> fixed "allocation_rule". While this works right now, an extension in SGE
>> could be
>>>> "$fill_up_omp"/"$round_robin_omp" and using OMP_NUM_THREADS there too,
>> hence it must not be specified as an `export` in the job script but either
>> on the command
>>>> line or inside the job script in #$ lines as job requests. This would
>> mean to collect slots in bunches of OMP_NUM_THREADS on each machine to
>> reach the overall
>>>> specified slot count. Whether OMP_NUM_THREADS or n times
>> OMP_NUM_THREADS is allowed per machine needs to be discussed.
>>>>>>>>> 
>>>>>>>>> PS2: As Univa SGE can also supply a list of granted cores in the
>> $PE_HOSTFILE, it would be an extension to feed this to Open MPI to allow
>> any UGE aware
>>>> binding.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Am 14.08.2014 um 21:52 schrieb Oscar Mojica:
>>>>>>>>> 
>>>>>>>>>> Guys
>>>>>>>>>> 
>>>>>>>>>> I changed the line to run the program in the script with both
>> options
>>>>>>>>>> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v --bind-to-none
>> -np $NSLOTS ./inverse.exe
>>>>>>>>>> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v --bind-to-socket
>> -np $NSLOTS ./inverse.exe
>>>>>>>>>> 
>>>>>>>>>> but I got the same results. When I use man mpirun appears:
>>>>>>>>>> 
>>>>>>>>>> -bind-to-none, --bind-to-none
>>>>>>>>>> Do not bind processes. (Default.)
>>>>>>>>>> 
>>>>>>>>>> and the output of 'qconf -sp orte' is
>>>>>>>>>> 
>>>>>>>>>> pe_name orte
>>>>>>>>>> slots 9999
>>>>>>>>>> user_lists NONE
>>>>>>>>>> xuser_lists NONE
>>>>>>>>>> start_proc_args /bin/true
>>>>>>>>>> stop_proc_args /bin/true
>>>>>>>>>> allocation_rule $fill_up
>>>>>>>>>> control_slaves TRUE
>>>>>>>>>> job_is_first_task FALSE
>>>>>>>>>> urgency_slots min
>>>>>>>>>> accounting_summary TRUE
>>>>>>>>>> 
>>>>>>>>>> I don't know if the installed Open MPI was compiled with
>> '--with-sge'. How can i know that?
>>>>>>>>>> before to think in an hybrid application i was using only MPI and
>> the program used few processors (14). The cluster possesses 28 machines, 15
>> with 16
>>>> cores and 13 with 8 cores totalizing 344 units of processing. When I
>> submitted the job (only MPI), the MPI processes were spread to the cores
>> directly, for that
>>>> reason I created a new queue with 14 machines trying to gain more time.
>> the results were the same in both cases. In the last case i could prove
>> that the processes
>>>> were distributed to all machines correctly.
>>>>>>>>>> 
>>>>>>>>>> What I must to do?
>>>>>>>>>> Thanks
>>>>>>>>>> 
>>>>>>>>>> Oscar Fabian Mojica Ladino
>>>>>>>>>> Geologist M.S. in Geophysics
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> Date: Thu, 14 Aug 2014 10:10:17 -0400
>>>>>>>>>>> From: maxime.boissonnea...@calculquebec.ca
>>>>>>>>>>> To: us...@open-mpi.org
>>>>>>>>>>> Subject: Re: [OMPI users] Running a hybrid MPI+openMP program
>>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> You DEFINITELY need to disable OpenMPI's new default binding.
>> Otherwise,
>>>>>>>>>>> your N threads will run on a single core. --bind-to socket would
>> be my
>>>>>>>>>>> recommendation for hybrid jobs.
>>>>>>>>>>> 
>>>>>>>>>>> Maxime
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Le 2014-08-14 10:04, Jeff Squyres (jsquyres) a 馗rit :
>>>>>>>>>>>> I don't know much about OpenMP, but do you need to disable Open
>> MPI's default bind-to-core functionality (I'm assuming you're using Open
>> MPI 1.8.x)?
>>>>>>>>>>>> 
>>>>>>>>>>>> You can try "mpirun --bind-to none ...", which will have Open
>> MPI not bind MPI processes to cores, which might allow OpenMP to think that
>> it can use
>>>> all the cores, and therefore it will spawn num_cores threads...?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Aug 14, 2014, at 9:50 AM, Oscar Mojica
>> <o_moji...@hotmail.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hello everybody
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I am trying to run a hybrid mpi + openmp program in a cluster.
>> I created a queue with 14 machines, each one with 16 cores. The program
>> divides the
>>>> work among the 14 processors with MPI and within each processor a loop
>> is also divided into 8 threads for example, using openmp. The problem is
>> that when I submit
>>>> the job to the queue the MPI processes don't divide the work into
>> threads and the program prints the number of threads that are working
>> within each process as one.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I made a simple test program that uses openmp and I logged in
>> one machine of the fourteen. I compiled it using gfortran -fopenmp
>> program.f -o exe,
>>>> set the OMP_NUM_THREADS environment variable equal to 8 and when I ran
>> directly in the terminal the loop was effectively divided among the cores
>> and for example in
>>>> this case the program printed the number of threads equal to 8
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This is my Makefile
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # Start of the makefile
>>>>>>>>>>>>> # Defining variables
>>>>>>>>>>>>> objects = inv_grav3d.o funcpdf.o gr3dprm.o fdjac.o dsvd.o
>>>>>>>>>>>>> #f90comp = /opt/openmpi/bin/mpif90
>>>>>>>>>>>>> f90comp = /usr/bin/mpif90
>>>>>>>>>>>>> #switch = -O3
>>>>>>>>>>>>> executable = inverse.exe
>>>>>>>>>>>>> # Makefile
>>>>>>>>>>>>> all : $(executable)
>>>>>>>>>>>>> $(executable) : $(objects)
>>>>>>>>>>>>> $(f90comp) -fopenmp -g -O -o $(executable) $(objects)
>>>>>>>>>>>>> rm $(objects)
>>>>>>>>>>>>> %.o: %.f
>>>>>>>>>>>>> $(f90comp) -c $<
>>>>>>>>>>>>> # Cleaning everything
>>>>>>>>>>>>> clean:
>>>>>>>>>>>>> rm $(executable)
>>>>>>>>>>>>> # rm $(objects)
>>>>>>>>>>>>> # End of the makefile
>>>>>>>>>>>>> 
>>>>>>>>>>>>> and the script that i am using is
>>>>>>>>>>>>> 
>>>>>>>>>>>>> #!/bin/bash
>>>>>>>>>>>>> #$ -cwd
>>>>>>>>>>>>> #$ -j y
>>>>>>>>>>>>> #$ -S /bin/bash
>>>>>>>>>>>>> #$ -pe orte 14
>>>>>>>>>>>>> #$ -N job
>>>>>>>>>>>>> #$ -q new.q
>>>>>>>>>>>>> 
>>>>>>>>>>>>> export OMP_NUM_THREADS=8
>>>>>>>>>>>>> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v -np
>> $NSLOTS ./inverse.exe
>>>>>>>>>>>>> 
>>>>>>>>>>>>> am I forgetting something?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Oscar Fabian Mojica Ladino
>>>>>>>>>>>>> Geologist M.S. in Geophysics
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>> Subscription:
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/08/25016.php
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> ---------------------------------
>>>>>>>>>>> Maxime Boissonneault
>>>>>>>>>>> Analyste de calcul - Calcul Qu饕ec, Universit・Laval
>>>>>>>>>>> Ph. D. en physique
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/08/25020.php
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/08/25032.php
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/08/25034.php
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/08/25037.php
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/08/25038.php
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/08/25079.php
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/08/25080.php
>>>> 
>>>> ----
>>>> Tetsuya Mishima  tmish...@jcity.maeda.co.jp
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/08/25081.php
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/08/25083.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/08/25084.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/25087.php

Re: [OMPI users] Running a hybrid MPI+openMP program

Reply via email to