Reuti, Sorry for confusing you. Under the managed condition, actually -np option is not necessary. So, this cmd line also works for me with Torque.
$ qsub -l nodes=10:ppn=N $ mpirun -map-by slot:pe=N ./inverse.exe At least, Ralph confirmed it worked with Slurm and I comfirmed with Torque as shown below: [mishima@manage ~]$ qsub -I -l nodes=4:ppn=8 qsub: waiting for job 8798.manage.cluster to start qsub: job 8798.manage.cluster ready [mishima@node09 ~]$ cat $PBS_NODEFILE node09 node09 node09 node09 node09 node09 node09 node09 node10 node10 node10 node10 node10 node10 node10 node10 node11 node11 node11 node11 node11 node11 node11 node11 node12 node12 node12 node12 node12 node12 node12 node12 [mishima@node09 ~]$ mpirun -map-by slot:pe=8 -display-map ~/mis/openmpi/demos/myprog Data for JOB [8050,1] offset 0 ======================== JOB MAP ======================== Data for node: node09 Num slots: 8 Max slots: 0 Num procs: 1 Process OMPI jobid: [8050,1] App: 0 Process rank: 0 Data for node: node10 Num slots: 8 Max slots: 0 Num procs: 1 Process OMPI jobid: [8050,1] App: 0 Process rank: 1 Data for node: node11 Num slots: 8 Max slots: 0 Num procs: 1 Process OMPI jobid: [8050,1] App: 0 Process rank: 2 Data for node: node12 Num slots: 8 Max slots: 0 Num procs: 1 Process OMPI jobid: [8050,1] App: 0 Process rank: 3 ============================================================= Hello world from process 0 of 4 Hello world from process 2 of 4 Hello world from process 3 of 4 Hello world from process 1 of 4 [mishima@node09 ~]$ mpirun -map-by slot:pe=4 -display-map ~/mis/openmpi/demos/myprog Data for JOB [8056,1] offset 0 ======================== JOB MAP ======================== Data for node: node09 Num slots: 8 Max slots: 0 Num procs: 2 Process OMPI jobid: [8056,1] App: 0 Process rank: 0 Process OMPI jobid: [8056,1] App: 0 Process rank: 1 Data for node: node10 Num slots: 8 Max slots: 0 Num procs: 2 Process OMPI jobid: [8056,1] App: 0 Process rank: 2 Process OMPI jobid: [8056,1] App: 0 Process rank: 3 Data for node: node11 Num slots: 8 Max slots: 0 Num procs: 2 Process OMPI jobid: [8056,1] App: 0 Process rank: 4 Process OMPI jobid: [8056,1] App: 0 Process rank: 5 Data for node: node12 Num slots: 8 Max slots: 0 Num procs: 2 Process OMPI jobid: [8056,1] App: 0 Process rank: 6 Process OMPI jobid: [8056,1] App: 0 Process rank: 7 ============================================================= Hello world from process 1 of 8 Hello world from process 0 of 8 Hello world from process 2 of 8 Hello world from process 3 of 8 Hello world from process 4 of 8 Hello world from process 5 of 8 Hello world from process 6 of 8 Hello world from process 7 of 8 I don't know why it dosen't work with SGE. Could you show me your output adding -display-map and -mca rmaps_base_verbose 5 options? By the way, the option -map-by ppr:N:node or ppr:N:socket might be useful for your purpose. The ppr can reduce the slot counts given by RM without binding and allocate N procs by the specified resource. [mishima@node09 ~]$ mpirun -map-by ppr:1:node -display-map ~/mis/openmpi/demos/myprog Data for JOB [7913,1] offset 0 ======================== JOB MAP ======================== Data for node: node09 Num slots: 8 Max slots: 0 Num procs: 1 Process OMPI jobid: [7913,1] App: 0 Process rank: 0 Data for node: node10 Num slots: 8 Max slots: 0 Num procs: 1 Process OMPI jobid: [7913,1] App: 0 Process rank: 1 Data for node: node11 Num slots: 8 Max slots: 0 Num procs: 1 Process OMPI jobid: [7913,1] App: 0 Process rank: 2 Data for node: node12 Num slots: 8 Max slots: 0 Num procs: 1 Process OMPI jobid: [7913,1] App: 0 Process rank: 3 ============================================================= Hello world from process 0 of 4 Hello world from process 2 of 4 Hello world from process 1 of 4 Hello world from process 3 of 4 Tetsuya > Hi, > > Am 20.08.2014 um 13:26 schrieb tmish...@jcity.maeda.co.jp: > > > Reuti, > > > > If you want to allocate 10 procs with N threads, the Torque > > script below should work for you: > > > > qsub -l nodes=10:ppn=N > > mpirun -map-by slot:pe=N -np 10 -x OMP_NUM_THREADS=N ./inverse.exe > > I played around with giving -np 10 in addition to a Tight Integration. The slot count is not really divided I think, but only 10 out of the granted maximum is used (while on each of the listed > machines an `orted` is started). Due to the fixed allocation this is of course the result we want to achieve as it subtracts bunches of 8 from the given list of machines resp. slots. In SGE it's > sufficient to use and AFAICS it works (without touching the $PE_HOSTFILE any longer): > > === > export OMP_NUM_THREADS=8 > mpirun -map-by slot:pe=$OMP_NUM_THREADS -np $(bc <<<"$NSLOTS / $OMP_NUM_THREADS") ./inverse.exe > === > > and submit with: > > $ qsub -pe orte 80 job.sh > > as the variables are distributed to the slave nodes by SGE already. > > Nevertheless, using -np in addition to the Tight Integration gives a taste of a kind of half-tight integration in some way. And would not work for us because "--bind-to none" can't be used in such a > command (see below) and throws an error. > > > > Then, the openmpi automatically reduces the logical slot count to 10 > > by dividing real slot count 10N by binding width of N. > > > > I don't know why you want to use pe=N without binding, but unfortunately > > the openmpi allocates successive cores to each process so far when you > > use pe option - it forcibly bind_to core. > > In a shared cluster with many users and different MPI libraries in use, only the queuingsystem could know which job got which cores granted. This avoids any oversubscription of cores, while others > are idle. > > -- Reuti > > > > Tetsuya > > > > > >> Hi, > >> > >> Am 20.08.2014 um 06:26 schrieb Tetsuya Mishima: > >> > >>> Reuti and Oscar, > >>> > >>> I'm a Torque user and I myself have never used SGE, so I hesitated to > > join > >>> the discussion. > >>> > >>> From my experience with the Torque, the openmpi 1.8 series has already > >>> resolved the issue you pointed out in combining MPI with OpenMP. > >>> > >>> Please try to add --map-by slot:pe=8 option, if you want to use 8 > > threads. > >>> Then, then openmpi 1.8 should allocate processes properly without any > > modification > >>> of the hostfile provided by the Torque. > >>> > >>> In your case(8 threads and 10 procs): > >>> > >>> # you have to request 80 slots using SGE command before mpirun > >>> mpirun --map-by slot:pe=8 -np 10 ./inverse.exe > >> > >> Thx for pointing me to this option, for now I can't get it working though > > (in fact, I want to use it without binding essentially). This allows to > > tell Open MPI to bind more cores to each of the MPI > >> processes - ok, but does it lower the slot count granted by Torque too? I > > mean, was your submission command like: > >> > >> $ qsub -l nodes=10:ppn=8 ... > >> > >> so that Torque knows, that it should grant and remember this slot count > > of a total of 80 for the correct accounting? > >> > >> -- Reuti > >> > >> > >>> where you can omit --bind-to option because --bind-to core is assumed > >>> as default when pe=N is provided by the user. > >>> Regards, > >>> Tetsuya > >>> > >>>> Hi, > >>>> > >>>> Am 19.08.2014 um 19:06 schrieb Oscar Mojica: > >>>> > >>>>> I discovered what was the error. I forgot include the '-fopenmp' when > > I compiled the objects in the Makefile, so the program worked but it didn't > > divide the job > >>> in threads. Now the program is working and I can use until 15 cores for > > machine in the queue one.q. > >>>>> > >>>>> Anyway i would like to try implement your advice. Well I'm not alone > > in the cluster so i must implement your second suggestion. The steps are > >>>>> > >>>>> a) Use '$ qconf -mp orte' to change the allocation rule to 8 > >>>> > >>>> The number of slots defined in your used one.q was also increased to 8 > > (`qconf -sq one.q`)? > >>>> > >>>> > >>>>> b) Set '#$ -pe orte 80' in the script > >>>> > >>>> Fine. > >>>> > >>>> > >>>>> c) I'm not sure how to do this step. I'd appreciate your help here. I > > can add some lines to the script to determine the PE_HOSTFILE path and > > contents, but i > >>> don't know how alter it > >>>> > >>>> For now you can put in your jobscript (just after OMP_NUM_THREAD is > > exported): > >>>> > >>>> awk -v omp_num_threads=$OMP_NUM_THREADS '{ $2/=omp_num_threads; > > print }' $PE_HOSTFILE > $TMPDIR/machines > >>>> export PE_HOSTFILE=$TMPDIR/machines > >>>> > >>>> ============= > >>>> > >>>> Unfortunately noone stepped into this discussion, as in my opinion > > it's a much broader issue which targets all users who want to combine MPI > > with OpenMP. The > >>> queuingsystem should get a proper request for the overall amount of > > slots the user needs. For now this will be forwarded to Open MPI and it > > will use this > >>> information to start the appropriate number of processes (which was an > > achievement for the Tight Integration out-of-the-box of course) and ignores > > any setting of > >>> OMP_NUM_THREADS. So, where should the generated list of machines be > > adjusted; there are several options: > >>>> > >>>> a) The PE of the queuingsystem should do it: > >>>> > >>>> + a one time setup for the admin > >>>> + in SGE the "start_proc_args" of the PE could alter the $PE_HOSTFILE > >>>> - the "start_proc_args" would need to know the number of threads, i.e. > > OMP_NUM_THREADS must be defined by "qsub -v ..." outside of the jobscript > > (tricky scanning > >>> of the submitted jobscript for OMP_NUM_THREADS would be too nasty) > >>>> - limits to use inside the jobscript calls to libraries behaving in > > the same way as Open MPI only > >>>> > >>>> > >>>> b) The particular queue should do it in a queue prolog: > >>>> > >>>> same as a) I think > >>>> > >>>> > >>>> c) The user should do it > >>>> > >>>> + no change in the SGE installation > >>>> - each and every user must include it in all the jobscripts to adjust > > the list and export the pointer to the $PE_HOSTFILE, but he could change it > > forth and back > >>> for different steps of the jobscript though > >>>> > >>>> > >>>> d) Open MPI should do it > >>>> > >>>> + no change in the SGE installation > >>>> + no change to the jobscript > >>>> + OMP_NUM_THREADS can be altered for different steps of the jobscript > > while staying inside the granted allocation automatically > >>>> o should MKL_NUM_THREADS be covered too (does it use OMP_NUM_THREADS > > already)? > >>>> > >>>> -- Reuti > >>>> > >>>> > >>>>> echo "PE_HOSTFILE:" > >>>>> echo $PE_HOSTFILE > >>>>> echo > >>>>> echo "cat PE_HOSTFILE:" > >>>>> cat $PE_HOSTFILE > >>>>> > >>>>> Thanks for take a time for answer this emails, your advices had been > > very useful > >>>>> > >>>>> PS: The version of SGE is OGS/GE 2011.11p1 > >>>>> > >>>>> > >>>>> Oscar Fabian Mojica Ladino > >>>>> Geologist M.S. in Geophysics > >>>>> > >>>>> > >>>>>> From: re...@staff.uni-marburg.de > >>>>>> Date: Fri, 15 Aug 2014 20:38:12 +0200 > >>>>>> To: us...@open-mpi.org > >>>>>> Subject: Re: [OMPI users] Running a hybrid MPI+openMP program > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> Am 15.08.2014 um 19:56 schrieb Oscar Mojica: > >>>>>> > >>>>>>> Yes, my installation of Open MPI is SGE-aware. I got the following > >>>>>>> > >>>>>>> [oscar@compute-1-2 ~]$ ompi_info | grep grid > >>>>>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.2) > >>>>>> > >>>>>> Fine. > >>>>>> > >>>>>> > >>>>>>> I'm a bit slow and I didn't understand the las part of your > > message. So i made a test trying to solve my doubts. > >>>>>>> This is the cluster configuration: There are some machines turned > > off but that is no problem > >>>>>>> > >>>>>>> [oscar@aguia free-noise]$ qhost > >>>>>>> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS > >>>>>>> > > ------------------------------------------------------------------------------- > > > >>>>>>> global - - - - - - - > >>>>>>> compute-1-10 linux-x64 16 0.97 23.6G 558.6M 996.2M 0.0 > >>>>>>> compute-1-11 linux-x64 16 - 23.6G - 996.2M - > >>>>>>> compute-1-12 linux-x64 16 0.97 23.6G 561.1M 996.2M 0.0 > >>>>>>> compute-1-13 linux-x64 16 0.99 23.6G 558.7M 996.2M 0.0 > >>>>>>> compute-1-14 linux-x64 16 1.00 23.6G 555.1M 996.2M 0.0 > >>>>>>> compute-1-15 linux-x64 16 0.97 23.6G 555.5M 996.2M 0.0 > >>>>>>> compute-1-16 linux-x64 8 0.00 15.7G 296.9M 1000.0M 0.0 > >>>>>>> compute-1-17 linux-x64 8 0.00 15.7G 299.4M 1000.0M 0.0 > >>>>>>> compute-1-18 linux-x64 8 - 15.7G - 1000.0M - > >>>>>>> compute-1-19 linux-x64 8 - 15.7G - 996.2M - > >>>>>>> compute-1-2 linux-x64 16 1.19 23.6G 468.1M 1000.0M 0.0 > >>>>>>> compute-1-20 linux-x64 8 0.04 15.7G 297.2M 1000.0M 0.0 > >>>>>>> compute-1-21 linux-x64 8 - 15.7G - 1000.0M - > >>>>>>> compute-1-22 linux-x64 8 0.00 15.7G 297.2M 1000.0M 0.0 > >>>>>>> compute-1-23 linux-x64 8 0.16 15.7G 299.6M 1000.0M 0.0 > >>>>>>> compute-1-24 linux-x64 8 0.00 15.7G 291.5M 996.2M 0.0 > >>>>>>> compute-1-25 linux-x64 8 0.04 15.7G 293.4M 996.2M 0.0 > >>>>>>> compute-1-26 linux-x64 8 - 15.7G - 1000.0M - > >>>>>>> compute-1-27 linux-x64 8 0.00 15.7G 297.0M 1000.0M 0.0 > >>>>>>> compute-1-29 linux-x64 8 - 15.7G - 1000.0M - > >>>>>>> compute-1-3 linux-x64 16 - 23.6G - 996.2M - > >>>>>>> compute-1-30 linux-x64 16 - 23.6G - 996.2M - > >>>>>>> compute-1-4 linux-x64 16 0.97 23.6G 571.6M 996.2M 0.0 > >>>>>>> compute-1-5 linux-x64 16 1.00 23.6G 559.6M 996.2M 0.0 > >>>>>>> compute-1-6 linux-x64 16 0.66 23.6G 403.1M 996.2M 0.0 > >>>>>>> compute-1-7 linux-x64 16 0.95 23.6G 402.7M 996.2M 0.0 > >>>>>>> compute-1-8 linux-x64 16 0.97 23.6G 556.8M 996.2M 0.0 > >>>>>>> compute-1-9 linux-x64 16 1.02 23.6G 566.0M 1000.0M 0.0 > >>>>>>> > >>>>>>> I ran my program using only MPI with 10 processors of the queue > > one.q which has 14 machines (compute-1-2 to compute-1-15). Whit 'qstat -t' > > I got: > >>>>>>> > >>>>>>> [oscar@aguia free-noise]$ qstat -t > >>>>>>> job-ID prior name user state submit/start at queue master > > ja-task-ID task-ID state cpu mem io stat failed > >>>>>>> > >>> > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > > >>> ---- > >>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 > > one.q@compute-1-2.local MASTER r 00:49:12 554.13753 0.09163 > >>>>>>> one.q@compute-1-2.local SLAVE > >>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 > > one.q@compute-1-5.local SLAVE 1.compute-1-5 r 00:48:53 551.49022 0.09410 > >>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 > > one.q@compute-1-9.local SLAVE 1.compute-1-9 r 00:50:00 564.22764 0.09409 > >>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 > > one.q@compute-1-12.local SLAVE 1.compute-1-12 r 00:47:30 535.30379 0.09379 > >>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 > > one.q@compute-1-13.local SLAVE 1.compute-1-13 r 00:49:51 561.69868 0.09379 > >>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 > > one.q@compute-1-14.local SLAVE 1.compute-1-14 r 00:49:14 554.60818 0.09379 > >>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 > > one.q@compute-1-10.local SLAVE 1.compute-1-10 r 00:49:59 562.95487 0.09349 > >>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 > > one.q@compute-1-15.local SLAVE 1.compute-1-15 r 00:50:01 563.27221 0.09361 > >>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 > > one.q@compute-1-8.local SLAVE 1.compute-1-8 r 00:49:26 556.68431 0.09349 > >>>>>>> 2726 0.50500 job oscar r 08/15/2014 12:38:21 > > one.q@compute-1-4.local SLAVE 1.compute-1-4 r 00:49:27 556.87510 0.04967 > >>>>>> > >>>>>> Yes, here you got 10 slots (= cores) granted by SGE. So there is no > > free core left inside the allocation of SGE to allow the use of additional > > cores for your > >>> threads. If you use more cores than granted by SGE, it will > > oversubscribe the machines. > >>>>>> > >>>>>> The issue is now: > >>>>>> > >>>>>> a) If you want 8 threads per MPI process, your job will use 80 cores > > in total - for now SGE isn't aware of it. > >>>>>> > >>>>>> b) Although you specified $fill_up as allocation rule, it looks like > > $round_robin. Is there more than one slot defined in the queue definition > > of one.q to get > >>> exclusive access? > >>>>>> > >>>>>> c) What version of SGE are you using? Certain ones use cgroups or > > bind processes directly to cores (although it usually needs to be requested > > by the job: > >>> first line of `qconf -help`). > >>>>>> > >>>>>> > >>>>>> In case you are alone in the cluster, you could bypass the > > allocation with b) (unless you are hit by c)). But having a mixture of > > users and jobs a different > >>> handling would be necessary to handle this in a proper way IMO: > >>>>>> > >>>>>> a) having a PE with a fixed allocation rule of 8 > >>>>>> > >>>>>> b) requesting this PE with an overall slot count of 80 > >>>>>> > >>>>>> c) copy and alter the $PE_HOSTFILE to show only (granted core count > > per machine) divided by (OMP_NUM_THREADS) per entry, change $PE_HOSTFILE so > > that it points > >>> to the altered file > >>>>>> > >>>>>> d) Open MPI with a Tight Integration will now start only N process > > per machine according to the altered hostfile, in your case one > >>>>>> > >>>>>> e) Your application can start the desired threads and you stay > > inside the granted allocation > >>>>>> > >>>>>> -- Reuti > >>>>>> > >>>>>> > >>>>>>> I accessed to the MASTER processor with 'ssh compute-1-2.local' , > > and with $ ps -e f and got this, I'm showing only the last lines > >>>>>>> > >>>>>>> 2506 ? Ss 0:00 /usr/sbin/atd > >>>>>>> 2548 tty1 Ss+ 0:00 /sbin/mingetty /dev/tty1 > >>>>>>> 2550 tty2 Ss+ 0:00 /sbin/mingetty /dev/tty2 > >>>>>>> 2552 tty3 Ss+ 0:00 /sbin/mingetty /dev/tty3 > >>>>>>> 2554 tty4 Ss+ 0:00 /sbin/mingetty /dev/tty4 > >>>>>>> 2556 tty5 Ss+ 0:00 /sbin/mingetty /dev/tty5 > >>>>>>> 2558 tty6 Ss+ 0:00 /sbin/mingetty /dev/tty6 > >>>>>>> 3325 ? Sl 0:04 /opt/gridengine/bin/linux-x64/sge_execd > >>>>>>> 17688 ? S 0:00 \_ sge_shepherd-2726 -bg > >>>>>>> 17695 ? Ss 0:00 \_ > > -bash /opt/gridengine/default/spool/compute-1-2/job_scripts/2726 > >>>>>>> 17797 ? S 0:00 \_ /usr/bin/time -f %E /opt/openmpi/bin/mpirun -v > > -np 10 ./inverse.exe > >>>>>>> 17798 ? S 0:01 \_ /opt/openmpi/bin/mpirun -v -np 10 ./inverse.exe > >>>>>>> 17799 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit > > -nostdin -V compute-1-5.local PATH=/opt/openmpi/bin:$PATH ; expo > >>>>>>> 17800 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit > > -nostdin -V compute-1-9.local PATH=/opt/openmpi/bin:$PATH ; expo > >>>>>>> 17801 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit > > -nostdin -V compute-1-12.local PATH=/opt/openmpi/bin:$PATH ; exp > >>>>>>> 17802 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit > > -nostdin -V compute-1-13.local PATH=/opt/openmpi/bin:$PATH ; exp > >>>>>>> 17803 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit > > -nostdin -V compute-1-14.local PATH=/opt/openmpi/bin:$PATH ; exp > >>>>>>> 17804 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit > > -nostdin -V compute-1-10.local PATH=/opt/openmpi/bin:$PATH ; exp > >>>>>>> 17805 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit > > -nostdin -V compute-1-15.local PATH=/opt/openmpi/bin:$PATH ; exp > >>>>>>> 17806 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit > > -nostdin -V compute-1-8.local PATH=/opt/openmpi/bin:$PATH ; expo > >>>>>>> 17807 ? Sl 0:00 \_ /opt/gridengine/bin/linux-x64/qrsh -inherit> -nostdin -V compute-1-4.local PATH=/opt/openmpi/bin:$PATH ; expo > >>>>>>> 17826 ? R 31:36 \_ ./inverse.exe > >>>>>>> 3429 ? Ssl 0:00 automount --pid-file /var/run/autofs.pid > >>>>>>> > >>>>>>> So the job is using the 10 machines, Until here is all right OK. Do > > you think that changing the "allocation_rule " to a number instead $fill_up > > the MPI > >>> processes would divide the work in that number of threads? > >>>>>>> > >>>>>>> Thanks a lot > >>>>>>> > >>>>>>> Oscar Fabian Mojica Ladino > >>>>>>> Geologist M.S. in Geophysics > >>>>>>> > >>>>>>> > >>>>>>> PS: I have another doubt, what is a slot? is a physical core? > >>>>>>> > >>>>>>> > >>>>>>>> From: re...@staff.uni-marburg.de > >>>>>>>> Date: Thu, 14 Aug 2014 23:54:22 +0200 > >>>>>>>> To: us...@open-mpi.org > >>>>>>>> Subject: Re: [OMPI users] Running a hybrid MPI+openMP program > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> I think this is a broader issue in case an MPI library is used in > > conjunction with threads while running inside a queuing system. First: > > whether your > >>> actual installation of Open MPI is SGE-aware you can check with: > >>>>>>>> > >>>>>>>> $ ompi_info | grep grid > >>>>>>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5) > >>>>>>>> > >>>>>>>> Then we can look at the definition of your PE: "allocation_rule > > $fill_up". This means that SGE will grant you 14 slots in total in any > > combination on the > >>> available machines, means 8+4+2 slots allocation is an allowed > > combination like 4+4+3+3 and so on. Depending on the SGE-awareness it's a > > question: will your > >>> application just start processes on all nodes and completely disregard > > the granted allocation, or as the other extreme does it stays on one and > > the same machine > >>> for all started processes? On the master node of the parallel job you > > can issue: > >>>>>>>> > >>>>>>>> $ ps -e f > >>>>>>>> > >>>>>>>> (f w/o -) to have a look whether `ssh` or `qrsh -inhert ...` is > > used to reach other machines and their requested process count. > >>>>>>>> > >>>>>>>> > >>>>>>>> Now to the common problem in such a set up: > >>>>>>>> > >>>>>>>> AFAICS: for now there is no way in the Open MPI + SGE combination > > to specify the number of MPI processes and intended number of threads which > > are > >>> automatically read by Open MPI while staying inside the granted slot > > count and allocation. So it seems to be necessary to have the intended > > number of threads being > >>> honored by Open MPI too. > >>>>>>>> > >>>>>>>> Hence specifying e.g. "allocation_rule 8" in such a setup while > > requesting 32 processes, would for now start 32 processes by MPI already, > > as Open MP reads > the $PE_HOSTFILE and acts accordingly. > >>>>>>>> > >>>>>>>> Open MPI would have to read the generated machine file in a > > slightly different way regarding threads: a) read the $PE_HOSTFILE, b) > > divide the granted > >>> slots per machine by OMP_NUM_THREADS, c) throw an error in case it's > > not divisible by OMP_NUM_THREADS. Then start one process per quotient. > >>>>>>>> > >>>>>>>> Would this work for you? > >>>>>>>> > >>>>>>>> -- Reuti > >>>>>>>> > >>>>>>>> PS: This would also mean to have a couple of PEs in SGE having a > > fixed "allocation_rule". While this works right now, an extension in SGE > > could be > >>> "$fill_up_omp"/"$round_robin_omp" and using OMP_NUM_THREADS there too, > > hence it must not be specified as an `export` in the job script but either > > on the command > >>> line or inside the job script in #$ lines as job requests. This would > > mean to collect slots in bunches of OMP_NUM_THREADS on each machine to > > reach the overall > >>> specified slot count. Whether OMP_NUM_THREADS or n times > > OMP_NUM_THREADS is allowed per machine needs to be discussed. > >>>>>>>> > >>>>>>>> PS2: As Univa SGE can also supply a list of granted cores in the > > $PE_HOSTFILE, it would be an extension to feed this to Open MPI to allow > > any UGE aware > >>> binding. > >>>>>>>> > >>>>>>>> > >>>>>>>> Am 14.08.2014 um 21:52 schrieb Oscar Mojica: > >>>>>>>> > >>>>>>>>> Guys > >>>>>>>>> > >>>>>>>>> I changed the line to run the program in the script with both > > options > >>>>>>>>> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v --bind-to-none > > -np $NSLOTS ./inverse.exe > >>>>>>>>> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v --bind-to-socket > > -np $NSLOTS ./inverse.exe > >>>>>>>>> > >>>>>>>>> but I got the same results. When I use man mpirun appears: > >>>>>>>>> > >>>>>>>>> -bind-to-none, --bind-to-none > >>>>>>>>> Do not bind processes. (Default.) > >>>>>>>>> > >>>>>>>>> and the output of 'qconf -sp orte' is > >>>>>>>>> > >>>>>>>>> pe_name orte > >>>>>>>>> slots 9999 > >>>>>>>>> user_lists NONE > >>>>>>>>> xuser_lists NONE > >>>>>>>>> start_proc_args /bin/true > >>>>>>>>> stop_proc_args /bin/true > >>>>>>>>> allocation_rule $fill_up > >>>>>>>>> control_slaves TRUE > >>>>>>>>> job_is_first_task FALSE > >>>>>>>>> urgency_slots min > >>>>>>>>> accounting_summary TRUE > >>>>>>>>> > >>>>>>>>> I don't know if the installed Open MPI was compiled with > > '--with-sge'. How can i know that? > >>>>>>>>> before to think in an hybrid application i was using only MPI and > > the program used few processors (14). The cluster possesses 28 machines, 15 > > with 16 > >>> cores and 13 with 8 cores totalizing 344 units of processing. When I > > submitted the job (only MPI), the MPI processes were spread to the cores > > directly, for that > >>> reason I created a new queue with 14 machines trying to gain more time. > > the results were the same in both cases. In the last case i could prove > > that the processes > >>> were distributed to all machines correctly. > >>>>>>>>> > >>>>>>>>> What I must to do? > >>>>>>>>> Thanks > >>>>>>>>> > >>>>>>>>> Oscar Fabian Mojica Ladino > >>>>>>>>> Geologist M.S. in Geophysics > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Date: Thu, 14 Aug 2014 10:10:17 -0400 > >>>>>>>>>> From: maxime.boissonnea...@calculquebec.ca > >>>>>>>>>> To: us...@open-mpi.org > >>>>>>>>>> Subject: Re: [OMPI users] Running a hybrid MPI+openMP program > >>>>>>>>>> > >>>>>>>>>> Hi, > >>>>>>>>>> You DEFINITELY need to disable OpenMPI's new default binding. > > Otherwise, > >>>>>>>>>> your N threads will run on a single core. --bind-to socket would > > be my > >>>>>>>>>> recommendation for hybrid jobs. > >>>>>>>>>> > >>>>>>>>>> Maxime > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Le 2014-08-14 10:04, Jeff Squyres (jsquyres) a 馗rit : > >>>>>>>>>>> I don't know much about OpenMP, but do you need to disable Open > > MPI's default bind-to-core functionality (I'm assuming you're using Open > > MPI 1.8.x)? > >>>>>>>>>>> > >>>>>>>>>>> You can try "mpirun --bind-to none ...", which will have Open > > MPI not bind MPI processes to cores, which might allow OpenMP to think that > > it can use > >>> all the cores, and therefore it will spawn num_cores threads...? > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Aug 14, 2014, at 9:50 AM, Oscar Mojica > > <o_moji...@hotmail.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hello everybody > >>>>>>>>>>>> > >>>>>>>>>>>> I am trying to run a hybrid mpi + openmp program in a cluster. > > I created a queue with 14 machines, each one with 16 cores. The program > > divides the > >>> work among the 14 processors with MPI and within each processor a loop > > is also divided into 8 threads for example, using openmp. The problem is > > that when I submit > >>> the job to the queue the MPI processes don't divide the work into > > threads and the program prints the number of threads that are working > > within each process as one. > >>>>>>>>>>>> > >>>>>>>>>>>> I made a simple test program that uses openmp and I logged in > > one machine of the fourteen. I compiled it using gfortran -fopenmp > > program.f -o exe, > >>> set the OMP_NUM_THREADS environment variable equal to 8 and when I ran > > directly in the terminal the loop was effectively divided among the cores > > and for example in > >>> this case the program printed the number of threads equal to 8 > >>>>>>>>>>>> > >>>>>>>>>>>> This is my Makefile > >>>>>>>>>>>> > >>>>>>>>>>>> # Start of the makefile > >>>>>>>>>>>> # Defining variables > >>>>>>>>>>>> objects = inv_grav3d.o funcpdf.o gr3dprm.o fdjac.o dsvd.o > >>>>>>>>>>>> #f90comp = /opt/openmpi/bin/mpif90 > >>>>>>>>>>>> f90comp = /usr/bin/mpif90 > >>>>>>>>>>>> #switch = -O3 > >>>>>>>>>>>> executable = inverse.exe > >>>>>>>>>>>> # Makefile > >>>>>>>>>>>> all : $(executable) > >>>>>>>>>>>> $(executable) : $(objects) > >>>>>>>>>>>> $(f90comp) -fopenmp -g -O -o $(executable) $(objects) > >>>>>>>>>>>> rm $(objects) > >>>>>>>>>>>> %.o: %.f > >>>>>>>>>>>> $(f90comp) -c $< > >>>>>>>>>>>> # Cleaning everything > >>>>>>>>>>>> clean: > >>>>>>>>>>>> rm $(executable) > >>>>>>>>>>>> # rm $(objects) > >>>>>>>>>>>> # End of the makefile > >>>>>>>>>>>> > >>>>>>>>>>>> and the script that i am using is > >>>>>>>>>>>> > >>>>>>>>>>>> #!/bin/bash > >>>>>>>>>>>> #$ -cwd > >>>>>>>>>>>> #$ -j y > >>>>>>>>>>>> #$ -S /bin/bash > >>>>>>>>>>>> #$ -pe orte 14 > >>>>>>>>>>>> #$ -N job > >>>>>>>>>>>> #$ -q new.q > >>>>>>>>>>>> > >>>>>>>>>>>> export OMP_NUM_THREADS=8 > >>>>>>>>>>>> /usr/bin/time -f "%E" /opt/openmpi/bin/mpirun -v -np > > $NSLOTS ./inverse.exe > >>>>>>>>>>>> > >>>>>>>>>>>> am I forgetting something? > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> > >>>>>>>>>>>> Oscar Fabian Mojica Ladino > >>>>>>>>>>>> Geologist M.S. in Geophysics > >>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>> users mailing list > >>>>>>>>>>>> us...@open-mpi.org > >>>>>>>>>>>> Subscription: > > http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>>>>>>>>> Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/08/25016.php > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> --------------------------------- > >>>>>>>>>> Maxime Boissonneault > >>>>>>>>>> Analyste de calcul - Calcul Qu饕ec, Universit・Laval > >>>>>>>>>> Ph. D. en physique > >>>>>>>>>> > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> users mailing list > >>>>>>>>>> us...@open-mpi.org > >>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>>>>>>> Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/08/25020.php > >>>>>>>>> _______________________________________________ > >>>>>>>>> users mailing list > >>>>>>>>> us...@open-mpi.org > >>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>>>>>> Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/08/25032.php > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> users mailing list > >>>>>>>> us...@open-mpi.org > >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>>>>> Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/08/25034.php > >>>>>>> _______________________________________________ > >>>>>>> users mailing list > >>>>>>> us...@open-mpi.org > >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>>>> Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/08/25037.php > >>>>>> > >>>>>> _______________________________________________ > >>>>>> users mailing list > >>>>>> us...@open-mpi.org > >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>>> Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/08/25038.php > >>>>> _______________________________________________ > >>>>> users mailing list > >>>>> us...@open-mpi.org > >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>> Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/08/25079.php > >>>> > >>>> _______________________________________________ > >>>> users mailing list > >>>> us...@open-mpi.org > >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>> Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/08/25080.php > >>> > >>> ---- > >>> Tetsuya Mishima tmish...@jcity.maeda.co.jp > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/08/25081.php > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >> Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/08/25083.php > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: http://www.open-mpi.org/community/lists/users/2014/08/25084.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: http://www.open-mpi.org/community/lists/users/2014/08/25087.php