Re: [gridengine users] Split process between multiple nodes.

Reuti Mon, 12 Nov 2012 05:11:14 -0800

Am 12.11.2012 um 13:47 schrieb Guillermo Marco Puche:

> Hello,
> 
> This must be the problem. I've check that each compute node can only resolve 
> his own IP address:
> 
> For example in compute-0-0:
> 
> /opt/gridengine/utilbin/lx26-amd64/gethostbyaddr 10.4.0.2
> Hostname: compute-0-0.local
> Aliases:  compute-0-0
> Host Address(es): 10.4.0.2
> 
> 10.4.0.3 (compute-0-1


Check /etc/hosts file or any NIS you setup to distribute the hostnames.

-- Reuti


> $ /opt/gridengine/utilbin/lx26-amd64/gethostbyaddr 10.4.0.3
> error resolving ip "10.4.0.3": can't resolve ip address (h_errno = 
> HOST_NOT_FOUND)
> 
> And the inverse on compute-0-1, it can resolve 10.4.0.3 but not 10.4.0.2.
> 
> Regards,
> Guillermo.
> El 12/11/2012 13:35, Guillermo Marco Puche escribió:
>> Hello,
>> 
>> Ok I've patched my nodes with the RPM fix for MPI and SGE. (i forgot to 
>> install it on compute nodes).
>> 
>> Removed -np 16 argument and got this new error:
>> 
>> error: commlib error: access denied (client IP resolved to host name "". 
>> This is not identical to clients host name "")
>> error: executing task of job 97 failed: failed sending task to 
>> [email protected]: can't find connection
>> -------------------------------------------------------------------------- 
>> A daemon (pid 3037) died unexpectedly with status 1 while attempting
>> to launch so we are aborting.
>> 
>> There may be more information reported by the environment (see above).
>> 
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> -------------------------------------------------------------------------- 
>> -------------------------------------------------------------------------- 
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> -------------------------------------------------------------------------- 
>> 
>> 
>> El 12/11/2012 13:11, Reuti escribió:
>>> Am 12.11.2012 um 12:18 schrieb Guillermo Marco Puche:
>>> 
>>>> Hello,
>>>> 
>>>> I'm currently trying with the following job script and then submiting with 
>>>> qsub.
>>>> I don't know why it just uses cpus of one of my two compute nodes. It's 
>>>> not using both compute nodes. (compute-0-2 it's currently powered off 
>>>> node).
>>>> 
>>>> #!/bin/bash
>>>> #$ -S /bin/bash
>>>> #$ -V
>>>> ### name
>>>> #$ -N aln_left
>>>> ### work dir
>>>> #$ -cwd
>>>> ### outputs
>>>> #$ -j y
>>>> ### PE
>>>> #$ -pe orte 16
>>>> ### all.q
>>>> #$ -q all.q
>>>> 
>>>> mpirun -np 16 pBWA aln -f aln_left 
>>>> /data_in/references/genomes/human/hg19/bwa_ref/hg19.fa 
>>>> /data_in/data/rawdata/HapMap_1.fastq >
>>> If the compute-0-2 is powered off, it won't get slots assigned by SGE.
>>> 
>>> The 16 slots are available on the actual machine - otherwise the job should 
>>> be in "qw" state? As Open MPI was compiled with tight integration, the 
>>> argument "-np 16" isn't necessary. It will detect the granted amount of 
>>> slots and their location automatically.
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> /data_out_2/tmp/05_11_12/mpi/HapMap_cloud.left.sai
>>>> 
>>>> Here's all.q config file:
>>>> 
>>>> qname                 all.q
>>>> hostlist              @allhosts
>>>> seq_no                0
>>>> load_thresholds       np_load_avg=1.75
>>>> suspend_thresholds    NONE
>>>> nsuspend              1
>>>> suspend_interval      00:05:00
>>>> priority              0
>>>> min_cpu_interval      00:05:00
>>>> processors            UNDEFINED
>>>> qtype                 BATCH INTERACTIVE
>>>> ckpt_list             NONE
>>>> pe_list               make mpich mpi orte openmpi smp
>>>> rerun                 FALSE
>>>> slots 0,[compute-0-0.local=8],[compute-0-1.local=8], \
>>>>                      [compute-0-2.local.sg=8]
>>>> tmpdir                /tmp
>>>> shell                 /bin/csh
>>>> prolog                NONE
>>>> epilog                NONE
>>>> shell_start_mode      posix_compliant
>>>> starter_method        NONE
>>>> suspend_method        NONE
>>>> resume_method         NONE
>>>> terminate_method      NONE
>>>> notify                00:00:60
>>>> owner_list            NONE
>>>> user_lists            NONE
>>>> xuser_lists           NONE
>>>> subordinate_list      NONE
>>>> complex_values        NONE
>>>> projects              NONE
>>>> xprojects             NONE
>>>> calendar              NONE
>>>> initial_state         default
>>>> s_rt                  INFINITY
>>>> h_rt                  INFINITY
>>>> s_cpu                 INFINITY
>>>> h_cpu                 INFINITY
>>>> s_fsize               INFINITY
>>>> h_fsize               INFINITY
>>>> s_data                INFINITY
>>>> h_data                INFINITY
>>>> s_stack               INFINITY
>>>> h_stack               INFINITY
>>>> s_core                INFINITY
>>>> h_core                INFINITY
>>>> s_rss                 INFINITY
>>>> h_rss                 INFINITY
>>>> s_vmem                INFINITY
>>>> h_vmem                INFINITY
>>>> 
>>>> Best regards,
>>>> Guillermo.
>>>> 
>>>> 
>>>> El 05/11/2012 12:01, Reuti escribió:
>>>>> Hi,
>>>>> 
>>>>> Am 05.11.2012 um 10:55 schrieb Guillermo Marco Puche:
>>>>> 
>>>>>> I've managed to compile Open MPI for Rocks:
>>>>>> ompi_info | grep grid
>>>>>>                  MCA ras: gridengine (MCA v2.0, API v2.0, Component 
>>>>>> v1.4.3)
>>>>>> 
>>>>>> Now I'm really confused on how i should run my pBWA program with Open 
>>>>>> MPI.
>>>>>> Program website (http://pbwa.sourceforge.net/) suggests something like:
>>>>>> 
>>>>>> sqsub -q mpi -n 240 -r 1h --mpp 4G ./pBWA bla bla bla...
>>>>> Seems to be a local proprietary command on Sharcnet, or at least a 
>>>>> wrapper to another unknown queuing system.
>>>>> 
>>>>> 
>>>>>> I don't have sqsub, but qsub provided by SGE.  "-q" option isn't valid 
>>>>>> for SGE since it's for queue selection.
>>>>> Correct, the SGE paradigm is to request resources and SGE will select an 
>>>>> appropriate queue for your job which fullfils the requirements.
>>>>> 
>>>>> 
>>>>>> Maybe the solution is to create a simple job bash script and include 
>>>>>> parallel environment for SGE and the number of slots (since pBWA 
>>>>>> internally supports Open MPI)
>>>>> How is your actal setup of your SGE? Most likely you will need to define 
>>>>> a PE and request it during submission like for any other Open MPI 
>>>>> application:
>>>>> 
>>>>> $ qsub -pe orte 240 -l h_rt=1:00:00,h_vmem=4G ./pBWA bla bla bla...
>>>>> 
>>>>> Assuming "-n" gives the number of cores.
>>>>> Assuming "-r 1h" means wallclock time: -l h_rt=1:00:00
>>>>> Assuming "--mpp 4G" requests the memory per slot: -l h_vmem=4G
>>>>> 
>>>>> Necessary setup:
>>>>> 
>>>>> http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
>>>>> 
>>>>> -- Reuti
>>>>> 
>>>>> 
>>>>>> Regards,
>>>>>> Guillermo.
>>>>>> 
>>>>>> El 26/10/2012 12:21, Reuti escribió:
>>>>>>> Am 26.10.2012 um 12:02 schrieb Guillermo Marco Puche:
>>>>>>> 
>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> Like I said i'm using Rocks cluster 5.4.3 and it comes with mpirun 
>>>>>>>> (Open MPI) 1.4.3.
>>>>>>>> But $ ompi_info | grep gridengine shows nothing.
>>>>>>>> 
>>>>>>>> So I'm confused if I've to update and rebuild open-mpi into the latest 
>>>>>>>> version.
>>>>>>>> 
>>>>>>> You can also remove the supplied version 1.4.3 from your system and 
>>>>>>> build it from source with SGE support. But I don't see the advantage of 
>>>>>>> using an old version. ROCKS supplies the source of their used version 
>>>>>>> of Open MPI?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Or if i can keep that current version of MPI and re-build it (that 
>>>>>>>> would be the preferred option to keep the stability of the cluster)
>>>>>>>> 
>>>>>>> If you compile and install only in your own $HOME (as normal user, no 
>>>>>>> root access necessary), then there is no impact to any system tool at 
>>>>>>> all. You just have to take care which version you use by setting the 
>>>>>>> correct $PATH and $LD_LIBRARY_PATH during compilation of your 
>>>>>>> application and during execution of it. Therefore I suggested to 
>>>>>>> include the name of the used compiler and Open MPI version in the build 
>>>>>>> installation's directory name.
>>>>>>> 
>>>>>>> There was a question about the to be used version of `mpiexec` just on 
>>>>>>> the MPICH2 mailing list, maybe it's additional info:
>>>>>>> 
>>>>>>> 
>>>>>>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2012-October/013318.html
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>> -- Reuti
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Thanks !
>>>>>>>> 
>>>>>>>> Best regards,
>>>>>>>> Guillermo.
>>>>>>>> 
>>>>>>>> El 26/10/2012 11:59, Reuti escribió:
>>>>>>>> 
>>>>>>>>> Am 26.10.2012 um 09:40 schrieb Guillermo Marco Puche:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> Hello,
>>>>>>>>>> 
>>>>>>>>>> Thank you for the links Reuti !
>>>>>>>>>> 
>>>>>>>>>> When they talk about:
>>>>>>>>>> 
>>>>>>>>>> shell $ ./configure --with-sge
>>>>>>>>>> 
>>>>>>>>>> It's in bash shell or in any other special shell?
>>>>>>>>>> 
>>>>>>>>> There is no special shell required (please have a look at the INSTALL 
>>>>>>>>> file in Open MPI's tar-archive).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> Do I've to be in a specified directory to execute that command?
>>>>>>>>>> 
>>>>>>>>> Depends.
>>>>>>>>> 
>>>>>>>>> As it's set up according to the
>>>>>>>>> http://en.wikipedia.org/wiki/GNU_build_system
>>>>>>>>> , you can either:
>>>>>>>>> 
>>>>>>>>> $ tar -xf openmpi-1.6.2.tar.gz
>>>>>>>>> $ cd openmpi-1.6.2
>>>>>>>>> $ ./configure --prefix=$HOME/local/openmpi-1.6.2_gcc --with-sge
>>>>>>>>> $ make
>>>>>>>>> $ make install
>>>>>>>>> 
>>>>>>>>> It's quite common to build inside the source tree. But if it is set 
>>>>>>>>> up in the right way, it also supports building in different 
>>>>>>>>> directories inside or outside the source tree which avoids a `make 
>>>>>>>>> distclean` in case you want to generate different builds:
>>>>>>>>> 
>>>>>>>>> $ tar -xf openmpi-1.6.2.tar.gz
>>>>>>>>> $ mkdir openmpi-gcc
>>>>>>>>> $ cd openmpi-gcc
>>>>>>>>> $ ../openmpi-1.6.2/configure --prefix=$HOME/local/openmpi-1.6.2_gcc 
>>>>>>>>> --with-sge
>>>>>>>>> $ make
>>>>>>>>> $ make install
>>>>>>>>> 
>>>>>>>>> While at the time in another window you can execute:
>>>>>>>>> 
>>>>>>>>> $ mkdir openmpi-intel
>>>>>>>>> $ cd openmpi-intel
>>>>>>>>> $ ../openmpi-1.6.2/configure --prefix=$HOME/local/openmpi-1.6.2_intel 
>>>>>>>>> CC=icc CXX=icpc FC=ifort F77=ifort --disable-vt --with-sge
>>>>>>>>> $ make
>>>>>>>>> $ make install
>>>>>>>>> 
>>>>>>>>> (Not to confuse anyone: there is bug in combination of Intel compiler 
>>>>>>>>> and GNU headers with the above version of Open MPI, disabling 
>>>>>>>>> VampirTrace support helps.)
>>>>>>>>> 
>>>>>>>>> -- Reuti
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> Thank you !
>>>>>>>>>> Sorry again for my ignorance.
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Guillermo.
>>>>>>>>>> 
>>>>>>>>>> El 25/10/2012 19:50, Reuti escribió:
>>>>>>>>>> 
>>>>>>>>>>> Am 25.10.2012 um 19:36 schrieb Guillermo Marco Puche:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> Hello,
>>>>>>>>>>>> 
>>>>>>>>>>>> I've no idea who compiled the application. I just found on 
>>>>>>>>>>>> seqanswers forum that pBWA was a nice speed up to the original BWA 
>>>>>>>>>>>> since it supports native OPEN MPI.
>>>>>>>>>>>> 
>>>>>>>>>>>> As you told me i'll look further on how to compile open-mpi with 
>>>>>>>>>>>> SGE. If anyone knows a good introduction/tutorial to this would be 
>>>>>>>>>>>> appreciated.
>>>>>>>>>>>> 
>>>>>>>>>>> The Open MPI site has huge documentation:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> http://www.open-mpi.org/faq/?category=building#build-rte-sge
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Be sure that during execution you pick the correct `mpiexec` and 
>>>>>>>>>>> LD_LIBRARY_PATH from you own build. You can also adjust the 
>>>>>>>>>>> location of Open MPI with the usual --prefix. I put it in 
>>>>>>>>>>> --prefix==$HOME/local/openmpi-1.6.2_shared_gcc refelcting the 
>>>>>>>>>>> version I built.
>>>>>>>>>>> 
>>>>>>>>>>> -- Reuti
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> Then i'll try to run it with my current version of open-mpi and 
>>>>>>>>>>>> update if needed.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Guillermo.
>>>>>>>>>>>> 
>>>>>>>>>>>> El 25/10/2012 18:53, Reuti escribió:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Please keep the list posted, so that others can participate on 
>>>>>>>>>>>>> the discussion. I'm not aware of this application, but maybe 
>>>>>>>>>>>>> someone else is on the list who could be of broader help.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Again: who compiled the application, as I can see only the source 
>>>>>>>>>>>>> at the site you posted?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -- Reuti
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Am 25.10.2012 um 13:23 schrieb Guillermo Marco Puche:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> $ ompi_info | grep grid
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Returns nothing. Like i said I'm newbie to MPI.
>>>>>>>>>>>>>> I didn't know that I had to compile anything. I've Rocks 
>>>>>>>>>>>>>> installation out of the box.
>>>>>>>>>>>>>> So MPI is installed but nothing more I guess.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I've found an old thread in Rocks discuss list:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2012-April/057303.html
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> User asking is using this script:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  *#$ -S /bin/bash*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  *#*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  *#*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  *# Export all environment variables*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  *#$ -V*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  *# specify the PE and core #*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  *#$ -pe mpi 128*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  *# Customize job name*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  *#$ -N job_hpl_2.0*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  *# Use current working directory*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  *#$ -cwd*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  *# Join stdout and stder into one file*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  *#$ -j y*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  *# The mpirun command; note the lack of host names as SGE will 
>>>>>>>>>>>>>> provide them
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  on-the-fly.*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>  *mpirun -np $NSLOTS ./xhpl >> xhpl.out*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> But then I read this:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> in rocks  sge PE
>>>>>>>>>>>>>> mpi is loosely integrated
>>>>>>>>>>>>>> mpich and orte are tightly integrated
>>>>>>>>>>>>>> qsub require args are different for mpi mpich with orte
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> mpi and mpich need machinefile
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> by default
>>>>>>>>>>>>>> mpi, mpich are for mpich2
>>>>>>>>>>>>>> orte is for openmpi
>>>>>>>>>>>>>> regards
>>>>>>>>>>>>>> -LT
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The program I need to run is pBWA:
>>>>>>>>>>>>>>  http://pbwa.sourceforge.net/
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> It uses MPI.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> At this moment i'm kinda confused on which is the next step.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I thought i just could run with MPI and a simple SGE job pBWA 
>>>>>>>>>>>>>> with multiple processes.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Guillermo.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> El 25/10/2012 13:17, Reuti escribió:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Am 25.10.2012 um 13:11 schrieb Guillermo Marco Puche:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hello Reuti,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I got stoned here. I've no idea what MPI library I've got. I'm 
>>>>>>>>>>>>>>>> using Rocks Cluster Viper 5.4.3 which comes out with Centos 
>>>>>>>>>>>>>>>> 5.6, SGE, SPM, OPEN MPI and MPI.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> How can i check which library i got installed?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I found this:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> $ mpirun -V
>>>>>>>>>>>>>>>> mpirun (Open MPI) 1.4.3
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Report bugs to
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> http://www.open-mpi.org/community/help/
>>>>>>>>>>>>>>> Good, and this one you also used to compile the application?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The check whether Open MPI was build with SGE support:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> $ ompi_info | grep grid
>>>>>>>>>>>>>>>                  MCA ras: gridengine (MCA v2.0, API v2.0, 
>>>>>>>>>>>>>>> Component v1.6.2)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -- Reuti
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>> Guillermo.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> El 25/10/2012 13:05, Reuti escribió:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Am 25.10.2012 um 10:37 schrieb Guillermo Marco Puche:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hello !
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I found a new version of my tool which supports 
>>>>>>>>>>>>>>>>>> multi-threading but also MPI or OPENMPI for more additional 
>>>>>>>>>>>>>>>>>> processes.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I'm kinda new to MPI with SGE. What would be the good 
>>>>>>>>>>>>>>>>>> command for qsub or config inside a job file to ask SGE to 
>>>>>>>>>>>>>>>>>> work with 2 MPI processes?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Will the following code work in a SGE job file?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> #$ -pe mpi 2
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> That's supposed to make job work with 2 processes instead of 
>>>>>>>>>>>>>>>>>> 1.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Not out of the box: it will grant 2 slots for the job 
>>>>>>>>>>>>>>>>> according to the allocation rules of the PE. But how to start 
>>>>>>>>>>>>>>>>> your application in the jobscript inside the granted 
>>>>>>>>>>>>>>>>> allocation is up to you. Fortunately the MPI libraries got an 
>>>>>>>>>>>>>>>>> (almost) automatic integration into queuing systems nowadays 
>>>>>>>>>>>>>>>>> without further user intervention.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Which MPI library do you use when you compile your 
>>>>>>>>>>>>>>>>> application of the mentioned ones above?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -- Reuti
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>> Guillermo.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> El 22/10/2012 17:19, Reuti escribió:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Am 22.10.2012 um 16:31 schrieb Guillermo Marco Puche:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I'm using a program where I can specify the number of 
>>>>>>>>>>>>>>>>>>>> threads I want to use.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Only threads and not additional processes? Then you are 
>>>>>>>>>>>>>>>>>>> limited to one node, unless you add something like 
>>>>>>>>>>>>>>>>>>> http://www.kerrighed.org/wiki/index.php/Main_Page or 
>>>>>>>>>>>>>>>>>>> http://www.scalemp.com
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>  to get a cluster wide unique process and memory space.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> -- Reuti
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I'm able to launch multiple instances of that tool in 
>>>>>>>>>>>>>>>>>>>> separate nodes.
>>>>>>>>>>>>>>>>>>>> For example: job_process_00 in compute-0-0, job_process_01 
>>>>>>>>>>>>>>>>>>>> in compute-1 etc.. each job is calling that program which 
>>>>>>>>>>>>>>>>>>>> splits up in 8 threads (each of my nodes has 8 CPUs).
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> When i setup 16 threads i can't split 8 threads per node. 
>>>>>>>>>>>>>>>>>>>> So I would like to split them between 2 compute nodes.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Currently I've 4 compute nodes and i would like to speed 
>>>>>>>>>>>>>>>>>>>> up the process setting 16 threads of my program splitting 
>>>>>>>>>>>>>>>>>>>> between more than one compute node. At this moment I'm 
>>>>>>>>>>>>>>>>>>>> stuck using only 1 compute node per process with 8 threads.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thank you !
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>>> Guillermo.
>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>> https://gridengine.org/mailman/listinfo/users
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>> https://gridengine.org/mailman/listinfo/users
>> 
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Split process between multiple nodes.

Reply via email to