Thx. This is the allocation which is also confirmed by the Open MPI output.
[eg: ] exactly, but not the one used afterwards by openmpi

- The application was compiled with the same version of Open MPI?
[eg: ] yes, version 1.4.4 for all

- Does the application start something on its own besides the tasks granted by 
mpiexec/orterun?
[eg: ] no

You want 12 ranks in total, and to barney.fft and carl.fft there are also "-mca 
orte_ess_num_procs 3 " given in to the qrsh_starter. In total I count only 10 
ranks in this example given - 4+4+2 - do you observe the same?
[eg: ] i don't know why the -mca orte_ess_num_procs 3 is added here...
In the "Map generated by mapping policy" output in my last email, I see that 4 
processes were started on each node (barney, carl and charlie), but yes, in the 
ps -elf output, two of them are missing for one node (barney)... sorry about 
that, a bad copy/paste. Here is the actual output for this node:
2048 ?        Sl     3:33 /opt/sge/bin/lx-amd64/sge_execd
27502 ?        Sl     0:00  \_ sge_shepherd-1416 -bg
27503 ?        Ss     0:00      \_ /opt/sge/utilbin/lx-amd64/qrsh_starter 
/opt/sge/default/spool/barney/active_jobs/1416.1/1.barney
27510 ?        S      0:00          \_ bash -c  
PATH=/opt/openmpi-1.4.4/bin:$PATH ; export PATH ; 
LD_LIBRARY_PATH=/opt/openmpi-1.4.4/lib:$LD_LIBRARY_PATH ; export 
LD_LIBRARY_PATH ;  /opt/openmpi-1.4.4/bin/orted -mca ess env -mca 
orte_ess_jobid 3800367104 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 3 
--hnp-uri "3800367104.0;tcp://192.168.0.20:57233" --mca pls_gridengine_verbose 
1 --mca ras_gridengine_show_jobid 1 --mca ras_gridengine_verbose 1
27511 ?        S      0:00              \_ /opt/openmpi-1.4.4/bin/orted -mca 
ess env -mca orte_ess_jobid 3800367104 -mca orte_ess_vpid 1 -mca 
orte_ess_num_procs 3 --hnp-uri 3800367104.0;tcp://192.168.0.20:57233 --mca 
pls_gridengine_verbose 1 --mca ras_gridengine_show_jobid 1 --mca 
ras_gridengine_verbose 1
27512 ?        Rl    12:54                  \_ 
/opt/fft/actran_product/Actran_13.0.b.57333/bin/actranpy_mp 
--apl=/opt/fft/actran_product/Actran_13.0.b.57333 -e radiation -m 10000 
--parallel=frequency --scratch=/scratch/cluster/1416 
--inputfile=/home/jj/Projects/Toyota/REFERENCE_JPC/semi_green_PML_06/semi_green_coarse.edat
27513 ?        Rl    12:54                  \_ 
/opt/fft/actran_product/Actran_13.0.b.57333/bin/actranpy_mp 
--apl=/opt/fft/actran_product/Actran_13.0.b.57333 -e radiation -m 10000 
--parallel=frequency --scratch=/scratch/cluster/1416 
--inputfile=/home/jj/Projects/Toyota/REFERENCE_JPC/semi_green_PML_06/semi_green_coarse.edat
27514 ?        Rl    12:54                  \_ 
/opt/fft/actran_product/Actran_13.0.b.57333/bin/actranpy_mp 
--apl=/opt/fft/actran_product/Actran_13.0.b.57333 -e radiation -m 10000 
--parallel=frequency --scratch=/scratch/cluster/1416 
--inputfile=/home/jj/Projects/Toyota/REFERENCE_JPC/semi_green_PML_06/semi_green_coarse.edat
27515 ?        Rl    12:53                  \_ 
/opt/fft/actran_product/Actran_13.0.b.57333/bin/actranpy_mp 
--apl=/opt/fft/actran_product/Actran_13.0.b.57333 -e radiation -m 10000 
--parallel=frequency --scratch=/scratch/cluster/1416 
--inputfile=/home/jj/Projects/Toyota/REFERENCE_JPC/semi_green_PML_06/semi_green_coarse.edat

It looks like Open MPI is doing the right thing, but the applications decided 
to start in a different allocation.
[eg: ] if the "Map generated by mapping policy" is different than the sge 
allocation, then openmpi is not doing the right thing, don't you think ?

Does the application use OpenMP in addition or other kinds of threads? The 
suffix "_mp" in the name "actranpy_mp" makes me suspicious about it.
[eg: ] no, the suffix _mp stands for "parallel".




Reply via email to