Hi,

it turned out that the problem was caused by the program code itself.
There is an interaction between the subprocesses. The solution was to
configure the PE with allocation_rule "$pe_slots" so sge distributes it
to only one node.

I am sorry to have bothered you with this "false alarm". At least i have
learned some things from you. Thanks a lot for all your help, Reuti.

With kind regards, ulrich



On 08/16/2016 12:16 PM, Reuti wrote:
> Hi,
> 
>> Am 15.08.2016 um 18:48 schrieb Ulrich Hiller <hil...@mpia-hd.mpg.de>:
>>
>> Excuse me, i have committed  a stupid mistake. The extra mpihello
>> processes were leftovers from previous runs (sge processes aborted by
>> qdel command). Now in this detail the world is as it should be. The
>> number of processes on the nodes now sums to the number of the allocated
>> slots.
>> I have attached the output of the 'ps -e f' command of the master node
>> and the output of the 'qstat -g t -u ulrich' command.
>>
>> This seems to me to be correct.
>>
>> Remains the original problem, why jobs allocate cores on node but do
>> nothing.
>> As i wrote before, there is propably no OpenMP incidence.
>> The qmaster/messages file does not say anything about hanging/pending jobs.
>>
>> The problem is that i could not reproduce today nodes which do nothing
>> despite their cores are allocated. let me test a bit until i reproduce
>> the problem. Then i will send you the output of 'ps -e f' and qstat.
> 
> Fine.
> 
> 
>> Is there anything else which i could test?
> 
> Not for now.
> 
> -- Reuti
> 
> 
>> With kind regards, and thanks a lot for your help so far, ulrich
>>
>>
>> On 08/15/2016 05:37 PM, Reuti wrote:
>>>
>>>> Am 15.08.2016 um 17:03 schrieb Ulrich Hiller <hil...@mpia-hd.mpg.de>:
>>>>
>>>> Hello,
>>>>
>>>> thank you for the clarification. I must have misunderstood you.
>>>> Now i did it.The master node was in the example i send now exec-node01
>>>> (it varied from attempt  to attempt). The output is in the master-node
>>>> file. The qstat file is the output of
>>>> qstat -g t -u '*'
>>>> That seems to look normal.
>>>>
>>>> Now i created a simple C file with an endless loop.
>>>> #include <stdio.h>
>>>> int main()
>>>> {
>>>> int x;
>>>> for(x=0;x=10;x=x+1)
>>>> {
>>>> puts("Hello");
>>>> ;
>>>> }
>>>> return(0);
>>>> }
>>>>
>>>> and compiled it:
>>>> mpicc mpihello.c -o mpihello
>>>> and started qsub:
>>>> qsub -pe orte 300 -j yes -cwd -S /bin/bash <<< "mpiexec -n 300 mpihello"
>>>> The outputs look the same as for the sleep command above.
>>>> But now i counted the jobs:
>>>>
>>>> qstat -g t -u '*' | grep -ic slave
>>>> This results in the number '300', which i expected.
>>>>
>>>> On the execute nodes i did:
>>>> ps -ef | grep mpihello | grep -v grep | grep -vc mpiexec
>>>
>>> f w/o -
>>>
>>> $ ps -e f
>>>
>>> will list a nice tree of the processes.
>>>
>>>
>>>> (i counted the 'mpihello' processes)
>>>> This is the result:
>>>> exec-node01: 43
>>>> exec-node02: 82
>>>> exec-node03: 83
>>>> exec-node04: 82
>>>> exec-node05: 82
>>>> exec-node06: 80
>>>> exec-node07: 64
>>>> exec-node08: 64
>>>
>>> To investigate this it would be good to post the complete slot allocation 
>>> by `qstat -g t -u <your user>`, the master of the MPI application and one 
>>> of the slave nodes' `ps -e f --cols=500`. Any "mpihello" in the path?
>>>
>>> -- Reuti
>>>
>>>
>>>> Which gives the sum of 580.
>>>> When i count the number of free solts together (from 'qhost -q') i also
>>>> get 300, which i expect.
>>>> Where do the extra processes on the nodes come from?
>>>>
>>>> This difference is reproducible.
>>>>
>>>> libgomp.so.1.0.0 library is installed, but aqpart from that nothing with
>>>> OpenMP.
>>>>
>>>> With kind regards, ulrich
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 08/15/2016 02:30 PM, Ulrich Hiller wrote:
>>>>> Hello,
>>>>>
>>>>>> The other issue seems to be, that in fact your job is using only one
>>>>> machine, which means that it is essentially ignoring any granted slot
>>>>> allocation. While the job is running, can you please execute on the
>>>>> master node of the parallel job:
>>>>>>
>>>>>> $ ps -e f
>>>>>>
>>>>>> (f w/o -) and post the relevant lines belonging to either sge_execd or
>>>>> just running as kids of the init process, in case they jumped out of the
>>>>> process tree. Maybe a good start would be to execute something like
>>>>> `mpiexec sleep 300` in the jobscript.
>>>>>>
>>>>>
>>>>> i invoked
>>>>> qsub -pe orte 160 -j yes -cwd -S /bin/bash <<< "mpiexec -n 160 sleep 300"
>>>>>
>>>>> the only line ('ps -e f') on the master node was:
>>>>> 55722 ?        Sl     3:42 /opt/sge/bin/lx-amd64/sge_qmaster
>>>>>
>>>>> No other sge lines, no child processes from it, and no other init
>>>>> processes leading to sge While at the same time the sleep processes were
>>>>> running on the nodes (Checked with ps command on the nodes).
>>>>>
>>>>> The qstat command gave :
>>>>>  264 0.60500 STDIN      ulrich       r     08/15/2016 11:33:02
>>>>> all.q@exec-node01                  MASTER
>>>>>
>>>>> all.q@exec-node01                  SLAVE
>>>>>
>>>>> all.q@exec-node01                  SLAVE
>>>>>
>>>>> all.q@exec-node01                  SLAVE
>>>>> [ ...]
>>>>>
>>>>> 264 0.60500 STDIN      ulrich       r     08/15/2016 11:33:02
>>>>> all.q@exec-node03                  SLAVE
>>>>>
>>>>> all.q@exec-node03                  SLAVE
>>>>>
>>>>> all.q@exec-node03                  SLAVE
>>>>> [ ... ]
>>>>>  264 0.60500 STDIN      ulrich       r     08/15/2016 11:33:02
>>>>> all.q@exec-node05                  SLAVE
>>>>>
>>>>> all.q@exec-node05                  SLAVE
>>>>> [ ...]
>>>>>
>>>>>
>>>>> Because there was only the master deamon running on the master node, and
>>>>> you were tlaking about child processes: Was this now normal behaviour my
>>>>> cluster showed or is there something wrong?
>>>>>
>>>>> Kind reagrds, ulrich
>>>>>
>>>>>
>>>>>
>>>>> On 08/12/2016 07:11 PM, Reuti wrote:
>>>>>> Hi,
>>>>>>
>>>>>>> Am 12.08.2016 um 18:48 schrieb Ulrich Hiller <hil...@mpia-hd.mpg.de>:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> i have a strange effect, where i am not sure whether it is "only" a
>>>>>>> misconfiguration or a bug.
>>>>>>>
>>>>>>> First: I run son of gridengine 8.1.9-1.el6.x86_64 (i installed the rhel
>>>>>>> rpm on an opensuse 13.1 machine. This should not matter in this case,
>>>>>>> and it is reported to be able to run on opensuse).
>>>>>>>
>>>>>>> mpirun and mpiexec are from openmpi-1.10.3 (no other mpi was installed,
>>>>>>> neither on master, nor on slaves). The installation was made with:
>>>>>>> ./configure --prefix=`pwd`/build --disable-dlopen --disable-mca-dso
>>>>>>> --with-orte --with-sge --with-x --enable-mpi-thread-multiple
>>>>>>> --enable-orterun-prefix-by-default --enable-mpirun-prefix-by-default
>>>>>>> --enable-orte-static-ports --enable-mpi-cxx --enable-mpi-cxx-seek
>>>>>>> --enable-oshmem --enable-java --enable-mpi-java
>>>>>>> make
>>>>>>> make install
>>>>>>>
>>>>>>> I attached the outputs of 'qconf -ap all.q' , 'qconf -sconf' and 'qconf
>>>>>>> -sp orte' as textfiles.
>>>>>>>
>>>>>>> Now my problem:
>>>>>>> I asked for 20 cores and if i run qstat -u '*' it shows that this job
>>>>>>> is being run in slave07 using 20 cores but is not true! if i run qstat
>>>>>>> -f -u '*' i see that this job is only using 3 cores in salve07 and
>>>>>>> there are 17 cores in other nodes allocated to this job which are in 
>>>>>>> fact
>>>>>>> unused!
>>>>>>
>>>>>> qstat will list only the master node of the parallel job and the number 
>>>>>> of overall slots. The granted allocation you can check with:
>>>>>>
>>>>>> $ qstat -g t -u '*'
>>>>>>
>>>>>> The other issue seems to be, that in fact your job is using only one 
>>>>>> machine, which means that it is essentially ignoring any granted slot 
>>>>>> allocation. While the job is running, can you please execute on the 
>>>>>> master node of the parallel job:
>>>>>>
>>>>>> $ ps -e f
>>>>>>
>>>>>> (f w/o -) and post the relevant lines belonging to either sge_execd or 
>>>>>> just running as kids of the init process, in case they jumped out of the 
>>>>>> process tree. Maybe a good start would be to execute something like 
>>>>>> `mpiexec sleep 300` in the jobscript.
>>>>>>
>>>>>> Next step could be a `mpihello.c` where you put an almost endless loop 
>>>>>> inside and switch off all optimizations during compilations to check 
>>>>>> whether these slave processes are distributed in the correct way.
>>>>>>
>>>>>> Note that some applications will check the number of cores they are 
>>>>>> running on and start by OpenMP (not Open MPI) as many threads as cores 
>>>>>> are found. Could this be the case for your application too?
>>>>>>
>>>>>> -- Reuti
>>>>>>
>>>>>>
>>>>>>> Or other example:
>>>>>>> My job took say 6 cpus on slave07 and 14 on slave06 but nothing was
>>>>>>> running on 06 and therefore a waste of ressource on 06 and overload on
>>>>>>> 07 becomes highly possible (the numbers are made up).
>>>>>>> If i ran 1 Cpus in many independent jobs that would not be an issue, but
>>>>>>> imagine i now request 60 cpus on slave07, that would seriously overload
>>>>>>> the node in many cases.
>>>>>>>
>>>>>>> Or other example:
>>>>>>> if i ask for say 50 CPUs, the job will start on one node, e.g,
>>>>>>> slave01,  but only reserving say 15 CPUs out of 64 and reserve the rest
>>>>>>> on many other nodes (obviously wasting space doing nothing).
>>>>>>> This has the bad consequence of allocating many more CPUs than available
>>>>>>> when many jobs are running, imagine you have 10 jobs like this one...
>>>>>>> some nodes will run maybe 3 even if they only have 24 CPUs...
>>>>>>>
>>>>>>> I hope that i have made clear what the issue is.
>>>>>>>
>>>>>>> I also see that the `qstat` and `qstat -f` are in disagreement. The
>>>>>>> latter is correct, i checked the processes running on the nodes.
>>>>>>>
>>>>>>>
>>>>>>> Did somebody already encounter such a problem? Does somebody have an
>>>>>>> idea where to look into or what to test?
>>>>>>>
>>>>>>> With kind regards, ulrich
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <qhost.txt><qconf-sconf.txt><qconf-mp-orte.txt><qconf-all.q>_______________________________________________
>>>>>>> users mailing list
>>>>>>> users@gridengine.org
>>>>>>> https://gridengine.org/mailman/listinfo/users
>>>>>>
>>>> <qstat.txt><master-node.txt>_______________________________________________
>>>> users mailing list
>>>> users@gridengine.org
>>>> https://gridengine.org/mailman/listinfo/users
>>>
>> <ps.txt><qstat.txt>_______________________________________________
>> users mailing list
>> users@gridengine.org
>> https://gridengine.org/mailman/listinfo/users
> 
> 
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to