Hi, > Am 15.08.2016 um 18:48 schrieb Ulrich Hiller <hil...@mpia-hd.mpg.de>: > > Excuse me, i have committed a stupid mistake. The extra mpihello > processes were leftovers from previous runs (sge processes aborted by > qdel command). Now in this detail the world is as it should be. The > number of processes on the nodes now sums to the number of the allocated > slots. > I have attached the output of the 'ps -e f' command of the master node > and the output of the 'qstat -g t -u ulrich' command. > > This seems to me to be correct. > > Remains the original problem, why jobs allocate cores on node but do > nothing. > As i wrote before, there is propably no OpenMP incidence. > The qmaster/messages file does not say anything about hanging/pending jobs. > > The problem is that i could not reproduce today nodes which do nothing > despite their cores are allocated. let me test a bit until i reproduce > the problem. Then i will send you the output of 'ps -e f' and qstat.
Fine. > Is there anything else which i could test? Not for now. -- Reuti > With kind regards, and thanks a lot for your help so far, ulrich > > > On 08/15/2016 05:37 PM, Reuti wrote: >> >>> Am 15.08.2016 um 17:03 schrieb Ulrich Hiller <hil...@mpia-hd.mpg.de>: >>> >>> Hello, >>> >>> thank you for the clarification. I must have misunderstood you. >>> Now i did it.The master node was in the example i send now exec-node01 >>> (it varied from attempt to attempt). The output is in the master-node >>> file. The qstat file is the output of >>> qstat -g t -u '*' >>> That seems to look normal. >>> >>> Now i created a simple C file with an endless loop. >>> #include <stdio.h> >>> int main() >>> { >>> int x; >>> for(x=0;x=10;x=x+1) >>> { >>> puts("Hello"); >>> ; >>> } >>> return(0); >>> } >>> >>> and compiled it: >>> mpicc mpihello.c -o mpihello >>> and started qsub: >>> qsub -pe orte 300 -j yes -cwd -S /bin/bash <<< "mpiexec -n 300 mpihello" >>> The outputs look the same as for the sleep command above. >>> But now i counted the jobs: >>> >>> qstat -g t -u '*' | grep -ic slave >>> This results in the number '300', which i expected. >>> >>> On the execute nodes i did: >>> ps -ef | grep mpihello | grep -v grep | grep -vc mpiexec >> >> f w/o - >> >> $ ps -e f >> >> will list a nice tree of the processes. >> >> >>> (i counted the 'mpihello' processes) >>> This is the result: >>> exec-node01: 43 >>> exec-node02: 82 >>> exec-node03: 83 >>> exec-node04: 82 >>> exec-node05: 82 >>> exec-node06: 80 >>> exec-node07: 64 >>> exec-node08: 64 >> >> To investigate this it would be good to post the complete slot allocation by >> `qstat -g t -u <your user>`, the master of the MPI application and one of >> the slave nodes' `ps -e f --cols=500`. Any "mpihello" in the path? >> >> -- Reuti >> >> >>> Which gives the sum of 580. >>> When i count the number of free solts together (from 'qhost -q') i also >>> get 300, which i expect. >>> Where do the extra processes on the nodes come from? >>> >>> This difference is reproducible. >>> >>> libgomp.so.1.0.0 library is installed, but aqpart from that nothing with >>> OpenMP. >>> >>> With kind regards, ulrich >>> >>> >>> >>> >>> >>> >>> >>> On 08/15/2016 02:30 PM, Ulrich Hiller wrote: >>>> Hello, >>>> >>>>> The other issue seems to be, that in fact your job is using only one >>>> machine, which means that it is essentially ignoring any granted slot >>>> allocation. While the job is running, can you please execute on the >>>> master node of the parallel job: >>>>> >>>>> $ ps -e f >>>>> >>>>> (f w/o -) and post the relevant lines belonging to either sge_execd or >>>> just running as kids of the init process, in case they jumped out of the >>>> process tree. Maybe a good start would be to execute something like >>>> `mpiexec sleep 300` in the jobscript. >>>>> >>>> >>>> i invoked >>>> qsub -pe orte 160 -j yes -cwd -S /bin/bash <<< "mpiexec -n 160 sleep 300" >>>> >>>> the only line ('ps -e f') on the master node was: >>>> 55722 ? Sl 3:42 /opt/sge/bin/lx-amd64/sge_qmaster >>>> >>>> No other sge lines, no child processes from it, and no other init >>>> processes leading to sge While at the same time the sleep processes were >>>> running on the nodes (Checked with ps command on the nodes). >>>> >>>> The qstat command gave : >>>> 264 0.60500 STDIN ulrich r 08/15/2016 11:33:02 >>>> all.q@exec-node01 MASTER >>>> >>>> all.q@exec-node01 SLAVE >>>> >>>> all.q@exec-node01 SLAVE >>>> >>>> all.q@exec-node01 SLAVE >>>> [ ...] >>>> >>>> 264 0.60500 STDIN ulrich r 08/15/2016 11:33:02 >>>> all.q@exec-node03 SLAVE >>>> >>>> all.q@exec-node03 SLAVE >>>> >>>> all.q@exec-node03 SLAVE >>>> [ ... ] >>>> 264 0.60500 STDIN ulrich r 08/15/2016 11:33:02 >>>> all.q@exec-node05 SLAVE >>>> >>>> all.q@exec-node05 SLAVE >>>> [ ...] >>>> >>>> >>>> Because there was only the master deamon running on the master node, and >>>> you were tlaking about child processes: Was this now normal behaviour my >>>> cluster showed or is there something wrong? >>>> >>>> Kind reagrds, ulrich >>>> >>>> >>>> >>>> On 08/12/2016 07:11 PM, Reuti wrote: >>>>> Hi, >>>>> >>>>>> Am 12.08.2016 um 18:48 schrieb Ulrich Hiller <hil...@mpia-hd.mpg.de>: >>>>>> >>>>>> Hello, >>>>>> >>>>>> i have a strange effect, where i am not sure whether it is "only" a >>>>>> misconfiguration or a bug. >>>>>> >>>>>> First: I run son of gridengine 8.1.9-1.el6.x86_64 (i installed the rhel >>>>>> rpm on an opensuse 13.1 machine. This should not matter in this case, >>>>>> and it is reported to be able to run on opensuse). >>>>>> >>>>>> mpirun and mpiexec are from openmpi-1.10.3 (no other mpi was installed, >>>>>> neither on master, nor on slaves). The installation was made with: >>>>>> ./configure --prefix=`pwd`/build --disable-dlopen --disable-mca-dso >>>>>> --with-orte --with-sge --with-x --enable-mpi-thread-multiple >>>>>> --enable-orterun-prefix-by-default --enable-mpirun-prefix-by-default >>>>>> --enable-orte-static-ports --enable-mpi-cxx --enable-mpi-cxx-seek >>>>>> --enable-oshmem --enable-java --enable-mpi-java >>>>>> make >>>>>> make install >>>>>> >>>>>> I attached the outputs of 'qconf -ap all.q' , 'qconf -sconf' and 'qconf >>>>>> -sp orte' as textfiles. >>>>>> >>>>>> Now my problem: >>>>>> I asked for 20 cores and if i run qstat -u '*' it shows that this job >>>>>> is being run in slave07 using 20 cores but is not true! if i run qstat >>>>>> -f -u '*' i see that this job is only using 3 cores in salve07 and >>>>>> there are 17 cores in other nodes allocated to this job which are in fact >>>>>> unused! >>>>> >>>>> qstat will list only the master node of the parallel job and the number >>>>> of overall slots. The granted allocation you can check with: >>>>> >>>>> $ qstat -g t -u '*' >>>>> >>>>> The other issue seems to be, that in fact your job is using only one >>>>> machine, which means that it is essentially ignoring any granted slot >>>>> allocation. While the job is running, can you please execute on the >>>>> master node of the parallel job: >>>>> >>>>> $ ps -e f >>>>> >>>>> (f w/o -) and post the relevant lines belonging to either sge_execd or >>>>> just running as kids of the init process, in case they jumped out of the >>>>> process tree. Maybe a good start would be to execute something like >>>>> `mpiexec sleep 300` in the jobscript. >>>>> >>>>> Next step could be a `mpihello.c` where you put an almost endless loop >>>>> inside and switch off all optimizations during compilations to check >>>>> whether these slave processes are distributed in the correct way. >>>>> >>>>> Note that some applications will check the number of cores they are >>>>> running on and start by OpenMP (not Open MPI) as many threads as cores >>>>> are found. Could this be the case for your application too? >>>>> >>>>> -- Reuti >>>>> >>>>> >>>>>> Or other example: >>>>>> My job took say 6 cpus on slave07 and 14 on slave06 but nothing was >>>>>> running on 06 and therefore a waste of ressource on 06 and overload on >>>>>> 07 becomes highly possible (the numbers are made up). >>>>>> If i ran 1 Cpus in many independent jobs that would not be an issue, but >>>>>> imagine i now request 60 cpus on slave07, that would seriously overload >>>>>> the node in many cases. >>>>>> >>>>>> Or other example: >>>>>> if i ask for say 50 CPUs, the job will start on one node, e.g, >>>>>> slave01, but only reserving say 15 CPUs out of 64 and reserve the rest >>>>>> on many other nodes (obviously wasting space doing nothing). >>>>>> This has the bad consequence of allocating many more CPUs than available >>>>>> when many jobs are running, imagine you have 10 jobs like this one... >>>>>> some nodes will run maybe 3 even if they only have 24 CPUs... >>>>>> >>>>>> I hope that i have made clear what the issue is. >>>>>> >>>>>> I also see that the `qstat` and `qstat -f` are in disagreement. The >>>>>> latter is correct, i checked the processes running on the nodes. >>>>>> >>>>>> >>>>>> Did somebody already encounter such a problem? Does somebody have an >>>>>> idea where to look into or what to test? >>>>>> >>>>>> With kind regards, ulrich >>>>>> >>>>>> >>>>>> >>>>>> <qhost.txt><qconf-sconf.txt><qconf-mp-orte.txt><qconf-all.q>_______________________________________________ >>>>>> users mailing list >>>>>> users@gridengine.org >>>>>> https://gridengine.org/mailman/listinfo/users >>>>> >>> <qstat.txt><master-node.txt>_______________________________________________ >>> users mailing list >>> users@gridengine.org >>> https://gridengine.org/mailman/listinfo/users >> > <ps.txt><qstat.txt>_______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users