We have a cluster with gridengine 6.5u2 and noticing a strange behavior
when running MPI jobs. Our application will finish, yet the processes
continue to run and use up the CPU. We did configure a parallel
environment for MPI as follows:
pe_name mpi
slots 500
user_lists NONE
xuser_lists NONE
start_proc_args NONE
stop_proc_args NONE
allocation_rule $round_robin
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary FALSE
Then we have run our application "Maker" like this,
qsub -cwd -N <NAME> -b y -V -pe mpi <CPUs>
/opt/mpich-install/bin/mpiexec maker <maker options>
It seems to run fine and qstat will show it running. Once it has
completed, qstat is empty again and we have the desired output.
However, the "maker" process have continued to run on the compute nodes
until I login to each node and "kill -9" the processes. We did not have
this problem when running mpiexec directly with Maker, or running Maker
in stand-alone mode (without MPI), so I guess it is a problem with our
qsub command or parallel environment? Any Ideas?
Thanks,
--
Chandler / Systems Administrator
Arizona Genomics Institute
www.genome.arizona.edu
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users