We have a cluster with gridengine 6.5u2 and noticing a strange behavior when running MPI jobs. Our application will finish, yet the processes continue to run and use up the CPU. We did configure a parallel environment for MPI as follows:

pe_name            mpi
slots              500
user_lists         NONE
xuser_lists        NONE
start_proc_args    NONE
stop_proc_args     NONE
allocation_rule    $round_robin
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary FALSE

Then we have run our application "Maker" like this,
qsub -cwd -N <NAME> -b y -V -pe mpi <CPUs> /opt/mpich-install/bin/mpiexec maker <maker options>

It seems to run fine and qstat will show it running. Once it has completed, qstat is empty again and we have the desired output. However, the "maker" process have continued to run on the compute nodes until I login to each node and "kill -9" the processes. We did not have this problem when running mpiexec directly with Maker, or running Maker in stand-alone mode (without MPI), so I guess it is a problem with our qsub command or parallel environment? Any Ideas?

Thanks,
--
Chandler / Systems Administrator
Arizona Genomics Institute
www.genome.arizona.edu
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to