Howdy.

A user accidentally submitted a 1.4 BILLION job array on our HPC cluster.    How can I remove it?

I cannot qdel the job nor can I qhold the job because it crashes SGE.   I can restart SGE just fine but the job remains.

I removed the SGE job script itself from /var/spool/sge/job_scripts and restarted SGE, job remains.

The only thing I can do is remove tasks a time either one at a time or in groups which works but at 1.4 BILLION tasks, that will take a while.

Added max_aj_task to SGE to prevent this in the future.

   # qconf -sconf|grep tasks
   max_aj_tasks                 100000


Any help appreciated.

Thank you,
Joseph

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to