[SGE-discuss] Qmaster unresponsive, process status "disk sleep"

2017-06-27 Thread juanesteban.jime...@mdc-berlin.de
I’ve got a problem with my qmaster. It is running but is unresponsive to commands like qstat. The process status is mostly D for disk sleep, and when I run it in non-daemon debug mode it spends a LOT of time reading the Master_Job_List. Any clues? Mfg, Juan Jimenez System Administrator, BIH HP

Re: [SGE-discuss] Qmaster unresponsive, process status "disk sleep"

2017-06-27 Thread juanesteban.jime...@mdc-berlin.de
Never mind. One of my users submitted a job with 139k subjobs. A few other questions: 1) Is it possible to stop a job submission if I know it’s going to make the qmaster croak? 2) Will switching to the berkeley DB setup in the qmaster alleviate this? 3) Can that be done and still retain the exis

Re: [SGE-discuss] Qmaster unresponsive, process status "disk sleep"

2017-06-27 Thread juanesteban.jime...@mdc-berlin.de
So, if I reinstall using the Berkeley DB spooler, will this mitigate this kind of problem, or will the qmaster still want to commit hara-kiri by trying to load everything into memory from the DB? Mfg, Juan Jimenez System Administrator, BIH HPC Cluster MDC Berlin / IT-Dept. Tel.: +49 30 9406 2800

Re: [SGE-discuss] Qmaster unresponsive, process status "disk sleep"

2017-06-27 Thread juanesteban.jime...@mdc-berlin.de
I can’t get qmaster to respond. Memory is no longer an issue but the queue is 138,000+ jobs long and it’s not responding to any control commands. I need to manually delete the master job list. Am I correct in assuming that if I delete all the subdirectories in the jobs folder in spool/qmaster,