I’ve got a problem with my qmaster. It is running but is unresponsive to
commands like qstat. The process status is mostly D for disk sleep, and when I
run it in non-daemon debug mode it spends a LOT of time reading the
Master_Job_List.
Any clues?
Mfg,
Juan Jimenez
System Administrator, BIH HP
Never mind. One of my users submitted a job with 139k subjobs.
A few other questions:
1) Is it possible to stop a job submission if I know it’s going to make the
qmaster croak?
2) Will switching to the berkeley DB setup in the qmaster alleviate this?
3) Can that be done and still retain the exis
So, if I reinstall using the Berkeley DB spooler, will this mitigate this kind
of problem, or will the qmaster still want to commit hara-kiri by trying to
load everything into memory from the DB?
Mfg,
Juan Jimenez
System Administrator, BIH HPC Cluster
MDC Berlin / IT-Dept.
Tel.: +49 30 9406 2800
I can’t get qmaster to respond. Memory is no longer an issue but the queue is
138,000+ jobs long and it’s not responding to any control commands. I need to
manually delete the master job list.
Am I correct in assuming that if I delete all the subdirectories in the jobs
folder in spool/qmaster,