Update: Going through the spool messages of comp065 I found this message: 10/21/2014 14:48:34| main|comp065|E|can't start job "155": can't open file /opt/gridengine/default/spool/comp065/active_jobs/155.1/pe_hostfile: No such file or
Note that spool directory is a mounted NFS directory. I tried to open all permissions to spool directory to eliminate permission issues but still no results. --- Regards, Waleed Lutfi Cell: (+20) 122 344 8269 On Tue, Oct 21, 2014 at 2:43 PM, Waleed Lutfi <waleed.lutf...@gmail.com> wrote: > Dear all, > > I am currently configuring Grid Engine on a fresh install of Rocks > cluster. I have 3 compute nodes. Whenever I submit any job it only runs on > 1 of the nodes and the other nodes' jobs halt in 't' state. > > Running 'qconf -tsm', I get the following log: > > Tue Oct 21 14:36:49 2014|-------------START-SCHEDULER-RUN------------- > Tue Oct 21 14:36:49 2014|queue instance "all.q@comp065.local" dropped > because it is temporarily not available > Tue Oct 21 14:36:49 2014|queue instance "all.q@comp067.local" dropped > because it is temporarily not available > Tue Oct 21 14:36:49 2014|queues dropped because they are temporarily not > available: all.q@comp065.local all.q@comp067.local > Tue Oct 21 14:36:49 2014|JOB 153.1 [1] in queue all.q@comp066.local > increased absolute lc of host comp066.local by 95 to 95 > Tue Oct 21 14:36:49 2014|no pending jobs to perform scheduling on > Tue Oct 21 14:36:49 2014|--------------STOP-SCHEDULER-RUN------------- > > I reinstalled the problematic nodes, but I still get the same error. > > Any help would be appreciated. Thank you all for your time. > > --- > Regards, > Waleed Lutfi > > Cell: (+20) 122 344 8269 >
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users