Update:
Going through the spool messages of comp065 I found this message:
10/21/2014 14:48:34|  main|comp065|E|can't start job "155": can't open file
/opt/gridengine/default/spool/comp065/active_jobs/155.1/pe_hostfile: No
such file or

Note that spool directory is a mounted NFS directory.

I tried to open all permissions to spool directory to eliminate permission
issues but still no results.

---
Regards,
Waleed Lutfi

Cell: (+20) 122 344 8269

On Tue, Oct 21, 2014 at 2:43 PM, Waleed Lutfi <waleed.lutf...@gmail.com>
wrote:

> Dear all,
>
> I am currently configuring Grid Engine on a fresh install of Rocks
> cluster. I have 3 compute nodes. Whenever I submit any job it only runs on
> 1 of the nodes and the other nodes' jobs halt in 't' state.
>
> Running 'qconf -tsm', I get the following log:
>
> Tue Oct 21 14:36:49 2014|-------------START-SCHEDULER-RUN-------------
> Tue Oct 21 14:36:49 2014|queue instance "all.q@comp065.local" dropped
> because it is temporarily not available
> Tue Oct 21 14:36:49 2014|queue instance "all.q@comp067.local" dropped
> because it is temporarily not available
> Tue Oct 21 14:36:49 2014|queues dropped because they are temporarily not
> available: all.q@comp065.local all.q@comp067.local
> Tue Oct 21 14:36:49 2014|JOB 153.1 [1] in queue all.q@comp066.local
> increased absolute lc of host comp066.local by 95 to 95
> Tue Oct 21 14:36:49 2014|no pending jobs to perform scheduling on
> Tue Oct 21 14:36:49 2014|--------------STOP-SCHEDULER-RUN-------------
>
> I reinstalled the problematic nodes, but I still get the same error.
>
> Any help would be appreciated. Thank you all for your time.
>
> ---
> Regards,
> Waleed Lutfi
>
> Cell: (+20) 122 344 8269
>
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to