[gridengine users] job cannot run in parallel environment "smp" because it only offers 2 slots

Txema Heredia Genestar Tue, 21 Feb 2012 11:21:40 -0800

Hello all,

I am having some problems to run threaded jobs in SGE 6.1u4. In ourcluster, h_vmem is defined as a consumable attribute in all nodes. It ismandatory, all jobs must request it, with a default value of 6Gb. Thatconstraint leads any "parallel" job sent to the cluster to try toreserve a lot of memory (h_vmem * slots). This is ok for most parallelprocesses (mpi and the such). But, sometimes, we need to run "threaded"jobs, where all jobs share a chunk of memory (everything on a singlenode). This leads to situations where I need to send an 8-threaded jobthat requires, say, 10 Gb of memory, but it cannot be scheduled becauseno node can handle a 80Gb request. When a memory request cannot befulfilled, the typical message of "cannot run in PE "smp" because itonly offers N slots" appears in qstat (where N is the maximum number ofslots I wolud be able to use given the requested h_vmem size).


This is the parallel environment I am trying to use:

# qconf -sp smp
pe_name           smp
slots             9999
user_lists        test_users
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /bin/true
allocation_rule   $fill_up
control_slaves    FALSE
job_is_first_task FALSE
urgency_slots     min

The most annoying part of all this is that this behaviour is notconsistent: This morning I've been able to run a 6-threaded jobrequesting 10Gb of memory in a 48Gb node. But, in the afternoon, thesame job using the very same command in the same node could not be run.


Does anyone have any suggestion on how to deal with this?

Thanks in advance,

Txema

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

[gridengine users] job cannot run in parallel environment "smp" because it only offers 2 slots

Reply via email to