You could change the
consumable to from YES to JOB



On 02/21/2012 11:20 AM, Txema Heredia Genestar wrote:
Hello all,

I am having some problems to run threaded jobs in SGE 6.1u4. In our cluster, h_vmem is defined as a consumable attribute in all nodes. It is mandatory, all jobs must request it, with a default value of 6Gb. That constraint leads any "parallel" job sent to the cluster to try to reserve a lot of memory (h_vmem * slots). This is ok for most parallel processes (mpi and the such). But, sometimes, we need to run "threaded" jobs, where all jobs share a chunk of memory (everything on a single node). This leads to situations where I need to send an 8-threaded job that requires, say, 10 Gb of memory, but it cannot be scheduled because no node can handle a 80Gb request. When a memory request cannot be fulfilled, the typical message of "cannot run in PE "smp" because it only offers N slots" appears in qstat (where N is the maximum number of slots I wolud be able to use given the requested h_vmem size).

This is the parallel environment I am trying to use:

# qconf -sp smp
pe_name           smp
slots             9999
user_lists        test_users
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /bin/true
allocation_rule   $fill_up
control_slaves    FALSE
job_is_first_task FALSE
urgency_slots     min

The most annoying part of all this is that this behaviour is not consistent: This morning I've been able to run a 6-threaded job requesting 10Gb of memory in a 48Gb node. But, in the afternoon, the same job using the very same command in the same node could not be run.

Does anyone have any suggestion on how to deal with this?

Thanks in advance,

Txema

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to