On 22 February 2012 08:21, Hay, William <[email protected]> wrote:
> On 21 February 2012 19:20, Txema Heredia Genestar <[email protected]> 
> wrote:
>> Hello all,
>>
>> I am having some problems to run threaded jobs in SGE 6.1u4. In our
>> cluster, h_vmem is defined as a consumable attribute in all nodes. It is
>> mandatory, all jobs must request it, with a default value of 6Gb. That
>> constraint leads any "parallel" job sent to the cluster to try to
>> reserve a lot of memory (h_vmem * slots). This is ok for most parallel
>> processes (mpi and the such). But, sometimes, we need to run "threaded"
>> jobs, where all jobs share a chunk of memory (everything on a single
>> node). This leads to situations where I need to send an 8-threaded job
>> that requires, say, 10 Gb of memory, but it cannot be scheduled because
>> no node can handle a 80Gb request. When a memory request cannot be
>> fulfilled, the typical message of "cannot run in PE "smp" because it
>> only offers N slots" appears in qstat (where N is the maximum number of
>> slots I wolud be able to use given the requested h_vmem size).
> The trick we use here is that rather than set up a PE we just add a
> per host consumable threads.
I should clarify that by per host consumable I mean one for which the amount
available is specified in the complex_values section of each exechost
rather than on the queue configuration or the 'global' host.  It is still
consumed per slot.

> We use a JSV to ensure everyone requests at least one thread per slot.
>  I'm not sure if a JSV
> is available in 6.1u4 so you might have to trust your users.  In later
> versions than the 6.2u3
> we're using Grid Engine tries to allocate jobs to specific cores which
> might create a few issues
> since it doesn't know about our consumable.
>
> I believe for h_vmem the resource consumption is agregated for all
> slots on a node before being
> applied so you could just get your users to divide their memory
> consumption by slots when submitting
> (or if you have a JSV get it to do that for them when using the SMP PE).
>
>
> William
>
>
>
>
>>
>> This is the parallel environment I am trying to use:
>>
>> # qconf -sp smp
>> pe_name           smp
>> slots             9999
>> user_lists        test_users
>> xuser_lists       NONE
>> start_proc_args   /bin/true
>> stop_proc_args    /bin/true
>> allocation_rule   $fill_up
>> control_slaves    FALSE
>> job_is_first_task FALSE
>> urgency_slots     min
>>
>> The most annoying part of all this is that this behaviour is not
>> consistent: This morning I've been able to run a 6-threaded job
>> requesting 10Gb of memory in a 48Gb node. But, in the afternoon, the
>> same job using the very same command in the same node could not be run.
>>
>> Does anyone have any suggestion on how to deal with this?
>>
>> Thanks in advance,
>>
>> Txema
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>>
>>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
>
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to