On 22 February 2012 08:21, Hay, William <[email protected]> wrote: > On 21 February 2012 19:20, Txema Heredia Genestar <[email protected]> > wrote: >> Hello all, >> >> I am having some problems to run threaded jobs in SGE 6.1u4. In our >> cluster, h_vmem is defined as a consumable attribute in all nodes. It is >> mandatory, all jobs must request it, with a default value of 6Gb. That >> constraint leads any "parallel" job sent to the cluster to try to >> reserve a lot of memory (h_vmem * slots). This is ok for most parallel >> processes (mpi and the such). But, sometimes, we need to run "threaded" >> jobs, where all jobs share a chunk of memory (everything on a single >> node). This leads to situations where I need to send an 8-threaded job >> that requires, say, 10 Gb of memory, but it cannot be scheduled because >> no node can handle a 80Gb request. When a memory request cannot be >> fulfilled, the typical message of "cannot run in PE "smp" because it >> only offers N slots" appears in qstat (where N is the maximum number of >> slots I wolud be able to use given the requested h_vmem size). > The trick we use here is that rather than set up a PE we just add a > per host consumable threads. I should clarify that by per host consumable I mean one for which the amount available is specified in the complex_values section of each exechost rather than on the queue configuration or the 'global' host. It is still consumed per slot.
> We use a JSV to ensure everyone requests at least one thread per slot. > I'm not sure if a JSV > is available in 6.1u4 so you might have to trust your users. In later > versions than the 6.2u3 > we're using Grid Engine tries to allocate jobs to specific cores which > might create a few issues > since it doesn't know about our consumable. > > I believe for h_vmem the resource consumption is agregated for all > slots on a node before being > applied so you could just get your users to divide their memory > consumption by slots when submitting > (or if you have a JSV get it to do that for them when using the SMP PE). > > > William > > > > >> >> This is the parallel environment I am trying to use: >> >> # qconf -sp smp >> pe_name smp >> slots 9999 >> user_lists test_users >> xuser_lists NONE >> start_proc_args /bin/true >> stop_proc_args /bin/true >> allocation_rule $fill_up >> control_slaves FALSE >> job_is_first_task FALSE >> urgency_slots min >> >> The most annoying part of all this is that this behaviour is not >> consistent: This morning I've been able to run a 6-threaded job >> requesting 10Gb of memory in a 48Gb node. But, in the afternoon, the >> same job using the very same command in the same node could not be run. >> >> Does anyone have any suggestion on how to deal with this? >> >> Thanks in advance, >> >> Txema >> >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users >> >> > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
