Re: [gridengine users] job cannot run in parallel environment "smp" because it only offers 2 slots

Txema Heredia Genestar Wed, 22 Feb 2012 09:58:43 -0800

Hi,

Thanks for yous answers, I'll go one by one, but first, a fewclarifications:1- We are stuck with 6.1u4. In a few weeks we will install a newcluster, with a more recent version.2- I don't care about "smp". In fact, before reading your answers Inever understood properly the differences between $pe_slots and$fill_up. I have a $pe_slots parallel environment called "threaded" andthe problem is still there. Basically, I just want my PE to NOT multiplythe memory reservation.


Now, your answers:

Bob - I would like to use "consumable JOB", but, unfortunately, this isnot available until SGE 6.2. Even though, that would screw up any mpijob trying to run in our cluster. We mainly run single-core jobs, butfrom time to time some threaded or mpi jobs need to be run.

Mazouzi - Right now I have PE's only available in two "testing" nodes.The problem happens in them both.

Reuti - I have tried both combinations: 1-queue@1-node and1-queue@N-nodes. No luck, same problem everywhere. In fact, one node has48Gb while the other has 56Gb, so when I ask for a 6-threaded 10Gb job(60Gb total), one node replies stating that it only offers 4 slots, andthe other offers 5.I have read your ticket and that is exactly my problem, the resourcesmultiply. But, as far as I know, they solved it with the "consumableJOB" thing? Unfortunately the links are broken (http://gridengine.sunsource.net/nonav/source/browse/~checkout~/gridengine/doc/devel/rfe/non-multiplied-pe-requests.txt).

JSV's are a nope in 6.1u4

William - Yours is my best bet. Long time ago I tried tinkering with the"slots" attribute, but never thought about adding this threaded one. Ionly see one (minor) flaw in your solution: I cannot ask for an intervalof threads (from 4 to 8) as with -pe. This condemns to oblivion in thewaiting queue any job sent while our cluster is under some load. Thatwould need to be addressed by manually scheduling. But that will do, thanks.


Thank you very much.

Txema

PS: One last question: As I have no experience with 6.2 and JSV, whatshould be my to-go approach once we install our new cluster with anup-to-date version?





El 21/02/12 21:40, Reuti escribió:

Hi,

Am 21.02.2012 um 20:20 schrieb Txema Heredia Genestar:

Hello all,

I am having some problems to run threaded jobs in SGE 6.1u4. In our cluster, h_vmem is defined as a consumable 
attribute in all nodes. It is mandatory, all jobs must request it, with a default value of 6Gb. That constraint leads 
any "parallel" job sent to the cluster to try to reserve a lot of memory (h_vmem * slots). This is ok for 
most parallel processes (mpi and the such). But, sometimes, we need to run "threaded" jobs, where all jobs 
share a chunk of memory (everything on a single node). This leads to situations where I need to send an 8-threaded job 
that requires, say, 10 Gb of memory, but it cannot be scheduled because no node can handle a 80Gb request. When a 
memory request cannot be fulfilled, the typical message of "cannot run in PE "smp" because it only 
offers N slots" appears in qstat (where N is the maximum number of slots I wolud be able to use given the 
requested h_vmem size).

This is the parallel environment I am trying to use:

# qconf -sp smp
pe_name           smp
slots             9999
user_lists        test_users
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /bin/true
allocation_rule   $fill_up

for SMP mode you will need $pe_slots here, unless you are requesting exactly 
one node in addition in the submission command.

I assume before you got simply more than one node.

==

The answer from Bob changing the complex h_vmem to JOB would help for this type 
of job, but not if you have also MPI jobs in the cluster. I had an RFE for 
introducing this on a PE level:

https://arc.liv.ac.uk/trac/SGE/ticket/197

To cite from the issue "Therefore I wrote, that an entry inthe PE would still be 
advantageous: h_vmem can only be JOBS or YES"

==

For now: you could adjust the memory request in a JSV depending on the 
requested PE, but for this you need 6.2 IIRC.

-- Reuti

control_slaves    FALSE
job_is_first_task FALSE
urgency_slots     min

The most annoying part of all this is that this behaviour is not consistent: 
This morning I've been able to run a 6-threaded job requesting 10Gb of memory 
in a 48Gb node. But, in the afternoon, the same job using the very same command 
in the same node could not be run.

Does anyone have any suggestion on how to deal with this?

Thanks in advance,

Txema

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users




_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] job cannot run in parallel environment "smp" because it only offers 2 slots

Reply via email to