this type of question should be on gridengine user list


On 5/15/2012 4:32 PM, Jake Carroll wrote:
Hi all.

A couple of quick questions this morning with some ROCKS/SGE scheduler 
semantics.


   1.  I've got some new users who want to drive the cluster we have set up 
with the very maximum efficiency possible. I.e – a user can use as much of the 
cluster that is possible when they submit a job. With over 1000 cores  but many 
users, one of the things we did do was limit a users ability to take up more 
than about 300 or 400 slots, such that they could only ever utilise maybe 20 to 
30% of the cluster at any given time. My new users don't like this –and they 
want to be able to use 100% of the system, if it's free and no other jobs are 
running. Now, my understanding is that we could definitely remove that limit of 
300 or 400 slots/jobs, but it'll have a couple of detrimental impacts:

Primarily –it'll preclude any other user from starting jobs at any given time 
if their jobs are running, as there are no free slots.

2. My users told me "no, no – you can simply put our jobs "to sleep" when 
others in the queue log in to run their jobs.

Now, my understanding of that is, yes, that is possible (though, I don't know how it's 
implemented – fairshare policy queue / weight perhaps?) BUT it has the big drawback that 
when a users job is "asleep", it will actually still keep ahold of the memory 
allocation on the node, thus, if another big mem job comes along and the node is memory 
over-subscribed, crashing scenarios will ensue! Can somebody confirm that kind of 
functionality/concern for me?

3. My users want jobs to "persist" over the course of a cluster head node 
crash. Would I be right in saying that it's only possible to persist across crashes if 
the users are using CHECKPOINTING in their jobs? I've heard of it before –just never 
implemented it and don't know where to start.

Thank you for your time, all.

JC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20120515/2b7e4f8b/attachment.html

--


<<attachment: laotsao.vcf>>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to