The answer to this does not lie in the number of jobs or comparing raw 
performance. Your users probably use completely different tools to generate 
jobs than mine do. Each job submitted can carry with it completely different 
amounts of data in terms of environment variables, scripts, etc. 

We are using SGE 8.1.8 with classic spooling. That last one is probably a 
contributor to the issue we just had, but I started working with SGE just 6 
months ago, so I am still learning the options, mostly discovering how to tune 
things after the outage. :(

Mfg,
Juan Jimenez
System Administrator, HPC
MDC Berlin / IT-Dept.
Tel.: +49 30 9406 2800


________________________________________
From: Jesse Becker [becke...@mail.nih.gov]
Sent: Monday, March 20, 2017 22:08
To: Jimenez, Juan Esteban
Cc: SGE-discuss@liv.ac.uk
Subject: Re: [SGE-discuss] Sizing the qmaster

On Mon, Mar 20, 2017 at 08:39:38PM +0000, juanesteban.jime...@mdc-berlin.de 
wrote:
>Hi folks,
>
>I just ran into my first episode of the scheduler crashing because of too many 
>submitted jobs. It pegged memory usage to as much as I could give it (12gb at 
>one point) and still crashed while it tries to work its way through the stack.

How many is "too many?"  We routinely have 50,000+ jobs, and there's
nary a blip in RAM usage on the qmaster.  I'm not even sure that the 
sge_qmaster process uses a Gig of RAM...

Just checked, with 3,000+ jobs in the queue, it's got 550MB RSS, and
a total of 2.3G of virtual memory (including a large mmap of
/usr/lib/locale/locale-archive).


>I need to figure out how to size a box properly for a dedicated sge_master. 
>How do you folks recommend I do this?

12G should be plenty, IME.  What version are you running, and what
spooling method are you using?



--
Jesse Becker (Contractor)
_______________________________________________
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Reply via email to