The answer to this does not lie in the number of jobs or comparing raw performance. Your users probably use completely different tools to generate jobs than mine do. Each job submitted can carry with it completely different amounts of data in terms of environment variables, scripts, etc.
We are using SGE 8.1.8 with classic spooling. That last one is probably a contributor to the issue we just had, but I started working with SGE just 6 months ago, so I am still learning the options, mostly discovering how to tune things after the outage. :( Mfg, Juan Jimenez System Administrator, HPC MDC Berlin / IT-Dept. Tel.: +49 30 9406 2800 ________________________________________ From: Jesse Becker [becke...@mail.nih.gov] Sent: Monday, March 20, 2017 22:08 To: Jimenez, Juan Esteban Cc: SGE-discuss@liv.ac.uk Subject: Re: [SGE-discuss] Sizing the qmaster On Mon, Mar 20, 2017 at 08:39:38PM +0000, juanesteban.jime...@mdc-berlin.de wrote: >Hi folks, > >I just ran into my first episode of the scheduler crashing because of too many >submitted jobs. It pegged memory usage to as much as I could give it (12gb at >one point) and still crashed while it tries to work its way through the stack. How many is "too many?" We routinely have 50,000+ jobs, and there's nary a blip in RAM usage on the qmaster. I'm not even sure that the sge_qmaster process uses a Gig of RAM... Just checked, with 3,000+ jobs in the queue, it's got 550MB RSS, and a total of 2.3G of virtual memory (including a large mmap of /usr/lib/locale/locale-archive). >I need to figure out how to size a box properly for a dedicated sge_master. >How do you folks recommend I do this? 12G should be plenty, IME. What version are you running, and what spooling method are you using? -- Jesse Becker (Contractor) _______________________________________________ SGE-discuss mailing list SGE-discuss@liv.ac.uk https://arc.liv.ac.uk/mailman/listinfo/sge-discuss