John Hearns <hear...@googlemail.com> writes: > Agree with what you say Dave. > > Regarding not wanting jobs to use certsin cores ie. reserving low-numbered > cores for OS processes then surely a good way forward is to use a 'boot > cpuset' of one or two cores and let your jobs run on the rest of the cores.
Maybe, if you make sure the resource manager knows about it, and users don't mind losing the cores, presumably resulting in a cock-eyed MPI process distribution. Is it really necessary, compared with simply using core binding? I'd expect the bulk of overheads to be due to the resource manager, especially if it tracks things by grovelling /proc frequently, not to the OS. In cases I've measured, it's typically ~1%, depending on parameters, scaling more slowly than core count. > You're right about cpusets being helpful with 'badly behaved' jobs. > War stories some other time! Well [trying to bring this on topic], things got much more sanitary here after I replaced the wretched Streamline-supplied setup with tight integration of OMPI under SGE and then made the SGE core binding inherited by OMPI work sensibly with partially full nodes.