Also, kernel reporting Load average seems to be broken on Centos-6, it is fixed on Centos-7 O.
-----Original Message----- From: users-boun...@gridengine.org <users-boun...@gridengine.org> On Behalf Of Skylar Thompson Sent: Thursday, August 29, 2019 5:38 PM To: Mike Serkov <serko...@gmail.com> Cc: users@gridengine.org Subject: Re: [gridengine users] limit CPU/slot resource to the number of reserved slots We actually run CentOS 6 as well, and haven't seen this problem, though maybe our users haven't done anything as untoward as yours. We do have a bunch of bioinformatics code (including Java) so I thought we would have seen the worst cases. On Thu, Aug 29, 2019 at 10:50:27AM -0400, Mike Serkov wrote: > Load average indeed. The thing is that if, we have a parallel process bound > to one core, the kernel scheduler has to constantly switch those threads from > running to sleeping state and back and do context switch which creates some > overhead on the system itself. Imagine you have 64CPU box, each core runs > such a job, and every job spawns 64 threads ( which is a usual case, as many > tools just do system call to identify amount of cpus they can use by default > ). In both cases with affinity forced and without - it is not a good > situation. In case of affinity is enforced, in extreme cases we had nodes > just frozen, especially in cases when heavy I/O also a case ( probably > because of overhead on the kernel scheduler ). It was on RHEL6, maybe on > modern kernels it is much better. All I want to say is that unlike memory > limitations with cgroups, when you are actually sure that process can???t > allocate more, with cpusets it is a bit different. Users still can run as > many parallel proces! ses as they want. They are limited to a number of physical CPUS, but still it may affect the node and other jobs. > > Best regards, > Mikhail Serkov > > > On Aug 29, 2019, at 10:20 AM, Skylar Thompson <skyl...@uw.edu> wrote: > > > > Load average gets high if the job spawns more processes/threads than > > allocated CPUs, but we haven't seen any problem with node > > instability. We did have to remove np_load_avg from load_thresholds, > > though, to keep our users from DoS'ing the cluster... > > > >> On Thu, Aug 29, 2019 at 05:27:36AM -0400, Mike Serkov wrote: > >> Also, something to keep in mind - cgroups will not solve this issue > >> completely. It is just affinity enforcement. If the job spawns multiple > >> threads and they all active - it will cause LA growing as well as some > >> other side effects, regardless affinity setting. On big SMP boxes it may > >> actually cause more instability. Anyway, jobs should be configured to use > >> exact amount of threads they request, and it should be monitored. > >> > >> Best regards, > >> Mikhail Serkov > >> > >>> On Aug 29, 2019, at 4:16 AM, Ondrej Valousek > >>> <ondrej.valou...@adestotech.com> wrote: > >>> > >>> Also a quick note: cgroups is the way to _enforce_ CPU affinity. > >>> For vast majority of the jobs, I would say just a simple taskset > >>> configuration (i.e. i.e. something like ???-l binding linear???) would do > >>> as well. > >>> > >>> > >>> From: Dietmar Rieder <dietmar.rie...@i-med.ac.at> > >>> Sent: Thursday, August 29, 2019 9:37 AM > >>> To: users@gridengine.org; Ondrej Valousek > >>> <ondrej.valou...@adestotech.com>; users <users@gridengine.org> > >>> Subject: Re: [gridengine users] limit CPU/slot resource to the > >>> number of reserved slots > >>> > >>> Great, thanks so much! > >>> > >>> Dietmar > >>> > >>> Am 29. August 2019 09:05:35 MESZ schrieb Ondrej Valousek > >>> <ondrej.valou...@adestotech.com>: > >>> Nope, > >>> SoGE (as of 8.1.9) supports CGROUPS w/o any code changes, just add > >>> ???USE_CGROUPS=yes??? to the exec parameter list to make shepherd use > >>> CGroup saveset controller. > >>> My path only extends it to supports system and hence possibility to hard > >>> enforce memory/cpu limits, etc??? > >>> Hth, > >>> Ondrej > >>> > >>> From: Daniel Povey <dpo...@gmail.com> > >>> Sent: Monday, August 26, 2019 10:12 PM > >>> To: Dietmar Rieder <dietmar.rie...@i-med.ac.at>; Ondrej Valousek > >>> <ondrej.valou...@adestotech.com>; users <users@gridengine.org> > >>> Subject: Re: [gridengine users] limit CPU/slot resource to the > >>> number of reserved slots > >>> > >>> I don't think it's supported in Son of GridEngine. Ondrej > >>> Valousek (cc'd) described in the first thread here > >>> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fa > >>> rc.liv.ac.uk%2Fpipermail%2Fsge-discuss%2F2019-August%2Fthread.html > >>> &data=02%7C01%7Condrej.valousek%40adestotech.com%7C54cd82c8588 > >>> 34930061608d72c975347%7C2ccd8edaa14a4b4f825ce6ad71d71b81%7C0%7C0%7 > >>> C637026900767853425&sdata=90I%2FqZtw1FSSPMZdy4zBGu6f%2F%2BMMLC > >>> 8yLqNLk9HsSqE%3D&reserved=0 how he was able to implement it, > >>> but it required code changes, i.e. you would need to figure out how to > >>> build and install SGE from source, which is a task in itself. > >>> > >>> Dan > >>> > >>> > >>> On Mon, Aug 26, 2019 at 12:46 PM Dietmar Rieder > >>> <dietmar.rie...@i-med.ac.at> wrote: > >>> Hi, > >>> > >>> thanks for your reply. This sounds promising. > >>> We are using Son of Grid Engine though. Can you point me to the > >>> right docs to get cgroup enabled in the exec host (CentOS 7). I > >>> must admit I have no experience with cgroups. > >>> > >>> Thanks again > >>> Dietmar > >>> > >>>> On 8/26/19 4:03 PM, Skylar Thompson wrote: > >>>> At least for UGE, you will want to use the CPU set integration, > >>>> which will assign the job to a cgroup that has one CPU per > >>>> requested slot. Once you have cgroups enabled in the exec host > >>>> OS, you can then set these options in > >>>> sge_conf: > >>>> > >>>> cgroup_path=/cgroup > >>>> cpuset=1 > >>>> > >>>> You can use this mechanism to have the m_mem_free request enforced as > >>>> well. > >>>> > >>>>> On Mon, Aug 26, 2019 at 02:15:22PM +0200, Dietmar Rieder wrote: > >>>>> Hi, > >>>>> > >>>>> may be this is a stupid question, but I'd like to limit the > >>>>> used/usable number of cores to the number of slots that were reserved > >>>>> for a job. > >>>>> > >>>>> We often see that people reserve 1 slot, e.g. "qsub -pe smp 1 [...]" > >>>>> but their program is then running in parallel on multiple cores. > >>>>> How can this be prevented? Is it possible that with reserving > >>>>> only one slot a process can not utilize more than this? > >>>>> > >>>>> I was told the this should be possible in slurm (which we don't > >>>>> have, and to which we don't want to switch to currently). > >>>>> > >>>>> Thanks > >>>>> Dietmar > >>>> > >>> > >>> > >>> -- > >>> _________________________________________ > >>> D i e t m a r R i e d e r, Mag.Dr. > >>> Innsbruck Medical University > >>> Biocenter - Institute of Bioinformatics > >>> Email: dietmar.rie...@i-med.ac.at > >>> Web: > >>> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.icbi.at&data=02%7C01%7Condrej.valousek%40adestotech.com%7C54cd82c858834930061608d72c975347%7C2ccd8edaa14a4b4f825ce6ad71d71b81%7C0%7C0%7C637026900767853425&sdata=%2FaH5TK%2FqubJ8bOGsKnyZPiJEbQeJcH0Z%2B4YYa9EeVAs%3D&reserved=0 > >>> > >>> > >>> _______________________________________________ > >>> users mailing list > >>> users@gridengine.org > >>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2F > >>> gridengine.org%2Fmailman%2Flistinfo%2Fusers&data=02%7C01%7Cond > >>> rej.valousek%40adestotech.com%7C54cd82c858834930061608d72c975347%7 > >>> C2ccd8edaa14a4b4f825ce6ad71d71b81%7C0%7C0%7C637026900767853425& > >>> ;sdata=zo6154rMkTXqwEGaLse6Kc61u7jJzJh139YWKhVHodg%3D&reserved > >>> =0 > >>> > >>> -- > >>> D i e t m a r R i e d e r, Mag.Dr. > >>> Innsbruck Medical University > >>> Biocenter - Institute of Bioinformatics Innrain 80, 6020 Innsbruck > >>> Phone: +43 512 9003 71402 > >>> Fax: +43 512 9003 73100 > >>> Email: dietmar.rie...@i-med.ac.at > >>> Web: > >>> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fw > >>> ww.icbi.at&data=02%7C01%7Condrej.valousek%40adestotech.com%7C5 > >>> 4cd82c858834930061608d72c975347%7C2ccd8edaa14a4b4f825ce6ad71d71b81 > >>> %7C0%7C0%7C637026900767853425&sdata=%2FaH5TK%2FqubJ8bOGsKnyZPi > >>> JEbQeJcH0Z%2B4YYa9EeVAs%3D&reserved=0 > >>> _______________________________________________ > >>> users mailing list > >>> users@gridengine.org > >>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2F > >>> gridengine.org%2Fmailman%2Flistinfo%2Fusers&data=02%7C01%7Cond > >>> rej.valousek%40adestotech.com%7C54cd82c858834930061608d72c975347%7 > >>> C2ccd8edaa14a4b4f825ce6ad71d71b81%7C0%7C0%7C637026900767853425& > >>> ;sdata=zo6154rMkTXqwEGaLse6Kc61u7jJzJh139YWKhVHodg%3D&reserved > >>> =0 > > > >> _______________________________________________ > >> users mailing list > >> users@gridengine.org > >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fg > >> ridengine.org%2Fmailman%2Flistinfo%2Fusers&data=02%7C01%7Condre > >> j.valousek%40adestotech.com%7C54cd82c858834930061608d72c975347%7C2c > >> cd8edaa14a4b4f825ce6ad71d71b81%7C0%7C0%7C637026900767853425&sda > >> ta=zo6154rMkTXqwEGaLse6Kc61u7jJzJh139YWKhVHodg%3D&reserved=0 > > > > > > -- > > -- Skylar Thompson (skyl...@u.washington.edu) > > -- Genome Sciences Department, System Administrator > > -- Foege Building S046, (206)-685-7354 > > -- University of Washington School of Medicine > > _______________________________________________ > > users mailing list > > users@gridengine.org > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgr > > idengine.org%2Fmailman%2Flistinfo%2Fusers&data=02%7C01%7Condrej. > > valousek%40adestotech.com%7C54cd82c858834930061608d72c975347%7C2ccd8 > > edaa14a4b4f825ce6ad71d71b81%7C0%7C0%7C637026900767853425&sdata=z > > o6154rMkTXqwEGaLse6Kc61u7jJzJh139YWKhVHodg%3D&reserved=0 -- -- Skylar Thompson (skyl...@u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine _______________________________________________ users mailing list users@gridengine.org https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgridengine.org%2Fmailman%2Flistinfo%2Fusers&data=02%7C01%7Condrej.valousek%40adestotech.com%7C54cd82c858834930061608d72c975347%7C2ccd8edaa14a4b4f825ce6ad71d71b81%7C0%7C0%7C637026900767853425&sdata=zo6154rMkTXqwEGaLse6Kc61u7jJzJh139YWKhVHodg%3D&reserved=0 _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users