When is the last anyone has heard from Dave Love? It's not clear to me that he (or anyone) is still maintaining GridEngine.
I started a github with the intention of becoming the maintainer myself, here: https://github.com/son-of-gridengine/sge but stopped after Dave Love objected. I also realized that I didn't have the time, and didn't fully understand a lot of aspects of e.g. the build and packaging process. I would be happy to give up control of that if someone else wants to maintain it. What would be the most ideal is to have a smooth and voluntary handover to a new generation, if someone can persuade Dave to do this it would be the best... perhaps have it be led by someone he trusts. Dan On Fri, Aug 23, 2019 at 5:40 AM Ondrej Valousek < ondrej.valou...@adestotech.com> wrote: > > Hi, > > I just spent few days on this and have a working proposal. > Functionality: > > 1. Execd_params can be now set to "USE_CGROUPS=systemd" > 2. This causes shepherd to launch jobs via "systemd-run" scope units > rather than execing them directly. > 3. Jobs are no longer monitored via PDC (Sge process data collector) > but via CGroup the system-run above creates. > 4. Processes are no longer assigned additional GID (gid-range is > ignored) as this is no longer needed > 5. As we are using cgroups, all forked tasks are automatically killed > once the job finishes (i.e. just like ENABLE_ADDGRP_KILL=true) > > Advantages: > > 1. 100% reliable process tracking via kernel's control groups > 2. We can now use tools like systemd-cgls or systemd-cgtop to monitor > job's performance real-time > 3. Unlike classic systemd setup for exec daemon (when it runs in > foreground), restarting sgeexecd service does not kill all jobs (because > jobs run in a separate systemd scopes) > 4. Should be backwards compatible (i.e. you just switch this on and > see) with the traditional shepherd's functionality > 5. Easy to implement Control Group limits for jobs (i.e. MemoryLimit, > CPUshares, Memory reservation for job) > > Implementation details: > Only two important functions were created: > - start_command_via_systemd() which counterparts start_command() in > shepherd > - ptf_get_usage_from_systemd() which counterparts > ptf_get_usage_from_data_collector() > - minor fixes in other places > - systemd does not seem to support setting CPUAffinity via control group, > so the existing code for handling cpuset's is functional the same way it > was. > > TODO: > > 1. I wanted to introduce some new complex attributes (like > "cgroup_memmax" etc) that could be used to enforce cgroup limits, but it > does not seem to be a trivial task unfortunately. If anyone knows how to do > that it would be great as it would make my patches a lot more attractive. > 2. Testing: so far, everything looks good, interactive and > non-interactive jobs are working, env is passed, etc... Just need more > testing. > > Now can I send my patches somewhere so it can be possibly merged with the > SoGE main repo? > Thanks, > > Ondrej > > > > From: Ondrej Valousek > Sent: Friday, August 9, 2019 1:40 PM > To: 'us...@gridengine.org' <us...@gridengine.org<mailto: > us...@gridengine.org>> > Subject: SGE & systemd integration > > Hi all, > > I am thinking of making SGE (or sge_execd) more systemd friendly. > Right now, there is some (as per 8.1.9) support for cgroups as per: > USE_CGROUPS=y/n > My proposal is to make it: > USE_CGROUPS=y/n/systemd > when set to systemd, we would not to detect and any cgroups (and setting > cpuset controller) manually. > Instead, shepherd daemon would run the job via "systemd-run" binary. > > https://www.freedesktop.org/software/systemd/man/systemd-run.html > > > systemd-run can set various cgroup controllers via it's "--property" flag, > achieving the same we do now manually. We would probably also utilize the > "-scope" flag to make the job running synchronously. > > Initially, I was thinking about implementing the same via "starter_method" > flag, but systemd-run needs to be run as root, so it has to be hardcoded > into shepherd.c and sge_execd daemon needs to also be running under root > privileges, not sure if capabilities would help here. > > Does this initiative make any sense? > I can try to implement it myself, but I am not familiar with sge > internals. I can try... > > Ondrej > > _______________________________________________ > SGE-discuss mailing list > SGE-discuss@liv.ac.uk > https://arc.liv.ac.uk/mailman/listinfo/sge-discuss > _______________________________________________ SGE-discuss mailing list SGE-discuss@liv.ac.uk https://arc.liv.ac.uk/mailman/listinfo/sge-discuss