Yes, you right - by default systemd-run runs the job asynchronously, keeping parent PID as 1... ... unless you use the "--scope" parameter, which keeps it running in a foreground and synchronous. Return codes and (surprisingly enough) even the parent PID remains the same.
I think the "--scope" can be solution, right? I will try the us...@gridengine.org list to see if anyone tried yet. Thanks, Ondrej -----Original Message----- From: Hay, William <w....@ucl.ac.uk> Sent: Friday, August 9, 2019 12:33 PM To: Ondrej Valousek <ondrej.valou...@adestotech.com> Cc: sge-disc...@liverpool.ac.uk Subject: Re: [SGE-discuss] SGE & systemd integration On Wed, Aug 07, 2019 at 03:59:52PM +0000, Ondrej Valousek wrote: > Hi all, > > I am thinking of making SGE (or sge_execd) more systemd friendly. > Right now, there is some (as per 8.1.9) support for cgroups as per: > USE_CGROUPS=y/n > My proposal is to make it: > USE_CGROUPS=y/n/systemd > when set to systemd, we would not to detect and any cgroups (and setting > cpuset controller) manually. > Instead, shepherd daemon would run the job via "systemd-run" binary. > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww. > freedesktop.org%2Fsoftware%2Fsystemd%2Fman%2Fsystemd-run.html&data > =02%7C01%7C%7Cbe3f33cd68a94dd013da08d71b504d5e%7C1faf88fea9984c5b93c92 > 10a11d9a5c2%7C0%7C0%7C637007904017490093&sdata=qWWJ9EtvH%2FJWrfJ7X > NEd2NhdNMS7FXisjxEMTBdjY%2BY%3D&reserved=0 > > > systemd-run can set various cgroup controllers via it's "--property" flag, > achieving the same we do now manually. > > Initially, I was thinking about implementing the same via "starter_method" > flag, but systemd-run needs to be run as root, so it has to be hardcoded into > shepherd.c and sge_execd daemon needs to also be running under root > privileges, not sure if capabilities would help here. > > Does this initiative make any sense? > I can try to implement it myself, but I am not familiar with sge internals. I > can try... > AFAICT When you use systemd-run the job would be a child of init not a child of the shepherd. According to the sge_shepherd man page: "sge_shepherd provides the parent process functionality for a single Grid Engine job." Having the shepherd not do the one thing it is designed to do will probably break a lot of functionality which you would then need to reimplement: Off the top of my head: Determine if the job exited. Determine if it exited with a special exit status (eg 99). Measure resource usage. Send various signals to the job. None of these are particularly hard but I suspect you might well end up rewriting almost the entire shepherd functionality. It would probably be easier to write a replacement shepherd from scratch than to modify the existing one. However I think the easiest solution would be to write a wrapper that launches the shepherd itself via systemd-run, passes on any signals, and waits for the shepherd to terminate relying on systemd to cleanup any remnant processes. The wrapper can be installed via the shepherd_cmd setting in sge_conf. William _______________________________________________ SGE-discuss mailing list SGE-discuss@liv.ac.uk https://arc.liv.ac.uk/mailman/listinfo/sge-discuss