Yes, you right - by default systemd-run runs the job asynchronously, keeping 
parent PID as 1...
... unless you use the "--scope" parameter, which keeps it running in a 
foreground and synchronous.
Return codes and (surprisingly enough) even the parent PID remains the same.

I think the "--scope" can be solution, right?

I will try the us...@gridengine.org list to see if anyone tried yet.
Thanks,

Ondrej



-----Original Message-----
From: Hay, William <w....@ucl.ac.uk> 
Sent: Friday, August 9, 2019 12:33 PM
To: Ondrej Valousek <ondrej.valou...@adestotech.com>
Cc: sge-disc...@liverpool.ac.uk
Subject: Re: [SGE-discuss] SGE & systemd integration

On Wed, Aug 07, 2019 at 03:59:52PM +0000, Ondrej Valousek wrote:
> Hi all,
> 
> I am thinking of making SGE (or sge_execd) more systemd friendly.
> Right now, there is some (as per 8.1.9) support for cgroups as per:
> USE_CGROUPS=y/n
> My proposal is to make it:
> USE_CGROUPS=y/n/systemd
> when set to systemd, we would not to detect and any cgroups (and setting 
> cpuset controller) manually.
> Instead, shepherd daemon would run the job via "systemd-run" binary.
> 
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> freedesktop.org%2Fsoftware%2Fsystemd%2Fman%2Fsystemd-run.html&amp;data
> =02%7C01%7C%7Cbe3f33cd68a94dd013da08d71b504d5e%7C1faf88fea9984c5b93c92
> 10a11d9a5c2%7C0%7C0%7C637007904017490093&amp;sdata=qWWJ9EtvH%2FJWrfJ7X
> NEd2NhdNMS7FXisjxEMTBdjY%2BY%3D&amp;reserved=0
> 
> 
> systemd-run can set various cgroup controllers via it's "--property" flag, 
> achieving the same we do now manually.
> 
> Initially, I was thinking about implementing the same via "starter_method" 
> flag, but systemd-run needs to be run as root, so it has to be hardcoded into 
> shepherd.c and sge_execd daemon needs to also be running under root 
> privileges, not sure if capabilities would help here.
> 
> Does this initiative make any sense?
> I can try to implement it myself, but I am not familiar with sge internals. I 
> can try...
>
AFAICT When you use systemd-run the job would be a child of init not a child of 
the shepherd.  According to the sge_shepherd man page:

"sge_shepherd provides the parent process functionality for a single Grid 
Engine job."

Having the shepherd not do the one thing it is designed to do will probably 
break a lot of functionality which you would then need to
reimplement:

Off the top of my head:
Determine if the job exited.
Determine if it exited with a special exit status (eg 99).
Measure resource usage.
Send various signals to the job.


None of these are particularly hard but I suspect you might well end up 
rewriting almost the entire shepherd functionality.  It would probably be 
easier to write a replacement shepherd from scratch than to modify the existing 
one.

However I think the easiest solution would be to write a wrapper that launches 
the shepherd itself via systemd-run, passes on any signals, and waits for the 
shepherd to terminate relying on systemd to cleanup any remnant processes.  The 
wrapper can be installed via the shepherd_cmd setting in sge_conf.  

William


_______________________________________________
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Reply via email to