On 2025-11-10 04:27, Andriy Gapon wrote:
I played a little bit with OCI containers and podman.
I had a hiccup with one specific container created for Docker / Linux.
Its difference from other containers is that it uses multiple daemons
and a supervisor process to take care of them. That particular
supervisor is another variation of "advanced init", it's called s6.
Apparently, it is relatively popular for container use (not sure about
host systems). Probably other alternatives can be / are used for that
purpose as well.
I think that this is what a supervisor in a container needs:
1. its PID is 1;
2. orphaned processes get re-parented to it.
I think that (1) is not a hard requirement, but it's an easy way to
check if the process would be able to work as init.
Also, some other processes might expect to find init at PID 1, but I am
not sure about that.
(2) is important for doing the supervising (at least, when
procctl(PROC_REAP*) is not used) .
I think that on Linux they have separate PID namespace per container,
so the first process to run naturally gets PID 1.
I think that per-container PID namespace may be an overkill.
Maybe there is a way to make PID 1 special without going that way.
E.g., a jail could record the first process it runs.
We can patch up getpid() to return 1 for that process.
Also, we could patch up the process lookup to return the first process
in the jail for PID 1.
Re-parenting to the "jail init" sounds harder but should be possible as
well (e.g., using PROC_REAP).
Not sure what to do if the "jail init" dies... should all processes in
the jail get killed and the jail should die as well (unless
persistent)?
This proposal sounds like a kludge but it could be a shortcut to
support more Linux containers and to allow similar FreeBSD jails /
containers with alternative init-s / supervisors.
Far from being a kludge, I think it's a feature we need, and one at the
top of my list. Forcing it to look like PID 1 from jailed perspective
is definitely doable (and something I'd done outside of the project a
decade ago). In addition to those two requirements, I would add one
that answers your last question:
3. signals to init and reboot(2) work as they would on the host side.
A jailed reboot would kill all processes and restart rc, and possibly do
other kernel-side cleanups yet to be clearly defined. A jailed halt
would remove the jail. A jailed single-user mode could exist where
instead of init spawning a shell, it just sits around while the system
has a chance to jexec into it.
init handles various signals by rebooting/halting/etc, and it should be
able to do that as it does now, by calling reboot(2), directing the
kernel to do what it needs to with the jail. If init goes away, it's
probably like a halt and removes the jail.
This is definitely something that will be happening.
- Jamie