> That loop-kill-all thing should be a kind of last resort really, what's > actually needed is some kind of "init 1" procd equivalent which shuts down > all > services in a more or less clean manner. > > Oddly enough, the /lib/upgrade/stage2 script has some aspect of this. It explicitly shuts down (kill -9) telnet, dropbear, and ash before looping with sigTERM, and then again with sigKILL.
I find it very odd that it's explicitly singling out telnet, dropbear, and ash. My OpenWRT build doesn't have any of these installed in the first place. E.g. I have OpenSSH, and it's jumping straight to kill -9 instead of sending sigTERM first like it should. I imagine this is the reason why I've had my SSH sessions hang indefinitely when sysupgrading a board with dropbear. I'm just not sure offhand how much possible error conditions there are > besides > the actual image writing itself, which you cannot recover from if it dies > midway. > I would expect that if the image writing fails, at least one more attempt should be made before giving up. Rendering the device soft-bricked is very much not desirable... No it is not. When the logic was implemented there wasn't any cgroup support > in OpenWrt. Sysupgrade was introduced in 2007 when we still supported Linux > 2.4 on some targets. Using the freezer cgroup probably makes sense > nowadays, > it will however further bloat the kernel which might hurt various lower end > targets, flash space wise. > > Ok, noted. I suppose I should point out that I'm not personally interested in the lower end devices, but I understand where you're coming from there. Perhaps a way to address this in a reliable way: 1) If cgroups support is detected at runtime (or conditional compilation to save even more space in the binary), procd, acting as it's role of PID 1 places all services that it creates into their own cgroup. I don't know how this interacts with procd jails, but perhaps some code from that can be adapted and reused. 1.a) I would even add that there should be a top-level cgroup that should contain all service-cgroups as nested cgroups, so that *everything* can be terminated in one fell swoop. 2) on sysupgrade, just prior to execvp /sbin/upgraded, procd gracefully shuts down all services that are running. 2.a) If cgroups are available, then after shutting down all services, use the cgroup freezer to terminate any services cgroups that still have active processes. 2.b) Use the global cgroup to nuke everything from orbit. 3) /sbin/upgraded handles terminating any remaining processes. This isn't something that should be practically handled in a shell script. Moving the logic for this into /sbin/upgraded means that the only safety check is that it not try to terminate pid1. 4) Now /lib/upgrade/stage2 doesn't need to worry about terminating processes, and can focus entirely on handling the ramdisk chroot logic.
_______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel