Hi, > > That loop-kill-all thing should be a kind of last resort really, what's > actually needed is some kind of "init 1" procd equivalent which shuts > down all > services in a more or less clean manner. > > > Oddly enough, the /lib/upgrade/stage2 script has some aspect of this. It > explicitly shuts down (kill -9) telnet, dropbear, and ash before looping with > sigTERM, and then again with sigKILL. > > I find it very odd that it's explicitly singling out telnet, dropbear, and > ash. My OpenWRT build doesn't have any of these installed in the first place. > E.g. I have OpenSSH, and it's jumping straight to kill -9 instead of sending > sigTERM first like it should.
These are (in the case of telnet, were) the default services offering shell access in standard images the sysupgrade script was tailored for. The intention is to kill all user shell sessions to prevent interference with the subsequent upgrade process. An openssh case simply hasn't been added since it is uncommon, especially on lower end devices. The subsequent TERM / KILL loops are a poor mans attempt to cleanly shut down services. It obviously won't work for things having expensive teardown procedures (databases, squid proxy, etc.) - those really should be handled manually by the user before invoking sysupgrade. I mean obviously one can extend the grace period, but I guess there will always be unhandled cases. > I imagine this is the reason why I've had my SSH sessions hang > indefinitely when sysupgrading a board with dropbear. Hm, maybe. I usually see a "commencing upgrade" message and afterwards my SSH connection is cleanly terminated. > I'm just not sure offhand how much possible error conditions there are > besides > the actual image writing itself, which you cannot recover from if it dies > midway. > > I would expect that if the image writing fails, at least one more attempt > should be made before giving up. Rendering the device soft-bricked is very > much not desirable... Uhm, yeah sure, we could try writing the image again I guess. But eventually you have to give up if the storage device simply cannot be written cleanly. > [...] > Perhaps a way to address this in a reliable way: > > [...] These points make sense, yes. > 4) Now /lib/upgrade/stage2 doesn't need to worry about terminating processes, > and can focus entirely on handling the ramdisk chroot logic. Stuff like umounting external disks, fsync / swapoff etc. come to mind as well which should be doable at this point. ~ Jo
signature.asc
Description: OpenPGP digital signature
_______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel