On Fri, 2011-06-17 at 22:54 +0100, Aleksandar Radovanovic wrote: > On 06/17/2011 11:49 AM, D.S. Ljungmark wrote: > > On Thu, 2011-06-16 at 19:49 +0200, Ithamar R. Adema wrote: > >> Hi, > >> > >> On Thu, 2011-06-16 at 12:21 +0200, Jo-Philipp Wich wrote: > >>> Maybe calling "reboot" at all is not such a great idea as it > >>> attempts to run the (former) init scripts to stop them. > >> > >> Maybe using we should use 'reboot -f' here, as that will skip all > >> userland 'shutdown' handling and simply call the kernel side reboot > >> handler? > >> > >> This *should* flush all disk cache and such, so still be safe from > >> that point, but keep us out of the usual 'init' handling.... > > > > > > That does sound like a good idea, especially as I suspect part of the > > reason that the systems crash is because the mounted&active > > filesystem just disappeared, and that the pages of the files needed > > were not in active RAM. It basically looks as if things are > > attempting to execute pure garbage (Not too surprising, really) > > > > //D.S. > > > > _______________________________________________ openwrt-devel > > mailing list openwrt-devel@lists.openwrt.org > > https://lists.openwrt.org/mailman/listinfo/openwrt-devel > > > > > > Hi, > > If you don't mind the long mail, I can offer my two cents on the subject. > > I use OpenWRT in a number of products on various hardware and with varying > filesystems (jffs2, yaffs, ...) and have seen quite a few problems with > sysupgrade as it is now (including kernel crashes, segfaults and similar, > plus > the fact that it generally only handles jffs2). The idea of writing to a > mounted filesystem just scares me. > > So, I have decided to write my own flash upgrade framework, borrowing bits of > code from sysupgrade, but doing it in a completely different and safer > fashion. Downside is that the whole process is a bit more complicated and > requires three stages: > > Stage 1: > > Tell init to do a shutdown, but, instead of rebooting at the end, replace > itself with a stage2 shell script. > > This can be done with the following bit of code: > > # copy initttab and append our stage2 at the end > cp /etc/inittab /tmp/inittab > > > echo -e "\n::restart:/bin/sh /lib/upgrade/stage2.sh" >> /tmp/inittab > > > # remount it over existing inittab, so we don't disturb the original > mount --bind /tmp/inittab /etc/inittab > > > > > > # make init re-read inittab > kill -HUP 1 > > > sleep 1 > > > > # and make it run all the shutdown hooks and the replace > # itself with our restart hook (/lib/upgrade/stage2.sh) > kill -QUIT 1 > > After this point, there are no user processes left, init is gone and the only > process left is a PID 1 shell running our stage2 script (note that this shell > is still holding references to rootfs and rootfs is still mounted and active) > > (Side note: you wouldn't need all this copy/append/mount/re-read crap, if > that > ::restart line was a permanent part of the default inittab. A more flexible > solution would be for that default restart line to read the script to execute > from some file in tmpfs, so you echo your command(s) into that file and just > call 'kill -QUIT 1' to execute various nice things in PID 1 context) > > Stage 2: > > Copy all needed binaries and files to ram and make an (optional) backup of > the > config data (or whatever else you need) and then pivot old root with tmpfs, > in > much the same way sysupgrade does. > Then, exec /bin/sh /lib/upgrade/stage3.sh, replacing the PID 1 shell running > from old root with a new one running exclusively from ram root. > > At this point, we only have a PID 1 shell running our stage3 script from ram. > No references exist to rootfs anymore, so stage3 can safely unmount it. > > Stage 3: > > Unmount rootfs and flash everything over. Since rootfs is no longer active, > you can fully unmount it, write the paritition(s), and restore your backup > data by simply mounting rootfs again and copying the files back (no need to > use mtd jffs2 append, will work with any filesystem, e.g. yaffs - you may > want > to do an "mtd refresh", however, in case your partition layout changed). > > Finally, unmount everything and call reboot -f, since there are no processes > running but the ram shell. > > That's it. > > (For simplicity sake, I left out some gory bits, like handling overlays, > remounting /proc, /sys, passing command line options between stage1 and > stage2 > - you have to use tmp files as they run in different contexts - but all this > can be easily solved) > > OK, so the upsides are: > - safer, doesn't write over mounted filesystems, no page cache issues and such > - all user process are cleanly shutdown before flashing (by /etc/init.d/... > stop), just like during a normal reboot > - simpler config backup/restore > - works with other filesystems besides jffs2, especially with yaffs on NAND > flashes (for exampIe, I use this for upgrading firmware on Mikrotik rb411 > boards with yaffs filesystem on the NAND) > > Downsides: > - a bit more complicated, especially to debug > - after the initial stage, all debug output goes to console only, since all > user processes are killed - if you're running over an ssh connection, init > will stop your ssh daemon, killing your connection - you wont be able to > follow any progress. > > The fact is, sysupgrade is a fundamental part of OpenWRT, so I never dared to > try and push this upstream in any way, for fear of breaking all sorts of > things. > > If you guys think there's merit in this approach, I'll gladly share the code. > I've been using it for quite some time on a number of boards and it seems > stable. > > The code is quite ugly at the moment, has some bits that are specific to my > needs, isn't very modular (target-wise) and so needs a lot of cleanup. I'm > currently up to my eyeballs with my regular job, so can't really spare the > time to do it myself, but if anyone is willing to, shout. >
This sounds like something that could solve my issues, at least for the future. Right now, I'm working with the base theory that if I do not sync nor run the initscripts, things will work out. First trials suggests that it does indeed work, but since it hasn't been a sure-fire way so far, I'll let it go a bit further before claiming that it's a guaranteed fix. Anyhow, thankyou for the description, it was detailed enough that I know how to deal with this in the future, should the need arise. Regards, D.S. _______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel