Re: [OpenWrt-Devel] sysupgrade crashing

D.S. Ljungmark Wed, 22 Jun 2011 08:56:05 -0700

On Fri, 2011-06-17 at 22:54 +0100, Aleksandar Radovanovic wrote:
> On 06/17/2011 11:49 AM, D.S. Ljungmark wrote:
> > On Thu, 2011-06-16 at 19:49 +0200, Ithamar R. Adema wrote:
> >> Hi,
> >>
> >> On Thu, 2011-06-16 at 12:21 +0200, Jo-Philipp Wich wrote:
> >>> Maybe calling "reboot" at all is not such a great idea as it
> >>> attempts to run the (former) init scripts to stop them.
> >>
> >> Maybe using we should use 'reboot -f' here, as that will skip all
> >> userland 'shutdown' handling and simply call the kernel side reboot
> >> handler?
> >>
> >> This *should* flush all disk cache and such, so still be safe from
> >> that point, but keep us out of the usual 'init' handling....
> >
> >
> > That does sound like a good idea, especially as I suspect part of the
> > reason that the systems crash is because the mounted&active
> > filesystem just disappeared, and that the pages of the files needed
> > were not in active RAM. It basically looks as if things are
> > attempting to execute pure garbage (Not too surprising, really)
> >
> > //D.S.
> >
> > _______________________________________________ openwrt-devel
> > mailing list openwrt-devel@lists.openwrt.org
> > https://lists.openwrt.org/mailman/listinfo/openwrt-devel
> >
> >
> 
> Hi,
> 
> If you don't mind the long mail, I can offer my two cents on the subject.
> 
> I use OpenWRT in a number of products on various hardware and with varying 
> filesystems (jffs2, yaffs, ...) and have seen quite a few problems with 
> sysupgrade as it is now (including kernel crashes, segfaults and similar, 
> plus 
> the fact that it generally only handles jffs2). The idea of writing to a 
> mounted filesystem just scares me.
> 
> So, I have decided to write my own flash upgrade framework, borrowing bits of 
> code from sysupgrade, but doing it in a completely different and safer 
> fashion. Downside is that the whole process is a bit more complicated and 
> requires three stages:
> 
> Stage 1:
> 
> Tell init to do a shutdown, but, instead of rebooting at the end, replace 
> itself with a stage2 shell script.
> 
> This can be done with the following bit of code:
> 
> # copy initttab and append our stage2 at the end
> cp /etc/inittab /tmp/inittab                                                  
>                                                                               
>                   
> echo -e "\n::restart:/bin/sh /lib/upgrade/stage2.sh" >> /tmp/inittab
>                                                                               
>                               
> # remount it over existing inittab, so we don't disturb the original
> mount --bind /tmp/inittab /etc/inittab                                        
>                                                                               
>                   
>                                                                               
>                                                                               
>                   
> # make init re-read inittab
> kill -HUP 1                                                                   
>                                                                               
>                   
> sleep 1                                                                       
>                                                                               
>                   
> 
> # and make it run all the shutdown hooks and the replace
> # itself with our restart hook (/lib/upgrade/stage2.sh)
> kill -QUIT 1
> 
> After this point, there are no user processes left, init is gone and the only 
> process left is a PID 1 shell running our stage2 script (note that this shell 
> is still holding references to rootfs and rootfs is still mounted and active)
> 
> (Side note: you wouldn't need all this copy/append/mount/re-read crap, if 
> that 
> ::restart line was a permanent part of the default inittab. A more flexible 
> solution would be for that default restart line to read the script to execute 
> from some file in tmpfs, so you echo your command(s) into that file and just 
> call 'kill -QUIT 1' to execute various nice things in PID 1 context)
> 
> Stage 2:
> 
> Copy all needed binaries and files to ram and make an (optional) backup of 
> the 
> config data (or whatever else you need) and then pivot old root with tmpfs, 
> in 
> much the same way sysupgrade does.
> Then, exec /bin/sh /lib/upgrade/stage3.sh, replacing the PID 1 shell running 
> from old root with a new one running exclusively from ram root.
> 
> At this point, we only have a PID 1 shell running our stage3 script from ram. 
> No references exist to rootfs anymore, so stage3 can safely unmount it.
> 
> Stage 3:
> 
> Unmount rootfs and flash everything over. Since rootfs is no longer active, 
> you can fully unmount it, write the paritition(s), and restore your backup 
> data by simply mounting rootfs again and copying the files back (no need to 
> use mtd jffs2 append, will work with any filesystem, e.g. yaffs - you may 
> want 
> to do an "mtd refresh", however, in case your partition layout changed).
> 
> Finally,  unmount everything and call reboot -f, since there are no processes 
> running but the ram shell.
> 
> That's it.
> 
> (For simplicity sake, I left out some gory bits, like handling overlays, 
> remounting /proc, /sys, passing command line options between stage1 and 
> stage2 
> - you have to use tmp files as they run in different contexts - but all this 
> can be easily solved)
> 
> OK, so the upsides are:
> - safer, doesn't write over mounted filesystems, no page cache issues and such
> - all user process are cleanly shutdown before flashing (by /etc/init.d/... 
> stop), just like during a normal reboot
> - simpler config backup/restore
> - works with other filesystems besides jffs2, especially with yaffs on NAND 
> flashes (for exampIe, I use this for upgrading firmware on Mikrotik rb411 
> boards with yaffs filesystem on the NAND)
> 
> Downsides:
> - a bit more complicated, especially to debug
> - after the initial stage, all debug output goes to console only, since all 
> user processes are killed - if you're running over an ssh connection, init 
> will stop your ssh daemon, killing your connection - you wont be able to 
> follow any progress.
> 
> The fact is, sysupgrade is a fundamental part of OpenWRT, so I never dared to 
> try and push this upstream in any way, for fear of breaking all sorts of 
> things. 
> 
> If you guys think there's merit in this approach, I'll gladly share the code. 
> I've been using it for quite some time on a number of boards and it seems 
> stable. 
> 
> The code is quite ugly at the moment, has some bits that are specific to my 
> needs, isn't very modular (target-wise) and so needs a lot of cleanup. I'm 
> currently up to my eyeballs with my regular job, so can't really spare the 
> time to do it myself, but if anyone is willing to, shout.
>



This sounds like something that could solve my issues, at least for the
future.  Right now, I'm working with the base theory that if I do not
sync nor run the initscripts, things will work out.

First trials suggests that it does indeed work, but since it hasn't been
a sure-fire way so far, I'll let it go a bit further before claiming
that it's a guaranteed fix.

Anyhow, thankyou  for the description, it was detailed enough that I
know how to deal with this in the future, should the need arise.


Regards,
  D.S.

_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel

Re: [OpenWrt-Devel] sysupgrade crashing

Reply via email to