Re: [OpenWrt-Devel] sysupgrade crashing

Aleksandar Radovanovic Fri, 17 Jun 2011 14:55:00 -0700

On 06/17/2011 11:49 AM, D.S. Ljungmark wrote:
> On Thu, 2011-06-16 at 19:49 +0200, Ithamar R. Adema wrote:
>> Hi,
>>
>> On Thu, 2011-06-16 at 12:21 +0200, Jo-Philipp Wich wrote:
>>> Maybe calling "reboot" at all is not such a great idea as it
>>> attempts to run the (former) init scripts to stop them.
>>
>> Maybe using we should use 'reboot -f' here, as that will skip all
>> userland 'shutdown' handling and simply call the kernel side reboot
>> handler?
>>
>> This *should* flush all disk cache and such, so still be safe from
>> that point, but keep us out of the usual 'init' handling....
>
>
> That does sound like a good idea, especially as I suspect part of the
> reason that the systems crash is because the mounted&active
> filesystem just disappeared, and that the pages of the files needed
> were not in active RAM. It basically looks as if things are
> attempting to execute pure garbage (Not too surprising, really)
>
> //D.S.
>
> _______________________________________________ openwrt-devel
> mailing list openwrt-devel@lists.openwrt.org
> https://lists.openwrt.org/mailman/listinfo/openwrt-devel
>
>


Hi,

If you don't mind the long mail, I can offer my two cents on the subject.

I use OpenWRT in a number of products on various hardware and with varying 
filesystems (jffs2, yaffs, ...) and have seen quite a few problems with 
sysupgrade as it is now (including kernel crashes, segfaults and similar, plus 
the fact that it generally only handles jffs2). The idea of writing to a 
mounted filesystem just scares me.

So, I have decided to write my own flash upgrade framework, borrowing bits of 
code from sysupgrade, but doing it in a completely different and safer 
fashion. Downside is that the whole process is a bit more complicated and 
requires three stages:

Stage 1:

Tell init to do a shutdown, but, instead of rebooting at the end, replace 
itself with a stage2 shell script.

This can be done with the following bit of code:

# copy initttab and append our stage2 at the end
cp /etc/inittab /tmp/inittab                                                    
                                                                                
              
echo -e "\n::restart:/bin/sh /lib/upgrade/stage2.sh" >> /tmp/inittab
                                                                                
                            
# remount it over existing inittab, so we don't disturb the original
mount --bind /tmp/inittab /etc/inittab                                          
                                                                                
              
                                                                                
                                                                                
              
# make init re-read inittab
kill -HUP 1                                                                     
                                                                                
              
sleep 1                                                                         
                                                                                
              

# and make it run all the shutdown hooks and the replace
# itself with our restart hook (/lib/upgrade/stage2.sh)
kill -QUIT 1

After this point, there are no user processes left, init is gone and the only 
process left is a PID 1 shell running our stage2 script (note that this shell 
is still holding references to rootfs and rootfs is still mounted and active)

(Side note: you wouldn't need all this copy/append/mount/re-read crap, if that 
::restart line was a permanent part of the default inittab. A more flexible 
solution would be for that default restart line to read the script to execute 
from some file in tmpfs, so you echo your command(s) into that file and just 
call 'kill -QUIT 1' to execute various nice things in PID 1 context)

Stage 2:

Copy all needed binaries and files to ram and make an (optional) backup of the 
config data (or whatever else you need) and then pivot old root with tmpfs, in 
much the same way sysupgrade does.
Then, exec /bin/sh /lib/upgrade/stage3.sh, replacing the PID 1 shell running 
from old root with a new one running exclusively from ram root.

At this point, we only have a PID 1 shell running our stage3 script from ram. 
No references exist to rootfs anymore, so stage3 can safely unmount it.

Stage 3:

Unmount rootfs and flash everything over. Since rootfs is no longer active, 
you can fully unmount it, write the paritition(s), and restore your backup 
data by simply mounting rootfs again and copying the files back (no need to 
use mtd jffs2 append, will work with any filesystem, e.g. yaffs - you may want 
to do an "mtd refresh", however, in case your partition layout changed).

Finally,  unmount everything and call reboot -f, since there are no processes 
running but the ram shell.

That's it.

(For simplicity sake, I left out some gory bits, like handling overlays, 
remounting /proc, /sys, passing command line options between stage1 and stage2 
- you have to use tmp files as they run in different contexts - but all this 
can be easily solved)

OK, so the upsides are:
- safer, doesn't write over mounted filesystems, no page cache issues and such
- all user process are cleanly shutdown before flashing (by /etc/init.d/... 
stop), just like during a normal reboot
- simpler config backup/restore
- works with other filesystems besides jffs2, especially with yaffs on NAND 
flashes (for exampIe, I use this for upgrading firmware on Mikrotik rb411 
boards with yaffs filesystem on the NAND)

Downsides:
- a bit more complicated, especially to debug
- after the initial stage, all debug output goes to console only, since all 
user processes are killed - if you're running over an ssh connection, init 
will stop your ssh daemon, killing your connection - you wont be able to 
follow any progress.

The fact is, sysupgrade is a fundamental part of OpenWRT, so I never dared to 
try and push this upstream in any way, for fear of breaking all sorts of 
things. 

If you guys think there's merit in this approach, I'll gladly share the code. 
I've been using it for quite some time on a number of boards and it seems 
stable. 

The code is quite ugly at the moment, has some bits that are specific to my 
needs, isn't very modular (target-wise) and so needs a lot of cleanup. I'm 
currently up to my eyeballs with my regular job, so can't really spare the 
time to do it myself, but if anyone is willing to, shout.

Cheers,
Aleksandar

_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel

Re: [OpenWrt-Devel] sysupgrade crashing

Reply via email to