Hello, We use approximately 10k IPQ40XX devices and we have noticed that every time we run "sysupgrade -n" we lose approximately 1% of the routers in the process. After further investigation I'm almost confident that it is not the sysupgrade process that is the culprit - so what I did was that I put one test router into a reboot loop.
This is what I do; Boot the router in a fresh state after a newly installed image. The image contains a reboot loop that consists of a shell script that runs every minute. The shell script tries to run a php-script which simply echoes "Hello World". If the php-script exists normally then we reboot the router. However the php-script exists abnormally then the router stops and does nothing other than informing me that there was a bus-error making php not able to process the hello world script. When this process runs the router reboots approximately 50 times before it boots into a state which is faulty where I see bus-errors when I try to run php scripts for example. Looking into dmesg you can see some errors such as, [10985.209438] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e [11045.218685] SQUASHFS error: xz decompression failed, data probably corrupt [11045.218731] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e [11105.228157] SQUASHFS error: xz decompression failed, data probably corrupt [11105.228203] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e or [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a] [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a] [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size 10234 [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a] [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size 10234 or [62745.801178] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62773.347234] SQUASHFS error: xz decompression failed, data probably corrupt [62773.347281] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62790.132661] SQUASHFS error: xz decompression failed, data probably corrupt [62790.132706] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62790.216746] SQUASHFS error: xz decompression failed, data probably corrupt [62790.216792] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62800.810525] SQUASHFS error: xz decompression failed, data probably corrupt [62800.810570] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2 [62828.336267] SQUASHFS error: xz decompression failed, data probably corrupt Now, you would assume that the squashfs-partition is broken - but if this was the case then a reboot should not help. It does. Rebooting the router after it boots in this faulty state fixes the issue. So approximately 1-2% of my reboots make the router go into this faulty state. I am clueless on how to further investigate this issue. For now my work around is restarting the router via a bash script should it notice there are bus-errors or i/o errors. Thanks -- Ibrahim Tachijian _______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel