> The 4-"byte"-address mode is used on 32 MiB flash chips. > We had similar issues with other 32 MiB devices in the past > which were fixed at some point by Felix Fietkau.
My device is 32MiB. I'll check with Felix if he can give me any clues. @Everyone else reading this, do you know how one can increase "the reset duration during booting" for the flash chip? (Not even sure I fully understand what this means) On Sun, May 23, 2021 at 10:28 AM Vincent Wiemann <vincent.wiem...@ironai.com> wrote: > > On 5/23/21 10:21 AM, Ibrahim Tachijian wrote: > >> Is your firmware (sysupgrade) bigger than 16MB? > > > > No, the sysupgrade file is currently 13MB. > > > >> So maybe it has to do with switching to 4-address-mode... > > What is this exactly? > > The 4-"byte"-address mode is used on 32 MiB flash chips. > We had similar issues with other 32 MiB devices in the past > which were fixed at some point by Felix Fietkau. > > >> My guess is that the error already happens when reading the flash. > > At least we know that the flash is not being written to incorrectly > > since after a reboot the flash is intact and does not produce any > > errors. It's simply random if the system boots into this "faulty > > state" or not (happens approx 1-2% of the time). > > > > Does anyone maybe know how I can re-read the squashfs partition and > > verify the integrity while the system is booted to see if I encounter > > the squashfs errors. > > I'm really at a loss here - no idea where to even look into diagnosing > > the issue. > > > > I guess the reset line of the flash chip is not hold long enough so > that it is in an unclean state. I think the reset duration during > booting needs to be increased. But I don't know the code and can't point > you there. It's just a guess... > > > > > > > > > On Fri, May 21, 2021 at 6:16 PM Vincent Wiemann > > <vincent.wiem...@ironai.com> wrote: > >> > >> > >> > >> On 5/21/21 3:58 PM, Koen Vandeputte wrote: > >>> > >>> On 21.05.21 13:19, Ibrahim Tachijian wrote: > >>>> Hello, > >>>> > >>>> We use approximately 10k IPQ40XX devices and we have noticed that > >>>> every time we run "sysupgrade -n" we lose approximately 1% of the > >>>> routers in the process. > >>>> After further investigation I'm almost confident that it is not the > >>>> sysupgrade process that is the culprit - so what I did was that I put > >>>> one test router into a reboot loop. > >>>> > >>>> This is what I do; > >>>> > >>>> Boot the router in a fresh state after a newly installed image. > >>>> The image contains a reboot loop that consists of a shell script that > >>>> runs every minute. > >>>> > >>>> The shell script tries to run a php-script which simply echoes "Hello > >>>> World". If the php-script exists normally then we reboot the router. > >>>> > >>>> However the php-script exists abnormally then the router stops and > >>>> does nothing other than informing me that there was a bus-error making > >>>> php not able to process the hello world script. > >>>> > >>>> When this process runs the router reboots approximately 50 times > >>>> before it boots into a state which is faulty where I see bus-errors > >>>> when I try to run php scripts for example. > >>>> > >>>> > >>>> Looking into dmesg you can see some errors such as, > >>>> > >>>> [10985.209438] SQUASHFS error: squashfs_read_data failed to read block > >>>> 0x3a803e > >>>> [11045.218685] SQUASHFS error: xz decompression failed, data probably > >>>> corrupt > >>>> [11045.218731] SQUASHFS error: squashfs_read_data failed to read block > >>>> 0x3a803e > >>>> [11105.228157] SQUASHFS error: xz decompression failed, data probably > >>>> corrupt > >>>> [11105.228203] SQUASHFS error: squashfs_read_data failed to read block > >>>> 0x3a803e > >>>> > >>>> or > >>>> > >>>> [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size > >>>> 10234 > >>>> [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a] > >>>> [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size > >>>> 10234 > >>>> [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a] > >>>> [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size > >>>> 10234 > >>>> [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a] > >>>> [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size > >>>> 10234 > >>>> > >>>> or > >>>> > >>>> [62745.801178] SQUASHFS error: squashfs_read_data failed to read block > >>>> 0x732ae2 > >>>> [62773.347234] SQUASHFS error: xz decompression failed, data probably > >>>> corrupt > >>>> [62773.347281] SQUASHFS error: squashfs_read_data failed to read block > >>>> 0x732ae2 > >>>> [62790.132661] SQUASHFS error: xz decompression failed, data probably > >>>> corrupt > >>>> [62790.132706] SQUASHFS error: squashfs_read_data failed to read block > >>>> 0x732ae2 > >>>> [62790.216746] SQUASHFS error: xz decompression failed, data probably > >>>> corrupt > >>>> [62790.216792] SQUASHFS error: squashfs_read_data failed to read block > >>>> 0x732ae2 > >>>> [62800.810525] SQUASHFS error: xz decompression failed, data probably > >>>> corrupt > >>>> [62800.810570] SQUASHFS error: squashfs_read_data failed to read block > >>>> 0x732ae2 > >>>> [62828.336267] SQUASHFS error: xz decompression failed, data probably > >>>> corrupt > >>>> > >>>> > >>>> > >>>> Now, you would assume that the squashfs-partition is broken - but if > >>>> this was the case then a reboot should not help. It does. > >>>> Rebooting the router after it boots in this faulty state fixes the issue. > >>>> > >>>> So approximately 1-2% of my reboots make the router go into this > >>>> faulty state. > >>>> > >>>> I am clueless on how to further investigate this issue. For now my > >>>> work around is restarting the router via a bash script should it > >>>> notice there are bus-errors or i/o errors. > >>>> > >>>> Thanks > >>>> > >>> In the next kernel bump, following patch is also present: > >>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.10.38&id=2ed1d90162a0c0683ecbe0c4802187fa22d641c3 > >>> > >>> > >>> I think it's worth a shot to retry the tests once it's bumped. > >>> > >>> Koen > >>> > >> > >> My guess is that the error already happens when reading the flash. > >> Is your firmware (sysupgrade) bigger than 16MB? > >> So maybe it has to do with switching to 4-address-mode... > >> > >> Best, > >> > >> Vincent > >> > >> _______________________________________________ > >> openwrt-devel mailing list > >> openwrt-devel@lists.openwrt.org > >> https://lists.openwrt.org/mailman/listinfo/openwrt-devel > > > > > > > > -- > > Ibrahim Tachijian > > > -- Ibrahim Tachijian _______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel