On Mon, Oct 10, 2022 at 02:14:25PM -0400, Tom Rini wrote: > On Mon, Oct 10, 2022 at 08:01:23PM +0200, Pali Rohár wrote: > > On Monday 10 October 2022 13:56:10 Tom Rini wrote: > > > On Mon, Oct 10, 2022 at 07:44:05PM +0200, Pali Rohár wrote: > > > > On Monday 10 October 2022 13:40:38 Tom Rini wrote: > > > > > On Mon, Oct 10, 2022 at 07:22:56PM +0200, Pali Rohár wrote: > > > > > > On Monday 10 October 2022 12:28:18 Tom Rini wrote: > > > > > > > On Sun, Oct 09, 2022 at 09:12:25PM +0200, Pali Rohár wrote: > > > > > > > > Hello! Watchdog code seems to be broken in u-boot master branch. > > > > > > > > On Nokia N900 I'm getting following message in qemu: > > > > > > > > > > > > > > > > cyclic function rx51_watchdog took too long: 10000us vs 1000us > > > > > > > > max, disabling > > > > > > > > > > > > > > > > Seems that watchdog core code is not prepared for "slower" > > > > > > > > watchdogs > > > > > > > > which communicate over slower i2c bus, like it is the case for > > > > > > > > N900. > > > > > > > > > > > > > > > > Disabling slower watchdog is a bad idea as it would result in > > > > > > > > reboot > > > > > > > > loop instead of slower - but working code. > > > > > > > > > > > > > > So, looking at this in more detail, we have > > > > > > > CONFIG_CYCLIC_MAX_CPU_TIME_US as a configuration option (which is > > > > > > > where > > > > > > > the too long comes from). And picking a random CI run: > > > > > > > https://source.denx.de/u-boot/u-boot/-/jobs/511177 > > > > > > > I do see we hit this in CI once, but not every time, QEMU runs > > > > > > > here. Is > > > > > > > that the max time is configurable enough to satisfy your concerns > > > > > > > here? > > > > > > > > > > > > It is needed to investigate, how to _properly_ fix this issue, not > > > > > > just > > > > > > workarounded it. Probably other boards may be affected. > > > > > > > > > > So it's the cyclic watchdog code, which we merged as early as possible > > > > > that's the reason here. And it was merged as early as we could to see > > > > > if > > > > > there's problems. Are there problems? We're seeing "system too slow, > > > > > disabling" on QEMU, sometimes, and the value of too slow is > > > > > configurable. I know you reported other problems with n900 HW, so we > > > > > can't see if it's failing there > > > > > > > > I was tested it with older asm code (as described in that other email, > > > > via git checkout commit -- file) on n900 HW and watchdog problem is > > > > there too. Phone reboots in about 20 seconds. But as I do not have > > > > serial console, I do not know if that "disabling" message is printed > > > > there too (but I guess it is). > > > > > > I think I'm a bit baffled at this point, honestly. The watchdog timeout > > > is 60 seconds. If you're confident in it being about 20 seconds, > > > consistently, changing WATCHDOG_TIMEOUT_MSECS to say 10000 (so, 10 > > > seconds) should let you see if U-Boot has configured the watchdog and > > > it's being tripped, or if it's still at the prior stage value. > > > > $ git grep CONFIG_WATCHDOG_TIMEOUT_MSECS configs/nokia_rx51_defconfig > > configs/nokia_rx51_defconfig:CONFIG_WATCHDOG_TIMEOUT_MSECS=31000 > > > > Also watchdog is started by NOLO (which loads and execute U-Boot) so > > there can be some smaller timeout. > > > > So I have feeling that on the real HW is same issue. cyclic code > > disabled watchdog kicking and then watchdog restarted phone. > > > > I do not remember exact time (if it is 20s or 25s; I have not measured > > it precisely), but it sounds plausible. > > OK, so what happens if you increase CONFIG_CYCLIC_MAX_CPU_TIME_US to > something very high (so we should still enable the watchdog and > configure the timeout) along with CONFIG_WATCHDOG_TIMEOUT_MSECS being > high too (so if we can't service it in time really it's so long as to be > noticeable) ? Or CONFIG_WATCHDOG_TIMEOUT_MSECS to something much lower > (so that if the device is resetting quicker we're crashing elsewhere) ?
OK, on my beagleboard xM with a small change: diff --git a/drivers/watchdog/omap_wdt.c b/drivers/watchdog/omap_wdt.c index ca2bc7cfb59e..f0e57b4f7286 100644 --- a/drivers/watchdog/omap_wdt.c +++ b/drivers/watchdog/omap_wdt.c @@ -39,7 +39,7 @@ #include <common.h> #include <log.h> #include <watchdog.h> -#include <asm/arch/hardware.h> +#include <asm/ti-common/omap_wdt.h> #include <asm/io.h> #include <asm/processor.h> #include <asm/arch/cpu.h> On my beagleboard xM I now see: U-Boot SPL 2022.10-00459-g73e741b8ee46-dirty (Oct 10 2022 - 15:18:38 -0400) Trying to boot from MMC1 U-Boot 2022.10-00459-g73e741b8ee46-dirty (Oct 10 2022 - 15:18:38 -0400) OMAP3630/3730-GP ES1.1, CPU-OPP2, L3-200MHz, Max CPU Clock 800 MHz Model: TI OMAP3 BeagleBoard OMAP3 Beagle board + LPDDR/NAND I2C: ready DRAM: 256 MiB Core: 45 devices, 19 uclasses, devicetree: separate WDT: Started wdt@48314000 without servicing (60s timeout) NAND: 0 MiB MMC: OMAP SD/MMC: 0 Loading Environment from NAND... *** Warning - readenv() failed, using default environment Beagle xM Rev A/B No EEPROM on expansion board OMAP die ID: 6e5e00211ff00000015739eb08031024 Net: No ethernet found. Hit any key to stop autoboot: 0 So, this is as close as I can get to testing on n900 HW, and it's fine here. -- Tom
signature.asc
Description: PGP signature