On Wed, Jun 19, 2024 at 10:21:51AM +0200, Rasmus Villemoes wrote: > On 18/06/2024 23.03, Tim Harvey wrote: > > On Tue, Jun 18, 2024 at 7:32 AM Tom Rini <tr...@konsulko.com> wrote: > >> > > > Stefan and Tom, > > > > I'm seeing CI issues here even with 5000us [1]: > > host bind 0 /tmp/sandbox/persistent-cyclic function wdt-gpio-level > > took too long: 5368us vs 5000us max > > Yes, 5ms is way too little when you're not the only thing running on the > cpu, which is why I went with 100ms. > > Random thoughts and questions: > > (1) Do we have any way to programmatically grab all the logs from azure > CI, so we can get some kind of objective statistics on the number after > "took too long:". Clicking through the web interface and randomly > searching is too painful. > > It would also be helpful to know what percentage of CI runs have failed > due to that, versus due to some genuine error.
I don't think we can easily grab logs via API. And at least for https://dev.azure.com/u-boot/u-boot/_build?definitionId=2&_a=summary a lot of the failures have rolled off already because we only get so many logs and I try and mark full release logs as keep forever. But anecdotally I can say 75%+ of the Azure runs fail at least once due to this, and most fail enough to fail the pipeline (we get 2 tries per job). > (2) I considered a patch that just added a > > default $something big if SANDBOX > > to config CYCLIC_MAX_CPU_TIME_US, but since the problem also hit qemu, I > dropped that. But, if my patch is too ugly (and I might tend to think > that myself...), perhaps at least this would be an added improvement > over the generic bump to 5000us. I was fine with your approach really, maybe a bit bigger of a comment and note under doc/ as well why we do it? > (3) I also thought that perhaps for sandbox, we should simply measure > the time using clock_gettime(CLOCK_PROCESS_CPUTIME_ID), instead of > wallclock time. But it's a little ugly to implement since the "now" > variable is both used to decide if its time to run the callback, and as > a starting point for measuring cpu time, and we probably still want the > "is it time" to be measured on wallclock and not however much cpu-time > the u-boot process has been given. Or maybe we don't, and > CLOCK_PROCESS_CPUTIME_ID would simply be a better backend for > os_get_nsec(). Sure, time in the sandbox would progress slower than on > the host, but does that actually matter? > > (4) Btw., what kind of clock tick do we even get when run under qemu? I > don't have much experience with qemu, but from quick googling it seems > that -icount would be interesting. Also see > https://github.com/zephyrproject-rtos/zephyr/issues/14173 . From quick > reading it seems there were some issues back in 2019, but that today it > mostly works for them, except some SMP issues (that are certainly not > relevant to U-Boot). That could be interesting too, yeah. I do worry it might open its own set of problems to figure out tho. -- Tom
signature.asc
Description: PGP signature