This is a rework of the "Coroutines" RFC v2 series [1] which allowed to run functions in parallel, and more specifically the ones that rely on udelay() to poll hardware and wait for some event to happen. With that we have shown that some intializations could be sped up (namely, efi_init_obj_list()).
In this new version I dropped coroutines for threads and used the USB subsystem as a (hopefully) better example. The goal and the basic concepts are the same but threads are likely more familiar to programmers. The API is more self-contained: the main thread just needs to create the threads, call a schedule function, then cleanup. Threads may yield the processor to another thread by calling the same schedule function. When doing so they do not switch to the main thread; they switch to the new thread directly. Another major change is the simplification of the stack management. Now each thread has its own stack, there is no stack sharing anymore. The code is inspired from the barebox threads [2]. The custom assembly code that was present in the coroutines series is mostly replaced by setjmp()/longjmp(). As a result, supporting multiple architectures is much easier, although there is still a need for a non-standard extension to setjmp()/longjmp() called initjmp(). The new function is added in several patches, one for each architecture that supports HAVE_SETJMP. A new symbol is defined: HAVE_INITJMP. Two tests, one for initjmp() and one for uthread scheduling, are added to the lib suite. NOTE: the SANDBOX version of initjmp() appears to have problems and needs to be worked on. After introducing uthreads and making udelay() a thread re-scheduling point, the USB stack initialization is modified to benefit from concurrency when UTHREAD is enabled, where uthreads are used in usb_init() to initialize and scan multiple busses at the same time. The code was tested on arm64 and arm QEMU with 4 simulated XHCI buses and some devices. On this platform the USB scan takes 2.2 s instead of 5.6 s. Tested on i.MX93 EVK with two USB hubs, one ethernet adapter and one webcam on each, "usb start" takes 2.4 s instead of 4.6 s. With UTHREAD=y on qemu_arm64_defconfig the code size increases by less than 1KB (936 bytes exactly). CI: - (UTHREAD not set): https://source.denx.de/u-boot/custodians/u-boot-net/-/pipelines/24625 - (UTHREAD enabled for QEMU arm/arm64/riscv32/riscv64): https://source.denx.de/u-boot/custodians/u-boot-net/-/pipelines/24626 [1] https://lists.denx.de/pipermail/u-boot/2025-January/578779.html [2] https://github.com/barebox/barebox/blob/master/common/bthread.c Jerome Forissier (10): arch: introduce symbol HAVE_INITJMP arm: add initjmp() riscv: add initjmp() sandbox: add initjmp() test: lib: add initjmp() test uthread: add cooperative multi-tasking interface lib: time: hook uthread_schedule() into udelay() dm: usb: move bus initialization into new static function usb_init_bus() dm: usb: initialize and scan multiple buses simultaneously with uthread test: lib: add uthread test arch/Kconfig | 8 ++ arch/arm/include/asm/setjmp.h | 1 + arch/arm/lib/setjmp.S | 11 ++ arch/arm/lib/setjmp_aarch64.S | 9 ++ arch/riscv/include/asm/setjmp.h | 1 + arch/riscv/lib/setjmp.S | 10 ++ arch/sandbox/cpu/Makefile | 11 +- arch/sandbox/cpu/initjmp.c | 172 ++++++++++++++++++++++++++++++ arch/sandbox/include/asm/setjmp.h | 5 + drivers/usb/host/usb-uclass.c | 167 ++++++++++++++++++++--------- include/uthread.h | 31 ++++++ lib/Kconfig | 19 ++++ lib/Makefile | 2 + lib/time.c | 17 ++- lib/uthread.c | 108 +++++++++++++++++++ test/boot/bootdev.c | 14 +-- test/boot/bootflow.c | 3 +- test/lib/Makefile | 2 + test/lib/initjmp.c | 72 +++++++++++++ test/lib/uthread.c | 58 ++++++++++ 20 files changed, 660 insertions(+), 61 deletions(-) create mode 100644 arch/sandbox/cpu/initjmp.c create mode 100644 include/uthread.h create mode 100644 lib/uthread.c create mode 100644 test/lib/initjmp.c create mode 100644 test/lib/uthread.c -- 2.43.0