I have been reading about ongoing improvements to SMP in OpenBSD. My understanding is that context switching from userspace to the kernel can be hazardous if shared resources are not protected by locking. OpenBSD currently has a "giant lock" for safe concurrent access to kernel data structures. It will eventually be replaced by finer grained locking in order for the kernel to execute on multiple CPUs simultaneously.
Has any thought been given to an alternative design where each CPU has its own thread scheduler, like DragonFly BSD? https://www.dragonflybsd.org/presentations/dragonflybsd.asiabsdcon04.pdf