On Wed, Feb 5, 2025 at 1:06 PM chao an <magicd...@gmail.com> wrote:
> Hi, > > The behavior of spin_lock needs everyone's advice > > After PR14578 <https://github.com/apache/nuttx/pull/14578> was merged into > the NuttX, the behavior of spin_lock() and spin_lock_irqsave() added the > feature of disable the preemption: > > https://github.com/apache/nuttx/pull/14578 > > The spin lock behavior changed from: > > > *spin_lock: spin lockspin_lock_irqsave: spin lock + irqsave* > > to: > > > *spin_lock: spin lock + sched_lockspin_lock_irqsave: spin lock + irqsave + > sched_lock* > > This change has two key issues > 1. *Crash*: Since spin_lock depends on sched_lock, the code using this type > of API needs to re-evaluate the scope of impact, especially when tcb is not > initialized at the startup stage, sched_lock will cause crash > > https://github.com/apache/nuttx/pull/15728 > > It isn't an initialization problem, the real cause is some code abusing spin lock(lock/unlock in the different thread). After holding sched_lock in spinlock_irqsave, the api requires that the lock/unlock come from the same thread. 2. *Performance degradation*: spin_lock/spin_lock_irqsave is widely used in > the kernel, and more than 90% of the code protected by spin_lock will not > cause context switching, so after the introduction of sched_lock, the > kernel part has introduced new performance overhead > > https://github.com/apache/nuttx/pull/15684 > > Due to the change in API semantics, we need to further optimize all the > locations in the kernel that use spin_lock()/spin_lock_irqsave() to solve > performance issue in evolution. > All performance critical code is evaluating carefully to skip the sched_lock/sched_unlock, like this: https://github.com/apache/nuttx/pull/15695 And the optimization of sched_lock is under heavy development. > In PR15705 <https://github.com/apache/nuttx/pull/15705>, Xiaoxiang had > some > arguments about API naming with me. > > https://github.com/apache/nuttx/pull/15705 > > My point of view is to restore the original behavior of spin_lock() and > spin_lock_irqsave(), which brings the following benefits: > 1. The API semantics remain unchanged, and independent developers and > projects that use these APIs outside the nuttx repository can keep their > code without any adjustment > 2. The API naming is consistent with the internal implementation, and the > caller can know what is happening inside the function from the API naming > > *Option 1:* > > > > *spin_lock: spin lockspin_lock_nopreempt: > spin_lock + sched_lockspin_lock_irqsave: spin lock + irq > savespin_lock_irqsave_nopreempt: spin_lock + irq save + sched_lock* > > @xiaoxiang suggested changing the semantics of the API to hold sched_lock > by default, consistent with Linux: > 1. This means that all locations in the kernel that call spin_lock and > spin_lock_irqsave need to be changed > Since holding sched locks make spinlock more safe, the change doesn't require the change from the caller in general. > spin_lock_irqsave -> spin_lock_irqsave_preempt > 2. It is impossible to know what happens inside the function from the API > naming > > *Option 2:* > > > > *spin_lock: spin lock + sched_lockspin_lock_preempt: > spin_lockspin_lock_irqsave: spin lock + irq save + > sched_lockspin_lock_irqsave_preempt: spin_lock + irq save* > > I don't know which definition is better, or if any wise person has a better > choice, please give some advice. > Here is the summary why I prefer to add sched_lock in spin_lock/spin_lock_irqsave by default: - The default(short) api should be safe since not all people understand the schedule behaviour deeply, please see the related discussion: https://github.com/apache/nuttx/issues/9531 https://github.com/apache/nuttx/issues/1138 https://www.kernel.org/doc/Documentation/preempt-locking.txt - spin_lock without sched_lock is unusable in the normal thread context, since the scheduler may suspend the spinlock owner thread at any time - API semantic align with Linux to avoid Linux developer use the unsafe api without any notification - the caller of enter_critical section can call sem_post/mq_send without problem, spin_lock/spin_lock_irqsave can only achieve the same behaviour by holding sched lock