anchao commented on PR #15705: URL: https://github.com/apache/nuttx/pull/15705#issuecomment-2623338648
> Yes, but you need first change the exuction of ALL IRQ from the interupt context to the thread context. Before that happen, the sched lock must be taken at the same time we hold spinlock. > Let's emphasis the key point: > > 1. The busy loop lock(e.g. spinlock) must hold sched lock too > 2. The sleep lock(e.g. mutex) don't/shouldn't hold sched lock > > So, you can't do the change in seperate step without breaking the code: > > 1. Remove sched lock first > 2. Change spinlock to mutex at the late time > > Let's consider a VERY COMMON case in SMP: > > 1. thread0 on CPU0 hold a spinlock without hold sched lock > 2. thread1 on CPU1 wake up thread2 > 3. OS decide suspend thread0 and resume thread2 on CPU0 > 4. thread1 on CPU1 want to hold the spinlock hold by thread0 > > but thread1 is ver hard to know when it can finish the busy wait loop since thread0 is already in the suspend state. so, it's VERY VERY dangerous to schedule a thread which hold a spinlock. Let’s first unify our understanding: 1. There was no deadlock problem before in nuttx's API semantics, and all codes works very well, regardless of SMP or AMP Right? 2. The deadlock problem was caused by xiaomi's changes(from you @xiaoxiang781216 and @hujun260 ) 3. You first replaced **enter_critical_section()** with **spin_lock_irqsave()**, 4. After finding the deadlock, you guys replace **spin_lock_irqsave()** with **spin_lock_irqsave()**+**sched_lock()** 5. In order to simplify the code, you inside **sched_lock()** into **spin_lock_irqsave()**. Now you guys find that such a change will break all ESP devices and the performance of the entire system will be degraded 6. Look at what you are doing now, Prepare to replace all kernel codes from **spin_lock_irqsave()** to **raw_spin_lock_irqsave()** In order to solve the problem of startup crash, replace all **spin_lock_irqsave()** with **raw_spin_lock_irqsave()** I can't imagine how many board-level and chip codes will need similar adjustments in the future. How can other developers continue to trust NuttX when the kernel API is so unstable? Most kernel code does not need to hold **sched_lock()**, which is unfair to some code that does not cause scheduling. New API semantics should be expressed with new APIs instead of modifying the behavior of the original API. > > 3. You can look at the spin_lock implementation in other RTOS, no one will disable preemption(**zephyr**) > > https://github.com/zephyrproject-rtos/zephyr/blob/c66a0bb2a9269dc483079d5cc24ba77b8c332a9a/include/zephyr/spinlock.h#L182-L214 > > zephyr's k_spin_lock disable irq too, but nuttx's spin_lock doesn't disable irq. let remove **spin_lock_irqsave()** and move the **irqsave** to **spin_lock()** ok? > > Now we have 4 APIs available to developers > > 1.spin_lock 2. spin_lock_nopreempt 3. spin_lock_irqsave 4. spin_lock_irqsave_nopreempt > > If you think the code needs to hold sched_lock, please use *_spin_lock__nopreempt__ instead of **spin_lock()** > > No, the correct semantics(same as Linux) is: > > 1. spin_lock > 2. spin_lock_preempt > 3. spin_lock_irqsave > 4. spin_lock_irqsave_preempt > > Otherwise, the most code will sufer the random deadlock or long time busy loop like the example I mention previously. Only the special designed code can use spin_lock_preempt/spin_lock_irqsave_preempt as expect. The bottom line is that the most common used API should be SMP SAFE. No, keep the current API semantics unchanged, and use new APIs to represent new features -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@nuttx.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org