[PR] sched/wdog: Fix-up for SMP and Improvements. [nuttx]

via GitHub Mon, 22 Dec 2025 04:13:44 -0800


Fix-Point opened a new pull request, #17640:
URL: https://github.com/apache/nuttx/pull/17640


   ## Summary
   
   This PR is part III of the https://github.com/apache/nuttx/pull/17556, it 
primarily introduces fix-up and improvements to the WDOG module, including the 
following key changes:
   - Removed the `wd_recover` interface to simplify the WDOG module, reduce 
coupling, and decrease code size.
   - Added the `list_delete_fast` interface to improve WDOG deletion 
performance. Theoretically, this saves at least 1 write operation on the path 
where a WDOG needs to be deleted, and 2 write operations when a WDOG needs to 
be reinserted.
   - Reverted the WDOG spin-lock to a big kernel lock 
(`enter_critical_section`) to fix concurrency-related bugs.
   - Simplified `wd_timer` to reduce performance overhead and Worst-Case 
Execution Time (WCET) for timeout handling.
   - Modified code organization to comply with the MISRA-C coding standard.
   
   Special attention is required regarding why we reverted the spin-lock to a 
big kernel lock.
   
   In the earliest implementation, the WDOG had only two states:
   ```mermaid
   %%{
     init: {
       'theme': 'base',
       'themeVariables': {
         'primaryColor': '#FFFFFF',
         'primaryTextColor' : '#000000',
         'mermiad-container': "#FFFFFF",
         'primaryBorderColor': '#000000',
         'lineColor': '#000000',
         'secondaryColor': '#FFFFFF',
         'tertiaryColor': '#000000'
       },
       'sequence': { 'mirrorActors': false }
     }
   }%%
   
   stateDiagram-v2
       WDOG_INACTIVE --> WDOG_INACTIVE : wd_cancel
       WDOG_INACTIVE --> WDOG_ACTIVE : wd_start
       WDOG_ACTIVE --> WDOG_ACTIVE : wd_start
       WDOG_ACTIVE --> WDOG_INACTIVE : wd_cancel or callback finished
   ```
   
   Figure 1. Correct WDOG state diagram.
   
   This implementation was functionally correct in an SMP environment. However, 
after switching to a spin-lock and relaxing the critical section, the WDOG 
entered three states:
   
   ```mermaid
   %%{
     init: {
       'theme': 'base',
       'themeVariables': {
         'primaryColor': '#FFFFFF',
         'primaryTextColor' : '#000000',
         'mermiad-container': "#FFFFFF",
         'primaryBorderColor': '#000000',
         'lineColor': '#000000',
         'secondaryColor': '#FFFFFF',
         'tertiaryColor': '#000000'
       },
       'sequence': { 'mirrorActors': false }
     }
   }%%
   
   stateDiagram-v2
       WDOG_INACTIVE|private --> WDOG_ACTIVE|shared : wd_start
       WDOG_ACTIVE|shared --> WDOG_INACTIVE|shared :  execute callback
       WDOG_INACTIVE|shared --> WDOG_ACTIVE|shared : wd_start in callback to 
restart
       WDOG_INACTIVE|shared --> WDOG_INACTIVE|private : callback finished
       WDOG_ACTIVE|shared --> WDOG_INACTIVE|private : wd_cancel
       WDOG_INACTIVE|shared --> WDOG_INACTIVE|shared : wd_cancel failed
   ```
   
   Figure 2. Incorrect WDOG state diagram in current implementation.
   
   After modifying the critical section, the execution of the WDOG callback 
function is no longer atomic. This inevitably introduces a new state: 
`WDOG_INACTIVE | shared`. In this state, even though `wdog->func == NULL`, its 
ownership does not belong to the user thread, as shown in Figure 2. If the user 
repurposes the WDOG memory space at this moment, a race condition occurs.
   
   **Race Condition Example:**
      1. At time $$t_0$$: The interrupt handler thread executes the WDOG 
callback and is about to restart the WDOG within the callback.
      2. At time $$t_1$$: The user thread calls `wd_cancel`, reads `wdog->func 
== NULL`, mistakenly assumes the WDOG is in the `WDOG_INACTIVE | private` 
state, and returns directly. The user thread then calls `memset` to set the 
WDOG memory space to `0x55aaaa55`.
      3. At time $$t_2$$: The interrupt handler thread calls `wd_start`, reads 
a non-NULL `wdog->func` (`0x55aaaa55`), and attempts to remove the WDOG.
      4. At time $$t_3$$: During the removal process, the interrupt handler 
thread reads the pointer `0x55aaaa55`, causing a memory access fault.
   
   To address this issue, a straightforward idea is to call `wd_cancel` before 
using the WDOG to ensure it enters the `WDOG_INACTIVE | private` state. 
However, even this simple approach is not easily achievable. Closer inspection 
of the state diagram reveals:
   - Only by calling `wd_cancel` in the `WDOG_ACTIVE | shared` state can the 
WDOG return to the user-available `WDOG_INACTIVE | private` state.
   - In the `WDOG_INACTIVE | shared` state, `wd_cancel` fails. If the user 
thread mistakenly believes the WDOG is in the `WDOG_INACTIVE | private` state 
and requeues the WDOG, a race condition still occurs, leading to runtime errors.
   
   In this scenario, to ensure WDOG concurrency correctness, it becomes 
necessary to loop `wd_cancel` until it succeeds. However, this looping approach 
introduces the following problems:
   - **Starvation Risk**: Suppose the WDOG continuously cycles through 
`WDOG_INACTIVE | shared` $$\rightarrow^{wd\_start}$$ `WDOG_ACTIVE | shared` 
$$\rightarrow^{\text{execute callback}}$$ `WDOG_INACTIVE | shared` as a 
periodic timer. If `wd_cancel` is called precisely when the WDOG is in the 
`WDOG_INACTIVE | shared` state (executing the callback), `wd_cancel` might fail 
indefinitely. In other words, `wd_cancel` can starve, violating the system's 
real-time and deterministic properties.
   - **Requires Waiting for Callback Completion**: Even if the WDOG is not a 
periodic timer (i.e., no `WDOG_INACTIVE | shared` $$\rightarrow^{wd\_start}$$ 
`WDOG_ACTIVE | shared` transition), using the WDOG still requires waiting for 
the callback to finish. However, the current implementation lacks such a 
waiting mechanism.
   - **Additionally, in an SMP build, problematic thread interleaving can 
occur**, such as:
      The previous `nxsem_timeout` callback function caused the second 
semaphore wait to fail.
   
           Core 0 [nxsem_clockwait]       |  Core 1
           enter_critical_section()       |  ...
           wd_start(nxsem_timeout)        |  ...
           nxsem_wait(sem)                |  wd_expiration() --> nxsem_timeout
           waked up by others             |
           wd_cancel(&rtcb->waitdog)      |  try enter_critical_section()
           leave_critical_section()       |  Failed retry...(Blocking by Core n)
           ... call nxsem_clockwait again |  Failed retry...
           enter_critical_section()       |  Failed retry...
           wd_start(nxsem_timeout)        |  Failed retry...
           nxsem_wait(sem)                |  Core 1 enter the critical section
                                          |  nxsem_wait_irq(wtcb, ETIMEDOUT) -> 
incorrectly wake-up the rtcb.
   
   **Explanation of other related points:**
   
   **Why not use a recursive spin-lock (`rspin_lock`) but instead revert to a 
big kernel lock?**
      Because it could lead to circular waiting and deadlock:
      - Attempting to acquire the big kernel lock while holding the WDOG lock: 
Calling system functions within the WDOG timeout callback will attempt to 
acquire the big kernel lock.
      - Attempting to acquire the WDOG lock while holding the big kernel lock: 
This occurs when `tcb->waitdog` is restarted within a critical section.
   
   **Why is the most time-consuming operation, `nxsched_reassess_timer()`, also 
placed within the critical section?**
      - If we do not place it within the critical section, after inserting a 
timer that needs to trigger immediately, the thread might be preempted (e.g., 
by a network interrupt). This could delay the timer setup, potentially causing 
uncertain timer delays and causing tasks to miss their deadlines.
   
   ## Impact
   
   This change will affect the WDOG module's code size, performance, critical 
section entry/exit time, and the WCET (Worst-Case Execution Time) of timeout 
handling.
   - The removal of `wd_recover` should result in an overall reduction in code 
size.
   - Adding `list_delete_fast` and replacing the original WDOG deletion 
interface reduces code size and slightly improves performance.
   - While simplifying `wd_timer` optimizes the performance and WCET of timer 
timeout handling, reverting the spin-lock to a big-kernel-lock significantly 
increases the execution time within the critical section. However, this 
sacrifice is necessary for functional correctness. Without functional 
correctness, high performance is meaningless. Once the HRTimer is integrated, 
we will gradually replace WDOG calls with hrtimer and apply optimizations to 
minimize the size of WDOG's critical sections as much as possible.
   
   ## Testing
   
   We ran `ostest` on `rv-virt:smp` to verify functional correctness, with 
`CONFIG_SCHED_TICKLESS=y` and `CONFIG_RR_INTERVAL=0` enabled.
   We also launched 4 instances and executed `ostest` 64 times in total. All 
test results passed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] sched/wdog: Fix-up for SMP and Improvements. [nuttx]

Reply via email to