On 5/28/26 17:09, Nam Cao wrote:
Gabriele Monaco <[email protected]> writes:
From: Wen Yang <[email protected]>

da_monitor_start() set monitoring=1 before calling da_monitor_init_hook(),
may racing with the sched_switch handler:

   da_monitor_start()               sched_switch handler
   -------------------------        ---------------------------------
   da_mon->monitoring = 1;
                                    if (da_monitoring(da_mon))  /* true  */
                                        ha_start_timer_ns(...);
                                        /* hrtimer->base == NULL, crash */
   da_monitor_init_hook(da_mon);
   /* hrtimer_setup() sets base */

Fix the ordering and pair with release/acquire semantics:

   da_monitor_init_hook(da_mon);
   smp_store_release(&da_mon->monitoring, 1);    /* da_monitor_start()  */
   return smp_load_acquire(&da_mon->monitoring); /* da_monitoring()     */

On ARM64 a plain STR + LDR does not form a release-acquire pair, so
the load can observe monitoring=1 while hrtimer->base is still NULL.
The plain accesses are also data races under KCSAN.

Use WRITE_ONCE for the monitoring=0 store in da_monitor_reset() to
cover the reset path.

Fixes: 792575348ff7 ("rv/include: Add deterministic automata monitor definition via 
C macros")
Signed-off-by: Wen Yang <[email protected]>
Reviewed-by: Gabriele Monaco <[email protected]>
Signed-off-by: Gabriele Monaco <[email protected]>

Looks correct to me.
Reviewed-by: Nam Cao <[email protected]>

Wen, I am curious, how did you find this issue?


It was caught during development of the tlob monitor by a KUnit test (since moved to selftests) that monitored a kthread running concurrently on another CPU. On a multi-core box the window between setting monitoring=1 and da_monitor_init_hook() was wide enough to trigger the crash, and KCSAN flagged the plain monitoring accesses as data races as well.

--
Best wishes,
Wen



Reply via email to