On Sat, Mar 07 2026 at 10:01, Thomas Gleixner wrote:
> I gave up staring at it yesterday as my brain started to melt. Let me
> try again.

[Un]Surprisingly a rested and awake brain works way better.

The good news is that I actually found a nasty brown paperbag bug in
mm_cid_schedout() while going through all of this with a fine comb:

     cid = cid_from_transit_cid(...);

     That preserves the MM_CID_ONCPU bit, which makes mm_drop_cid()
     clear bit 0x40000000 + CID. That is obviously way outside of the
     bitmap. So the actual CID bit is not cleared and the clear just
     corrupts some other piece of memory.

     I just retried with all the K*SAN muck enabled which should catch
     that out of bounds access, but it never triggered and I haven't
     seen syzbot reports to that effect either.

     Fix for that is below.

The bad news is that I couldn't come up with a scenario yet where this
bug leads to the outcome observed by Jiri and Matthieu, because the not
dropped CID bit in the bitmap is by chance cleaned up on the next
schedule in on that CPU due to the ONCPU bit still being set.

I'll look at it more tomorrow in the hope that this rested brain
approach works out again.

Thanks,

        tglx
---
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3809,7 +3809,8 @@ static __always_inline bool cid_on_task(
 
 static __always_inline void mm_drop_cid(struct mm_struct *mm, unsigned int cid)
 {
-       clear_bit(cid, mm_cidmask(mm));
+       if (!WARN_ON_ONCE(cid >= num_possible_cpus()))
+               clear_bit(cid, mm_cidmask(mm));
 }
 
 static __always_inline void mm_unset_cid_on_task(struct task_struct *t)
@@ -3978,7 +3979,13 @@ static __always_inline void mm_cid_sched
                return;
 
        mode = READ_ONCE(mm->mm_cid.mode);
+
+       /*
+        * Needs to clear both TRANSIT and ONCPU to make the range comparison
+        * and mm_drop_cid() work correctly.
+        */
        cid = cid_from_transit_cid(prev->mm_cid.cid);
+       cid = cpu_cid_to_cid(cid);
 
        /*
         * If transition mode is done, transfer ownership when the CID is
@@ -3994,6 +4001,11 @@ static __always_inline void mm_cid_sched
        } else {
                mm_drop_cid(mm, cid);
                prev->mm_cid.cid = MM_CID_UNSET;
+               /*
+                * Invalidate the per CPU CID so that the next mm_cid_schedin()
+                * can't observe MM_CID_ONCPU on the per CPU CID.
+                */
+               mm_cid_update_pcpu_cid(mm, 0);
        }
 }
 

Reply via email to