I think this sequence ought to work (keep in mind we are already under a mutex, so the global data is safe even if we are preempted):
set up page table entries invlpg set up bp patching global data cpu = get_cpu() bp_old_value = atomic_read(bp_write_addr) do { atomic_write(&bp_poke_state, 1) atomic_write(bp_write_addr, 0xcc) mask <- online_cpu_mask - self send IPIs wait for mask = 0 } while (cmpxchg(&bp_poke_state, 1, 2) != 1); patch sites, remove breakpoints after patching each one atomic_write(&bp_poke_state, 3); mask <- online_cpu_mask - self send IPIs wait for mask = 0 atomic_write(&bp_poke_state, 0); tear down patching global data tear down page table entries The #BP handler would then look like: state = cmpxchg(&bp_poke_state, 1, 4); switch (state) { case 1: case 4: invlpg cmpxchg(bp_write_addr, 0xcc, bp_old_value) break; case 2: invlpg complete patch sequence remove breakpoint break; case 3: /* If we are here, the #BP will go away on its own */ break; case 0: /* No patching in progress!!! */ return 0; } clear bit in mask return 1; The IPI handler: clear bit in mask sync_core /* Needed if multiple IPI events are chained */