* Oleg Nesterov <o...@redhat.com> [2012-09-03 17:26:09]: > Afaics the usage of update_debugctlmsr() and TIF_BLOCKSTEP in > step.c was always very wrong. > > 1. update_debugctlmsr() was simply unneeded. The child sleeps > TASK_TRACED, __switch_to_xtra(next_p => child) should notice > TIF_BLOCKSTEP and set/clear DEBUGCTLMSR_BTF after resume if > needed. > > 2. It is wrong. The state of DEBUGCTLMSR_BTF bit in CPU register > should always match the state of current's TIF_BLOCKSTEP bit. > > 3. Even get_debugctlmsr() + update_debugctlmsr() itself does not > look right. Irq can change other bits in MSR_IA32_DEBUGCTLMSR > register or the caller can be preempted in between. > > 4. It is not safe to play with TIF_BLOCKSTEP if task != current. > DEBUGCTLMSR_BTF and TIF_BLOCKSTEP should always match each > other if the task is running. The tracee is stopped but it > can be SIGKILL'ed right before set/clear_tsk_thread_flag(). > > However, now that uprobes uses user_enable_single_step(current) > we can't simply remove update_debugctlmsr(). So this patch adds > the additional "task == current" check and disables irqs to avoid > the race with interrupts/preemption. > > Unfortunately this patch doesn't solve the last problem, we need > another fix. Probably we should teach ptrace_stop() to set/clear > single/block stepping after resume. > > And afaics there is yet another problem: perf can play with > MSR_IA32_DEBUGCTLMSR from nmi, this obviously means that even > __switch_to_xtra() has problems. > > Signed-off-by: Oleg Nesterov <o...@redhat.com> > --- > arch/x86/kernel/step.c | 14 +++++++++++++- > 1 files changed, 13 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c > index 7a51498..f89cdc6 100644 > --- a/arch/x86/kernel/step.c > +++ b/arch/x86/kernel/step.c > @@ -161,6 +161,16 @@ static void set_task_blockstep(struct task_struct *task, > bool on) > { > unsigned long debugctl; > > + /* > + * Ensure irq/preemption can't change debugctl in between. > + * Note also that both TIF_BLOCKSTEP and debugctl should > + * be changed atomically wrt preemption. > + * FIXME: this means that set/clear TIF_BLOCKSTEP is simply > + * wrong if task != current, SIGKILL can wakeup the stopped > + * tracee and set/clear can play with the running task, this > + * can confuse the next __switch_to_xtra(). > + */ > + local_irq_disable(); > debugctl = get_debugctlmsr(); > if (on) { > debugctl |= DEBUGCTLMSR_BTF; > @@ -169,7 +179,9 @@ static void set_task_blockstep(struct task_struct *task, > bool on) > debugctl &= ~DEBUGCTLMSR_BTF; > clear_tsk_thread_flag(task, TIF_BLOCKSTEP); > } > - update_debugctlmsr(debugctl); > + if (task == current) > + update_debugctlmsr(debugctl); > + local_irq_enable(); > } > > /* >
The changes look simple and neat. But I would prefer somebody with better x86 knowledgde comment on this. -- Thanks and Regards Srikar -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/