On 04/02/2015 02:31 PM, Ingo Molnar wrote: > > * Denys Vlasenko <dvlas...@redhat.com> wrote: > >> On 04/02/2015 01:14 PM, Brian Gerst wrote: >>>>>> So I merged this as it's an obvious bugfix, but in hindsight I'm >>>>>> really uneasy about the whole opportunistic SYSRET concept: it appears >>>>>> that the chance that %rcx matches return-%rip is astronomical - this >>>>>> is why this bug wasn't noticed live so far. >>>>>> >>>>>> So should we really be doing this? >>>>> >>>>> Andy does this not for the off-chance that userspace's RCX is equal >>>>> to return address and R11 == RFLAGS. The chances of that are >>>>> astronomically small. >>>>> >>>>> This code path triggers when ptrace/audit/seccomp is active. Instead >>>>> of torturing ourselves trying to not divert into IRET return, now >>>>> code is steered that way. But then immediately before actual IRET, >>>>> we check again: "do we really need IRET?" IOW "did ptrace really >>>>> touch pt_regs->ss? ->flags? ->rip? ->rcx?" which in vast majority of >>>>> cases will not be true. >>>> >>>> I keep forgetting about that, my test systems have the audit muck >>>> turned off ;-) >>>> >>>> Fair enough - and it's sensible to share the IRET path between >>>> interrupts and complex-return system calls, even though the check >>>> is unnecessary overhead for the pure interrupt return path... >>> >>> >>> Maybe we could reintroduce TIF_IRET for this purpose instead of >>> (ab)using TIF_NOTIFY_RESUME. Then we would only do the opportunistic >>> check for those cases (ptrace, audit, exec, sigreturn, etc.), and skip >>> it for interrupts. >> >> The very first check in the existing code, pt_regs->cx == >> pt_regs->ip, will fail for interrupt returns. >> >> You hardly can save anything by placing a (ti->flags & >> TIF_TRY_SYSRET) check in front of it, it's almost as expensive. > > Well, what I was thinking of was to have a pure irq (well, async > context) return path, not shared with the weird-syscall-IRET return > path at all ... > > It would be open coded, not obfuscated via macros. > > That way AFAICS the upsides are: > > - it's easier to read (and maintain) what goes on in which case. > '*intr*' labels would truly identify interrupt return related > processing, for a change!
Re labels: I fully agree they need cleanup (mass rename). Something along the lines of int_ret_from_sys_call -> return_from_syscall int_with_check -> sysret_check_workmask_in_edi int_careful -> sysret_check_NEED_RESCHED int_very_careful -> sysret_check_SYSCALL_EXIT int_signal -> sysret_check_DO_NOTIFY_MASK int_restore_rest -> sysret_next_check ret_from_intr -> return_from_intr retint_with_reschedule -> intr_check_WORK_MASK retint_check -> intr_check_workmask_in_edi retint_careful -> intr_check_NEED_RESCHED retint_signal -> intr_check_DO_NOTIFY_MASK retint_swapgs -> return_from_syscall_or_intr irq_return_via_sysret -> return_via_sysret retint_kernel -> intr_check_preempt restore_args -> restore_c_regs irq_return -> return_via_iret and then your proposal can be rephrased as "let's stop merging sysret and intr code paths at retint_swapgs". Makes sense. It would entail some code duplication, but the code will be easier to maintain. > - we can optimize in a more directed fashion - like here > > ... while the downsides are: > > - more code > - a (small) chance of a fix going to one path while not the other. > > How much extra code would it be? A screenful or two. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/