> On Jul 17, 2018, at 4:04 PM, Andy Lutomirski <[email protected]> wrote:
> 
> 
> I think you've introduced a minor-ish performance regression due to
> changing the old (admittedly terribly documented) control flow a bit.
> Before, if real_prev == next, we would skip:
> 
> load_mm_cr4(next);
> switch_ldt(real_prev, next);
> 
> Now we don't any more.  I think you should reinstate that
> optimization.  It's probably as simple as wrapping them in an if
> (real_priv != next) with a comment like /* Remote changes that would
> require a cr4 or ldt reload will unconditionally send an IPI even to
> lazy CPUs.  So, if we aren't changing our mm, we don't need to refresh
> cr4 or the ldt */

Looks like switch_ldt already skips reloading the LDT when prev equals
next, or when they simply have the same LDT values:

        if (unlikely((unsigned long)prev->context.ldt |
                     (unsigned long)next->context.ldt))
                load_mm_ldt(next);

It appears that the cr4 bits have a similar optimization:

static inline void cr4_set_bits(unsigned long mask)
{
        unsigned long cr4, flags;

        local_irq_save(flags);
        cr4 = this_cpu_read(cpu_tlbstate.cr4);
        if ((cr4 | mask) != cr4)
                __cr4_set(cr4 | mask);
        local_irq_restore(flags);
}

> 
> Hmm.  load_mm_cr4() should bypass itself when mm == &init_mm.  Want to
> fix that part or should I?
> 
Looks like there might not be anything to do here, after all.

On to the lazy TLB mm_struct refcounting stuff :)

Reply via email to