On Fri, Jan 03, 2020 at 01:08:20PM +0100, Jan Beulich wrote:
> On 24.12.2019 13:44, Roger Pau Monne wrote:
> > @@ -227,14 +233,47 @@ void flush_area_mask(const cpumask_t *mask, const 
> > void *va, unsigned int flags)
> >      if ( (flags & ~FLUSH_ORDER_MASK) &&
> >           !cpumask_subset(mask, cpumask_of(cpu)) )
> >      {
> > +        bool cpus_locked = false;
> > +
> >          spin_lock(&flush_lock);
> >          cpumask_and(&flush_cpumask, mask, &cpu_online_map);
> >          cpumask_clear_cpu(cpu, &flush_cpumask);
> >          flush_va      = va;
> >          flush_flags   = flags;
> > -        send_IPI_mask(&flush_cpumask, INVALIDATE_TLB_VECTOR);
> > +
> > +        /*
> > +         * Prevent any CPU hot{un}plug while sending the IPIs if we are to 
> > use
> > +         * a shorthand, also refuse to use a shorthand if not all CPUs are
> > +         * online or have been parked.
> > +         */
> > +        if ( system_state > SYS_STATE_smp_boot && !cpu_overflow &&
> > +             (cpus_locked = get_cpu_maps()) &&
> > +             (park_offline_cpus ||
> 
> Why is it relevant whether we park offline CPUs, or whether we've
> even brought up all of the ones a system has? An IPI, in particular
> a broadcast one, shouldn't have any issue getting delivered if some
> of the nominal recipients don't listen, should it? (The use of
> cpu_online_map that was already there above is a sign - but not a
> proof, as it may itself be buggy - that the set of online CPUs
> fluctuating behind this function's back ought to not be a problem.)

I've tried it myself, and if not all CPUs are onlined when the
shorthand is used the box would just reboot. This matches the
description at:

https://lwn.net/Articles/793065/

Of the Linux shorthand implementation.

> Further a question on lock nesting: Since the commit message
> doesn't say anything in this regard, did you check there are no
> TLB flush invocations with the get_cpu_maps() lock held?

The CPU maps lock is a recursive one, so it should be fine to attempt
a TLB flush with the lock already held.

> Even if
> you did and even if there are none, I think the function should
> then get a comment attached to the effect of this lock order
> inversion risk. (For example, it isn't obvious to me that no user
> of stop_machine() would ever want to do any kind of TLB flushing.)
> 
> Overall I wonder whether your goal couldn't be achieved without
> the extra locking and without the special conditions.

Hm, this so far has worked fine on all the boxes that I've tried.
I'm happy to change it to a simpler approach, but I think the
conditions and locking are required for this to work correctly.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to