On Tue, Jul 25, 2017 at 08:53:20PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 25, 2017 at 10:17:01AM -0700, Paul E. McKenney wrote:
> 
> > > munmap() TLB invalidate is limited to those CPUs that actually ran
> > > threads of their process, while this is machine wide.
> > 
> > Or those CPUs running threads of any process mapping the underlying file
> > or whatever.
> 
> That doesn't sound right. munmap() of a shared file only invalidates
> this process's map of it.
> 
> Swapping a file page otoh will indeed touch the union of cpumasks over
> all processes mapping that page.

There are a lot of variations, to be sure.  For whatever it is worth,
the original patch that started this uses mprotect():

https://github.com/msullivan/userspace-rcu/commit/04656b468d418efbc5d934ab07954eb8395a7ab0

> > And in either case, this can span the whole machine.  Plus
> > there are a number of other ways for users to do on-demand full-system
> > IPIs, including any number of ways to wake up large numbers of CPUs,
> > including from unrelated processes.
> 
> Which are those? I thought we significantly reduced those with the nohz
> full work. Most IPI uses now first check if a CPU actually needs the IPI
> before sending it IIRC.

If the task being awakened is higher priority than the task currently
running on a given CPU, that CPU still gets an IPI, right?  Or am I
completely confused?

> > But I do plan to add another alternative that is limited to threads of
> > the running process.  I will be carrying both versions to enable those
> > who have been bugging me about this to do testing.
> 
> Sending IPIs to mm_cpumask() might be better than expedited, but I'm
> still hesitant. Just because people want it doesn't mean its a good
> idea. We need to weight this against the potential for abuse.
> 
> People want userspace preempt disable, no matter how hard they want it,
> they're not getting it because its a completely crap idea.

Unlike userspace preempt disable, in this case we get the abuse anyway
via existing mechanisms, as in they are already being abused.  If we
provide a mechanism for this purpose, we at least have the potential
for handling the abuse, for example:

o       "Defanging" sys_membarrier() on systems that are sensitive to
        latency.  For example, this patch can be defanged by booting
        with the rcupdate.rcu_normal=1 kernel boot parameter, which
        causes requests for expedited grace periods to instead use
        normal grace periods.

o       Detecting and responding to abuse.  For example, perhaps if there
        are more than (say) 50 expedited sys_membarrier()s within a given
        jiffy, the excess sys_membarrier()s are non-expedited.

o       Batching optimizations allow large number of concurrent requests
        to be handled with fewer grace periods -- and both normal and
        expedited grace periods already do exactly this.

This horse is already out, so trying to shut the gate won't be effective.

                                                        Thanx, Paul

Reply via email to