On Mon, Jan 28, 2019 at 01:56:24PM -0800, Alexei Starovoitov wrote:
> On Mon, Jan 28, 2019 at 10:24:08AM +0100, Peter Zijlstra wrote:

> > Ah, but the loop won't be in the BPF program itself. The BPF program
> > would only have had the BPF_SPIN_LOCK instruction, the JIT them emits
> > code similar to queued_spin_lock()/queued_spin_unlock() (or calls to
> > out-of-line versions of them).
> 
> As I said we considered exactly that and such approach has a lot of downsides
> comparing with the helper approach.
> Pretty much every time new feature is added we're evaluating whether it
> should be new instruction or new helper. 99% of the time we go with new 
> helper.

Ah; it seems I'm confused on helper vs instruction. As in, I've no idea
what a helper is.

> > There isn't anything that mandates the JIT uses the exact same locking
> > routines the interpreter does, is there?
> 
> sure. This bpf_spin_lock() helper can be optimized whichever way the kernel 
> wants.
> Like bpf_map_lookup_elem() call is _inlined_ by the verifier for certain map 
> types.
> JITs don't even need to do anything. It looks like function call from bpf prog
> point of view, but in JITed code it is a sequence of native instructions.
> 
> Say tomorrow we find out that bpf_prog->bpf_spin_lock()->queued_spin_lock()
> takes too much time then we can inline fast path of queued_spin_lock
> directly into bpf prog and save function call cost.

OK, so then the JIT can optimize helpers. Would it not make sense to
have the simple test-and-set spinlock in the generic code and have the
JITs use arch_spinlock_t where appropriate?

Reply via email to