On Mon, Jan 28, 2019 at 01:56:24PM -0800, Alexei Starovoitov wrote: > On Mon, Jan 28, 2019 at 10:24:08AM +0100, Peter Zijlstra wrote:
> > Ah, but the loop won't be in the BPF program itself. The BPF program > > would only have had the BPF_SPIN_LOCK instruction, the JIT them emits > > code similar to queued_spin_lock()/queued_spin_unlock() (or calls to > > out-of-line versions of them). > > As I said we considered exactly that and such approach has a lot of downsides > comparing with the helper approach. > Pretty much every time new feature is added we're evaluating whether it > should be new instruction or new helper. 99% of the time we go with new > helper. Ah; it seems I'm confused on helper vs instruction. As in, I've no idea what a helper is. > > There isn't anything that mandates the JIT uses the exact same locking > > routines the interpreter does, is there? > > sure. This bpf_spin_lock() helper can be optimized whichever way the kernel > wants. > Like bpf_map_lookup_elem() call is _inlined_ by the verifier for certain map > types. > JITs don't even need to do anything. It looks like function call from bpf prog > point of view, but in JITed code it is a sequence of native instructions. > > Say tomorrow we find out that bpf_prog->bpf_spin_lock()->queued_spin_lock() > takes too much time then we can inline fast path of queued_spin_lock > directly into bpf prog and save function call cost. OK, so then the JIT can optimize helpers. Would it not make sense to have the simple test-and-set spinlock in the generic code and have the JITs use arch_spinlock_t where appropriate?