Re: [PATCH bpf-next 2/3] tools, perf: use smp_{rmb,mb} barriers instead of {rmb,mb}

Peter Zijlstra Thu, 18 Oct 2018 01:15:02 -0700

On Thu, Oct 18, 2018 at 01:10:15AM +0200, Daniel Borkmann wrote:

> Wouldn't this then also allow the kernel side to use smp_store_release()
> when it updates the head? We'd be pretty much at the model as described
> in Documentation/core-api/circular-buffers.rst.
> 
> Meaning, rough pseudo-code diff would look as:
> 
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 5d3cf40..3d96275 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -84,8 +84,9 @@ static void perf_output_put_handle(struct 
> perf_output_handle *handle)
>        *
>        * See perf_output_begin().
>        */
> -     smp_wmb(); /* B, matches C */
> -     rb->user_page->data_head = head;
> +
> +     /* B, matches C */
> +     smp_store_release(&rb->user_page->data_head, head);


Yes, this would be correct.

The reason we didn't do this is because smp_store_release() ends up
being smp_mb() + WRITE_ONCE() for a fair number of platforms, even if
they have a cheaper smp_wmb(). Most notably ARM.

(ARM64 OTOH would like to have smp_store_release() there I imagine;
while x86 doesn't care either way around).

A similar concern exists for the smp_load_acquire() I proposed for the
userspace side, ARM would have to resort to smp_mb() in that situation,
instead of the cheaper smp_rmb().

The smp_store_release() on the userspace side will actually be of equal
cost or cheaper, since it already has an smp_mb(). Most notably, x86 can
avoid barrier entirely, because TSO doesn't allow the LOAD-STORE reorder
(it only allows the STORE-LOAD reorder). And PowerPC can use LWSYNC
instead of SYNC.

Re: [PATCH bpf-next 2/3] tools, perf: use smp_{rmb,mb} barriers instead of {rmb,mb}

Reply via email to