On Thu, 20 Aug 2020, Andrii Nakryiko wrote:

> Add a set of APIs to perf_buffer manage to allow applications to integrate
> perf buffer polling into existing epoll-based infrastructure. One example is
> applications using libevent already and wanting to plug perf_buffer polling,
> instead of relying on perf_buffer__poll() and waste an extra thread to do it.
> But perf_buffer is still extremely useful to set up and consume perf buffer
> rings even for such use cases.
> 
> So to accomodate such new use cases, add three new APIs:
>   - perf_buffer__buffer_cnt() returns number of per-CPU buffers maintained by
>     given instance of perf_buffer manager;
>   - perf_buffer__buffer_fd() returns FD of perf_event corresponding to
>     a specified per-CPU buffer; this FD is then polled independently;
>   - perf_buffer__consume_buffer() consumes data from single per-CPU buffer,
>     identified by its slot index.
> 
> These APIs allow for great flexiblity, but do not sacrifice general usability
> of perf_buffer.
>

This is great! If I understand correctly, you're supporting the
retrieval and ultimately insertion of the individual per-cpu buffer fds 
into another epoll()ed fd.  I've been exploring another possibility - 
hierarchical epoll, where the top-level perf_buffer epoll_fd field is used 
rather than the individual per-cpu buffers.  In that context, would an 
interface to return the perf_buffer epoll_fd make sense too? i.e.

int perf_buffer__fd(const struct perf_buffer *pb);

?

When events occur for the perf_buffer__fd, we can simply call
perf_buffer__poll(perf_buffer__fd(pb), ...) to handle them it seems.  
That approach _appears_ to work, though I do see occasional event loss.
Is that method legit too or am I missing something?
 
> Also exercise and check new APIs in perf_buffer selftest.
> 
> Signed-off-by: Andrii Nakryiko <andr...@fb.com>

A few question around the test below, but

Reviewed-by: Alan Maguire <alan.magu...@oracle.com>


> ---
>  tools/lib/bpf/libbpf.c                        | 51 ++++++++++++++-
>  tools/lib/bpf/libbpf.h                        |  3 +
>  tools/lib/bpf/libbpf.map                      |  7 +++
>  .../selftests/bpf/prog_tests/perf_buffer.c    | 62 +++++++++++++++----
>  4 files changed, 111 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 0bc1fd813408..a6359d49aa9d 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -9390,6 +9390,55 @@ int perf_buffer__poll(struct perf_buffer *pb, int 
> timeout_ms)
>       return cnt < 0 ? -errno : cnt;
>  }
>  
> +/* Return number of PERF_EVENT_ARRAY map slots set up by this perf_buffer
> + * manager.
> + */
> +size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb)
> +{
> +     return pb->cpu_cnt;
> +}
> +
> +/*
> + * Return perf_event FD of a ring buffer in *buf_idx* slot of
> + * PERF_EVENT_ARRAY BPF map. This FD can be polled for new data using
> + * select()/poll()/epoll() Linux syscalls.
> + */
> +int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t buf_idx)
> +{
> +     struct perf_cpu_buf *cpu_buf;
> +
> +     if (buf_idx >= pb->cpu_cnt)
> +             return -EINVAL;
> +
> +     cpu_buf = pb->cpu_bufs[buf_idx];
> +     if (!cpu_buf)
> +             return -ENOENT;
> +
> +     return cpu_buf->fd;
> +}
> +
> +/*
> + * Consume data from perf ring buffer corresponding to slot *buf_idx* in
> + * PERF_EVENT_ARRAY BPF map without waiting/polling. If there is no data to
> + * consume, do nothing and return success.
> + * Returns:
> + *   - 0 on success;
> + *   - <0 on failure.
> + */
> +int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t buf_idx)
> +{
> +     struct perf_cpu_buf *cpu_buf;
> +
> +     if (buf_idx >= pb->cpu_cnt)
> +             return -EINVAL;
> +
> +     cpu_buf = pb->cpu_bufs[buf_idx];
> +     if (!cpu_buf)
> +             return -ENOENT;
> +
> +     return perf_buffer__process_records(pb, cpu_buf);
> +}
> +
>  int perf_buffer__consume(struct perf_buffer *pb)
>  {
>       int i, err;
> @@ -9402,7 +9451,7 @@ int perf_buffer__consume(struct perf_buffer *pb)
>  
>               err = perf_buffer__process_records(pb, cpu_buf);
>               if (err) {
> -                     pr_warn("error while processing records: %d\n", err);
> +                     pr_warn("perf_buffer: failed to process records in 
> buffer #%d: %d\n", i, err);
>                       return err;
>               }
>       }
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 5ecb4069a9f0..15e02dcda2c7 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -590,6 +590,9 @@ perf_buffer__new_raw(int map_fd, size_t page_cnt,
>  LIBBPF_API void perf_buffer__free(struct perf_buffer *pb);
>  LIBBPF_API int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms);
>  LIBBPF_API int perf_buffer__consume(struct perf_buffer *pb);
> +LIBBPF_API int perf_buffer__consume_buffer(struct perf_buffer *pb, size_t 
> buf_idx);
> +LIBBPF_API size_t perf_buffer__buffer_cnt(const struct perf_buffer *pb);
> +LIBBPF_API int perf_buffer__buffer_fd(const struct perf_buffer *pb, size_t 
> buf_idx);
>  
>  typedef enum bpf_perf_event_ret
>       (*bpf_perf_event_print_t)(struct perf_event_header *hdr,
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index e35bd6cdbdbf..77466958310a 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -299,3 +299,10 @@ LIBBPF_0.1.0 {
>               btf__set_fd;
>               btf__set_pointer_size;
>  } LIBBPF_0.0.9;
> +
> +LIBBPF_0.2.0 {
> +     global:
> +             perf_buffer__buffer_cnt;
> +             perf_buffer__buffer_fd;
> +             perf_buffer__consume_buffer;
> +} LIBBPF_0.1.0;
> diff --git a/tools/testing/selftests/bpf/prog_tests/perf_buffer.c 
> b/tools/testing/selftests/bpf/prog_tests/perf_buffer.c
> index c33ec180b3f2..add224ce17af 100644
> --- a/tools/testing/selftests/bpf/prog_tests/perf_buffer.c
> +++ b/tools/testing/selftests/bpf/prog_tests/perf_buffer.c
> @@ -7,6 +7,8 @@
>  #include "test_perf_buffer.skel.h"
>  #include "bpf/libbpf_internal.h"
>  
> +static int duration;
> +
>  /* AddressSanitizer sometimes crashes due to data dereference below, due to
>   * this being mmap()'ed memory. Disable instrumentation with
>   * no_sanitize_address attribute
> @@ -24,13 +26,31 @@ static void on_sample(void *ctx, int cpu, void *data, 
> __u32 size)
>       CPU_SET(cpu, cpu_seen);
>  }
>  
> +int trigger_on_cpu(int cpu)
> +{
> +     cpu_set_t cpu_set;
> +     int err;
> +
> +     CPU_ZERO(&cpu_set);
> +     CPU_SET(cpu, &cpu_set);
> +
> +     err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set);
> +     if (err && CHECK(err, "set_affinity", "cpu #%d, err %d\n", cpu, err))
> +             return err;
> +
> +     usleep(1);
> +
> +     return 0;
> +}
> +
>  void test_perf_buffer(void)
>  {
> -     int err, on_len, nr_on_cpus = 0,  nr_cpus, i, duration = 0;
> +     int err, on_len, nr_on_cpus = 0, nr_cpus, i;
>       struct perf_buffer_opts pb_opts = {};
>       struct test_perf_buffer *skel;
> -     cpu_set_t cpu_set, cpu_seen;
> +     cpu_set_t cpu_seen;
>       struct perf_buffer *pb;
> +     int last_fd = -1, fd;
>       bool *online;
>  
>       nr_cpus = libbpf_num_possible_cpus();
> @@ -71,16 +91,8 @@ void test_perf_buffer(void)
>                       continue;
>               }
>  
> -             CPU_ZERO(&cpu_set);
> -             CPU_SET(i, &cpu_set);
> -
> -             err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set),
> -                                          &cpu_set);
> -             if (err && CHECK(err, "set_affinity", "cpu #%d, err %d\n",
> -                              i, err))
> +             if (trigger_on_cpu(i))
>                       goto out_close;
> -
> -             usleep(1);
>       }
>  
>       /* read perf buffer */
> @@ -92,6 +104,34 @@ void test_perf_buffer(void)
>                 "expect %d, seen %d\n", nr_on_cpus, CPU_COUNT(&cpu_seen)))
>               goto out_free_pb;
>  
> +     if (CHECK(perf_buffer__buffer_cnt(pb) != nr_cpus, "buf_cnt",
> +               "got %zu, expected %d\n", perf_buffer__buffer_cnt(pb), 
> nr_cpus))
> +             goto out_close;
> +
> +     for (i = 0; i < nr_cpus; i++) {
> +             if (i >= on_len || !online[i])
> +                     continue;
> +
> +             fd = perf_buffer__buffer_fd(pb, i);
> +             CHECK(last_fd == fd, "fd_check", "last fd %d == fd %d\n", 
> last_fd, fd);
> +             last_fd = fd;
> +

I'm not sure why you're testing this way - shouldn't it just be a
verification of whether we get an unexpected error code rather
than a valid fd?

> +             err = perf_buffer__consume_buffer(pb, i);
> +             if (CHECK(err, "drain_buf", "cpu %d, err %d\n", i, err))
> +                     goto out_close;
> +

I think I'm a bit lost in what processes what here. The first
perf_buffer__poll() should handle the processing of the records
associated with the first set of per-cpu triggering I think.
Is the above perf_buffer__consume_buffer() checking the
"no data, return success" case? If that's right should we do 
something to explicitly check it indeed was a no-op, like CHECK()ing 
CPU_ISSET(i, &cpu_seen) to ensure the on_sample() handler wasn't
called? The  "drain_buf" description makes me think I'm misreading
this and we're draining additional events, so I wanted to check
what's going on here to make sure.

Thanks!

Alan

Reply via email to