On Wed, 8 Feb 2017 15:41:20 -0800
Tom Herbert <t...@herbertland.com> wrote:

> +static inline int __xdp_run_one_hook(struct xdp_hook *hook,
> +                                  struct xdp_buff *xdp)
> +{
> +     void *priv = rcu_dereference(hook->priv);
> +
> +     if (hook->is_bpf) {
> +             /* Run BPF programs directly do avoid one layer of
> +              * indirection.
> +              */
> +             return BPF_PROG_RUN((struct bpf_prog *)priv, (void *)xdp);
> +     } else {
> +             return hook->hookfn(priv, xdp);
> +     }
> +}
> +
> +/* Core function to run the XDP hooks. This must be as fast as possible */
> +static inline int __xdp_hook_run(struct xdp_hook_set *hook_set,
> +                              struct xdp_buff *xdp,
> +                              struct xdp_hook **last_hook)
> +{
> +     struct xdp_hook *hook;
> +     int i, ret;
> +
> +     if (unlikely(!hook_set))
> +             return XDP_PASS;
> +
> +     hook = &hook_set->hooks[0];
> +     ret = __xdp_run_one_hook(hook, xdp);
> +     *last_hook = hook;
> +
> +     for (i = 1; i < hook_set->num; i++) {
> +             if (ret != XDP_PASS)
> +                     break;
> +             hook = &hook_set->hooks[i];
> +             ret = __xdp_run_one_hook(hook, xdp);
> +     }
> +
> +     return ret;
> +}

There is one basic problem with this approach.  There is no bulking and
no reuse of instruction cache.  There is no revolution in this approach.
We will end-up with the same known performance problems when more hook
users get added.

Calling N-number of hooks per every packet, will just end-up flushing
the instruction cache (like the issues we have today).

Instead take N-packets, and then call the hooks by turn (store action
verdicts in packet-vector).  Such an architecture would be inline with
that VPP, Snabb and DPDK is doing.  Optimizing icache usage, and opens
up for smarter prefetching of lookup tables.  Imagine, having hook-1
identify lookup bucket and start prefetch, hook-2 access the bucket and
prefetch table data, and hook-3 read data.  This is what DPDK is doing
see[1], and VPP is doing similar tricks to get it to scale to large
route lookup tables.

[1] http://dpdk.org/doc/guides/prog_guide/packet_framework.html#figure-figure35

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Reply via email to