Jonathan Lemon <jonathan.le...@gmail.com> writes: > On 6 Jun 2019, at 14:14, Toke Høiland-Jørgensen wrote: > >> Jonathan Lemon <jonathan.le...@gmail.com> writes: >> >>> On 6 Jun 2019, at 12:24, Daniel Borkmann wrote: >>> >>>> On 06/06/2019 08:15 PM, Jonathan Lemon wrote: >>>>> On 6 Jun 2019, at 9:15, Toke Høiland-Jørgensen wrote: >>>>>> Alexei Starovoitov <alexei.starovoi...@gmail.com> writes: >>>>>>> On Thu, Jun 6, 2019 at 8:51 AM Daniel Borkmann >>>>>>> <dan...@iogearbox.net> wrote: >>>>>>>> On 06/06/2019 03:24 PM, Toke Høiland-Jørgensen wrote: >>>>>>>>> From: Toke Høiland-Jørgensen <t...@redhat.com> >>>>>>>>> >>>>>>>>> The bpf_redirect_map() helper used by XDP programs doesn't return >>>>>>>>> any >>>>>>>>> indication of whether it can successfully redirect to the map >>>>>>>>> index it was >>>>>>>>> given. Instead, BPF programs have to track this themselves, >>>>>>>>> leading to >>>>>>>>> programs using duplicate maps to track which entries are >>>>>>>>> populated in the >>>>>>>>> devmap. >>>>>>>>> >>>>>>>>> This patch adds a flag to the XDP version of the >>>>>>>>> bpf_redirect_map() helper, >>>>>>>>> which makes the helper do a lookup in the map when called, and >>>>>>>>> return >>>>>>>>> XDP_PASS if there is no value at the provided index. >>>>>>>>> >>>>>>>>> With this, a BPF program can check the return code from the >>>>>>>>> helper call and >>>>>>>>> react if it is XDP_PASS (by, for instance, substituting a >>>>>>>>> different >>>>>>>>> redirect). This works for any type of map used for redirect. >>>>>>>>> >>>>>>>>> Signed-off-by: Toke Høiland-Jørgensen <t...@redhat.com> >>>>>>>>> --- >>>>>>>>> include/uapi/linux/bpf.h | 8 ++++++++ >>>>>>>>> net/core/filter.c | 10 +++++++++- >>>>>>>>> 2 files changed, 17 insertions(+), 1 deletion(-) >>>>>>>>> >>>>>>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >>>>>>>>> index 7c6aef253173..d57df4f0b837 100644 >>>>>>>>> --- a/include/uapi/linux/bpf.h >>>>>>>>> +++ b/include/uapi/linux/bpf.h >>>>>>>>> @@ -3098,6 +3098,14 @@ enum xdp_action { >>>>>>>>> XDP_REDIRECT, >>>>>>>>> }; >>>>>>>>> >>>>>>>>> +/* Flags for bpf_xdp_redirect_map helper */ >>>>>>>>> + >>>>>>>>> +/* If set, the help will check if the entry exists in the map >>>>>>>>> and return >>>>>>>>> + * XDP_PASS if it doesn't. >>>>>>>>> + */ >>>>>>>>> +#define XDP_REDIRECT_F_PASS_ON_INVALID BIT(0) >>>>>>>>> +#define XDP_REDIRECT_ALL_FLAGS XDP_REDIRECT_F_PASS_ON_INVALID >>>>>>>>> + >>>>>>>>> /* user accessible metadata for XDP packet hook >>>>>>>>> * new fields must be added to the end of this structure >>>>>>>>> */ >>>>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c >>>>>>>>> index 55bfc941d17a..2e532a9b2605 100644 >>>>>>>>> --- a/net/core/filter.c >>>>>>>>> +++ b/net/core/filter.c >>>>>>>>> @@ -3755,9 +3755,17 @@ BPF_CALL_3(bpf_xdp_redirect_map, struct >>>>>>>>> bpf_map *, map, u32, ifindex, >>>>>>>>> { >>>>>>>>> struct bpf_redirect_info *ri = >>>>>>>>> this_cpu_ptr(&bpf_redirect_info); >>>>>>>>> >>>>>>>>> - if (unlikely(flags)) >>>>>>>>> + if (unlikely(flags & ~XDP_REDIRECT_ALL_FLAGS)) >>>>>>>>> return XDP_ABORTED; >>>>>>>>> >>>>>>>>> + if (flags & XDP_REDIRECT_F_PASS_ON_INVALID) { >>>>>>>>> + void *val; >>>>>>>>> + >>>>>>>>> + val = __xdp_map_lookup_elem(map, >>>>>>>>> ifindex); >>>>>>>>> + if (unlikely(!val)) >>>>>>>>> + return XDP_PASS; >>>>>>>> >>>>>>>> Generally looks good to me, also the second part with the flag. >>>>>>>> Given we store into >>>>>>>> the per-CPU scratch space and function like xdp_do_redirect() pick >>>>>>>> this up again, we >>>>>>>> could even propagate val onwards and save a second lookup on the >>>>>>>> /same/ element (which >>>>>>>> also avoids a race if the val was dropped from the map in the >>>>>>>> meantime). Given this >>>>>>>> should all still be within RCU it should work. Perhaps it even >>>>>>>> makes sense to do the >>>>>>>> lookup unconditionally inside bpf_xdp_redirect_map() helper iff we >>>>>>>> manage to do it >>>>>>>> only once anyway? >>>>>>> >>>>>>> +1 >>>>>>> >>>>>>> also I don't think we really need a new flag here. >>>>>>> Yes, it could be considered an uapi change, but it >>>>>>> looks more like bugfix in uapi to me. >>>>>>> Since original behavior was so clunky to use. >>>>>> >>>>>> Hmm, the problem with this is that eBPF programs generally do >>>>>> something >>>>>> like: >>>>>> >>>>>> return bpf_redirect_map(map, idx, 0); >>>>>> >>>>>> after having already modified the packet headers. This will get them >>>>>> a >>>>>> return code of XDP_REDIRECT, and the lookup will then subsequently >>>>>> fail, >>>>>> which returns in XDP_ABORTED in the driver, which you can catch with >>>>>> tracing. >>>>>> >>>>>> However, if we just change it to XDP_PASS, the packet will go up the >>>>>> stack, but because it has already been modified the stack will drop >>>>>> it, >>>>>> more or less invisibly. >>>>>> >>>>>> So the question becomes, is that behaviour change really OK? >>>>> >>>>> Another option would be treating the flags (or the lower bits of >>>>> flags) >>>>> as the default xdp action taken if the lookup fails. 0 just happens >>>>> to >>>>> map to XDP_ABORTED, which gives the initial behavior. Then the new >>>>> behavior >>>>> would be: >>>>> >>>>> return bpf_redirect_map(map, index, XDP_PASS); >>>> >>>> Makes sense, that should work, but as default (flags == 0), you'd have >>>> to return XDP_REDIRECT to stay consistent with existing behavior. >>> >>> Right - I was thinking something along the lines of: >>> >>> val = __xdp_map_lookup_elem(map, ifindex); >>> if (unlikely(!val)) >>> return (flags & 3); >>> ... >>> return XDP_REDIRECT; >>> >>> >>> Stated another way, if the map lookup succeeds, return REDIRECT, >>> otherwise >>> return one (ABORT, DROP, PASS, TX). >> >> But then we're still changing UAPI on flags==0? > > I believe your point (and Daniel's) is that for flags==0, it should always > return REDIRECT, which is the current behavior? I'm not seeing why it > matters. > > Returning REDIRECT indicates something was stored in redirect_info, and > xdp_do_redirect() is called. This will fail the lookup (which was just done) > and return -EINVAL. Callers treat this as XDP_DROP. > > On the other hand, returning XDP_ABORTED bypasses the xdp_do_redirect() call > and all callsites treat this as DROP. The main difference seems to be the > tracing call - whether _trace_xdp_redirect_map_err or trace_xdp_exception gets > called. > > Is this really an UAPI breakage?
Well, that's what I'm trying to figure out :) It will mean that the xdp_redirect_map_err() tracepoint is no longer triggered, and anyone who counts the number of different return codes seen by the program (as we do in the XDP tutorial, for instance[0]) is going to see different values all of a sudden. So it kinda feels dodgy to change it, I'd say? As in, I'm not vehemently opposed, just trying to be extra cautious? >> Also, what would be the use case for this, wouldn't the program have >> to react explicitly in any case (to, e.g., not modify the packet if >> it decides to XDP_PASS)? > > How is that any different from using XDP_REDIRECT_F_PASS_ON_INVALID? My point is that it's not: If you have to check the return value anyway, we're not really gaining everything from making it possible to select what that return value is? -Toke [0] https://github.com/xdp-project/xdp-tutorial/blob/master/packet01-parsing/xdp_prog_kern.c#L94