On 6 Jun 2019, at 14:14, Toke Høiland-Jørgensen wrote:
> Jonathan Lemon <jonathan.le...@gmail.com> writes:
>
>> On 6 Jun 2019, at 12:24, Daniel Borkmann wrote:
>>
>>> On 06/06/2019 08:15 PM, Jonathan Lemon wrote:
>>>> On 6 Jun 2019, at 9:15, Toke Høiland-Jørgensen wrote:
>>>>> Alexei Starovoitov <alexei.starovoi...@gmail.com> writes:
>>>>>> On Thu, Jun 6, 2019 at 8:51 AM Daniel Borkmann
>>>>>> <dan...@iogearbox.net> wrote:
>>>>>>> On 06/06/2019 03:24 PM, Toke Høiland-Jørgensen wrote:
>>>>>>>> From: Toke Høiland-Jørgensen <t...@redhat.com>
>>>>>>>>
>>>>>>>> The bpf_redirect_map() helper used by XDP programs doesn't return
>>>>>>>> any
>>>>>>>> indication of whether it can successfully redirect to the map
>>>>>>>> index it was
>>>>>>>> given. Instead, BPF programs have to track this themselves,
>>>>>>>> leading to
>>>>>>>> programs using duplicate maps to track which entries are
>>>>>>>> populated in the
>>>>>>>> devmap.
>>>>>>>>
>>>>>>>> This patch adds a flag to the XDP version of the
>>>>>>>> bpf_redirect_map() helper,
>>>>>>>> which makes the helper do a lookup in the map when called, and
>>>>>>>> return
>>>>>>>> XDP_PASS if there is no value at the provided index.
>>>>>>>>
>>>>>>>> With this, a BPF program can check the return code from the
>>>>>>>> helper call and
>>>>>>>> react if it is XDP_PASS (by, for instance, substituting a
>>>>>>>> different
>>>>>>>> redirect). This works for any type of map used for redirect.
>>>>>>>>
>>>>>>>> Signed-off-by: Toke Høiland-Jørgensen <t...@redhat.com>
>>>>>>>> ---
>>>>>>>> include/uapi/linux/bpf.h | 8 ++++++++
>>>>>>>> net/core/filter.c | 10 +++++++++-
>>>>>>>> 2 files changed, 17 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>>>>>>> index 7c6aef253173..d57df4f0b837 100644
>>>>>>>> --- a/include/uapi/linux/bpf.h
>>>>>>>> +++ b/include/uapi/linux/bpf.h
>>>>>>>> @@ -3098,6 +3098,14 @@ enum xdp_action {
>>>>>>>> XDP_REDIRECT,
>>>>>>>> };
>>>>>>>>
>>>>>>>> +/* Flags for bpf_xdp_redirect_map helper */
>>>>>>>> +
>>>>>>>> +/* If set, the help will check if the entry exists in the map
>>>>>>>> and return
>>>>>>>> + * XDP_PASS if it doesn't.
>>>>>>>> + */
>>>>>>>> +#define XDP_REDIRECT_F_PASS_ON_INVALID BIT(0)
>>>>>>>> +#define XDP_REDIRECT_ALL_FLAGS XDP_REDIRECT_F_PASS_ON_INVALID
>>>>>>>> +
>>>>>>>> /* user accessible metadata for XDP packet hook
>>>>>>>> * new fields must be added to the end of this structure
>>>>>>>> */
>>>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>>>>>> index 55bfc941d17a..2e532a9b2605 100644
>>>>>>>> --- a/net/core/filter.c
>>>>>>>> +++ b/net/core/filter.c
>>>>>>>> @@ -3755,9 +3755,17 @@ BPF_CALL_3(bpf_xdp_redirect_map, struct
>>>>>>>> bpf_map *, map, u32, ifindex,
>>>>>>>> {
>>>>>>>> struct bpf_redirect_info *ri =
>>>>>>>> this_cpu_ptr(&bpf_redirect_info);
>>>>>>>>
>>>>>>>> - if (unlikely(flags))
>>>>>>>> + if (unlikely(flags & ~XDP_REDIRECT_ALL_FLAGS))
>>>>>>>> return XDP_ABORTED;
>>>>>>>>
>>>>>>>> + if (flags & XDP_REDIRECT_F_PASS_ON_INVALID) {
>>>>>>>> + void *val;
>>>>>>>> +
>>>>>>>> + val = __xdp_map_lookup_elem(map,
>>>>>>>> ifindex);
>>>>>>>> + if (unlikely(!val))
>>>>>>>> + return XDP_PASS;
>>>>>>>
>>>>>>> Generally looks good to me, also the second part with the flag.
>>>>>>> Given we store into
>>>>>>> the per-CPU scratch space and function like xdp_do_redirect() pick
>>>>>>> this up again, we
>>>>>>> could even propagate val onwards and save a second lookup on the
>>>>>>> /same/ element (which
>>>>>>> also avoids a race if the val was dropped from the map in the
>>>>>>> meantime). Given this
>>>>>>> should all still be within RCU it should work. Perhaps it even
>>>>>>> makes sense to do the
>>>>>>> lookup unconditionally inside bpf_xdp_redirect_map() helper iff we
>>>>>>> manage to do it
>>>>>>> only once anyway?
>>>>>>
>>>>>> +1
>>>>>>
>>>>>> also I don't think we really need a new flag here.
>>>>>> Yes, it could be considered an uapi change, but it
>>>>>> looks more like bugfix in uapi to me.
>>>>>> Since original behavior was so clunky to use.
>>>>>
>>>>> Hmm, the problem with this is that eBPF programs generally do
>>>>> something
>>>>> like:
>>>>>
>>>>> return bpf_redirect_map(map, idx, 0);
>>>>>
>>>>> after having already modified the packet headers. This will get them
>>>>> a
>>>>> return code of XDP_REDIRECT, and the lookup will then subsequently
>>>>> fail,
>>>>> which returns in XDP_ABORTED in the driver, which you can catch with
>>>>> tracing.
>>>>>
>>>>> However, if we just change it to XDP_PASS, the packet will go up the
>>>>> stack, but because it has already been modified the stack will drop
>>>>> it,
>>>>> more or less invisibly.
>>>>>
>>>>> So the question becomes, is that behaviour change really OK?
>>>>
>>>> Another option would be treating the flags (or the lower bits of
>>>> flags)
>>>> as the default xdp action taken if the lookup fails. 0 just happens
>>>> to
>>>> map to XDP_ABORTED, which gives the initial behavior. Then the new
>>>> behavior
>>>> would be:
>>>>
>>>> return bpf_redirect_map(map, index, XDP_PASS);
>>>
>>> Makes sense, that should work, but as default (flags == 0), you'd have
>>> to return XDP_REDIRECT to stay consistent with existing behavior.
>>
>> Right - I was thinking something along the lines of:
>>
>> val = __xdp_map_lookup_elem(map, ifindex);
>> if (unlikely(!val))
>> return (flags & 3);
>> ...
>> return XDP_REDIRECT;
>>
>>
>> Stated another way, if the map lookup succeeds, return REDIRECT,
>> otherwise
>> return one (ABORT, DROP, PASS, TX).
>
> But then we're still changing UAPI on flags==0?
I believe your point (and Daniel's) is that for flags==0, it should always
return REDIRECT, which is the current behavior? I'm not seeing why it
matters.
Returning REDIRECT indicates something was stored in redirect_info, and
xdp_do_redirect() is called. This will fail the lookup (which was just done)
and return -EINVAL. Callers treat this as XDP_DROP.
On the other hand, returning XDP_ABORTED bypasses the xdp_do_redirect() call
and all callsites treat this as DROP. The main difference seems to be the
tracing call - whether _trace_xdp_redirect_map_err or trace_xdp_exception gets
called.
Is this really an UAPI breakage?
> Also, what would be the
> use case for this, wouldn't the program have to react explicitly in any
> case (to, e.g., not modify the packet if it decides to XDP_PASS)?
How is that any different from using XDP_REDIRECT_F_PASS_ON_INVALID?
--
Jonathan
>
> -Toke