On Tue, Jun 19, 2018 at 02:16:53PM -0600, David Ahern wrote: > On 6/19/18 10:36 AM, Martin KaFai Lau wrote: > > On Tue, Jun 19, 2018 at 09:34:28AM -0600, David Ahern wrote: > >> On 6/19/18 9:25 AM, Martin KaFai Lau wrote: > >>> On Mon, Jun 18, 2018 at 03:35:25PM -0600, David Ahern wrote: > >>>> On 6/18/18 2:55 PM, Martin KaFai Lau wrote: > >>>>>> /* rc > 0 case */ > >>>>>> switch(rc) { > >>>>>> case BPF_FIB_LKUP_RET_BLACKHOLE: > >>>>>> case BPF_FIB_LKUP_RET_UNREACHABLE: > >>>>>> case BPF_FIB_LKUP_RET_PROHIBIT: > >>>>>> return XDP_DROP; > >>>>>> } > >>>>>> > >>>>>> For the others it becomes a question of do we share why the stack needs > >>>>>> to be involved? Maybe the program wants to collect stats to show > >>>>>> traffic > >>>>>> patterns that can be improved (BPF_FIB_LKUP_RET_FRAG_NEEDED) or support > >>>>>> in the kernel needs to be improved (BPF_FIB_LKUP_RET_UNSUPP_LWT) or an > >>>>>> interface is misconfigured (BPF_FIB_LKUP_RET_FWD_DISABLED). > >>>>> Thanks for the explanation. > >>>>> > >>>>> Agree on the bpf able to collect stats will be useful. > >>>>> > >>>>> I am wondering, if a new BPF_FIB_LKUP_RET_XYZ is added later, > >>>>> how may the old xdp_prog work/not-work? As of now, the return value > >>>>> is straight forward, FWD, PASS (to stack) or DROP (error). > >>>>> With this change, the xdp_prog needs to match/switch() the > >>>>> BPF_FIB_LKUP_RET_* to at least PASS and DROP. > >>>> > >>>> IMO, programs should only call XDP_DROP for known reasons - like the 3 > >>>> above. Anything else punt to the stack. > >>>> > >>>> If a new RET_XYZ comes along: > >>>> 1. the new XYZ is a new ACL response where the packet is to be dropped. > >>>> If the program does not understand XYZ and punts to the stack > >>>> (recommendation), then a second lookup is done during normal packet > >>>> processing and the stack drops it. > >>>> > >>>> 2. the new XYZ is a new path in the kernel that is unsupported with > >>>> respect to XDP forwarding, nothing new for the program to do. > >>>> > >>>> Either way I would expect stats on BPF_FIB_LKUP_RET_* to give a hint to > >>>> the program writer. > >>>> > >>>> Worst case of punting packets to the stack for any rc != 0 means the > >>>> stack is doing 2 lookups - 1 in XDP based on its lookup parameters and 1 > >>>> in normal stack processing - to handle the packet. > >>> Instead of having the xdp_prog to follow the meaning of what RET_SYZ is, > >>> should the bpf_*_fib_lookup() return value be kept as is such that > >>> the xdp_prog is clear what to do. The reason can be returned in > >>> the 'struct bpf_fib_lookup'. The number of reasons can be extended. > >>> If the xdp_prog does not understand a reason, it still will not > >>> affect its decision because the return value is clear. > >>> I think the situation here is similar to regular syscall which usually > >>> uses -1 to clearly states error and errno to spells out the reason. > >>> > >> > >> I did consider returning the status in struct bpf_fib_lookup. However, > >> it is 64 bytes and can not be extended without a big performance > >> penalty, so the only option there is to make an existing entry a union > >> the most logical of which is the ifindex. It seemed odd to me to have > >> the result by hidden in the struct as a union on ifindex and returning > >> the egress index from the function: > >> > >> @@ -2625,7 +2636,11 @@ struct bpf_fib_lookup { > >> > >> /* total length of packet from network header - used for MTU > >> check */ > >> __u16 tot_len; > >> - __u32 ifindex; /* L3 device index for lookup */ > >> + > >> + union { > >> + __u32 ifindex; /* input: L3 device index for lookup */ > >> + __u32 result; /* output: one of BPF_FIB_LKUP_RET_* */ > >> + }; > >> > >> > >> It seemed more natural to have ifindex stay ifindex and only change > >> value on return: > >> > >> @@ -2625,7 +2639,11 @@ struct bpf_fib_lookup { > >> > >> /* total length of packet from network header - used for MTU check */ > >> __u16 tot_len; > >> - __u32 ifindex; /* L3 device index for lookup */ > >> + > >> + /* input: L3 device index for lookup > >> + * output: nexthop device index from FIB lookup > >> + */ > >> + __u32 ifindex; > >> > >> union { > >> /* inputs to lookup */ > >> > >> > >> From a program's perspective: > >> > >> rc < 0 -- program is passing incorrect data > >> rc == 0 -- packet can be forwarded > >> rc > 0 -- packet can not be forwarded. > >> > >> BPF programs are not required to track the LKUP_RET values any more than > >> a function returning multiple negative values - the caller just checks > >> rc < 0 means failure. If the program cares it can look at specific > >> values of rc to see the specific value. > >> > >> The same applies with the LKUP_RET values - they are there to provide > >> insight into why the packet is not forwarded directly if the program > >> cares to know why. > > hmm...ic. My concern is, the prog can interpret rc > 0 (in this patch) to be > > drop vs pass (although we can advise them in bpf.h to always pass if it does > > not understand a rc but it is not a strong contract), it may catch people > > a surprise if a xdp_prog suddenly drops everything when running in a > > newer kernel where the upper stack can actually handle it. > > > > while the current behavior (i.e. before this patch, rc == 0) is always pass > > to the stack. > > > > I think at least comments should be put in the enum such that > > the xdp/tc_prog should expect the enum could be extended later, so > > the suggested behavior should be a pass for unknown LKUP_RET and let > > the stack to decide. > > > > All APIs with enums have the inherent quality that more can be added. > Nothing about rc > 0 says it is ok to drop the packet and nothing in the > documentation says it is ok to drop the packet. The program author needs > to look at the extra information provided by the rc. Specific values are > a hint that yes the packet can be dropped; others merely say 'packet > needs help from the stack' with a reason why it needs help. Right, I understand there are values that hint drop (as your earlier example) and there are values that hint pass to stack. and either set of these values could be extended later.
> > My intention is to allow the XDP program to understand FIB based ACLs. > To that end a fib lookup result of blackhole, unreachable and prohibit > needs to be returned to the xdp program with the effective summary of > "can drop in the xdp program". > > If the other return codes are going to cause confusion then less shorten > the list: > > enum { > BPF_FIB_LKUP_RET_SUCCESS, /* packet is to be forwareded */ > BPF_FIB_LKUP_RET_CAN_DROP, /* XDP program can drop the packet */ > BPF_FIB_LKUP_RET_NEED_STACK, /* packet needs full stack assist */ > }; > > But, that still does not solve the problem of rc > 0 means xdp program > can drop the packet, but then that is not the intention of rc > 0. The > intention is only "here's more information about why this packet can not > be forwarded at this layer" ok. Please spell out this intention in bpf.h so that the writer will not default to DROP for the reason that it does not understand.