On 10/17/18 10:26 AM, Alexei Starovoitov wrote: > On Tue, Oct 16, 2018 at 10:56:05PM -0700, Song Liu wrote: >> BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the >> skb. This patch enables direct access of skb for these programs. > > The lack of direct packet access in CGROUP_SKB progs was > an unpleasant surprise to me, so thank you for fixing it, > but there are few issues with the patch. See below. > >> In __cgroup_bpf_run_filter_skb(), bpf_compute_data_pointers() is called >> to compute proper data_end for the BPF program. >> >> Signed-off-by: Song Liu <songliubrav...@fb.com> >> --- >> kernel/bpf/cgroup.c | 4 ++++ >> net/core/filter.c | 26 +++++++++++++++++++++++++- >> 2 files changed, 29 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c >> index 00f6ed2e4f9a..340d496f35bd 100644 >> --- a/kernel/bpf/cgroup.c >> +++ b/kernel/bpf/cgroup.c >> @@ -566,6 +566,10 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk, >> save_sk = skb->sk; >> skb->sk = sk; >> __skb_push(skb, offset); >> + >> + /* compute pointers for the bpf prog */ >> + bpf_compute_data_pointers(skb); >> + >> ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], skb, >> bpf_prog_run_save_cb); >> __skb_pull(skb, offset); >> diff --git a/net/core/filter.c b/net/core/filter.c >> index 1a3ac6c46873..8b5a502e241f 100644 >> --- a/net/core/filter.c >> +++ b/net/core/filter.c >> @@ -5346,6 +5346,30 @@ static bool sk_filter_is_valid_access(int off, int >> size, >> return bpf_skb_is_valid_access(off, size, type, prog, info); >> } >> >> +static bool cg_skb_is_valid_access(int off, int size, >> + enum bpf_access_type type, >> + const struct bpf_prog *prog, >> + struct bpf_insn_access_aux *info) >> +{ >> + if (type == BPF_WRITE) >> + return false; > > this disables writes into cb[0..4] that were allowed for cgroup_inet_* before. > One can argue that this may break existing progs, > but looking at the place where BPF_CGROUP_RUN_PROG_INET_INGRESS is called > it seems it's actually not correct in all cases to access cb there. > Just few lines down we call bpf_prog_run_save_cb() which save/restores > these 24 bytes. > So we have two option either add save/restore for INET_INGRESS only > or disable read and write access to cb[0..4] for CGROUP_SKB progs. > I prefer the former. > >> + >> + switch (off) { >> + case bpf_ctx_range(struct __sk_buff, len): >> + break; >> + case bpf_ctx_range(struct __sk_buff, data): >> + info->reg_type = PTR_TO_PACKET; >> + break; >> + case bpf_ctx_range(struct __sk_buff, data_end): >> + info->reg_type = PTR_TO_PACKET_END; >> + break; >> + default: >> + return false; >> + } > > this also enables access to a range of fields family..local_port. > It's ok to do for egress, but not for ingress unless we > add code similar to the bottom of sk_filter_trim_cap() that > inits skb->sk. > > above change also allows access to data_meta and flow_keys > which is not correct. > > Considering all that I'm proposing to fix INET_INGRESS call site > similar to code below it in sk_filter_trim_cap(). > In particular to do: > struct sock *save_sk = skb->sk; > skb->sk = sk; > save and clear cb > BPF_CGROUP_RUN_PROG_INET_INGRESS > restore cb > skb->sk = save_sk; > > all of above can probaby be inside BPF_CGROUP_RUN_PROG_INET_INGRESS macro. > Then in this cg_skb_is_valid_access() allow access to data/data_end > and family..local_port range as well. > while disallowing access to flow_keys and data_meta. > > In patch 2 we gotta have tests for all these fields. > > Thoughts?
chatted with Song offline. I completely misread 'return false' in the above as 'break'. The patch actually disables access to pkt_type, mark, queue_mapping and so on. Which is not correct either. Since tests were not failing we really need to improve this aspect of test coverage in test_verifier.c Also I missed that __cgroup_bpf_run_filter_skb() already does save_sk = skb->sk; skb->sk = sk; and bpf_prog_run_save_cb() So no issue in the existing code. That was false alarm. Revising the proposal... I think cg_skb_is_valid_access() can be made similar to lwt_is_valid_access(). Allowing writes into mark, priority, cb[0..4] and read of data/data_end. In addition it's also ok to allow family..local_port range (unlike lwt where sk may not be present). and no access to data_meta and flow_keys.