On Sat, Jan 2, 2016 at 4:28 PM, Daniel Borkmann <dan...@iogearbox.net> wrote: > On 12/29/2015 06:29 PM, Craig Gallek wrote: >> >> From: Craig Gallek <kr...@google.com> >> >> Expose socket options for setting a classic or extended BPF program >> for use when selecting sockets in an SO_REUSEPORT group. These options >> can be used on the first socket to belong to a group before bind or >> on any socket in the group after bind. >> >> This change includes refactoring of the existing sk_filter code to >> allow reuse of the existing BPF filter validation checks. >> >> Signed-off-by: Craig Gallek <kr...@google.com> > > [...] >> >> diff --git a/include/linux/filter.h b/include/linux/filter.h >> index 4165e9a..3561d3a 100644 >> --- a/include/linux/filter.h >> +++ b/include/linux/filter.h >> @@ -447,6 +447,8 @@ void bpf_prog_destroy(struct bpf_prog *fp); >> >> int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk); >> int sk_attach_bpf(u32 ufd, struct sock *sk); >> +int reuseport_attach_filter(struct sock_fprog *fprog, struct sock *sk); >> +int reuseport_attach_bpf(u32 ufd, struct sock *sk); > > > Maybe for consistency this should be sk_* prefixed as well due to its > relation to sockets?) ACK -> v3
> You currently don't charge the BPF program against the optmem limits, but > just > test if the size of a given program would surpass the current > sysctl_optmem_max. > Ok, after all, this would block anything beyond 2560 insns by default. Is > there > a reason it's not charged for real? Due to the sysctl_optmem_max default > being > too small? > > Btw, in case of an eBPF fd, we already charged it to the user's > RLIMIT_MEMLOCK, > not sure if blocking it here after program already got an fd makes much > sense. > I'm fine if you want to leave it for now and refine this later, though. I considered a couple options, but I wasn't able to come up with a straight forward policy. Which socket should the filter memory be charged to? If it's only one of them, you run the risk of under accounting when that socket leaves the group. If you charge all of them you end up over-accounting and introduce all kinds of partial failure paths. Another alternative would be to mirror one of the sk memory properties (omem?) in the sock_reuseport structure. That is, have an accounting variable similar to sk_omem_alloc (with a max of sysctl_optmem_max) which is changed each time the reuseport bpf program is manipulated. I'd definitely be interested in ideas for a follow-on patch if you have any. > Nit: return bpf_prepare_filter(prog, NULL); ACK -> v3 > Rest of BPF bits look good to me. Thank you! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html