2018-11-26 11:08 UTC-0800 ~ Vlad Dumitrescu <v...@dumitrescu.ro> > On Fri, Nov 23, 2018 at 2:51 AM Quentin Monnet > <quentin.mon...@netronome.com> wrote: >> >> 2018-11-21 09:28 UTC-0800 ~ Stanislav Fomichev <s...@fomichev.me> >>> On 11/21, Quentin Monnet wrote: >>>> 2018-11-20 15:26 UTC-0800 ~ Stanislav Fomichev <s...@fomichev.me> >>>>> On 11/20, Alexei Starovoitov wrote: >>>>>> On Wed, Nov 21, 2018 at 12:18:57AM +0100, Daniel Borkmann wrote: >>>>>>> On 11/21/2018 12:04 AM, Alexei Starovoitov wrote: >>>>>>>> On Tue, Nov 20, 2018 at 01:19:05PM -0800, Stanislav Fomichev wrote: >>>>>>>>> On 11/20, Alexei Starovoitov wrote: >>>>>>>>>> On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote: >>>>>>>>>>> [Recent commit 23499442c319 ("bpf: libbpf: retry map creation >>>>>>>>>>> without >>>>>>>>>>> the name") fixed this issue for maps, let's do the same for >>>>>>>>>>> programs.] >>>>>>>>>>> >>>>>>>>>>> Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support >>>>>>>>>>> to specify BPF obj name"), libbpf unconditionally sets >>>>>>>>>>> bpf_attr->name >>>>>>>>>>> for programs. Pre v4.14 kernels don't know about programs names and >>>>>>>>>>> return an error about unexpected non-zero data. Retry sys_bpf >>>>>>>>>>> without >>>>>>>>>>> a program name to cover older kernels. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Stanislav Fomichev <s...@google.com> >>>>>>>>>>> --- >>>>>>>>>>> tools/lib/bpf/bpf.c | 10 ++++++++++ >>>>>>>>>>> 1 file changed, 10 insertions(+) >>>>>>>>>>> >>>>>>>>>>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c >>>>>>>>>>> index 961e1b9fc592..cbe9d757c646 100644 >>>>>>>>>>> --- a/tools/lib/bpf/bpf.c >>>>>>>>>>> +++ b/tools/lib/bpf/bpf.c >>>>>>>>>>> @@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct >>>>>>>>>>> bpf_load_program_attr *load_attr, >>>>>>>>>>> if (fd >= 0 || !log_buf || !log_buf_sz) >>>>>>>>>>> return fd; >>>>>>>>>>> >>>>>>>>>>> + if (fd < 0 && errno == E2BIG && load_attr->name) { >>>>>>>>>>> + /* Retry the same syscall, but without the name. >>>>>>>>>>> + * Pre v4.14 kernels don't support prog names. >>>>>>>>>>> + */ >>>>>>>>>> >>>>>>>>>> I'm afraid that will put unnecessary stress on the kernel. >>>>>>>>>> This check needs to be tighter. >>>>>>>>>> Like E2BIG and anything in the log_buf probably means that >>>>>>>>>> E2BIG came from the verifier and nothing to do with prog_name. >>>>>>>>>> Asking kernel to repeat is an unnecessary work. >>>>>>>>>> >>>>>>>>>> In general we need to think beyond this single prog_name field. >>>>>>>>>> There are bunch of other fields in bpf_load_program_xattr() and >>>>>>>>>> older kernels >>>>>>>>>> won't support them. Are we going to zero them out one by one >>>>>>>>>> and retry? I don't think that would be practical. >>>>>>>>> I general, we don't want to zero anything out. However, >>>>>>>>> for this particular problem the rationale is the following: >>>>>>>>> In commit 88cda1c9da02 we started unconditionally setting >>>>>>>>> {prog,map}->name >>>>>>>>> from the 'higher' libbpfc layer which breaks users on the older >>>>>>>>> kernels. >>>>>>>>> >>>>>>>>>> Also libbpf silently ignoring prog_name is not great for debugging. >>>>>>>>>> A warning is needed. >>>>>>>>>> But it cannot be done out of lib/bpf/bpf.c, since it's a set of >>>>>>>>>> syscall >>>>>>>>>> wrappers. >>>>>>>>>> Imo such "old kernel -> lets retry" feature should probably be done >>>>>>>>>> at lib/bpf/libbpf.c level. inside load_program(). >>>>>>>>> For maps bpftools calls bpf_create_map_xattr directly, that's why >>>>>>>>> for maps I did the retry on the lower level (and why for programs I >>>>>>>>> initially >>>>>>>>> thought about doing the same). However, in this case maybe asking >>>>>>>>> user to omit 'name' argument might be a better option. >>>>>>>>> >>>>>>>>> For program names, I agree, we might think about doing it on the >>>>>>>>> higher >>>>>>>>> level (although I'm not sure whether we want to have different API >>>>>>>>> expectations, i.e. bpf_create_map_xattr ignoring the name and >>>>>>>>> bpf_load_program_xattr not ignoring the name). >>>>>>>>> >>>>>>>>> So given that rationale above, what do you think is the best way to >>>>>>>>> move forward? >>>>>>>>> 1. Same patch, but tighten the retry check inside >>>>>>>>> bpf_load_program_xattr ? >>>>>>>>> 2. Move this retry logic into load_program and have different handling >>>>>>>>> for bpf_create_map_xattr vs bpf_load_program_xattr ? >>>>>>>>> 3. Do 2 and move the retry check for maps from bpf_create_map_xattr >>>>>>>>> into bpf_object__create_maps ? >>>>>>>>> >>>>>>>>> (I'm slightly leaning towards #3) >>>>>>>> >>>>>>>> me too. I think it's cleaner for maps to do it in >>>>>>>> bpf_object__create_maps(). >>>>>>>> Originally bpf.c was envisioned to be a thin layer on top of bpf >>>>>>>> syscall. >>>>>>>> Whereas 'smart bits' would go into libbpf.c >>>>>>> >>>>>>> Can't we create in bpf_object__load() a small helper >>>>>>> bpf_object__probe_caps() >>>>>>> which would figure this out _once_ upon start with a few things to >>>>>>> probe for >>>>>>> availability in the underlying kernel for maps and programs? E.g. >>>>>>> programs >>>>>>> it could try to inject a tiny 'r0 = 0; exit' snippet where we figure out >>>>>>> things like prog name support etc. Given underlying kernel doesn't >>>>>>> change, we >>>>>>> would only try this once and it doesn't require fallback every time. >>>>>> >>>>>> +1. great idea! >>>>> Sounds good, let me try to do it. >>>>> >>>>> It sounds more like a recent LPC proposal/idea to have some sys_bpf option >>>>> to query BPF features. This new bpf_object__probe_caps can probably query >>>>> that in the future if we eventually add support for it. >>>>> >>>> >>>> Hi, >>>> >>>> LPC proposal indeed. I've been working on implementing this kind of >>>> probes in bpftool. I don't probe name support for now (but I can >>>> certainly add it), but I detect supported program types, map types, >>>> header functions, and a couple of other parameters. The idea (initially >>>> from Daniel) was to dump "#define" declarations that could later be >>>> included in a header file and used for a BPF project (or alternatively, >>>> JSON output). >>> Oh, nice, I didn't know someone was already working on it! >>> >>>> I felt like bpftool was maybe a better place to do it, as the set of >>>> probes may grow large (all types, helpers, etc). It might have >>>> consequences on the running system: for example, if I don't raise the >>>> rlimit in bpftool before starting the probes, a good half of them fail >>>> because of consecutive program creation and reclaim delay for locked >>>> memory usage. >>> Should we aim for something like build system feature checks? Where we >>> preserve these probe results between bpftool/libbpf invocations so we >>> don't re-run them again? >>> >>>> So should we start adding probes to libbpf and should I move mine to the >>>> lib as well, or should the one in the v3 of this series be directed to >>>> something like bpftool instead? >>> We need them (well, at least the name checks) for libbpf because that's >>> what we link against and what we use to load the programs, bpftool is >>> less of an issue right now. But my patch was mostly a hackish solution >>> until we get the real feature checks :-) >> >> Hi Stanislav, >> Apologies for the delayed answer, I have been travelling. >> >> I don't know if the probes should be "shared" with libbpf. What I had in >> mind was rather to run them with bpftool and to produce something like a >> header containing the results of the probes. Then this header could be >> included in whatever software wants to manage BPF objects, and that >> software can then decide to call (or not to call) libbpf functions in >> one way or another, depending on the features supported by the system. >> It means you have to recompile your program for the target system, or at >> least to load probe results as a configuration file somehow. > > Will this work for what Stanislav is trying to solve? The 'whatever > software wants to manage BPF objects' (WS) has no option to choose > whether the map and program names are passed to kernel. The libbpf API > used by WS, in this case, is bpf_object__load(), which internally > decides when names are passed.
Hm that's correct. I had in mind something like bpf_load_program_xattr(), but it's true that with bpf_object__load() you do not get to choose if you pass the name or not. Well I suppose we'll keep some probes in libbpf, bpftool might not be a solution in that case indeed. Thanks, Quentin