On 2015/5/5 11:02, Alexei Starovoitov wrote: > On 5/2/15 12:19 AM, Wang Nan wrote: >> >> I'd like to do following works in the next version (based on my experience >> and feedbacks): >> >> 1. Safely clean up kprobe points after unloading; >> >> 2. Add subcommand space to 'perf bpf'. Current staff should be reside in >> 'perf bpf load'; >> >> 3. Extract eBPF ELF walking and collecting work to a separated library to >> help others. > > that's a good list. > > The feedback for existing patches: > patch 18 - since we're creating a generic library for bpf elf > loading it would great to do the following: > first try to load with > attr.log_buf = NULL; > attr.log_level = 0; > then only if it fails, allocate a buffer and repeat with log_level = 1. > The reason is that it's better to have fast program loading by default > without any verbosity emitted by verifier. >
Will do. > patch 19 - I think it's unnecessary. > verifier already dumps it. so this '-v' flag can be translated into > verbose loading. > There is also .s output from llvm for those interested in bpf asm > instructions. > That's great. Could you please append the description of 'llvm -s' into your README or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to add it into perf... >> My collage He Kuang is working on variable accessing. Probing inside >> function body >> and accessing its local variable will be supported like this: >> >> SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara" >> int prog(struct pt_regs *ctx, unsigned long vara) { >> // vara is the value of localvara of function func_name >> } > > that would be great. I'm not sure though how you can achieve that > without changing C front-end ? It's not very difficult. He is trying to generate the loader of vara as prologue, then paste the prologue and the main eBPF program together. >From the viewpoint of kernel bpf verifier, there is only one param (ctx); the prologue program fetches the value of vara then put it into a propoer register, then main program work. Another possible solution is to change the protocol between kprobe and eBPF program, makes kprobes calls fetchers and passes them to eBPF program as a second param (group all varx together). A prologue may still need in this case to load each param into correct register. > This type of feature is exactly the reason why we're trying to write > our front-end. > In general there are two ways to achieve 'restricted C' language: > - start from clang and chop all features that are not supported. > I believe Jovi already tried to do that and it became very difficult. > - start from simple front-end with minimal C and add all things one by > one. That's what we're trying to do. So far we have most of normal > syntax. The problem with our approach is that we cannot easily do > #include of existing .h files. We're working on that. > It's too experimental still. May be will be drop it and go back to > first approach. > > The reason for extending front-end is your example above, where > the user would want to write: > int prog(struct pt_regs *ctx, unsigned long vara) { > // use 'vara' > but generated BPF should have only one 'ctx' pointer, since that's > the only thing that verifier will accept. bpf/core and JITs expect > only one argument, etc. > So this func definition + 'vara' access can be compiled as ctx->si > (if vara is actually in register) or > bpf_probe_read(ctx->bp + magic_offset_from_debug_info) > (if vara is on stack) > or it can also be done via store_trace_args() but that will be slower > and requires hacking kernel, whereas ctx->... style is pure userspace. > Lot's of things to brainstorm. So please share your progress soon. > >> And I want to discuss with you and others about: >> >> 1. How to make eBPF output its tracing and aggregation results to perf? > > well, the output of bpf program is a data stored in maps. Each program > needs a corresponding user space reader/printer/sorter of this data. > Like tracex2 prints this data as histogram and tracex3 prints it as > heatmap. We can standardize few things like this, but ideally we > keep it up to user. So that user can write single file that consists > of functions that are loaded as bpf into kernel and other functions > that are executed in user space. llvm can jit first set to bpf and > second set to x86. That's distant future though. > So far samples/bpf/ style of kern.c+user.c worked quite well. > Well, looks like in your design the usage of BPF programs are some aggration results. In my side, I want they also ack as trace filters. Could you please consider the following problem? We find there are serval __lock_page() calls last very long time. We are going to find corresponding __unlock_page() so we can know what blocks them. We want to insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program on the entry of __unlock_page(), so we can compute the interval between page locking and unlocking. If time is longer than a threshold, let __unlock_page() trigger a perf sampling so we get its call stack. In this case, eBPF program acts as a trace filter. Thank you. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/