On Thu, Apr 3, 2014 at 6:21 PM, Jovi Zhangwei <jovi.zhang...@gmail.com> wrote: > Hi Alexei, > > We talked a lot on ktap and ebpf integration in these days, > Now I think we can put into deeply to thinking out some > technical issues in there. > > Firstly, I want to make sure you are support this ktap and > ebpf integration direction, I aware you have ongoing 'bpf filter' > patch set work, which actually overlapping with ktap integration > efforts (IMO the interface should be unified and simple for user, > so I think filter debugfs file is not a good interface), so please let > me know your answer about this.
I think the more choices users have the better. I'll continue with C based filters and you can continue with ktap syntax. That's ok. We can share all kernel pieces. Like: 1. user: C -> llvm -> obj_file kernel: obj_file -> ibpf_verifier -> ibpf execution engine 2. user: ktap language -> ktap_compiler -> obj_file kernel: obj_file -> ibpf_verifier -> ibpf execution engine > If the answer is yes, then we can go through ebpf core > improvement, for example: In the architecture I'm proposing there are three main pieces: - user facing language and userspace compiler into ibpf instruction set stored into object file format like ELF or something simpler - in kernel loader of that object file, license and instruction verifier - ibpf execution engine ibpf execution engine can do all requested features already. It's a matter of loader and verifier to accept them. For example: > - support global variable access from execution engine point of view global or stack variable makes no difference. It's a 'ld rY, word ptr [rX]' instruction. where register rX is pointing to the stack or to some memory location. In my old patch set 'verifier' was proving correctness of stack and table accesses only, since I didn't see the need for global pointers yet, but we can add it. > this is mandatory for dynamic tracing, otherwise, there have > no possible to run a simple script like get function execution > time. I don't understand the correlation between measuring function execution time and global variables. I think userspace should be measuring script execution time. Time sampling within kernel can be done from ibpf program by calling ktime_get(). > - support timer in kernel > The final solution must need to support kernel timer for profiling, > and sampling stack. we can let programs be executed in kernel by timer events, but I think it's a userspace task. If userspace can do it without hurting performance, it probably should do it. For example to do systemtap 'iotop.stp' which looks like: probe vfs.read.return { reads[execname()] += bytes_read } probe vfs.write.return { writes[execname()] += bytes_written } # print top 10 IO processes every 5 seconds probe timer.s(5) { foreach (name in writes) total_io[name] += writes[name] foreach (name in reads) total_io[name] += reads[name] printf ("%16s\t%10s\t%10s\n", "Process", "KB Read", "KB Written") ... } first two probe functions belong in kernel as two independent ibpf programs that access 'reads' and 'writes' tables, and 'timer.s' really belongs in userspace. Every 5 seconds it can access 'reads' and 'write' tables, sort them, print them, etc. The important concept here is a user/kernel shared table. ibpf program can read/write to it from kernel. userspace component can read/write it in parallel. Back in september I posted patches for this style of table access via netlink. Note that ibpf program doesn't own memory. It can call 'bpf_table_update' to store key/value pair into kernel table. Think of it as small in kernel database that ibpf program can store data to and user space can read/write data at the same time. > - support register multi-event in one script I think it should be clear now, that it's already supported. one ibpf program == one function. object file may contain multiple programs that attach to different kprobe events and store key/value pairs into the same or different tables. >From verifier point of view this two programs are disjoint. They cannot call each other. Verifier checks them independently. > - support trace_end if you mean the final print out of everything, then it's a userspace task. Thanks Alexei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/