Em Wed, Oct 17, 2018 at 09:11:40AM -0300, Arnaldo Carvalho de Melo escreveu: > Adding Alexey, Jiri and Namhyung as they worked/are working on > multithreading 'perf record'. > > Em Tue, Oct 16, 2018 at 11:43:11PM -0700, Song Liu escreveu: > > On Tue, Oct 16, 2018 at 4:43 PM David Ahern <dsah...@gmail.com> wrote: > > > On 10/15/18 4:33 PM, Song Liu wrote: > > > > I am working with Alexei on the idea of fetching BPF program > > > > information via > > > > BPF_OBJ_GET_INFO_BY_FD cmd. I added PERF_RECORD_BPF_EVENT > > > > to perf_event_type, and dumped these events to perf event ring buffer. > > > > > I found that perf will not process event until the end of perf-record: > > > > > root@virt-test:~# ~/perf record -ag -- sleep 10 > > > > ...... 10 seconds later > > > > [ perf record: Woken up 34 times to write data ] > > > > machine__process_bpf_event: prog_id 6 loaded > > > > machine__process_bpf_event: prog_id 6 unloaded > > > > [ perf record: Captured and wrote 9.337 MB perf.data (93178 samples) ] > > > > > In this example, the bpf program was loaded and then unloaded in > > > > another terminal. When machine__process_bpf_event() processes > > > > the load event, the bpf program is already unloaded. Therefore, > > > > machine__process_bpf_event() will not be able to get information > > > > about the program via BPF_OBJ_GET_INFO_BY_FD cmd. > > > > > To solve this problem, we will need to run BPF_OBJ_GET_INFO_BY_FD > > > > as soon as perf get the event from kernel. I looked around the perf > > > > code for a while. But I haven't found a good example where some > > > > events are processed before the end of perf-record. Could you > > > > please help me with this? > > > > perf record does not process events as they are generated. Its sole job > > > is pushing data from the maps to a file as fast as possible meaning in > > > bulk based on current read and write locations. > > > > Adding code to process events will add significant overhead to the > > > record command and will not really solve your race problem. > > > I agree that processing events while recording has significant overhead. > > In this case, perf user space need to know details about the the jited BPF > > program. It is impossible to pass all these details to user space through > > the relatively stable ring_buffer API. Therefore, some processing of the > > data is necessary (get bpf prog_id from ring buffer, and then fetch program > > details via BPF_OBJ_GET_INFO_BY_FD. > > > I have some idea on processing important data with relatively low overhead. > > Let me try implement it. > > Well, you could have a separate thread processing just those kinds of > events, associate it with a dummy event where you only ask for > PERF_RECORD_BPF_EVENTs. > > Here is how to setup the PERF_TYPE_SOFTWARE/PERF_COUNT_SW_DUMMY > perf_event_attr: > > [root@seventh ~]# perf record -vv -e dummy sleep 01 > ------------------------------------------------------------ > perf_event_attr: > type 1 > size 112 > config 0x9 > { sample_period, sample_freq } 4000 > sample_type IP|TID|TIME|PERIOD > disabled 1 > inherit 1
These you would have disabled, no need for PERF_RECORD_{MMAP*,COMM,FORK,EXIT} just PERF_RECORD_BPF_EVENT > mmap 1 > comm 1 > task 1 > mmap2 1 > comm_exec 1 > freq 1 > enable_on_exec 1 > sample_id_all 1 > exclude_guest 1 > ------------------------------------------------------------ > sys_perf_event_open: pid 12046 cpu 0 group_fd -1 flags 0x8 = 4 > sys_perf_event_open: pid 12046 cpu 1 group_fd -1 flags 0x8 = 5 > sys_perf_event_open: pid 12046 cpu 2 group_fd -1 flags 0x8 = 6 > sys_perf_event_open: pid 12046 cpu 3 group_fd -1 flags 0x8 = 8 > mmap size 528384B > perf event ring buffer mmapped per cpu > Synthesizing TSC conversion information > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.014 MB perf.data ] > [root@seventh ~]# > > [root@seventh ~]# perf evlist -v > dummy: type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 4000, > sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, > freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, > mmap2: 1, comm_exec: 1 > [root@seventh ~]# > > There is work ongoing in dumping one file per cpu and then, at post > processing time merging all those files to get ordering, so one more > file, for these VIP events, that require per-event processing would be > ordered at that time with all the other per-cpu files. > > - Arnaldo