On Mon, Sep 1, 2025 at 12:57 AM Namhyung Kim <namhy...@kernel.org> wrote: > > Hello, > > On Sun, Aug 31, 2025 at 09:17:34PM -0700, Yonghong Song wrote: > > > > > > On 8/29/25 10:19 PM, Ian Rogers wrote: > > > On Thu, Jun 5, 2025 at 2:00 PM <patchwork-bot+netdev...@kernel.org> wrote: > > > > Hello: > > > > > > > > This series was applied to bpf/bpf-next.git (master) > > > > by Andrii Nakryiko <and...@kernel.org>: > > > > > > > > On Tue, 3 Jun 2025 13:37:00 -0700 you wrote: > > > > > The BTF dumper code currently displays arrays of characters as just > > > > > that - > > > > > arrays, with each character formatted individually. Sometimes this is > > > > > what > > > > > makes sense, but it's nice to be able to treat that array as a string. > > > > > > > > > > This change adds a special case to the btf_dump functionality to allow > > > > > 0-terminated arrays of single-byte integer values to be printed as > > > > > character strings. Characters for which isprint() returns false are > > > > > printed as hex-escaped values. This is enabled when the new > > > > > ".emit_strings" > > > > > is set to 1 in the btf_dump_type_data_opts structure. > > > > > > > > > > [...] > > > > Here is the summary with links: > > > > - [v3,1/2] libbpf: add support for printing BTF character arrays as > > > > strings > > > > https://git.kernel.org/bpf/bpf-next/c/87c9c79a02b4 > > > > - [v3,2/2] Tests for the ".emit_strings" functionality in the BTF > > > > dumper. > > > > https://git.kernel.org/bpf/bpf-next/c/a570f386f3d1 > > > > > > > > You are awesome, thank you! > > > I believe this patch is responsible for segvs occurring in v6.17 in > > > various perf tests when the perf tests run in parallel. There's lots > > > > Could you share the command line to reproduce this failure? > > This will help debugging. Thanks! > > My reproducer is below: > > terminal 1: run perf trace in a loop. > > $ while true; do sudo ./perf trace true; done > > terminal 2: run perf record in a loop until hit the segfault. > > $ while true; do sudo ./perf record true || break; done > ... > perf: Segmentation fault > #0 0x560b2db790e4 in dump_stack debug.c:366 > #1 0x560b2db7915a in sighandler_dump_stack debug.c:378 > #2 0x560b2d973b1b in sigsegv_handler builtin-record.c:722 > #3 0x7f975f249df0 in __restore_rt libc_sigaction.c:0 > #4 0x560b2dca1ee6 in snprintf_hex bpf-event.c:39 > #5 0x560b2dca2306 in synthesize_bpf_prog_name bpf-event.c:144 > #6 0x560b2dca2d92 in bpf_metadata_create bpf-event.c:401 > #7 0x560b2dca3838 in perf_event__synthesize_one_bpf_prog bpf-event.c:673 > #8 0x560b2dca3dd5 in perf_event__synthesize_bpf_events bpf-event.c:798 > #9 0x560b2d977ef5 in record__synthesize builtin-record.c:2131 > #10 0x560b2d9797c1 in __cmd_record builtin-record.c:2581 > #11 0x560b2d97db30 in cmd_record builtin-record.c:4376 > #12 0x560b2da0672e in run_builtin perf.c:349 > #13 0x560b2da069c6 in handle_internal_command perf.c:401 > #14 0x560b2da06b1f in run_argv perf.c:448 > #15 0x560b2da06e68 in main perf.c:555 > #16 0x7f975f233ca8 in __libc_start_call_main libc_start_call_main.h:74 > #17 0x7f975f233d65 in __libc_start_main_alias_2 libc-start.c:128 > #18 0x560b2d959b11 in _start perf[4cb11] > > > I manually ran it with gdb to get some more hints. > > Thread 1 "perf" received signal SIGSEGV, Segmentation fault. > 0x00005555558e8ee6 in snprintf_hex (buf=0x5555562c1d79 "", size=503, > data=0x40 <error: Cannot access memory at address 0x40>, len=8) > at util/bpf-event.c:39 > 39 ret += snprintf(buf + ret, size - ret, "%02x", > data[i]); > > The data is bpf_prog_info->prog_tags and it's called from > synthesize_bpf_prog_name(). > > (gdb) bt > #0 0x00005555558e8ee6 in snprintf_hex (buf=0x5555562c1d79 "", size=503, > data=0x40 <error: Cannot access memory at address 0x40>, > len=8) at util/bpf-event.c:39 > #1 0x00005555558e9306 in synthesize_bpf_prog_name (buf=0x5555562c1d70 > "bpf_prog_", size=512, info=0x55555665e400, btf=0x5555562c5630, > sub_id=0) at util/bpf-event.c:144 > #2 0x00005555558e9db5 in bpf_metadata_create (info=0x55555665e400) at > util/bpf-event.c:403 > #3 0x00005555558ea85b in perf_event__synthesize_one_bpf_prog > (session=0x555556178510, > process=0x5555555ba7ab <process_synthesized_event>, > machine=0x555556178728, fd=25, event=0x5555561b73a0, > opts=0x5555560d33a8 <record+328>) at util/bpf-event.c:674 > #4 0x00005555558eadf8 in perf_event__synthesize_bpf_events > (session=0x555556178510, > process=0x5555555ba7ab <process_synthesized_event>, > machine=0x555556178728, opts=0x5555560d33a8 <record+328>) > at util/bpf-event.c:799 > #5 0x00005555555beef5 in record__synthesize (rec=0x5555560d3260 <record>, > tail=false) at builtin-record.c:2131 > #6 0x00005555555c07c1 in __cmd_record (rec=0x5555560d3260 <record>, > argc=1, argv=0x7fffffffe2e0) at builtin-record.c:2581 > #7 0x00005555555c4b30 in cmd_record (argc=1, argv=0x7fffffffe2e0) at > builtin-record.c:4376 > #8 0x000055555564d72e in run_builtin (p=0x5555560d63c0 <commands+288>, > argc=6, argv=0x7fffffffe2e0) at perf.c:349 > #9 0x000055555564d9c6 in handle_internal_command (argc=6, > argv=0x7fffffffe2e0) at perf.c:401 > #10 0x000055555564db1f in run_argv (argcp=0x7fffffffe0dc, > argv=0x7fffffffe0d0) at perf.c:445 > #11 0x000055555564de68 in main (argc=6, argv=0x7fffffffe2e0) at perf.c:553 > > I seems bpf_prog_info is broken for some reason. > > (gdb) up > #1 0x00005555558e9306 in synthesize_bpf_prog_name (buf=0x5555563305b0 > "bpf_prog_", size=512, info=0x55555664e1d0, btf=0x55555637ad40, > sub_id=0) at util/bpf-event.c:144 > 144 name_len += snprintf_hex(buf + name_len, size - name_len, > > (gdb) p *info > $1 = {type = 68, id = 80, tag = "\\\000\000\000\214\000\000", > jited_prog_len = 152, xlated_prog_len = 164, > jited_prog_insns = 824633721012, xlated_prog_insns = 1185410973912, > load_time = 1305670058276, created_by_uid = 352, > nr_map_ids = 364, map_ids = 1975684956608, name = > "\330\001\000\000\350\001\000\000$\002\000\0004\002\000", ifindex = 576, > gpl_compatible = 0, netns_dev = 2697239462496, netns_ino = 2834678416000, > nr_jited_ksyms = 756, nr_jited_func_lens = 768, > jited_ksyms = 3418793968396, jited_func_lens = 3573412791092, btf_id = > 844, func_info_rec_size = 880, func_info = 3934190044028, > nr_func_info = 928, nr_line_info = 952, line_info = 4294967296988, > jited_line_info = 4449586119680, nr_jited_line_info = 1060, > line_info_rec_size = 1076, jited_line_info_rec_size = 1092, nr_prog_tags > = 1108, prog_tags = 4861902980192, > run_time_ns = 5085241279632, run_cnt = 5257039971512, recursion_misses = > 5360119186644, verified_insns = 1264, > attach_btf_obj_id = 1288, attach_btf_id = 1312}
Thanks Namhyung! So it looks like my "fix" was breaking the tools/perf build feature test for btf_dump_type_data_opts opts.emit_strings and that was avoiding this code. Having terminal 1 run perf trace is going to be loading/unloading a BPF program for system call augmentation. This must be creating the race condition that is causing perf record to segv when it is inspecting the bpf_prog_info. The cast in: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-event.c#n135 ``` static int synthesize_bpf_prog_name(char *buf, int size, struct bpf_prog_info *info, struct btf *btf, u32 sub_id) { u8 (*prog_tags)[BPF_TAG_SIZE] = (void *)(uintptr_t)(info->prog_tags); ``` looks concerning given the bad address comes from: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-event.c#n144 ``` name_len += snprintf_hex(buf + name_len, size - name_len, prog_tags[sub_id], BPF_TAG_SIZE); ``` Checking git blame this code has existed since 2019, commit 7b612e291a5a ("perf tools: Synthesize PERF_RECORD_* for loaded BPF programs"): http://lkml.kernel.org/r/20190117161521.1341602-8-songliubrav...@fb.com it was refactored in 2019 to a single memory allocation commit ("a742258af131 perf bpf: Synthesize bpf events with bpf_program__get_prog_info_linear()") http://lkml.kernel.org/r/20190312053051.2690567-5-songliubrav...@fb.com There seems like a potential race here: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-utils.c#n123 ``` /* step 1: get array dimensions */ err = bpf_obj_get_info_by_fd(fd, &info, &info_len); ``` and later: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf-utils.c#n185 ``` /* step 5: call syscall again to get required arrays */ err = bpf_obj_get_info_by_fd(fd, &info_linear->info, &info_len); ``` There's a verification step that looks to cover issues with the race. I thought making those warnings fatal may help, but no: ``` --- a/tools/perf/util/bpf-utils.c +++ b/tools/perf/util/bpf-utils.c @@ -202,14 +202,20 @@ get_bpf_prog_info_linear(int fd, __u64 arrays) v1 = bpf_prog_info_read_offset_u32(&info, desc->count_offset); v2 = bpf_prog_info_read_offset_u32(&info_linear->info, desc->count_offset); - if (v1 != v2) + if (v1 != v2) { pr_warning("%s: mismatch in element count\n", __func__); + free(info_linear); + return ERR_PTR(-EFAULT); + } v1 = bpf_prog_info_read_offset_u32(&info, desc->size_offset); v2 = bpf_prog_info_read_offset_u32(&info_linear->info, desc->size_offset); - if (v1 != v2) + if (v1 != v2) { pr_warning("%s: mismatch in rec size\n", __func__); + free(info_linear); + return ERR_PTR(-EFAULT); + } } /* step 7: update info_len and data_len */ ``` Fwiw, the address of "data=0x40" in the stack trace makes it looks like an offset has been applied to NULL. 0x40 is 64 which corresponds with "name" info a bpf_prog_info by way of pahole: ``` struct bpf_prog_info { __u32 type; /* 0 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ ... char name[16]; /* 64 16 */ ``` I feel we're relatively close to discovering a proper fix for the issue, if others could lend a hand as I'm not overly familiar with the BPF code. I'm wondering if the second bpf_obj_get_info_by_fd could be filling in offsets relative to NULL rather than returning an error, but this would be (I believe) a kernel issue :-( Thanks, Ian