On 07/04/2018 09:33 AM, Peter Robinson wrote: > On Tue, Jun 26, 2018 at 1:52 PM, Daniel Borkmann <dan...@iogearbox.net> wrote: >> On 06/26/2018 02:23 PM, Peter Robinson wrote: >>>>>> On 06/24/2018 11:24 AM, Peter Robinson wrote: >>>>>>>>> I'm seeing this netlink/sk_filter_trim_cap crash on ARMv7 across quite >>>>>>>>> a few ARMv7 platforms on Fedora with 4.18rc1. I've tested RPi2/RPi3 >>>>>>>>> (doesn't happen on aarch64), AllWinner H3, BeagleBone and a few >>>>>>>>> others, both LPAE/normal kernels. >>>>>> >>>>>> So this is arm32 right? >>>>> >>>>> Correct. >>>>> >>>>>>>>> I'm a bit out of my depth in this part of the kernel but I'm wondering >>>>>>>>> if it's known, I couldn't find anything that looked obvious on a few >>>>>>>>> mailing lists. >>>>>>>>> >>>>>>>>> Peter >>>>>>>> >>>>>>>> Hi Peter >>>>>>>> >>>>>>>> Could you provide symbolic information ? >>>>>>> >>>>>>> I passed in through scripts/decode_stacktrace.sh is that what you were >>>>>>> after: >>>>>>> >>>>>>> [ 8.673880] Internal error: Oops: a06 [#10] SMP ARM >>>>>>> [ 8.673949] ---[ end trace 049df4786ea3140a ]--- >>>>>>> [ 8.678754] Modules linked in: >>>>>>> [ 8.678766] CPU: 1 PID: 206 Comm: systemd-udevd Tainted: G D >>>>>>> 4.18.0-0.rc1.git0.1.fc29.armv7hl+lpae #1 >>>>>>> [ 8.678769] Hardware name: Allwinner sun8i Family >>>>>>> [ 8.678781] PC is at sk_filter_trim_cap () >>>>>>> [ 8.678790] LR is at (null) >>>>>>> [ 8.709463] pc : lr : psr: 60000013 () >>>>>>> [ 8.715722] sp : c996bd60 ip : 00000000 fp : 00000000 >>>>>>> [ 8.720939] r10: ee79dc00 r9 : c12c9f80 r8 : 00000000 >>>>>>> [ 8.726157] r7 : 00000000 r6 : 00000001 r5 : f1648000 r4 : >>>>>>> 00000000 >>>>>>> [ 8.732674] r3 : 00000007 r2 : 00000000 r1 : 00000000 r0 : >>>>>>> 00000000 >>>>>>> [ 8.739193] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM >>>>>>> Segment user >>>>>>> [ 8.746318] Control: 30c5387d Table: 6e7bc880 DAC: ffe75ece >>>>>>> [ 8.752055] Process systemd-udevd (pid: 206, stack limit = >>>>>>> 0x(ptrval)) >>>>>>> [ 8.758574] Stack: (0xc996bd60 to 0xc996c000) >>>>>> >>>>>> Do you have BPF JIT enabled or disabled? Does it happen with disabled? >>>>> >>>>> Enabled, I can test with it disabled, BPF configs bits are: >>>>> CONFIG_BPF_EVENTS=y >>>>> # CONFIG_BPFILTER is not set >>>>> CONFIG_BPF_JIT_ALWAYS_ON=y >>>>> CONFIG_BPF_JIT=y >>>>> CONFIG_BPF_STREAM_PARSER=y >>>>> CONFIG_BPF_SYSCALL=y >>>>> CONFIG_BPF=y >>>>> CONFIG_CGROUP_BPF=y >>>>> CONFIG_HAVE_EBPF_JIT=y >>>>> CONFIG_IPV6_SEG6_BPF=y >>>>> CONFIG_LWTUNNEL_BPF=y >>>>> # CONFIG_NBPFAXI_DMA is not set >>>>> CONFIG_NET_ACT_BPF=m >>>>> CONFIG_NET_CLS_BPF=m >>>>> CONFIG_NETFILTER_XT_MATCH_BPF=m >>>>> # CONFIG_TEST_BPF is not set >>>>> >>>>>> I can see one bug, but your stack trace seems unrelated. >>>>>> >>>>>> Anyway, could you try with this? >>>>> >>>>> Build in process. >>>>> >>>>>> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c >>>>>> index 6e8b716..f6a62ae 100644 >>>>>> --- a/arch/arm/net/bpf_jit_32.c >>>>>> +++ b/arch/arm/net/bpf_jit_32.c >>>>>> @@ -1844,7 +1844,7 @@ struct bpf_prog *bpf_int_jit_compile(struct >>>>>> bpf_prog *prog) >>>>>> /* there are 2 passes here */ >>>>>> bpf_jit_dump(prog->len, image_size, 2, ctx.target); >>>>>> >>>>>> - set_memory_ro((unsigned long)header, header->pages); >>>>>> + bpf_jit_binary_lock_ro(header); >>>>>> prog->bpf_func = (void *)ctx.target; >>>>>> prog->jited = 1; >>>>>> prog->jited_len = image_size; >>>> >>>> So with that and the other fix there was no improvement, with those >>>> and the BPF JIT disabled it works, I'm not sure if the two patches >>>> have any effect with the JIT disabled though. >>>> >>>> Will look at the other patches shortly, there's been some other issue >>>> introduced between rc1 and rc2 which I have to work out before I can >>>> test those though. >>> >>> Quick update, with linus's head as of yesterday, basically rc2 plus >>> davem's network fixes it works if the JIT is disabled IE: >>> # CONFIG_BPF_JIT_ALWAYS_ON is not set >>> # CONFIG_BPF_JIT is not set >>> >>> If I enable it the boot breaks even worse than the errors above in >>> that I get no console output at all, even with earlycon, so we've gone >>> backwards since rc1 somehow. >>> >>> I'll try the above two reverted unless you have any other suggestions. >> >> Ok, thanks, lets do that! >> >> I'm still working on fixes meanwhile, should have something by end of day. > > Sorry for the delay on this from my end. I noticed there was some bpf > bits land in the last net fixes pull request landed Monday so I built > a kernel with the JIT reenabled. It seems it's improved in that the > completely dead no output boot has gone but the original problem that > arrived in the merge window still persists:
Okay, thanks a lot! And on top of that tree could you try with the below applied to check whether it fixes the issue? diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c index f6a62ae..45e6b49 100644 --- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -234,11 +234,11 @@ static void jit_fill_hole(void *area, unsigned int size) #define SCRATCH_SIZE 80 /* total stack size used in JITed code */ -#define _STACK_SIZE (ctx->prog->aux->stack_depth + SCRATCH_SIZE) +#define _STACK_SIZE (ctx->prog->aux->stack_depth + SCRATCH_SIZE + 4) #define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT) /* Get the offset of eBPF REGISTERs stored on scratch space. */ -#define STACK_VAR(off) (STACK_SIZE - off) +#define STACK_VAR(off) (STACK_SIZE - 4 - off) #if __LINUX_ARM_ARCH__ < 7