Hi Cascardo,
Thanks for reporting this.
Thadeu Lima de Souza Cascardo wrote:
Hi, there.
We have been investigating an issue we have observed on POWER8 POWERNV systems.
When running the kernel selftests reuseport_bpf_cpu after a CPU hotplug, we see
crashes, in different forms. [1]
Just to re-confirm: you are only seeing this on P8 powernv, and not in a
P8 guest/LPAR? I haven't been able to reproduce this on a firestone --
can you share more details about your power8 machine?
Also, do you only see this with ubuntu kernels, or are you also able to
reproduce this with the upstream tree?
I managed to get xmon on that trap, and did some debugging. [2] I tried to dump
the BPF JIT code, and it looks different when dumped from CPU#0 and CPU#0x9f
(the one that was hotplugged, offlined, then onlined).
Next time you reproduce this, can you try dumping the SLBs for the cpus
(command 'u' in xmon)?
Here is my partial analysis [3]. Basically, the BPF JIT fills a page with
invalid instructions (traps, in ppc64 case), and puts the BPF program in a
random offset of the page. In the case of the hotplugged CPU, which was the one
that compiled the program, the page had the expected contents (BPF program
started at the offset used to run the program). On the other CPU (in many
cases, CPU #0), the same memory address/page had different contents, with the
program starting at a different offset.
From [3], I think fp->aux->jit_data can be NULL if there are subprogs.
But, I find it interesting that you don't always see the correct
bpf_func, as reported in comment #25. Can you also try dumping the full
bpf_prog structure (prog/fp) from xmon?
Is this a case of a bug in the micro-architecture or the firmware when
doing the hotplug? Can someone chime in?
It's possible that something is going wrong when offlining the cpu. Can
you try booting the kernel with 'powersave=off' and see if the problem
goes away?
Notice that we can't reproduce the same issue on a POWER9 system.
Thanks.
Cascardo.
[1] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076
[2] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076/comments/29
[3] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076/comments/30
- Naveen