On Wed, 4 Nov 2015 14:54:51 +0100 Laurent Vivier <lviv...@redhat.com> wrote:
> > > On 04/11/2015 13:34, Hari Bathini wrote: > > On 10/16/2015 12:30 AM, Laurent Vivier wrote: > >> On kexec, all secondary offline CPUs are onlined before > >> starting the new kernel, this is not done in the case of kdump. > >> > >> If kdump is configured and a kernel crash occurs whereas > >> some secondaries CPUs are offline (SMT=off), > >> the new kernel is not able to start them and displays some > >> "Processor X is stuck.". > >> > >> Starting with POWER8, subcore logic relies on all threads of > >> core being booted. So, on startup kernel tries to start all > >> threads, and asks OPAL (or RTAS) to start all CPUs (including > >> threads). If a CPU has been offlined by the previous kernel, > >> it has not been returned to OPAL, and thus OPAL cannot restart > >> it: this CPU has been lost... > >> > >> Signed-off-by: Laurent Vivier<lviv...@redhat.com> > > > > > > Hi Laurent, > > Hi Hari, > > > Sorry for jumping too late into this. > > better late than never :) > > > Are you seeing this issue even with the below patches: > > > > pseries: > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c1caae3de46a072d0855729aed6e793e536a4a55 Unfortunately, this is unlikely to be relevant - this fixes a failure while setting up the kexec. The problem we see occurs once we've booted the second kernel and it's attempting to bring up secondary CPUs. > > opal/powernv: > > https://github.com/open-power/skiboot/commit/9ee56b5 > > Very interesting. Is there a way to have a firmware with the fix ? From Laurent's analysis of the crash, I don't think this will be relevant either, but I'm not sure. It would be very interesting to know which (if any) released firmwares include this patch so we can test it. -- David Gibson <dgib...@redhat.com> Senior Software Engineer, Virtualization, Red Hat
pgpz2vhOEpwZc.pgp
Description: OpenPGP digital signature