------- Comment From mahesh.salgaon...@in.ibm.com 2017-08-17 04:48 EDT------- (In reply to comment #30) > This bug is awaiting verification that the kernel in -proposed solves the > problem. Please test the kernel and update this bug with the results. If the > problem is solved, change the tag 'verification-needed-zesty' to > 'verification-done-zesty'. If the problem still exists, change the tag > 'verification-needed-zesty' to 'verification-failed-zesty'. > > If verification is not done by 5 working days from today, this fix will be > dropped from the source code, and this bug will be closed. > > See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to > enable and use -proposed. Thank you!
Tested with zesty-proposed kernel (Selected version '4.10.0.33.33' (Ubuntu:17.04/zesty-proposed [ppc64el]) for 'linux-generic') and verified that the problem is solved. Below is the test o/p results: -----------------------------------Test Results------------------------------------------------- p8wookie login: Ubuntu 17.04 p8wookie hvc0 p8wookie login: root Password: Last login: Thu Aug 17 03:30:30 CDT 2017 from 9.84.221.193 on pts/0 Welcome to Ubuntu 17.04 (GNU/Linux 4.10.0-33-generic ppc64le) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/advantage root@p8wookie:~# root@p8wookie:~# uname -a Linux p8wookie 4.10.0-33-generic #37-Ubuntu SMP Fri Aug 11 10:53:58 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux root@p8wookie:~# root@p8wookie:~# dmesg -C root@p8wookie:~# dmesg root@p8wookie:~# cd skiboot/external/xscom-utils/ root@p8wookie:~/skiboot/external/xscom-utils# for i in /sys/devices/system/cpu/cpu*/cpuidle/state1/disable; do echo 1 > $i; done root@p8wookie:~/skiboot/external/xscom-utils# for i in /sys/devices/system/cpu/cpu*/cpuidle/state2/disable; do echo 1 > $i; done root@p8wookie:~/skiboot/external/xscom-utils# ./putscom -c 00000000 1c013100 0000000000100000 0000000000100008 root@p8wookie:~/skiboot/external/xscom-utils# root@p8wookie:~/skiboot/external/xscom-utils# dmesg [ 419.675063] Harmless Hypervisor Maintenance interrupt [Recovered] [ 419.675066] Error detail: Processor Recovery done [ 419.675069] HMER: 2040000000000000 [ 419.675072] Harmless Hypervisor Maintenance interrupt [Recovered] [ 419.675074] Error detail: Processor Recovery done [ 419.675077] HMER: 2040000000000000 [ 419.675080] Harmless Hypervisor Maintenance interrupt [Recovered] [ 419.675082] Error detail: Processor Recovery done [ 419.675084] HMER: 2040000000000000 [ 419.675087] Harmless Hypervisor Maintenance interrupt [Recovered] [ 419.675089] Error detail: Processor Recovery done [ 419.675092] HMER: 2040000000000000 [ 419.675094] Harmless Hypervisor Maintenance interrupt [Recovered] [ 419.675096] Error detail: Processor Recovery done [ 419.675098] HMER: 2040000000000000 [ 419.675101] Harmless Hypervisor Maintenance interrupt [Recovered] [ 419.675103] Error detail: Processor Recovery done [ 419.675105] HMER: 2040000000000000 [ 419.675107] Harmless Hypervisor Maintenance interrupt [Recovered] [ 419.675109] Error detail: Processor Recovery done [ 419.675111] HMER: 2040000000000000 [ 419.675113] Harmless Hypervisor Maintenance interrupt [Recovered] [ 419.675115] Error detail: Processor Recovery done [ 419.675116] HMER: 2040000000000000 root@p8wookie:~/skiboot/external/xscom-utils# root@p8wookie:~/skiboot/external/xscom-utils# dmesg -C root@p8wookie:~/skiboot/external/xscom-utils# ./putscom -c 00000000 1c013281 0003080000000000 0000080000000000 root@p8wookie:~/skiboot/external/xscom-utils# dmesg [ 896.268510] Severe Hypervisor Maintenance interrupt [Recovered] [ 896.268513] Error detail: Timer facility experienced an error [ 896.268515] HMER: 0840000000000000 [ 896.268518] TFMR: 4d12000980a54000 root@p8wookie:~/skiboot/external/xscom-utils# dmesg -C root@p8wookie:~/skiboot/external/xscom-utils# ./putscom -c 00000001 14013281 0003080000000000 0000080000000000 root@p8wookie:~/skiboot/external/xscom-utils# dmesg [ 940.755458] Severe Hypervisor Maintenance interrupt [Recovered] [ 940.755463] Error detail: Timer facility experienced an error [ 940.755466] HMER: 0840000000000000 [ 940.755468] TFMR: 4d12000980a44000 root@p8wookie:~/skiboot/external/xscom-utils# -----------------------------------Test Results------------------------------------------------- Please change the tag 'verification-needed-zesty' to 'verification-done- zesty'. ** Tags removed: verification-needed-zesty ** Tags added: verification-done-zesty -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1684054 Title: [LTCTest][Opal][FW860.20] HMI recoverable errors failed to recover and system goes to dump state. Status in The Ubuntu-power-systems project: In Progress Status in linux package in Ubuntu: Fix Released Status in linux source package in Zesty: Fix Committed Bug description: == Comment: #0 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-17 06:08:41 == ---Problem Description--- HMI Recoverable error injection tests leads to system checkstop followed by system dump with ubuntu 17.04 os and kernel 4.10.0-19-generic ppc64le Contact Information = ppaid...@in.ibm.com ---uname output--- #21-Ubuntu SMP Thu Apr 6 17:03:05 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux Machine Type = PowerNV 8284-22A ---System Hang--- System is in dumping state. after dump finishes system will IPL to OS again. ---Debugger--- A debugger is not configured == Comment: #3 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-17 06:12:51 == # uname -a #21-Ubuntu SMP Thu Apr 6 17:03:05 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux # cat /etc/os-release NAME="Ubuntu" VERSION="17.04 (Zesty Zapus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 17.04" VERSION_ID="17.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=zesty UBUNTU_CODENAME=zesty root@p8wookie:~# == Comment: #4 - Kevin W. Rudd <ru...@us.ibm.com> - 2017-04-17 11:10:22 == == Comment: #5 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 2017-04-17 13:34:03 == it looks like below commit is a culprit: ======================================= commit 2337d207288f163e10bd8d4d7eeb0c1c75046a0c Author: Nicholas Piggin <npig...@gmail.com> Date: Fri Jan 27 14:24:33 2017 +1000 powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts The branch from hmi_exception_early to hmi_exception_realmode must use a "relocatable-style" branch, because it is branching from unrelocated exception code to beyond __end_interrupts. Signed-off-by: Nicholas Piggin <npig...@gmail.com> Signed-off-by: Michael Ellerman <m...@ellerman.id.au> ======================================= With the above commit changes now hmi_exception_realmode() is called using bctrl which ends up messing up TOC (r2) value and further access using new r2 results into unpredictable behaviour. ---------------------------------------- c000000000025f50 <hmi_exception_realmode>: c000000000025f50: 3a 01 4c 3c addis r2,r12,314 c000000000025f54: b0 01 42 38 addi r2,r2,432 c000000000025f58: a6 02 08 7c mflr r0 ----------------------------------------- With above commit the hmi_exception_early() code jumps to c000000000025f50 (hmi_exception_realmode+0x0) which then sets up new value for r2. If we revert above commit the code jumps to c000000000025f58 (hmi_exception_realmode+0x8) and hmi handler works fine. After reverting above patch I don't see this issue anymore. I have rebuilt the ubuntu kernel after reverting above patch and you can find the kernel rpm at: Can you please retry your tests with above kernel and see if issue still persists. == Comment: #6 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 2017-04-17 23:02:31 == Spoke to Michael Ellerman this morning. He helped me to identify the root cause and a fix patch beow: diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 857bf7c5b946..7cfeb8768587 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -982,7 +982,7 @@ TRAMP_REAL_BEGIN(hmi_exception_early) EXCEPTION_PROLOG_COMMON_2(PACA_EXGEN) EXCEPTION_PROLOG_COMMON_3(0xe60) addi r3,r1,STACK_FRAME_OVERHEAD - BRANCH_LINK_TO_FAR(r4, hmi_exception_realmode) + BRANCH_LINK_TO_FAR(r12, hmi_exception_realmode) /* Windup the stack. */ /* Move original HSRR0 and HSRR1 into the respective regs */ ld r9,_MSR(r1) == Comment: #7 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-18 01:52:03 == == Comment: #8 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-18 01:53:57 == Hi Mahesh Tested all the HMI Recoverable errors on the below patched kernel, attached the corresponding executing logs. All tests are working fine. #21 SMP Mon Apr 17 12:58:30 EDT 2017 ppc64le ppc64le ppc64le GNU/Linux Thanks == Comment: #9 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 2017-04-18 06:07:56 == (In reply to comment #8) > Hi Mahesh > Tested all the HMI Recoverable errors on the below patched kernel, attached > the corresponding executing logs. All tests are working fine. > > Linux p8wookie 4.10.0-19.bz153487-generic #21 SMP Mon Apr 17 12:58:30 EDT > 2017 ppc64le ppc64le ppc64le GNU/Linux > > > Thanks Thanks. Michael has posted fix for this upstream. http://patchwork.ozlabs.org/patch/751647/ I will rebuild the new ubuntu kernel with above patch. == Comment: #12 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-18 09:27:59 == (In reply to comment #11) > > > > https://git.kernel.org/powerpc/c/be5c5e843c4afa1c8397cb740b6032 > > I have built new kernel with above patch and you can find it below path > >:/home2/mahesh/u2/bz153487v2/linux-image-4.10.0-19.bz153487v2- > generic_4.10.0-19.bz153487v2.21_ppc64el.deb Tested with this new patched kernel, all tests are working fine. Linux p8wookie 4.10.0-19.bz153487v2-generic #21 SMP Tue Apr 18 07:43:13 EDT 2017 ppc64le ppc64le ppc64le GNU/Linux Will attach is full the execution logs here. == Comment: #13 - Pridhiviraj Paidipeddi <ppaid...@in.ibm.com> - 2017-04-18 09:29:43 == == Comment: #14 - MAHESH J. SALGAONKAR <mahesh.salgaon...@in.ibm.com> - 2017-04-19 03:52:18 == (In reply to comment #12) > (In reply to comment #11) > > > > > > https://git.kernel.org/powerpc/c/be5c5e843c4afa1c8397cb740b6032 > > Thanks for testing. We need to mirror this to ubuntu for fix patch inclusion > > Linux p8wookie 4.10.0-19.bz153487v2-generic #21 SMP Tue Apr 18 07:43:13 EDT > 2017 ppc64le ppc64le ppc64le GNU/Linux > > Will attach is full the execution logs here. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1684054/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp