Re: [PATCH] powerpc/mce: Remove per cpu variables from MCE handlers

2020-12-08 Thread Mahesh Jagannath Salgaonkar
On 12/8/20 4:16 PM, Ganesh wrote: > > On 12/8/20 4:01 PM, Michael Ellerman wrote: >> Ganesh Goudar writes: >>> diff --git a/arch/powerpc/include/asm/paca.h >>> b/arch/powerpc/include/asm/paca.h >>> index 9454d29ff4b4..4769954efa7d 100644 >>> --- a/arch/powerpc/include/asm/paca.h >>> +++ b/arch/po

Re: [PATCH v2] powernv/elog: Fix the race while processing OPAL error log event.

2020-10-05 Thread Mahesh Jagannath Salgaonkar
On 10/5/20 4:17 PM, Ananth N Mavinakayanahalli wrote: > On 10/5/20 9:42 AM, Mahesh Salgaonkar wrote: >> Every error log reported by OPAL is exported to userspace through a sysfs >> interface and notified using kobject_uevent(). The userspace daemon >> (opal_errd) then reads the error log and acknow

Re: [PATCH v2] powernv/elog: Fix the race while processing OPAL error log event.

2020-10-05 Thread Mahesh Jagannath Salgaonkar
On 10/6/20 5:55 AM, Oliver O'Halloran wrote: > On Mon, Oct 5, 2020 at 3:12 PM Mahesh Salgaonkar wrote: >> >> Every error log reported by OPAL is exported to userspace through a sysfs >> interface and notified using kobject_uevent(). The userspace daemon >> (opal_errd) then reads the error log and

Re: Injecting SLB miltihit crashes kernel 5.9.0-rc5

2020-09-16 Thread Mahesh Jagannath Salgaonkar
On 9/15/20 2:13 PM, Michal Suchánek wrote: > Hello, > > Using the SLB mutihit injection test module (which I did not write so I > do not want to post it here) to verify updates on my 5.3 frankernekernel > I found that the kernel crashes with Oops: kernel bad access. > > I tested on latest upstrea

Re: [PATCH v5 02/31] powerpc/fadump: move internal code to a new file

2019-09-04 Thread Mahesh Jagannath Salgaonkar
On 9/3/19 9:35 PM, Hari Bathini wrote: > > > On 03/09/19 4:39 PM, Michael Ellerman wrote: >> Hari Bathini writes: >>> Make way for refactoring platform specific FADump code by moving code >>> that could be referenced from multiple places to fadump-common.c file. >>> >>> Signed-off-by: Hari Bathi

Re: [PATCH v4 11/25] powernv/fadump: register kernel metadata address with opal

2019-08-14 Thread Mahesh Jagannath Salgaonkar
On 8/14/19 12:36 PM, Hari Bathini wrote: > > > On 13/08/19 4:11 PM, Mahesh J Salgaonkar wrote: >> On 2019-07-16 17:03:15 Tue, Hari Bathini wrote: >>> OPAL allows registering address with it in the first kernel and >>> retrieving it after MPIPL. Setup kernel metadata and register its >>> address w

Re: [PATCH v9 6/7] powerpc/mce: Handle UE event for memcpy_mcsafe

2019-08-14 Thread Mahesh Jagannath Salgaonkar
On 8/12/19 2:52 PM, Santosh Sivaraj wrote: > If we take a UE on one of the instructions with a fixup entry, set nip > to continue execution at the fixup entry. Stop processing the event > further or print it. > > Co-developed-by: Reza Arbab > Signed-off-by: Reza Arbab > Cc: Mahesh Salgaonkar >

Re: [PATCH v9 1/7] powerpc/mce: Schedule work from irq_work

2019-08-12 Thread Mahesh Jagannath Salgaonkar
On 8/12/19 2:52 PM, Santosh Sivaraj wrote: > schedule_work() cannot be called from MCE exception context as MCE can > interrupt even in interrupt disabled context. > > fixes: 733e4a4c ("powerpc/mce: hookup memory_failure for UE errors") > Suggested-by: Mahesh Salgaonkar > Signed-off-by: Santosh S

Re: [PATCH v4 03/25] powerpc/fadump: Improve fadump documentation

2019-08-11 Thread Mahesh Jagannath Salgaonkar
On 7/16/19 5:02 PM, Hari Bathini wrote: > The figures depicting FADump's (Firmware-Assisted Dump) memory layout > are missing some finer details like different memory regions and what > they represent. Improve the documentation by updating those details. > > Signed-off-by: Hari Bathini > --- > D

Re: [PATCH v8 3/7] powerpc/mce: Fix MCE handling for huge pages

2019-08-09 Thread Mahesh Jagannath Salgaonkar
On 8/7/19 8:26 PM, Santosh Sivaraj wrote: > From: Balbir Singh > > The current code would fail on huge pages addresses, since the shift would > be incorrect. Use the correct page shift value returned by > __find_linux_pte() to get the correct physical address. The code is more > generic and can h

Re: [PATCH v8 1/7] powerpc/mce: Schedule work from irq_work

2019-08-09 Thread Mahesh Jagannath Salgaonkar
On 8/7/19 8:26 PM, Santosh Sivaraj wrote: > schedule_work() cannot be called from MCE exception context as MCE can > interrupt even in interrupt disabled context. > > fixes: 733e4a4c ("powerpc/mce: hookup memory_failure for UE errors") > Signed-off-by: Santosh Sivaraj > --- > arch/powerpc/kernel

Re: [PATCH] powerpc/fadump: sysfs for fadump memory reservation

2019-08-06 Thread Mahesh Jagannath Salgaonkar
On 8/6/19 8:42 AM, Sourabh Jain wrote: > Add a sys interface to allow querying the memory reserved by fadump > for saving the crash dump. > > Signed-off-by: Sourabh Jain Looks good to me. Reviewed-by: Mahesh Salgaonkar Thanks, -Mahesh. > --- > Documentation/powerpc/firmware-assisted-dump.rs

Re: [PATCH 2/2] powerpc: avoid adjusting memory_limit for capture kernel memory reservation

2019-07-23 Thread Mahesh Jagannath Salgaonkar
On 7/22/19 11:19 PM, Michal Suchánek wrote: > On Fri, 28 Jun 2019 00:51:19 +0530 > Hari Bathini wrote: > >> Currently, if memory_limit is specified and it overlaps with memory to >> be reserved for capture kernel, memory_limit is adjusted to accommodate >> capture kernel. With memory reservation

Re: [v3 4/7] powerpc/mce: Handle UE event for memcpy_mcsafe

2019-07-08 Thread Mahesh Jagannath Salgaonkar
On 7/6/19 3:23 PM, Nicholas Piggin wrote: > Santosh Sivaraj's on July 6, 2019 7:26 am: >> If we take a UE on one of the instructions with a fixup entry, set nip >> to continue exucution at the fixup entry. Stop processing the event >> further or print it. > > Minor nit, but can you instead a field

Re: [v2 09/12] powerpc/mce: Enable MCE notifiers in external modules

2019-07-02 Thread Mahesh Jagannath Salgaonkar
On 7/2/19 11:47 AM, Nicholas Piggin wrote: > Santosh Sivaraj's on July 2, 2019 3:19 pm: >> From: Reza Arbab >> >> Signed-off-by: Reza Arbab >> --- >> arch/powerpc/kernel/exceptions-64s.S | 6 ++ >> arch/powerpc/kernel/mce.c| 2 ++ >> 2 files changed, 8 insertions(+) >> >> diff --

Re: [PATCH 05/13] powerpc/mce: Allow notifier callback to handle MCE

2019-06-23 Thread Mahesh Jagannath Salgaonkar
On 6/23/19 7:44 AM, Reza Arbab wrote: > Hi Mahesh, > > On Fri, Jun 21, 2019 at 12:35:08PM +0530, Mahesh Jagannath Salgaonkar > wrote: >> On 6/21/19 6:27 AM, Santosh Sivaraj wrote: >>> -    blocking_notifier_call_chain(&mce_notifier_list, 0, &evt); >>

Re: [PATCH 05/13] powerpc/mce: Allow notifier callback to handle MCE

2019-06-21 Thread Mahesh Jagannath Salgaonkar
On 6/21/19 6:27 AM, Santosh Sivaraj wrote: > From: Reza Arbab > > If a notifier returns NOTIFY_STOP, consider the MCE handled, just as we > do when machine_check_early() returns 1. > > Signed-off-by: Reza Arbab > --- > arch/powerpc/include/asm/asm-prototypes.h | 2 +- > arch/powerpc/kernel/ex

Re: [PATCH v2 43/52] powerpc/64s/exception: machine check early only runs in HV mode

2019-06-20 Thread Mahesh Jagannath Salgaonkar
On 6/20/19 3:46 PM, Nicholas Piggin wrote: > Mahesh J Salgaonkar's on June 20, 2019 7:53 pm: >> On 2019-06-20 15:14:50 Thu, Nicholas Piggin wrote: >>> machine_check_common_early and machine_check_handle_early only run in >>> HVMODE. Remove dead code. >> >> That's not true. For pseries guest with FW

Re: [PATCH v2 42/52] powerpc/64s/exception: machine check fwnmi does not trigger when in HV mode

2019-06-20 Thread Mahesh Jagannath Salgaonkar
On 6/20/19 10:44 AM, Nicholas Piggin wrote: > Remove dead code. > > Signed-off-by: Nicholas Piggin > --- > arch/powerpc/kernel/exceptions-64s.S | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index 286bd

Re: [RFC PATCH 2/3] powernv/mce: Print correct severity for mce error.

2019-03-29 Thread Mahesh Jagannath Salgaonkar
On 3/29/19 5:53 AM, Michael Ellerman wrote: > Mahesh J Salgaonkar writes: >> diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h >> index 8d0b1c24c636..314ed3f13d59 100644 >> --- a/arch/powerpc/include/asm/mce.h >> +++ b/arch/powerpc/include/asm/mce.h >> @@ -110,17 +110,18

Re: [RFC PATCH 1/3] powernv/mce: reduce mce console logs to lesser lines.

2019-03-29 Thread Mahesh Jagannath Salgaonkar
On 3/29/19 5:50 AM, Michael Ellerman wrote: > Hi Mahesh, > > Thanks for doing this series. > > Mahesh J Salgaonkar writes: >> From: Mahesh Salgaonkar >> >> Also add cpu number while displaying mce log. This will help cleaner logs >> when mce hits on multiple cpus simultaneously. > > Can you in

Re: Disable kcov for slb routines.

2019-03-14 Thread Mahesh Jagannath Salgaonkar
On 3/14/19 5:13 PM, Michael Ellerman wrote: > On Mon, 2019-03-04 at 08:25:51 UTC, Mahesh J Salgaonkar wrote: >> From: Mahesh Salgaonkar >> >> The kcov instrumentation inside SLB routines causes duplicate SLB entries >> to be added resulting into SLB multihit machine checks. >> Disable kcov instrum

Re: [PATCH] powerpc/fadump: re-register firmware-assisted dump if already registered

2018-09-18 Thread Mahesh Jagannath Salgaonkar
On 09/14/2018 07:36 PM, Hari Bathini wrote: > Firmware-Assisted Dump (FADump) needs to be registered again after any > memory hot add/remove operation to update the crash memory ranges. But > currently, the kernel returns '-EEXIST' if we try to register without > uregistering it first. This could e

Re: [PATCH v8 5/5] powernv/pseries: consolidate code for mce early handling.

2018-08-27 Thread Mahesh Jagannath Salgaonkar
On 08/23/2018 02:32 PM, Nicholas Piggin wrote: > On Thu, 23 Aug 2018 14:13:13 +0530 > Mahesh Jagannath Salgaonkar wrote: > >> On 08/20/2018 05:04 PM, Nicholas Piggin wrote: >>> On Sun, 19 Aug 2018 22:38:39 +0530 >>> Mahesh J Salgaonkar wrote: >>> &

Re: [RESEND PATCH v2] powerpc/mce: Fix SLB rebolting during MCE recovery path.

2018-08-23 Thread Mahesh Jagannath Salgaonkar
On 08/23/2018 05:35 PM, Michael Ellerman wrote: > Mahesh Jagannath Salgaonkar writes: > >> On 08/23/2018 12:14 PM, Michael Ellerman wrote: >>> Mahesh J Salgaonkar writes: >>> >>>> From: Mahesh Salgaonkar >>>> >>>> With t

Re: [PATCH v8 5/5] powernv/pseries: consolidate code for mce early handling.

2018-08-23 Thread Mahesh Jagannath Salgaonkar
On 08/20/2018 05:04 PM, Nicholas Piggin wrote: > On Sun, 19 Aug 2018 22:38:39 +0530 > Mahesh J Salgaonkar wrote: > >> From: Mahesh Salgaonkar >> >> Now that other platforms also implements real mode mce handler, >> lets consolidate the code by sharing existing powernv machine check >> early code

Re: [RESEND PATCH v2] powerpc/mce: Fix SLB rebolting during MCE recovery path.

2018-08-23 Thread Mahesh Jagannath Salgaonkar
On 08/23/2018 12:14 PM, Michael Ellerman wrote: > Mahesh J Salgaonkar writes: > >> From: Mahesh Salgaonkar >> >> With the powerpc next commit e7e81847478 (powerpc/mce: Fix SLB rebolting >> during MCE recovery path.), > > That commit description is wrong, I'll fix it up. Ouch.. My bad.. :-( >

Re: [PATCH v2] poewrpc/mce: Fix SLB rebolting during MCE recovery path.

2018-08-22 Thread Mahesh Jagannath Salgaonkar
On 08/23/2018 10:26 AM, Mahesh J Salgaonkar wrote: > From: Mahesh Salgaonkar > > With the powrpc next commit e7e81847478 (poewrpc/mce: Fix SLB rebolting > during MCE recovery path.), the SLB error recovery is broken. The new > change now does not add index value to RB[52-63] that selects the SLB

Re: [PATCH] poewrpc/mce: Fix SLB rebolting during MCE recovery path.

2018-08-22 Thread Mahesh Jagannath Salgaonkar
On 08/21/2018 03:57 PM, Nicholas Piggin wrote: > On Fri, 17 Aug 2018 14:51:47 +0530 > Mahesh J Salgaonkar wrote: > >> From: Mahesh Salgaonkar >> >> With the powrpc next commit e7e81847478 (poewrpc/mce: Fix SLB rebolting >> during MCE recovery path.), the SLB error recovery is broken. The >> comm

Re: [PATCH v7 4/9] powerpc/pseries: Define MCE error event section.

2018-08-17 Thread Mahesh Jagannath Salgaonkar
On 08/16/2018 09:44 AM, Michael Ellerman wrote: > Mahesh Jagannath Salgaonkar writes: >> On 08/08/2018 08:12 PM, Michael Ellerman wrote: > ... >>> >>>> + union { >>>> + struct { >>>> + uint8_t ue_err_type; >>

Re: [PATCH v7 7/9] powerpc/pseries: Dump the SLB contents on SLB MCE errors.

2018-08-14 Thread Mahesh Jagannath Salgaonkar
On 08/13/2018 07:57 PM, Nicholas Piggin wrote: > On Mon, 13 Aug 2018 09:47:04 +0530 > Mahesh Jagannath Salgaonkar wrote: > >> On 08/11/2018 10:03 AM, Nicholas Piggin wrote: >>> On Tue, 07 Aug 2018 19:47:39 +0530 >>> Mahesh J Salgaonkar wrote: >>> >

Re: [PATCH v2 1/2] powerpc/64s: move machine check SLB flushing to mm/slb.c

2018-08-12 Thread Mahesh Jagannath Salgaonkar
On 08/10/2018 12:12 PM, Nicholas Piggin wrote: > The machine check code that flushes and restores bolted segments in > real mode belongs in mm/slb.c. This will also be used by pseries > machine check and idle code in future changes. > > Signed-off-by: Nicholas Piggin > > Since v1: > - Restore th

Re: [PATCH v7 7/9] powerpc/pseries: Dump the SLB contents on SLB MCE errors.

2018-08-12 Thread Mahesh Jagannath Salgaonkar
On 08/11/2018 10:03 AM, Nicholas Piggin wrote: > On Tue, 07 Aug 2018 19:47:39 +0530 > Mahesh J Salgaonkar wrote: > >> From: Mahesh Salgaonkar >> >> If we get a machine check exceptions due to SLB errors then dump the >> current SLB contents which will be very much helpful in debugging the >> roo

Re: [PATCH v7 7/9] powerpc/pseries: Dump the SLB contents on SLB MCE errors.

2018-08-10 Thread Mahesh Jagannath Salgaonkar
On 08/10/2018 04:02 PM, Mahesh Jagannath Salgaonkar wrote: > On 08/09/2018 06:35 AM, Michael Ellerman wrote: >> Mahesh J Salgaonkar writes: >> >>> diff --git a/arch/powerpc/include/asm/paca.h >>> b/arch/powerpc/include/asm/paca.h >>> index 7f22929ce915..

Re: [PATCH v7 7/9] powerpc/pseries: Dump the SLB contents on SLB MCE errors.

2018-08-10 Thread Mahesh Jagannath Salgaonkar
On 08/09/2018 06:35 AM, Michael Ellerman wrote: > Mahesh J Salgaonkar writes: > >> diff --git a/arch/powerpc/include/asm/paca.h >> b/arch/powerpc/include/asm/paca.h >> index 7f22929ce915..233d25ff6f64 100644 >> --- a/arch/powerpc/include/asm/paca.h >> +++ b/arch/powerpc/include/asm/paca.h >> @@

Re: [PATCH v7 5/9] powerpc/pseries: flush SLB contents on SLB MCE errors.

2018-08-10 Thread Mahesh Jagannath Salgaonkar
On 08/08/2018 02:34 PM, Nicholas Piggin wrote: > On Tue, 07 Aug 2018 19:47:14 +0530 > Mahesh J Salgaonkar wrote: > >> From: Mahesh Salgaonkar >> >> On pseries, as of today system crashes if we get a machine check >> exceptions due to SLB errors. These are soft errors and can be fixed by >> flush

Re: [PATCH v7 5/9] powerpc/pseries: flush SLB contents on SLB MCE errors.

2018-08-10 Thread Mahesh Jagannath Salgaonkar
On 08/07/2018 10:24 PM, Michal Suchánek wrote: > Hello, > > > On Tue, 07 Aug 2018 19:47:14 +0530 > "Mahesh J Salgaonkar" wrote: > >> From: Mahesh Salgaonkar >> >> On pseries, as of today system crashes if we get a machine check >> exceptions due to SLB errors. These are soft errors and can be

Re: [PATCH v7 4/9] powerpc/pseries: Define MCE error event section.

2018-08-10 Thread Mahesh Jagannath Salgaonkar
On 08/08/2018 08:12 PM, Michael Ellerman wrote: > Hi Mahesh, > > A few nitpicks. > > Mahesh J Salgaonkar writes: >> From: Mahesh Salgaonkar >> >> On pseries, the machine check error details are part of RTAS extended >> event log passed under Machine check exception section. This patch adds >> t

Re: [PATCH v2 2/2] powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segements

2018-08-08 Thread Mahesh Jagannath Salgaonkar
On 08/07/2018 02:12 AM, Hari Bathini wrote: > With dynamic memory allocation support for crash memory ranges array, > there is no hard limit on the no. of crash memory ranges kernel could > export, but program headers count could overflow in the /proc/vmcore > ELF file while exporting each memory r

Re: [PATCH v2 1/2] powerpc/fadump: handle crash memory ranges array index overflow

2018-08-08 Thread Mahesh Jagannath Salgaonkar
On 08/07/2018 02:12 AM, Hari Bathini wrote: > Crash memory ranges is an array of memory ranges of the crashing kernel > to be exported as a dump via /proc/vmcore file. The size of the array > is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases > where memblock memory regions co

Re: [PATCH] powerpc/fadump: handle crash memory ranges array overflow

2018-08-05 Thread Mahesh Jagannath Salgaonkar
On 07/31/2018 07:26 PM, Hari Bathini wrote: > Crash memory ranges is an array of memory ranges of the crashing kernel > to be exported as a dump via /proc/vmcore file. The size of the array > is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases > where memblock memory regions co

Re: [PATCH v6 5/8] powerpc/pseries: flush SLB contents on SLB MCE errors.

2018-08-01 Thread Mahesh Jagannath Salgaonkar
On 08/01/2018 11:28 AM, Nicholas Piggin wrote: > On Wed, 04 Jul 2018 23:28:21 +0530 > Mahesh J Salgaonkar wrote: > >> From: Mahesh Salgaonkar >> >> On pseries, as of today system crashes if we get a machine check >> exceptions due to SLB errors. These are soft errors and can be fixed by >> flush

Re: [RFC PATCH v6 0/4] powerpc/fadump: Improvements and fixes for firmware-assisted dump.

2018-07-18 Thread Mahesh Jagannath Salgaonkar
On 07/17/2018 05:22 PM, Michal Hocko wrote: > On Tue 17-07-18 16:58:10, Mahesh Jagannath Salgaonkar wrote: >> On 07/16/2018 01:56 PM, Michal Hocko wrote: >>> On Mon 16-07-18 11:32:56, Mahesh J Salgaonkar wrote: >>>> One of the primary issues with Firmware Assisted

Re: [RFC PATCH v6 0/4] powerpc/fadump: Improvements and fixes for firmware-assisted dump.

2018-07-17 Thread Mahesh Jagannath Salgaonkar
On 07/16/2018 01:56 PM, Michal Hocko wrote: > On Mon 16-07-18 11:32:56, Mahesh J Salgaonkar wrote: >> One of the primary issues with Firmware Assisted Dump (fadump) on Power >> is that it needs a large amount of memory to be reserved. This reserved >> memory is used for saving the contents of old c

Re: [PATCH v5 2/7] powerpc/pseries: Defer the logging of rtas error to irq work queue.

2018-07-03 Thread Mahesh Jagannath Salgaonkar
On 07/03/2018 08:55 AM, Nicholas Piggin wrote: > On Mon, 02 Jul 2018 11:16:29 +0530 > Mahesh J Salgaonkar wrote: > >> From: Mahesh Salgaonkar >> >> rtas_log_buf is a buffer to hold RTAS event data that are communicated >> to kernel by hypervisor. This buffer is then used to pass RTAS event >> da

Re: [PATCH v5 5/7] powerpc/pseries: flush SLB contents on SLB MCE errors.

2018-07-03 Thread Mahesh Jagannath Salgaonkar
On 07/03/2018 03:38 AM, Nicholas Piggin wrote: > On Mon, 02 Jul 2018 11:17:06 +0530 > Mahesh J Salgaonkar wrote: > >> From: Mahesh Salgaonkar >> >> On pseries, as of today system crashes if we get a machine check >> exceptions due to SLB errors. These are soft errors and can be fixed by >> flush

Re: [PATCH v4 1/6] powerpc/pseries: Defer the logging of rtas error to irq work queue.

2018-06-28 Thread Mahesh Jagannath Salgaonkar
On 06/29/2018 02:35 AM, kbuild test robot wrote: > Hi Mahesh, > > Thank you for the patch! Yet something to improve: > > [auto build test ERROR on powerpc/next] > [also build test ERROR on v4.18-rc2 next-20180628] > [if your patch is applied to the wrong git tree, please drop us a note to > help

Re: [PATCH v4 1/6] powerpc/pseries: Defer the logging of rtas error to irq work queue.

2018-06-28 Thread Mahesh Jagannath Salgaonkar
On 06/28/2018 06:49 PM, Laurent Dufour wrote: > On 28/06/2018 13:10, Mahesh J Salgaonkar wrote: >> From: Mahesh Salgaonkar >> >> rtas_log_buf is a buffer to hold RTAS event data that are communicated >> to kernel by hypervisor. This buffer is then used to pass RTAS event >> data to user through pr

Re: [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.

2018-06-12 Thread Mahesh Jagannath Salgaonkar
On 06/12/2018 07:17 PM, Michael Ellerman wrote: > Mahesh J Salgaonkar writes: >> diff --git a/arch/powerpc/platforms/pseries/ras.c >> b/arch/powerpc/platforms/pseries/ras.c >> index 2edc673be137..e56759d92356 100644 >> --- a/arch/powerpc/platforms/pseries/ras.c >> +++ b/arch/powerpc/platforms/pse

Re: [v3 PATCH 2/5] powerpc/pseries: Fix endainness while restoring of r3 in MCE handler.

2018-06-08 Thread Mahesh Jagannath Salgaonkar
On 06/08/2018 12:20 PM, Michael Ellerman wrote: > Mahesh J Salgaonkar writes: >> From: Mahesh Salgaonkar >> >> During Machine Check interrupt on pseries platform, register r3 points >> RTAS extended event log passed by hypervisor. Since hypervisor uses r3 >> to pass pointer to rtas log, it stores

Re: [v3 PATCH 5/5] powerpc/pseries: Display machine check error details.

2018-06-07 Thread Mahesh Jagannath Salgaonkar
On 06/08/2018 07:21 AM, Nicholas Piggin wrote: > On Thu, 07 Jun 2018 22:59:04 +0530 > Mahesh J Salgaonkar wrote: > >> From: Mahesh Salgaonkar >> >> Extract the MCE error details from RTAS extended log and display it to >> console. >> >> With this patch you should now see mce logs like below: >>

Re: [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.

2018-06-07 Thread Mahesh Jagannath Salgaonkar
On 06/08/2018 07:18 AM, Nicholas Piggin wrote: > On Thu, 07 Jun 2018 22:58:55 +0530 > Mahesh J Salgaonkar wrote: > >> From: Mahesh Salgaonkar >> >> If we get a machine check exceptions due to SLB errors then dump the >> current SLB contents which will be very much helpful in debugging the >> roo

Re: [v3 PATCH 1/5] powerpc/pseries: convert rtas_log_buf to linear allocation.

2018-06-07 Thread Mahesh Jagannath Salgaonkar
On 06/08/2018 07:01 AM, Nicholas Piggin wrote: > On Thu, 07 Jun 2018 22:58:11 +0530 > Mahesh J Salgaonkar wrote: > >> From: Mahesh Salgaonkar >> >> rtas_log_buf is a buffer to hold RTAS event data that are communicated >> to kernel by hypervisor. This buffer is then used to pass RTAS event >> da

Re: [v2 PATCH 0/5] powerpc/pseries: Machien check handler improvements.

2018-06-07 Thread Mahesh Jagannath Salgaonkar
On 06/07/2018 04:15 PM, Nicholas Piggin wrote: > On Thu, 07 Jun 2018 15:36:25 +0530 > Mahesh J Salgaonkar wrote: > >> This patch series includes some improvement to Machine check handler >> for pseries. Patch 1 fixes an issue where machine check handler crashes >> kernel while accessing vmalloc-e

Re: [PATCH v5 1/4] powerpc/fadump: un-register fadump on kexec path.

2018-04-26 Thread Mahesh Jagannath Salgaonkar
On 04/26/2018 07:10 PM, Nicholas Piggin wrote: > On Thu, 26 Apr 2018 18:35:10 +0530 > Mahesh Jagannath Salgaonkar wrote: > >> On 04/26/2018 06:28 PM, Nicholas Piggin wrote: >>> On Thu, 26 Apr 2018 17:12:03 +0530 >>> Mahesh J Salgaonkar wrote: &g

Re: [PATCH v5 1/4] powerpc/fadump: un-register fadump on kexec path.

2018-04-26 Thread Mahesh Jagannath Salgaonkar
On 04/26/2018 06:28 PM, Nicholas Piggin wrote: > On Thu, 26 Apr 2018 17:12:03 +0530 > Mahesh J Salgaonkar wrote: > >> From: Mahesh Salgaonkar >> >> otherwise the fadump registration in new kexec-ed kernel complains that >> fadump is already registered. This makes new kernel to continue using >>

Re: [PATCH] powerpc/mce: Fix a bug where mce loops on memory UE.

2018-04-23 Thread Mahesh Jagannath Salgaonkar
On 04/23/2018 04:44 PM, Balbir Singh wrote: > On Mon, Apr 23, 2018 at 8:33 PM, Mahesh Jagannath Salgaonkar > wrote: >> On 04/23/2018 12:21 PM, Balbir Singh wrote: >>> On Mon, Apr 23, 2018 at 2:59 PM, Mahesh J Salgaonkar >>> wrote: >>>> From: Mahesh Salga

Re: [PATCH] powerpc/mce: Fix a bug where mce loops on memory UE.

2018-04-23 Thread Mahesh Jagannath Salgaonkar
On 04/23/2018 12:21 PM, Balbir Singh wrote: > On Mon, Apr 23, 2018 at 2:59 PM, Mahesh J Salgaonkar > wrote: >> From: Mahesh Salgaonkar >> >> The current code extracts the physical address for UE errors and then >> hooks it up into memory failure infrastructure. On successful extraction >> of phys

Re: [PATCH v4 3/7] powerpc/fadump: un-register fadump on kexec path.

2018-04-22 Thread Mahesh Jagannath Salgaonkar
On 04/22/2018 07:28 AM, Nicholas Piggin wrote: > On Fri, 20 Apr 2018 10:34:35 +0530 > Mahesh J Salgaonkar wrote: > >> From: Mahesh Salgaonkar >> >> otherwise the fadump registration in new kexec-ed kernel complains that >> fadump is already registered. This makes new kernel to continue using >>

Re: [PATCH v4 1/7] powerpc/fadump: Move the metadata region to start of the reserved area.

2018-04-22 Thread Mahesh Jagannath Salgaonkar
On 04/22/2018 07:28 AM, Nicholas Piggin wrote: > On Fri, 20 Apr 2018 10:34:18 +0530 > Mahesh J Salgaonkar wrote: > >> From: Mahesh Salgaonkar >> >> Currently the metadata region that holds crash info structure and ELF core >> header is placed towards the end of reserved memory area. This patch p

Re: [PATCH v2 2/2] powerpc/fadump: Do not use hugepages when fadump is active

2018-04-11 Thread Mahesh Jagannath Salgaonkar
On 04/10/2018 07:11 PM, Hari Bathini wrote: > FADump capture kernel boots in restricted memory environment preserving > the context of previous kernel to save vmcore. Supporting hugepages in > such environment makes things unnecessarily complicated, as hugepages > need memory set aside for them. Th

Re: [PATCH v3 1/7] powerpc/fadump: Move the metadata region to start of the reserved area.

2018-04-04 Thread Mahesh Jagannath Salgaonkar
On 04/04/2018 12:56 AM, Hari Bathini wrote: > Mahesh, I think we should explicitly document that production and > capture kernel > versions should be same. For changes like below, older/newer production > kernel vs > capture kernel is bound to fail. Of course, production and capture > kernel versio

Re: [PATCH v3 4/7] powerpc/fadump: exclude memory holes while reserving memory in second kernel.

2018-04-03 Thread Mahesh Jagannath Salgaonkar
On 04/03/2018 03:21 PM, Hari Bathini wrote: > > > On Monday 02 April 2018 12:00 PM, Mahesh J Salgaonkar wrote: >> From: Mahesh Salgaonkar >> >> The second kernel, during early boot after the crash, reserves rest of >> the memory above boot memory size to make sure it does not touch any >> of the

Re: [PATCH v3 6/7] powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.

2018-04-03 Thread Mahesh Jagannath Salgaonkar
On 04/03/2018 08:48 AM, Pingfan Liu wrote: > I think CMA has protected us from hot-remove, so this patch is not necessary. Yes, but only if the memory from declared CMA region is allocated using cma_alloc(). The rest of the memory inside CMA region which hasn't been cma_allocat-ed can still be hot

Re: [PATCH] powernv: Avoid calling trace tlbie in kexec path.

2017-11-23 Thread Mahesh Jagannath Salgaonkar
On 11/23/2017 04:26 AM, Balbir Singh wrote: > On Thu, Nov 23, 2017 at 4:32 AM, Mahesh J Salgaonkar > wrote: >> From: Mahesh Salgaonkar >> >> Rebooting into a new kernel with kexec fails in trace_tlbie() which is >> called from native_hpte_clear(). This happens if the running kernel has >> CONFIG_

Re: [PATCH] powernv: Avoid calling trace tlbie in kexec path.

2017-11-23 Thread Mahesh Jagannath Salgaonkar
On 11/23/2017 12:37 AM, Naveen N. Rao wrote: > Mahesh J Salgaonkar wrote: >> From: Mahesh Salgaonkar >> >> Rebooting into a new kernel with kexec fails in trace_tlbie() which is >> called from native_hpte_clear(). This happens if the running kernel has >> CONFIG_LOCKDEP enabled. With lockdep enabl

Re: [rfc 2/3] powerpc/mce: Extract physical_address for UE errors

2017-09-07 Thread Mahesh Jagannath Salgaonkar
On 09/05/2017 09:45 AM, Balbir Singh wrote: > Walk the page table for NIP and extract the instruction. Then > use the instruction to find the effective address via analyse_instr(). > > We might have page table walking races, but we expect them to > be rare, the physical address extraction is best

Re: [PATCH v2 2/3] powerpc/powernv: machine check use kernel crash path

2017-07-20 Thread Mahesh Jagannath Salgaonkar
On 07/19/2017 12:29 PM, Nicholas Piggin wrote: > There are quite a few machine check exceptions that can be caused by > kernel bugs. To make debugging easier, use the kernel crash path in > cases of synchronous machine checks that occur in kernel mode, if that > would not result in the machine goin

Re: [PATCH v2 1/3] powerpc/powernv: handle the platform error reboot in ppc_md.restart

2017-07-19 Thread Mahesh Jagannath Salgaonkar
On 07/19/2017 12:29 PM, Nicholas Piggin wrote: > Unrecovered MCE and HMI errors are sent through a special restart OPAL > call to log the platform error. The downside is that they don't go > through normal Linux crash paths, so they don't give much information > to the Linux console. > > Change th

Re: [PATCH 1/4] powerpc/powernv: handle the platform error reboot in ppc_md.restart

2017-07-09 Thread Mahesh Jagannath Salgaonkar
On 07/06/2017 11:26 PM, Nicholas Piggin wrote: > On Wed, 5 Jul 2017 14:04:19 +1000 > Nicholas Piggin wrote: > >> Unrecovered MCE and HMI errors are sent through a special restart >> OPAL call to log the platform error. The downside is that they don't >> go through normal crash paths, so they don

Re: [PATCH v2 1/3] powerpc: do not call ppc_md.panic in fadump panic notifier

2017-07-04 Thread Mahesh Jagannath Salgaonkar
On 07/05/2017 09:26 AM, Nicholas Piggin wrote: > If fadump is not registered, and no other crash or debug handlers are > registered, the powerpc panic handler stops the guest before the generic > panic code can push out debug information to the console. > > Currently, system reset injection causes

Re: [PATCH 1/3] powerpc: do not call ppc_md.panic in panic notifier if fadump not used

2017-07-04 Thread Mahesh Jagannath Salgaonkar
On 07/04/2017 03:39 PM, Nicholas Piggin wrote: > If fadump is not registered, and no other crash or debug handlers are > registered, the powerpc panic handler stops the guest before the generic > panic code can push out debug information to the console. > > Without this patch, system reset injecti

Re: [PATCH v2] powerpc/fadump: return error when fadump registration fails

2017-05-29 Thread Mahesh Jagannath Salgaonkar
On 05/27/2017 09:16 PM, Michal Suchanek wrote: > - log an error message when registration fails and no error code listed > in the switch is returned > - translate the hv error code to posix error code and return it from > fw_register > - return the posix error code from fw_register to the proc

Re: [PATCH 2/2] powerpc/fadump: avoid holes in boot memory area when fadump is registered

2017-05-04 Thread Mahesh Jagannath Salgaonkar
On 05/04/2017 11:24 PM, Hari Bathini wrote: > To register fadump, boot memory area - the size of low memory chunk that > is required for a kernel to boot successfully when booted with restricted > memory, is assumed to have no holes. But this memory area is currently > not protected from hot-remove

Re: [PATCH 1/2] powerpc/fadump: avoid duplicates in crash memory ranges

2017-05-04 Thread Mahesh Jagannath Salgaonkar
On 05/04/2017 11:23 PM, Hari Bathini wrote: > fadump sets up crash memory ranges to be used for creating PT_LOAD > program headers in elfcore header. Memory chunk RMA_START through > boot memory area size is added as the first memory range because > firmware, at the time of crash, moves this memory

Re: [PATCH v4 2/3] powerpc/fadump: Use the correct VMCOREINFO_NOTE_SIZE for phdr

2017-04-27 Thread Mahesh Jagannath Salgaonkar
On 04/26/2017 12:41 PM, Dave Young wrote: > Ccing ppc list > On 04/20/17 at 07:39pm, Xunlei Pang wrote: >> vmcoreinfo_max_size stands for the vmcoreinfo_data, the >> correct one we should use is vmcoreinfo_note whose total >> size is VMCOREINFO_NOTE_SIZE. >> >> Like explained in commit 77019967f06b

Re: [PATCH v2] powerpc/book3s: mce: Move add_taint() later in virtual mode.

2017-04-24 Thread Mahesh Jagannath Salgaonkar
On 04/21/2017 09:37 AM, Michael Ellerman wrote: > Daniel Axtens writes: >>> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c >>> index a1475e6..b23b323 100644 >>> --- a/arch/powerpc/kernel/mce.c >>> +++ b/arch/powerpc/kernel/mce.c >>> @@ -221,6 +221,8 @@ static void machine_check

Re: [PATCH 2/2] powerpc/book3s: mce: Use add_taint_no_warn() in machine_check_early().

2017-04-17 Thread Mahesh Jagannath Salgaonkar
On 04/17/2017 04:09 PM, Daniel Axtens wrote: > Hi Mahesh, > >> Fixes: 27ea2c420cad powerpc: Set the correct kernel taint on machine check >> errors. > > I notice this Fixes a commit I introduced. Please could you cc me when > you do this? I am likely to miss it otherwise, especially since I have

Re: [PATCH v7 2/3] powerpc/powernv: Introduce a machine check hook for Guest MCEs.

2017-04-06 Thread Mahesh Jagannath Salgaonkar
On 04/06/2017 10:52 AM, David Gibson wrote: > On Thu, Apr 06, 2017 at 02:17:22AM +0530, Mahesh J Salgaonkar wrote: >> From: Mahesh Salgaonkar >> >> This patch introduces a mce hook which is invoked at the time of guest >> exit to facilitate the host-side handling of machine check exception >> befo

Re: [PATCH 2/2] powerpc/book3s: Display task info for MCE error in user mode.

2017-03-30 Thread Mahesh Jagannath Salgaonkar
On 03/30/2017 05:39 AM, Nicholas Piggin wrote: > On Tue, 28 Mar 2017 19:15:28 +0530 > Mahesh J Salgaonkar wrote: > >> From: Mahesh Salgaonkar >> >> For MCE that hit while in use mode MSR(HV=1,PR=1), print the task info on the >> console MCE error log. This will help to identify application that

Re: [PATCH 4/8] powerpc/64s: fix POWER9 machine check handler from stop state

2017-03-19 Thread Mahesh Jagannath Salgaonkar
On 03/16/2017 06:49 PM, Gautham R Shenoy wrote: > Hi, > > On Thu, Mar 16, 2017 at 11:05:20PM +1000, Nicholas Piggin wrote: >> On Thu, 16 Mar 2017 18:10:48 +0530 >> Mahesh Jagannath Salgaonkar wrote: >> >>> On 03/14/2017 02:53 PM, Nicholas Piggin wrote: >

Re: [PATCH 4/8] powerpc/64s: fix POWER9 machine check handler from stop state

2017-03-16 Thread Mahesh Jagannath Salgaonkar
On 03/14/2017 02:53 PM, Nicholas Piggin wrote: > The ISA specifies power save wakeup can cause a machine check interrupt. > The machine check handler currently has code to handle that for POWER8, > but POWER9 crashes when trying to execute the P8 style sleep > instructions. > > So queue up the mac

Re: [PATCH 3/3] powerpc/64s: POWER9 machine check handler

2017-02-27 Thread Mahesh Jagannath Salgaonkar
On 02/28/2017 07:30 AM, Nicholas Piggin wrote: > Add POWER9 machine check handler. There are several new types of errors > added, so logging messages for those are also added. > > This doesn't attempt to reuse any of the P7/8 defines or functions, > because that becomes too complex. The better opt

Re: [PATCH 1/3] powerpc/64s: fix handling of non-synchronous machine checks

2017-02-27 Thread Mahesh Jagannath Salgaonkar
On 02/28/2017 07:30 AM, Nicholas Piggin wrote: > A synchronous machine check is an exception raised by the attempt to > execute the current instruction. If the error can't be corrected, it > can make sense to SIGBUS the currently running process. > > In other cases, the error condition is not rela

Re: [RFC PATCH 1/7] powerpc/book3s: Move machine check event structure to opal-api.h

2017-02-20 Thread Mahesh Jagannath Salgaonkar
On 02/21/2017 08:05 AM, Nicholas Piggin wrote: > On Tue, 21 Feb 2017 07:21:56 +0530 > Mahesh J Salgaonkar wrote: > >> +enum MCE_TlbErrorType { >> +MCE_TLB_ERROR_INDETERMINATE = 0, >> +MCE_TLB_ERROR_PARITY = 1, >> +MCE_TLB_ERROR_MULTIHIT = 2, >> +MCE_TLB_ERROR_TLBIEL_PROG_ERROR = 3

Re: [RFC PATCH 5/7] powerpc/book3s: Don't turn on the MSR[ME] bit until opal processes the reason.

2017-02-20 Thread Mahesh Jagannath Salgaonkar
On 02/21/2017 08:17 AM, Nicholas Piggin wrote: > On Tue, 21 Feb 2017 07:22:56 +0530 > Mahesh J Salgaonkar wrote: > >> From: Mahesh Salgaonkar >> >> Delay it until we are done with machine_check_early() call. Turn on MSR[ME] >> once opal is done with processing MCE. > > Why? This seems like quit

Re: [PATCH v1 1/2] fadump: reduce memory consumption for capture kernel

2017-01-30 Thread Mahesh Jagannath Salgaonkar
On 01/30/2017 10:14 PM, Hari Bathini wrote: > In case of fadump, capture (fadump) kernel boots like a normal kernel. > While this has its advantages, the capture kernel would initialize all > the components like normal kernel, which may not necessarily be needed > for a typical dump capture kernel.

Re: [PATCH v5 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled

2017-01-17 Thread Mahesh Jagannath Salgaonkar
On 01/16/2017 10:05 AM, Paul Mackerras wrote: > On Fri, Jan 13, 2017 at 04:51:45PM +0530, Aravinda Prasad wrote: >> Enhance KVM to cause a guest exit with KVM_EXIT_NMI >> exit reason upon a machine check exception (MCE) in >> the guest address space if the KVM_CAP_PPC_FWNMI >> capability is enabled

Re: [PATCH v4 4/5] powerpc/fadump: reuse crashkernel parameter for fadump memory reservation

2017-01-13 Thread Mahesh Jagannath Salgaonkar
On 01/05/2017 11:02 PM, Hari Bathini wrote: > fadump supports specifying memory to reserve for fadump's crash kernel > with fadump_reserve_mem kernel parameter. This parameter currently > supports passing a fixed memory size, like fadump_reserve_mem= > only. This patch aims to add support for other

Re: [PATCH v4 3/5] powerpc/fadump: remove dependency with CONFIG_KEXEC

2017-01-13 Thread Mahesh Jagannath Salgaonkar
On 01/05/2017 11:02 PM, Hari Bathini wrote: > Now that crashkernel parameter parsing and vmcoreinfo related code is > moved under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE, remove > dependency with CONFIG_KEXEC for CONFIG_FA_DUMP. While here, get rid > of definitions of fadump_append_elf_note(

Re: [PATCH] powerpc/64s: relocation, register save fixes for system reset interrupt

2016-11-01 Thread Mahesh Jagannath Salgaonkar
On 10/13/2016 07:47 AM, Nicholas Piggin wrote: > This patch does a couple of things. First of all, powernv immediately > explodes when running a relocated kernel, because the system reset > exception for handling sleeps does not do correct relocated branches. > > Secondly, the sleep handling code

Re: [PATCH] powerpc/pseries: Use H_CLEAR_HPT to clear MMU hash table during kexec

2016-10-26 Thread Mahesh Jagannath Salgaonkar
On 10/01/2016 04:11 PM, Anton Blanchard wrote: > From: Anton Blanchard > > An hcall was recently added that does exactly what we need > during kexec - it clears the entire MMU hash table, ignoring any > VRMA mappings. > > Try it and fall back to the old method if we get a failure. > > On a POWE

Re: [PATCH] powerpc/fadump: Fix the race in crash_fadump().

2016-10-12 Thread Mahesh Jagannath Salgaonkar
On 10/10/2016 04:22 PM, Michael Ellerman wrote: > Mahesh J Salgaonkar writes: > >> From: Mahesh Salgaonkar >> >> There are chances that multiple CPUs can call crash_fadump() simultaneously >> and would start duplicating same info to vmcoreinfo ELF note section. This >> causes makedumpfile to fai

Re: [PATCH v2 4/5] powerpc/fadump: Make ELF eflags depend on endian

2016-09-08 Thread Mahesh Jagannath Salgaonkar
On 09/08/2016 12:30 PM, Mahesh Jagannath Salgaonkar wrote: > On 09/06/2016 11:02 AM, Daniel Axtens wrote: >> Firmware Assisted Dump is a facility to dump kernel core with assistance >> from firmware. As part of this process the kernel ELF version is >> stored. >> >

Re: [PATCH v2 4/5] powerpc/fadump: Make ELF eflags depend on endian

2016-09-08 Thread Mahesh Jagannath Salgaonkar
On 09/06/2016 11:02 AM, Daniel Axtens wrote: > Firmware Assisted Dump is a facility to dump kernel core with assistance > from firmware. As part of this process the kernel ELF version is > stored. > > Currently, fadump.h defines this to 0 if it is not already defined. This > clashes with a define

Re: [PATCH 1/2] powerpc/pseries: PACA save area fix for general exception vs MCE

2016-08-10 Thread Mahesh Jagannath Salgaonkar
On 08/10/2016 04:18 PM, Nicholas Piggin wrote: > MCE must not use PACA_EXGEN. When a general exception enables MSR_RI, > that means SPRN_SRR[01] and SPRN_SPRG are no longer used. However the > PACA save area is still in use. > --- > arch/powerpc/kernel/exceptions-64s.S | 8 > 1 file chang

Re: [RESEND PATCH v3 2/2] powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.

2016-08-08 Thread Mahesh Jagannath Salgaonkar
On 08/08/2016 02:28 PM, Michael Ellerman wrote: > Mahesh J Salgaonkar writes: > >> From: Mahesh Salgaonkar >> >> The current implementation of MCE early handling modifies CR0/1 registers >> without saving its old values. Fix this by moving early check for >> powersaving mode to machine_check_han

Re: [PATCH] powernv: Load correct TOC pointer while waking up from winkle.

2016-08-05 Thread Mahesh Jagannath Salgaonkar
On 08/06/2016 04:08 AM, Benjamin Herrenschmidt wrote: > On Fri, 2016-08-05 at 19:13 +0530, Mahesh J Salgaonkar wrote: >> From: Mahesh Salgaonkar >> >> The function pnv_restore_hyp_resource() loads the TOC into r2 from >> the invalid PACA pointer before fixing r13 value. This do not affect >> POWER

Re: [PATCH] powerpc/book3s: Fix MCE console messages for unrecoverable MCE.

2016-08-04 Thread Mahesh Jagannath Salgaonkar
On 08/04/2016 03:27 PM, Michael Ellerman wrote: > Mahesh J Salgaonkar writes: > >> From: Mahesh Salgaonkar >> >> When machine check occurs with MSR(RI=0), it means MC interrupt is >> unrecoverable and kernel goes down to panic path. But the console >> message still shows it as recovered. This pa

Re: [PATCH] powerpc/book3s: Fix MCE console messages for unrecoverable MCE.

2016-08-04 Thread Mahesh Jagannath Salgaonkar
On 08/04/2016 01:35 PM, Greg KH wrote: > On Thu, Aug 04, 2016 at 10:16:48AM +0530, Mahesh J Salgaonkar wrote: >> From: Mahesh Salgaonkar >> >> When machine check occurs with MSR(RI=0), it means MC interrupt is >> unrecoverable and kernel goes down to panic path. But the console >> message still sh

  1   2   >