On 12/8/20 4:16 PM, Ganesh wrote:
>
> On 12/8/20 4:01 PM, Michael Ellerman wrote:
>> Ganesh Goudar writes:
>>> diff --git a/arch/powerpc/include/asm/paca.h
>>> b/arch/powerpc/include/asm/paca.h
>>> index 9454d29ff4b4..4769954efa7d 100644
>>> --- a/arch/powerpc/include/asm/paca.h
>>> +++ b/arch/po
On 10/5/20 4:17 PM, Ananth N Mavinakayanahalli wrote:
> On 10/5/20 9:42 AM, Mahesh Salgaonkar wrote:
>> Every error log reported by OPAL is exported to userspace through a sysfs
>> interface and notified using kobject_uevent(). The userspace daemon
>> (opal_errd) then reads the error log and acknow
On 10/6/20 5:55 AM, Oliver O'Halloran wrote:
> On Mon, Oct 5, 2020 at 3:12 PM Mahesh Salgaonkar wrote:
>>
>> Every error log reported by OPAL is exported to userspace through a sysfs
>> interface and notified using kobject_uevent(). The userspace daemon
>> (opal_errd) then reads the error log and
On 9/15/20 2:13 PM, Michal Suchánek wrote:
> Hello,
>
> Using the SLB mutihit injection test module (which I did not write so I
> do not want to post it here) to verify updates on my 5.3 frankernekernel
> I found that the kernel crashes with Oops: kernel bad access.
>
> I tested on latest upstrea
On 9/3/19 9:35 PM, Hari Bathini wrote:
>
>
> On 03/09/19 4:39 PM, Michael Ellerman wrote:
>> Hari Bathini writes:
>>> Make way for refactoring platform specific FADump code by moving code
>>> that could be referenced from multiple places to fadump-common.c file.
>>>
>>> Signed-off-by: Hari Bathi
On 8/14/19 12:36 PM, Hari Bathini wrote:
>
>
> On 13/08/19 4:11 PM, Mahesh J Salgaonkar wrote:
>> On 2019-07-16 17:03:15 Tue, Hari Bathini wrote:
>>> OPAL allows registering address with it in the first kernel and
>>> retrieving it after MPIPL. Setup kernel metadata and register its
>>> address w
On 8/12/19 2:52 PM, Santosh Sivaraj wrote:
> If we take a UE on one of the instructions with a fixup entry, set nip
> to continue execution at the fixup entry. Stop processing the event
> further or print it.
>
> Co-developed-by: Reza Arbab
> Signed-off-by: Reza Arbab
> Cc: Mahesh Salgaonkar
>
On 8/12/19 2:52 PM, Santosh Sivaraj wrote:
> schedule_work() cannot be called from MCE exception context as MCE can
> interrupt even in interrupt disabled context.
>
> fixes: 733e4a4c ("powerpc/mce: hookup memory_failure for UE errors")
> Suggested-by: Mahesh Salgaonkar
> Signed-off-by: Santosh S
On 7/16/19 5:02 PM, Hari Bathini wrote:
> The figures depicting FADump's (Firmware-Assisted Dump) memory layout
> are missing some finer details like different memory regions and what
> they represent. Improve the documentation by updating those details.
>
> Signed-off-by: Hari Bathini
> ---
> D
On 8/7/19 8:26 PM, Santosh Sivaraj wrote:
> From: Balbir Singh
>
> The current code would fail on huge pages addresses, since the shift would
> be incorrect. Use the correct page shift value returned by
> __find_linux_pte() to get the correct physical address. The code is more
> generic and can h
On 8/7/19 8:26 PM, Santosh Sivaraj wrote:
> schedule_work() cannot be called from MCE exception context as MCE can
> interrupt even in interrupt disabled context.
>
> fixes: 733e4a4c ("powerpc/mce: hookup memory_failure for UE errors")
> Signed-off-by: Santosh Sivaraj
> ---
> arch/powerpc/kernel
On 8/6/19 8:42 AM, Sourabh Jain wrote:
> Add a sys interface to allow querying the memory reserved by fadump
> for saving the crash dump.
>
> Signed-off-by: Sourabh Jain
Looks good to me.
Reviewed-by: Mahesh Salgaonkar
Thanks,
-Mahesh.
> ---
> Documentation/powerpc/firmware-assisted-dump.rs
On 7/22/19 11:19 PM, Michal Suchánek wrote:
> On Fri, 28 Jun 2019 00:51:19 +0530
> Hari Bathini wrote:
>
>> Currently, if memory_limit is specified and it overlaps with memory to
>> be reserved for capture kernel, memory_limit is adjusted to accommodate
>> capture kernel. With memory reservation
On 7/6/19 3:23 PM, Nicholas Piggin wrote:
> Santosh Sivaraj's on July 6, 2019 7:26 am:
>> If we take a UE on one of the instructions with a fixup entry, set nip
>> to continue exucution at the fixup entry. Stop processing the event
>> further or print it.
>
> Minor nit, but can you instead a field
On 7/2/19 11:47 AM, Nicholas Piggin wrote:
> Santosh Sivaraj's on July 2, 2019 3:19 pm:
>> From: Reza Arbab
>>
>> Signed-off-by: Reza Arbab
>> ---
>> arch/powerpc/kernel/exceptions-64s.S | 6 ++
>> arch/powerpc/kernel/mce.c| 2 ++
>> 2 files changed, 8 insertions(+)
>>
>> diff --
On 6/23/19 7:44 AM, Reza Arbab wrote:
> Hi Mahesh,
>
> On Fri, Jun 21, 2019 at 12:35:08PM +0530, Mahesh Jagannath Salgaonkar
> wrote:
>> On 6/21/19 6:27 AM, Santosh Sivaraj wrote:
>>> - blocking_notifier_call_chain(&mce_notifier_list, 0, &evt);
>>
On 6/21/19 6:27 AM, Santosh Sivaraj wrote:
> From: Reza Arbab
>
> If a notifier returns NOTIFY_STOP, consider the MCE handled, just as we
> do when machine_check_early() returns 1.
>
> Signed-off-by: Reza Arbab
> ---
> arch/powerpc/include/asm/asm-prototypes.h | 2 +-
> arch/powerpc/kernel/ex
On 6/20/19 3:46 PM, Nicholas Piggin wrote:
> Mahesh J Salgaonkar's on June 20, 2019 7:53 pm:
>> On 2019-06-20 15:14:50 Thu, Nicholas Piggin wrote:
>>> machine_check_common_early and machine_check_handle_early only run in
>>> HVMODE. Remove dead code.
>>
>> That's not true. For pseries guest with FW
On 6/20/19 10:44 AM, Nicholas Piggin wrote:
> Remove dead code.
>
> Signed-off-by: Nicholas Piggin
> ---
> arch/powerpc/kernel/exceptions-64s.S | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S
> b/arch/powerpc/kernel/exceptions-64s.S
> index 286bd
On 3/29/19 5:53 AM, Michael Ellerman wrote:
> Mahesh J Salgaonkar writes:
>> diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
>> index 8d0b1c24c636..314ed3f13d59 100644
>> --- a/arch/powerpc/include/asm/mce.h
>> +++ b/arch/powerpc/include/asm/mce.h
>> @@ -110,17 +110,18
On 3/29/19 5:50 AM, Michael Ellerman wrote:
> Hi Mahesh,
>
> Thanks for doing this series.
>
> Mahesh J Salgaonkar writes:
>> From: Mahesh Salgaonkar
>>
>> Also add cpu number while displaying mce log. This will help cleaner logs
>> when mce hits on multiple cpus simultaneously.
>
> Can you in
On 3/14/19 5:13 PM, Michael Ellerman wrote:
> On Mon, 2019-03-04 at 08:25:51 UTC, Mahesh J Salgaonkar wrote:
>> From: Mahesh Salgaonkar
>>
>> The kcov instrumentation inside SLB routines causes duplicate SLB entries
>> to be added resulting into SLB multihit machine checks.
>> Disable kcov instrum
On 09/14/2018 07:36 PM, Hari Bathini wrote:
> Firmware-Assisted Dump (FADump) needs to be registered again after any
> memory hot add/remove operation to update the crash memory ranges. But
> currently, the kernel returns '-EEXIST' if we try to register without
> uregistering it first. This could e
On 08/23/2018 02:32 PM, Nicholas Piggin wrote:
> On Thu, 23 Aug 2018 14:13:13 +0530
> Mahesh Jagannath Salgaonkar wrote:
>
>> On 08/20/2018 05:04 PM, Nicholas Piggin wrote:
>>> On Sun, 19 Aug 2018 22:38:39 +0530
>>> Mahesh J Salgaonkar wrote:
>>>
&
On 08/23/2018 05:35 PM, Michael Ellerman wrote:
> Mahesh Jagannath Salgaonkar writes:
>
>> On 08/23/2018 12:14 PM, Michael Ellerman wrote:
>>> Mahesh J Salgaonkar writes:
>>>
>>>> From: Mahesh Salgaonkar
>>>>
>>>> With t
On 08/20/2018 05:04 PM, Nicholas Piggin wrote:
> On Sun, 19 Aug 2018 22:38:39 +0530
> Mahesh J Salgaonkar wrote:
>
>> From: Mahesh Salgaonkar
>>
>> Now that other platforms also implements real mode mce handler,
>> lets consolidate the code by sharing existing powernv machine check
>> early code
On 08/23/2018 12:14 PM, Michael Ellerman wrote:
> Mahesh J Salgaonkar writes:
>
>> From: Mahesh Salgaonkar
>>
>> With the powerpc next commit e7e81847478 (powerpc/mce: Fix SLB rebolting
>> during MCE recovery path.),
>
> That commit description is wrong, I'll fix it up.
Ouch.. My bad.. :-(
>
On 08/23/2018 10:26 AM, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar
>
> With the powrpc next commit e7e81847478 (poewrpc/mce: Fix SLB rebolting
> during MCE recovery path.), the SLB error recovery is broken. The new
> change now does not add index value to RB[52-63] that selects the SLB
On 08/21/2018 03:57 PM, Nicholas Piggin wrote:
> On Fri, 17 Aug 2018 14:51:47 +0530
> Mahesh J Salgaonkar wrote:
>
>> From: Mahesh Salgaonkar
>>
>> With the powrpc next commit e7e81847478 (poewrpc/mce: Fix SLB rebolting
>> during MCE recovery path.), the SLB error recovery is broken. The
>> comm
On 08/16/2018 09:44 AM, Michael Ellerman wrote:
> Mahesh Jagannath Salgaonkar writes:
>> On 08/08/2018 08:12 PM, Michael Ellerman wrote:
> ...
>>>
>>>> + union {
>>>> + struct {
>>>> + uint8_t ue_err_type;
>>
On 08/13/2018 07:57 PM, Nicholas Piggin wrote:
> On Mon, 13 Aug 2018 09:47:04 +0530
> Mahesh Jagannath Salgaonkar wrote:
>
>> On 08/11/2018 10:03 AM, Nicholas Piggin wrote:
>>> On Tue, 07 Aug 2018 19:47:39 +0530
>>> Mahesh J Salgaonkar wrote:
>>>
>
On 08/10/2018 12:12 PM, Nicholas Piggin wrote:
> The machine check code that flushes and restores bolted segments in
> real mode belongs in mm/slb.c. This will also be used by pseries
> machine check and idle code in future changes.
>
> Signed-off-by: Nicholas Piggin
>
> Since v1:
> - Restore th
On 08/11/2018 10:03 AM, Nicholas Piggin wrote:
> On Tue, 07 Aug 2018 19:47:39 +0530
> Mahesh J Salgaonkar wrote:
>
>> From: Mahesh Salgaonkar
>>
>> If we get a machine check exceptions due to SLB errors then dump the
>> current SLB contents which will be very much helpful in debugging the
>> roo
On 08/10/2018 04:02 PM, Mahesh Jagannath Salgaonkar wrote:
> On 08/09/2018 06:35 AM, Michael Ellerman wrote:
>> Mahesh J Salgaonkar writes:
>>
>>> diff --git a/arch/powerpc/include/asm/paca.h
>>> b/arch/powerpc/include/asm/paca.h
>>> index 7f22929ce915..
On 08/09/2018 06:35 AM, Michael Ellerman wrote:
> Mahesh J Salgaonkar writes:
>
>> diff --git a/arch/powerpc/include/asm/paca.h
>> b/arch/powerpc/include/asm/paca.h
>> index 7f22929ce915..233d25ff6f64 100644
>> --- a/arch/powerpc/include/asm/paca.h
>> +++ b/arch/powerpc/include/asm/paca.h
>> @@
On 08/08/2018 02:34 PM, Nicholas Piggin wrote:
> On Tue, 07 Aug 2018 19:47:14 +0530
> Mahesh J Salgaonkar wrote:
>
>> From: Mahesh Salgaonkar
>>
>> On pseries, as of today system crashes if we get a machine check
>> exceptions due to SLB errors. These are soft errors and can be fixed by
>> flush
On 08/07/2018 10:24 PM, Michal Suchánek wrote:
> Hello,
>
>
> On Tue, 07 Aug 2018 19:47:14 +0530
> "Mahesh J Salgaonkar" wrote:
>
>> From: Mahesh Salgaonkar
>>
>> On pseries, as of today system crashes if we get a machine check
>> exceptions due to SLB errors. These are soft errors and can be
On 08/08/2018 08:12 PM, Michael Ellerman wrote:
> Hi Mahesh,
>
> A few nitpicks.
>
> Mahesh J Salgaonkar writes:
>> From: Mahesh Salgaonkar
>>
>> On pseries, the machine check error details are part of RTAS extended
>> event log passed under Machine check exception section. This patch adds
>> t
On 08/07/2018 02:12 AM, Hari Bathini wrote:
> With dynamic memory allocation support for crash memory ranges array,
> there is no hard limit on the no. of crash memory ranges kernel could
> export, but program headers count could overflow in the /proc/vmcore
> ELF file while exporting each memory r
On 08/07/2018 02:12 AM, Hari Bathini wrote:
> Crash memory ranges is an array of memory ranges of the crashing kernel
> to be exported as a dump via /proc/vmcore file. The size of the array
> is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases
> where memblock memory regions co
On 07/31/2018 07:26 PM, Hari Bathini wrote:
> Crash memory ranges is an array of memory ranges of the crashing kernel
> to be exported as a dump via /proc/vmcore file. The size of the array
> is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases
> where memblock memory regions co
On 08/01/2018 11:28 AM, Nicholas Piggin wrote:
> On Wed, 04 Jul 2018 23:28:21 +0530
> Mahesh J Salgaonkar wrote:
>
>> From: Mahesh Salgaonkar
>>
>> On pseries, as of today system crashes if we get a machine check
>> exceptions due to SLB errors. These are soft errors and can be fixed by
>> flush
On 07/17/2018 05:22 PM, Michal Hocko wrote:
> On Tue 17-07-18 16:58:10, Mahesh Jagannath Salgaonkar wrote:
>> On 07/16/2018 01:56 PM, Michal Hocko wrote:
>>> On Mon 16-07-18 11:32:56, Mahesh J Salgaonkar wrote:
>>>> One of the primary issues with Firmware Assisted
On 07/16/2018 01:56 PM, Michal Hocko wrote:
> On Mon 16-07-18 11:32:56, Mahesh J Salgaonkar wrote:
>> One of the primary issues with Firmware Assisted Dump (fadump) on Power
>> is that it needs a large amount of memory to be reserved. This reserved
>> memory is used for saving the contents of old c
On 07/03/2018 08:55 AM, Nicholas Piggin wrote:
> On Mon, 02 Jul 2018 11:16:29 +0530
> Mahesh J Salgaonkar wrote:
>
>> From: Mahesh Salgaonkar
>>
>> rtas_log_buf is a buffer to hold RTAS event data that are communicated
>> to kernel by hypervisor. This buffer is then used to pass RTAS event
>> da
On 07/03/2018 03:38 AM, Nicholas Piggin wrote:
> On Mon, 02 Jul 2018 11:17:06 +0530
> Mahesh J Salgaonkar wrote:
>
>> From: Mahesh Salgaonkar
>>
>> On pseries, as of today system crashes if we get a machine check
>> exceptions due to SLB errors. These are soft errors and can be fixed by
>> flush
On 06/29/2018 02:35 AM, kbuild test robot wrote:
> Hi Mahesh,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on powerpc/next]
> [also build test ERROR on v4.18-rc2 next-20180628]
> [if your patch is applied to the wrong git tree, please drop us a note to
> help
On 06/28/2018 06:49 PM, Laurent Dufour wrote:
> On 28/06/2018 13:10, Mahesh J Salgaonkar wrote:
>> From: Mahesh Salgaonkar
>>
>> rtas_log_buf is a buffer to hold RTAS event data that are communicated
>> to kernel by hypervisor. This buffer is then used to pass RTAS event
>> data to user through pr
On 06/12/2018 07:17 PM, Michael Ellerman wrote:
> Mahesh J Salgaonkar writes:
>> diff --git a/arch/powerpc/platforms/pseries/ras.c
>> b/arch/powerpc/platforms/pseries/ras.c
>> index 2edc673be137..e56759d92356 100644
>> --- a/arch/powerpc/platforms/pseries/ras.c
>> +++ b/arch/powerpc/platforms/pse
On 06/08/2018 12:20 PM, Michael Ellerman wrote:
> Mahesh J Salgaonkar writes:
>> From: Mahesh Salgaonkar
>>
>> During Machine Check interrupt on pseries platform, register r3 points
>> RTAS extended event log passed by hypervisor. Since hypervisor uses r3
>> to pass pointer to rtas log, it stores
On 06/08/2018 07:21 AM, Nicholas Piggin wrote:
> On Thu, 07 Jun 2018 22:59:04 +0530
> Mahesh J Salgaonkar wrote:
>
>> From: Mahesh Salgaonkar
>>
>> Extract the MCE error details from RTAS extended log and display it to
>> console.
>>
>> With this patch you should now see mce logs like below:
>>
On 06/08/2018 07:18 AM, Nicholas Piggin wrote:
> On Thu, 07 Jun 2018 22:58:55 +0530
> Mahesh J Salgaonkar wrote:
>
>> From: Mahesh Salgaonkar
>>
>> If we get a machine check exceptions due to SLB errors then dump the
>> current SLB contents which will be very much helpful in debugging the
>> roo
On 06/08/2018 07:01 AM, Nicholas Piggin wrote:
> On Thu, 07 Jun 2018 22:58:11 +0530
> Mahesh J Salgaonkar wrote:
>
>> From: Mahesh Salgaonkar
>>
>> rtas_log_buf is a buffer to hold RTAS event data that are communicated
>> to kernel by hypervisor. This buffer is then used to pass RTAS event
>> da
On 06/07/2018 04:15 PM, Nicholas Piggin wrote:
> On Thu, 07 Jun 2018 15:36:25 +0530
> Mahesh J Salgaonkar wrote:
>
>> This patch series includes some improvement to Machine check handler
>> for pseries. Patch 1 fixes an issue where machine check handler crashes
>> kernel while accessing vmalloc-e
On 04/26/2018 07:10 PM, Nicholas Piggin wrote:
> On Thu, 26 Apr 2018 18:35:10 +0530
> Mahesh Jagannath Salgaonkar wrote:
>
>> On 04/26/2018 06:28 PM, Nicholas Piggin wrote:
>>> On Thu, 26 Apr 2018 17:12:03 +0530
>>> Mahesh J Salgaonkar wrote:
&g
On 04/26/2018 06:28 PM, Nicholas Piggin wrote:
> On Thu, 26 Apr 2018 17:12:03 +0530
> Mahesh J Salgaonkar wrote:
>
>> From: Mahesh Salgaonkar
>>
>> otherwise the fadump registration in new kexec-ed kernel complains that
>> fadump is already registered. This makes new kernel to continue using
>>
On 04/23/2018 04:44 PM, Balbir Singh wrote:
> On Mon, Apr 23, 2018 at 8:33 PM, Mahesh Jagannath Salgaonkar
> wrote:
>> On 04/23/2018 12:21 PM, Balbir Singh wrote:
>>> On Mon, Apr 23, 2018 at 2:59 PM, Mahesh J Salgaonkar
>>> wrote:
>>>> From: Mahesh Salga
On 04/23/2018 12:21 PM, Balbir Singh wrote:
> On Mon, Apr 23, 2018 at 2:59 PM, Mahesh J Salgaonkar
> wrote:
>> From: Mahesh Salgaonkar
>>
>> The current code extracts the physical address for UE errors and then
>> hooks it up into memory failure infrastructure. On successful extraction
>> of phys
On 04/22/2018 07:28 AM, Nicholas Piggin wrote:
> On Fri, 20 Apr 2018 10:34:35 +0530
> Mahesh J Salgaonkar wrote:
>
>> From: Mahesh Salgaonkar
>>
>> otherwise the fadump registration in new kexec-ed kernel complains that
>> fadump is already registered. This makes new kernel to continue using
>>
On 04/22/2018 07:28 AM, Nicholas Piggin wrote:
> On Fri, 20 Apr 2018 10:34:18 +0530
> Mahesh J Salgaonkar wrote:
>
>> From: Mahesh Salgaonkar
>>
>> Currently the metadata region that holds crash info structure and ELF core
>> header is placed towards the end of reserved memory area. This patch p
On 04/10/2018 07:11 PM, Hari Bathini wrote:
> FADump capture kernel boots in restricted memory environment preserving
> the context of previous kernel to save vmcore. Supporting hugepages in
> such environment makes things unnecessarily complicated, as hugepages
> need memory set aside for them. Th
On 04/04/2018 12:56 AM, Hari Bathini wrote:
> Mahesh, I think we should explicitly document that production and
> capture kernel
> versions should be same. For changes like below, older/newer production
> kernel vs
> capture kernel is bound to fail. Of course, production and capture
> kernel versio
On 04/03/2018 03:21 PM, Hari Bathini wrote:
>
>
> On Monday 02 April 2018 12:00 PM, Mahesh J Salgaonkar wrote:
>> From: Mahesh Salgaonkar
>>
>> The second kernel, during early boot after the crash, reserves rest of
>> the memory above boot memory size to make sure it does not touch any
>> of the
On 04/03/2018 08:48 AM, Pingfan Liu wrote:
> I think CMA has protected us from hot-remove, so this patch is not necessary.
Yes, but only if the memory from declared CMA region is allocated using
cma_alloc(). The rest of the memory inside CMA region which hasn't been
cma_allocat-ed can still be hot
On 11/23/2017 04:26 AM, Balbir Singh wrote:
> On Thu, Nov 23, 2017 at 4:32 AM, Mahesh J Salgaonkar
> wrote:
>> From: Mahesh Salgaonkar
>>
>> Rebooting into a new kernel with kexec fails in trace_tlbie() which is
>> called from native_hpte_clear(). This happens if the running kernel has
>> CONFIG_
On 11/23/2017 12:37 AM, Naveen N. Rao wrote:
> Mahesh J Salgaonkar wrote:
>> From: Mahesh Salgaonkar
>>
>> Rebooting into a new kernel with kexec fails in trace_tlbie() which is
>> called from native_hpte_clear(). This happens if the running kernel has
>> CONFIG_LOCKDEP enabled. With lockdep enabl
On 09/05/2017 09:45 AM, Balbir Singh wrote:
> Walk the page table for NIP and extract the instruction. Then
> use the instruction to find the effective address via analyse_instr().
>
> We might have page table walking races, but we expect them to
> be rare, the physical address extraction is best
On 07/19/2017 12:29 PM, Nicholas Piggin wrote:
> There are quite a few machine check exceptions that can be caused by
> kernel bugs. To make debugging easier, use the kernel crash path in
> cases of synchronous machine checks that occur in kernel mode, if that
> would not result in the machine goin
On 07/19/2017 12:29 PM, Nicholas Piggin wrote:
> Unrecovered MCE and HMI errors are sent through a special restart OPAL
> call to log the platform error. The downside is that they don't go
> through normal Linux crash paths, so they don't give much information
> to the Linux console.
>
> Change th
On 07/06/2017 11:26 PM, Nicholas Piggin wrote:
> On Wed, 5 Jul 2017 14:04:19 +1000
> Nicholas Piggin wrote:
>
>> Unrecovered MCE and HMI errors are sent through a special restart
>> OPAL call to log the platform error. The downside is that they don't
>> go through normal crash paths, so they don
On 07/05/2017 09:26 AM, Nicholas Piggin wrote:
> If fadump is not registered, and no other crash or debug handlers are
> registered, the powerpc panic handler stops the guest before the generic
> panic code can push out debug information to the console.
>
> Currently, system reset injection causes
On 07/04/2017 03:39 PM, Nicholas Piggin wrote:
> If fadump is not registered, and no other crash or debug handlers are
> registered, the powerpc panic handler stops the guest before the generic
> panic code can push out debug information to the console.
>
> Without this patch, system reset injecti
On 05/27/2017 09:16 PM, Michal Suchanek wrote:
> - log an error message when registration fails and no error code listed
> in the switch is returned
> - translate the hv error code to posix error code and return it from
> fw_register
> - return the posix error code from fw_register to the proc
On 05/04/2017 11:24 PM, Hari Bathini wrote:
> To register fadump, boot memory area - the size of low memory chunk that
> is required for a kernel to boot successfully when booted with restricted
> memory, is assumed to have no holes. But this memory area is currently
> not protected from hot-remove
On 05/04/2017 11:23 PM, Hari Bathini wrote:
> fadump sets up crash memory ranges to be used for creating PT_LOAD
> program headers in elfcore header. Memory chunk RMA_START through
> boot memory area size is added as the first memory range because
> firmware, at the time of crash, moves this memory
On 04/26/2017 12:41 PM, Dave Young wrote:
> Ccing ppc list
> On 04/20/17 at 07:39pm, Xunlei Pang wrote:
>> vmcoreinfo_max_size stands for the vmcoreinfo_data, the
>> correct one we should use is vmcoreinfo_note whose total
>> size is VMCOREINFO_NOTE_SIZE.
>>
>> Like explained in commit 77019967f06b
On 04/21/2017 09:37 AM, Michael Ellerman wrote:
> Daniel Axtens writes:
>>> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
>>> index a1475e6..b23b323 100644
>>> --- a/arch/powerpc/kernel/mce.c
>>> +++ b/arch/powerpc/kernel/mce.c
>>> @@ -221,6 +221,8 @@ static void machine_check
On 04/17/2017 04:09 PM, Daniel Axtens wrote:
> Hi Mahesh,
>
>> Fixes: 27ea2c420cad powerpc: Set the correct kernel taint on machine check
>> errors.
>
> I notice this Fixes a commit I introduced. Please could you cc me when
> you do this? I am likely to miss it otherwise, especially since I have
On 04/06/2017 10:52 AM, David Gibson wrote:
> On Thu, Apr 06, 2017 at 02:17:22AM +0530, Mahesh J Salgaonkar wrote:
>> From: Mahesh Salgaonkar
>>
>> This patch introduces a mce hook which is invoked at the time of guest
>> exit to facilitate the host-side handling of machine check exception
>> befo
On 03/30/2017 05:39 AM, Nicholas Piggin wrote:
> On Tue, 28 Mar 2017 19:15:28 +0530
> Mahesh J Salgaonkar wrote:
>
>> From: Mahesh Salgaonkar
>>
>> For MCE that hit while in use mode MSR(HV=1,PR=1), print the task info on the
>> console MCE error log. This will help to identify application that
On 03/16/2017 06:49 PM, Gautham R Shenoy wrote:
> Hi,
>
> On Thu, Mar 16, 2017 at 11:05:20PM +1000, Nicholas Piggin wrote:
>> On Thu, 16 Mar 2017 18:10:48 +0530
>> Mahesh Jagannath Salgaonkar wrote:
>>
>>> On 03/14/2017 02:53 PM, Nicholas Piggin wrote:
>
On 03/14/2017 02:53 PM, Nicholas Piggin wrote:
> The ISA specifies power save wakeup can cause a machine check interrupt.
> The machine check handler currently has code to handle that for POWER8,
> but POWER9 crashes when trying to execute the P8 style sleep
> instructions.
>
> So queue up the mac
On 02/28/2017 07:30 AM, Nicholas Piggin wrote:
> Add POWER9 machine check handler. There are several new types of errors
> added, so logging messages for those are also added.
>
> This doesn't attempt to reuse any of the P7/8 defines or functions,
> because that becomes too complex. The better opt
On 02/28/2017 07:30 AM, Nicholas Piggin wrote:
> A synchronous machine check is an exception raised by the attempt to
> execute the current instruction. If the error can't be corrected, it
> can make sense to SIGBUS the currently running process.
>
> In other cases, the error condition is not rela
On 02/21/2017 08:05 AM, Nicholas Piggin wrote:
> On Tue, 21 Feb 2017 07:21:56 +0530
> Mahesh J Salgaonkar wrote:
>
>> +enum MCE_TlbErrorType {
>> +MCE_TLB_ERROR_INDETERMINATE = 0,
>> +MCE_TLB_ERROR_PARITY = 1,
>> +MCE_TLB_ERROR_MULTIHIT = 2,
>> +MCE_TLB_ERROR_TLBIEL_PROG_ERROR = 3
On 02/21/2017 08:17 AM, Nicholas Piggin wrote:
> On Tue, 21 Feb 2017 07:22:56 +0530
> Mahesh J Salgaonkar wrote:
>
>> From: Mahesh Salgaonkar
>>
>> Delay it until we are done with machine_check_early() call. Turn on MSR[ME]
>> once opal is done with processing MCE.
>
> Why? This seems like quit
On 01/30/2017 10:14 PM, Hari Bathini wrote:
> In case of fadump, capture (fadump) kernel boots like a normal kernel.
> While this has its advantages, the capture kernel would initialize all
> the components like normal kernel, which may not necessarily be needed
> for a typical dump capture kernel.
On 01/16/2017 10:05 AM, Paul Mackerras wrote:
> On Fri, Jan 13, 2017 at 04:51:45PM +0530, Aravinda Prasad wrote:
>> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
>> exit reason upon a machine check exception (MCE) in
>> the guest address space if the KVM_CAP_PPC_FWNMI
>> capability is enabled
On 01/05/2017 11:02 PM, Hari Bathini wrote:
> fadump supports specifying memory to reserve for fadump's crash kernel
> with fadump_reserve_mem kernel parameter. This parameter currently
> supports passing a fixed memory size, like fadump_reserve_mem=
> only. This patch aims to add support for other
On 01/05/2017 11:02 PM, Hari Bathini wrote:
> Now that crashkernel parameter parsing and vmcoreinfo related code is
> moved under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE, remove
> dependency with CONFIG_KEXEC for CONFIG_FA_DUMP. While here, get rid
> of definitions of fadump_append_elf_note(
On 10/13/2016 07:47 AM, Nicholas Piggin wrote:
> This patch does a couple of things. First of all, powernv immediately
> explodes when running a relocated kernel, because the system reset
> exception for handling sleeps does not do correct relocated branches.
>
> Secondly, the sleep handling code
On 10/01/2016 04:11 PM, Anton Blanchard wrote:
> From: Anton Blanchard
>
> An hcall was recently added that does exactly what we need
> during kexec - it clears the entire MMU hash table, ignoring any
> VRMA mappings.
>
> Try it and fall back to the old method if we get a failure.
>
> On a POWE
On 10/10/2016 04:22 PM, Michael Ellerman wrote:
> Mahesh J Salgaonkar writes:
>
>> From: Mahesh Salgaonkar
>>
>> There are chances that multiple CPUs can call crash_fadump() simultaneously
>> and would start duplicating same info to vmcoreinfo ELF note section. This
>> causes makedumpfile to fai
On 09/08/2016 12:30 PM, Mahesh Jagannath Salgaonkar wrote:
> On 09/06/2016 11:02 AM, Daniel Axtens wrote:
>> Firmware Assisted Dump is a facility to dump kernel core with assistance
>> from firmware. As part of this process the kernel ELF version is
>> stored.
>>
>
On 09/06/2016 11:02 AM, Daniel Axtens wrote:
> Firmware Assisted Dump is a facility to dump kernel core with assistance
> from firmware. As part of this process the kernel ELF version is
> stored.
>
> Currently, fadump.h defines this to 0 if it is not already defined. This
> clashes with a define
On 08/10/2016 04:18 PM, Nicholas Piggin wrote:
> MCE must not use PACA_EXGEN. When a general exception enables MSR_RI,
> that means SPRN_SRR[01] and SPRN_SPRG are no longer used. However the
> PACA save area is still in use.
> ---
> arch/powerpc/kernel/exceptions-64s.S | 8
> 1 file chang
On 08/08/2016 02:28 PM, Michael Ellerman wrote:
> Mahesh J Salgaonkar writes:
>
>> From: Mahesh Salgaonkar
>>
>> The current implementation of MCE early handling modifies CR0/1 registers
>> without saving its old values. Fix this by moving early check for
>> powersaving mode to machine_check_han
On 08/06/2016 04:08 AM, Benjamin Herrenschmidt wrote:
> On Fri, 2016-08-05 at 19:13 +0530, Mahesh J Salgaonkar wrote:
>> From: Mahesh Salgaonkar
>>
>> The function pnv_restore_hyp_resource() loads the TOC into r2 from
>> the invalid PACA pointer before fixing r13 value. This do not affect
>> POWER
On 08/04/2016 03:27 PM, Michael Ellerman wrote:
> Mahesh J Salgaonkar writes:
>
>> From: Mahesh Salgaonkar
>>
>> When machine check occurs with MSR(RI=0), it means MC interrupt is
>> unrecoverable and kernel goes down to panic path. But the console
>> message still shows it as recovered. This pa
On 08/04/2016 01:35 PM, Greg KH wrote:
> On Thu, Aug 04, 2016 at 10:16:48AM +0530, Mahesh J Salgaonkar wrote:
>> From: Mahesh Salgaonkar
>>
>> When machine check occurs with MSR(RI=0), it means MC interrupt is
>> unrecoverable and kernel goes down to panic path. But the console
>> message still sh
1 - 100 of 126 matches
Mail list logo