On 3/14/20 9:18 AM, Nicholas Piggin wrote:
Ganesh Goudar's on March 14, 2020 12:04 am:
MCE handling on pSeries platform fails as recent rework to use common
code for pSeries and PowerNV in machine check error handling tries to
access per-cpu variables in realmode. The per-cpu variables m
On 3/17/20 3:31 PM, Nicholas Piggin wrote:
Ganesh's on March 16, 2020 9:47 pm:
On 3/14/20 9:18 AM, Nicholas Piggin wrote:
Ganesh Goudar's on March 14, 2020 12:04 am:
MCE handling on pSeries platform fails as recent rework to use common
code for pSeries and PowerNV in machine c
On 3/20/20 8:11 AM, Nicholas Piggin wrote:
Ganesh's on March 18, 2020 12:35 am:
On 3/17/20 3:31 PM, Nicholas Piggin wrote:
Ganesh's on March 16, 2020 9:47 pm:
On 3/14/20 9:18 AM, Nicholas Piggin wrote:
Ganesh Goudar's on March 14, 2020 12:04 am:
MCE handling on pSeries pl
On 3/20/20 8:58 PM, Nicholas Piggin wrote:
rtas_call allocates and uses memory in failure paths, which is
not safe for RMA. It also calls local_irq_save() which may not be safe
in all real mode contexts.
Particularly machine check may run with interrupts not "reconciled",
and it may have hit w
On 3/24/20 10:57 AM, Michael Ellerman wrote:
Ganesh Goudar writes:
If we hit UE at an instruction with a fixup entry, flag to
ignore the event and set nip to continue execution at the
fixup entry.
You don't explain why we would want to do that. Or what the consequences
are if we *don&
On 3/20/20 4:31 PM, Ganesh Goudar wrote:
MCE handling on pSeries platform fails as recent rework to use common
code for pSeries and PowerNV in machine check error handling tries to
access per-cpu variables in realmode. The per-cpu variables may be
outside the RMO region on pSeries platform and
On 4/3/20 7:38 AM, Nicholas Piggin wrote:
Ganesh Goudar's on March 30, 2020 5:12 pm:
From: Santosh S
Introduce notification chain which lets know about uncorrected memory
errors(UE). This would help prospective users in pmem or nvdimm subsystem
to track bad blocks for better handli
On 3/30/20 12:42 PM, Ganesh Goudar wrote:
From: Santosh S
Introduce notification chain which lets know about uncorrected memory
errors(UE). This would help prospective users in pmem or nvdimm subsystem
to track bad blocks for better handling of persistent memory allocations.
Signed-off-by
On 8/17/22 11:28, Michael Ellerman wrote:
Sachin Sant writes:
Following crash is seen while running powerpc/mce subtest on
a Power10 LPAR.
1..1
# selftests: powerpc/mce: inject-ra-err
[ 155.240591] BUG: Unable to handle kernel data access on read at
0xc00e00022d55b503
[ 155.240618] Faultin
On 8/22/22 11:19, Michael Ellerman wrote:
So I guess the compiler has decided not to inline it (why?!), and it is
not marked noinstr, so it gets KASAN instrumentation which crashes in
real mode.
We'll have to make sure everything get_pseries_errorlog() is either
forced inline, or marked noinstr
On 8/22/22 11:01, Sachin Sant wrote:
On 19-Aug-2022, at 10:12 AM, Ganesh wrote
We'll have to make sure everything get_pseries_errorlog() is either
forced inline, or marked noinstr.
Making the following functions always_inline and noinstr is fixing the issue.
__always_i
On 9/2/22 05:49, Jason Gunthorpe wrote:
On Tue, Aug 16, 2022 at 08:57:13AM +0530, Ganesh Goudar wrote:
Hi,
EEH reocvery is currently serialized and these patches shorten
the time taken for EEH recovery by making the recovery to run
in parallel. The original author of these patches is Sam
On 9/7/22 09:49, Nicholas Piggin wrote:
On Mon Sep 5, 2022 at 4:38 PM AEST, Ganesh Goudar wrote:
Part of machine check error handling is done in realmode,
As of now instrumentation is not possible for any code that
runs in realmode.
When MCE is injected on KASAN enabled kernel, crash is
On 9/17/20 5:59 PM, Michal Suchánek wrote:
Hello,
On Wed, Sep 16, 2020 at 10:52:25PM +0530, Ganesh Goudar wrote:
This patch series fixes mce handling for pseries, provides debugfs
interface for mce injection and adds selftest to test mce handling
on pseries/powernv machines running in hash mmu
On 9/17/20 5:50 PM, Michal Suchánek wrote:
Hello,
On Wed, Sep 16, 2020 at 10:52:26PM +0530, Ganesh Goudar wrote:
Use of nmi_enter/exit in real mode handler causes the kernel to panic
and reboot on injecting slb mutihit on pseries machine running in hash
mmu mode, As these calls try to
On 9/17/20 5:53 PM, Michal Suchánek wrote:
Hello,
On Wed, Sep 16, 2020 at 10:52:27PM +0530, Ganesh Goudar wrote:
To test machine check handling, add debugfs interface to inject
slb multihit errors.
To inject slb multihit:
#echo 1 > /sys/kernel/debug/powerpc/mce_error_inj
On 9/18/20 12:10 PM, Michael Ellerman wrote:
Hi Ganesh,
Ganesh Goudar writes:
To test machine check handling, add debugfs interface to inject
slb multihit errors.
To inject slb multihit:
#echo 1 > /sys/kernel/debug/powerpc/mce_error_inject/inject_slb_multihit
Rather than creating a
On 9/26/20 1:27 AM, Kees Cook wrote:
On Fri, Sep 25, 2020 at 04:01:22PM +0530, Ganesh Goudar wrote:
Add support to inject slb multihit errors, to test machine
check handling.
Thank you for more tests in here!
Based on work by Mahesh Salgaonkar and Michal Suchánek.
Cc: Mahesh Salgaonkar
On 9/26/20 1:29 AM, Kees Cook wrote:
On Fri, Sep 25, 2020 at 04:01:23PM +0530, Ganesh Goudar wrote:
Add PPC_SLB_MULTIHIT to lkdtm selftest framework.
Signed-off-by: Ganesh Goudar
---
tools/testing/selftests/lkdtm/tests.txt | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing
On 10/1/20 11:21 PM, Ganesh Goudar wrote:
Use of nmi_enter/exit in real mode handler causes the kernel to panic
and reboot on injecting slb mutihit on pseries machine running in hash
mmu mode, As these calls try to accesses memory outside RMO region in
real mode handler where translation is
On 10/16/20 5:02 PM, Michael Ellerman wrote:
On Fri, 9 Oct 2020 12:10:03 +0530, Ganesh Goudar wrote:
This patch series fixes mce handling for pseries, Adds LKDTM test
for SLB multihit recovery and enables selftest for the same,
basically to test MCE handling on pseries/powernv machines running
On 7/24/20 12:09 PM, Ganesh Goudar wrote:
When an UE or memory error exception is encountered the MCE handler
tries to find the pfn using addr_to_pfn() which takes effective
address as an argument, later pfn is used to poison the page where
memory error occurred, recent rework in this area made
On 4/17/21 6:06 PM, Michael Ellerman wrote:
Ganesh Goudar writes:
The error type is ICACHE and DCACHE, for case MCE_ERROR_TYPE_ICACHE.
Do you mean "is ICACHE not DCACHE" ?
Right :), Should I send v2 ?
cheers
Signed-off-by: Ganesh Goudar
---
arch/powerpc/platforms/pseries
On 4/20/21 12:54 PM, Santosh Sivaraj wrote:
Hi Ganesh,
Ganesh Goudar writes:
When we hit an UE while using machine check safe copy routines,
ignore_event flag is set and the event is ignored by mce handler,
And the flag is also saved for defered handling and printing of
mce event
On 4/7/21 10:28 AM, Ganesh Goudar wrote:
When we hit an UE while using machine check safe copy routines,
ignore_event flag is set and the event is ignored by mce handler,
And the flag is also saved for defered handling and printing of
mce event information, But as of now saving of this flag is
On 4/22/21 11:31 AM, Ganesh wrote:
On 4/7/21 10:28 AM, Ganesh Goudar wrote:
When we hit an UE while using machine check safe copy routines,
ignore_event flag is set and the event is ignored by mce handler,
And the flag is also saved for defered handling and printing of
mce event information
On 7/21/20 3:38 PM, Nicholas Piggin wrote:
Excerpts from Ganesh Goudar's message of July 20, 2020 6:03 pm:
When an UE or memory error exception is encountered the MCE handler
tries to find the pfn using addr_to_pfn() which takes effective
address as an argument, later pfn is used to p
On 8/24/21 12:09 PM, Michael Ellerman wrote:
Hi Ganesh,
Some comments below ...
Ganesh Goudar writes:
Add support to parse and log control memory access
error for pseries.
Signed-off-by: Ganesh Goudar
---
v2: No changes in this patch.
---
arch/powerpc/platforms/pseries/ras.c | 21
On 8/24/21 6:18 PM, Michael Ellerman wrote:
Ganesh Goudar writes:
Add test for real address or control memory address access
error handling, using NX-GZIP engine.
The error is injected by accessing the control memory address
using illegal instruction, on successful handling the process
On 8/25/21 2:54 AM, Segher Boessenkool wrote:
On Tue, Aug 24, 2021 at 04:39:57PM +1000, Michael Ellerman wrote:
+ case MC_ERROR_CTRL_MEM_ACCESS_PTABLE_WALK:
+ mce_err.u.ra_error_type =
+ MCE_RA_ERROR_PAGE_TABLE_WALK_LOAD_STORE_FO
On 8/26/21 8:57 AM, Michael Ellerman wrote:
Ganesh writes:
On 8/24/21 6:18 PM, Michael Ellerman wrote:
Ganesh Goudar writes:
Add test for real address or control memory address access
error handling, using NX-GZIP engine.
The error is injected by accessing the control memory address
On 9/6/21 6:03 PM, Michael Ellerman wrote:
Ganesh Goudar writes:
We queue an irq work for deferred processing of mce event
in realmode mce handler, where translation is disabled.
Queuing of the work may result in accessing memory outside
RMO region, such access needs the translation to be
On 9/8/21 11:10 AM, Michael Ellerman wrote:
Ganesh writes:
On 9/6/21 6:03 PM, Michael Ellerman wrote:
Ganesh Goudar writes
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 5 PID: 1883 Comm: insmod Tainted: GOE 5.14.0
On 9/17/21 12:09 PM, Daniel Axtens wrote:
Hi Ganesh,
We queue an irq work for deferred processing of mce event
in realmode mce handler, where translation is disabled.
Queuing of the work may result in accessing memory outside
RMO region, such access needs the translation to be enabled
for an
On 8/6/21 6:53 PM, Ganesh Goudar wrote:
Check if the event info is valid before printing the
event information. When a fwnmi enabled nested kvm guest
hits a machine check exception L0 and L2 would generate
machine check event info, But L1 would not generate any
machine check event info as it
On 9/22/21 7:32 AM, Nicholas Piggin wrote:
The machine check handler is not considered NMI on 64s. The early
handler is the true NMI handler, and then it schedules the
machine_check_exception handler to run when interrupts are enabled.
This works fine except the case of an unrecoverable MCE, wh
On 9/6/21 14:13, Ganesh Goudar wrote:
Add support to parse and log control memory access
error for pseries. These changes are made according to
PAPR v2.11 10.3.2.2.12.
Signed-off-by: Ganesh Goudar
---
v3: Modify the commit log to mention the document according
to which changes are made
real mode?
If not, should we rename it as part of this patch?
patch 2/2, refactors this.
-
- /*
-* Queue irq work to log this rtas event later.
-* irq_work_queue uses per-cpu variables, so do this in virt
-* mode as well.
-*/
- irq_work_queue(&mce_errlog_process_work);
-
- mtmsr(msr);
-
return disposition;
}
Thanks for the review :) .
Ganesh
On 11/8/21 19:49, Nicholas Piggin wrote:
Excerpts from Ganesh Goudar's message of November 8, 2021 6:38 pm:
In realmode mce handler we use irq_work_queue() to defer
the processing of mce events, irq_work_queue() can only
be called when translation is enabled because it touches
memory ou
On 10/19/20 6:45 PM, Michal Suchánek wrote:
On Mon, Oct 19, 2020 at 09:59:57PM +1100, Michael Ellerman wrote:
Hi Ganesh,
Some comments below ...
Ganesh Goudar writes:
To check machine check handling, add support to inject slb
multihit errors.
Cc: Kees Cook
Reviewed-by: Michal Suchánek
On 12/8/20 4:01 PM, Michael Ellerman wrote:
Ganesh Goudar writes:
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 9454d29ff4b4..4769954efa7d 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -273,6 +274,17 @@ struct
On 1/19/21 9:28 AM, Nicholas Piggin wrote:
Excerpts from Ganesh Goudar's message of January 15, 2021 10:58 pm:
Access to per-cpu variables requires translation to be enabled on
pseries machine running in hash mmu mode, Since part of MCE handler
runs in realmode and part of MCE handling co
On 1/25/21 2:54 PM, Christophe Leroy wrote:
Le 22/01/2021 à 13:32, Ganesh Goudar a écrit :
Access to per-cpu variables requires translation to be enabled on
pseries machine running in hash mmu mode, Since part of MCE handler
runs in realmode and part of MCE handling code is shared between
Hi mpe, Any comments on this patchset?
On 8/5/21 2:50 PM, Ganesh Goudar wrote:
Add support to parse and log control memory access
error for pseries.
Signed-off-by: Ganesh Goudar
---
v2: No changes in this patch.
---
arch/powerpc/platforms/pseries/ras.c | 21 +
1 file
On 11/24/21 18:33, Nicholas Piggin wrote:
Excerpts from Ganesh Goudar's message of November 24, 2021 7:54 pm:
In realmode mce handler we use irq_work_queue() to defer
the processing of mce events, irq_work_queue() can only
be called when translation is enabled because it touches
memory ou
On 11/24/21 18:40, Nicholas Piggin wrote:
Excerpts from Ganesh Goudar's message of November 24, 2021 7:55 pm:
Now that we are no longer switching on the mmu in realmode
mce handler, Revert the commit 4ff753feab02("powerpc/pseries:
Avoid using addr_to_pfn in real mode") p
On 1/7/22 19:44, Ganesh Goudar wrote:
Add support to parse and log control memory access
error for pseries. These changes are made according to
PAPR v2.11 10.3.2.2.12.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/platforms/pseries/ras.c | 36
1 file changed
Add support to hwpoison the pages upon hitting machine check
exception.
This patch queues the address where UE is hit to percpu array
and schedules work to plumb it into memory poison infrastructure.
Reviewed-by: Mahesh Salgaonkar
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include/asm
48 2f8b0063 380b0001
---[ end trace 46fd63f36bbdd940 ]---
Fixes: 9ca766f9891d ("powerpc/64s/pseries: machine check convert to use common
event code")
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/exceptions-64s.S | 12
arch/powerpc/platforms/pseries/pseries.h
If we hit UE at an instruction with a fixup entry, flag to
ignore the event and set nip to continue execution at the
fixup entry.
For powernv this changes are already made by commit
895e3dceeb97 ("powerpc/mce: Handle UE event for memcpy_mcsafe")
Signed-off-by: Ganesh Goudar
---
ar
48 2f8b0063 380b0001
---[ end trace 46fd63f36bbdd940 ]---
Fixes: 9ca766f9891d ("powerpc/64s/pseries: machine check convert to use common
event code")
Reviewed-by: Mahesh Salgaonkar
Reviewed-by: Nicholas Piggin
Signed-off-by: Ganesh Goudar
---
v2: Avoid asm code to switch to virtual mo
eviewed-by: Santosh S
Signed-off-by: Ganesh Goudar
---
V2: Fixes a trivial checkpatch error in commit msg
---
arch/powerpc/platforms/pseries/ras.c | 8
1 file changed, 8 insertions(+)
diff --git a/arch/powerpc/platforms/pseries/ras.c
b/arch/powerpc/platforms/pseries/ras.c
index 5d
eviewed-by: Santosh S
Signed-off-by: Ganesh Goudar
---
V2: Fixes a trivial checkpatch error in commit msg.
V3: Use proper subject prefix.
---
arch/powerpc/platforms/pseries/ras.c | 8
1 file changed, 8 insertions(+)
diff --git a/arch/powerpc/platforms/pseries/ras.c
b/arch/powerpc/platfor
t for memcpy_mcsafe")
Reviewed-by: Mahesh Salgaonkar
Reviewed-by: Santosh S
Signed-off-by: Ganesh Goudar
---
V2: Fixes a trivial checkpatch error in commit msg.
V3: Use proper subject prefix.
V4: Rephrase the commit message.
Define a common function to update nip with fixup address.
From: Santosh S
Introduce notification chain which lets know about uncorrected memory
errors(UE). This would help prospective users in pmem or nvdimm subsystem
to track bad blocks for better handling of persistent memory allocations.
Signed-off-by: Santosh S
Signed-off-by: Ganesh Goudar
mce_handle_ierror() and mce_handle_derror() has some duplicate
code to recover from the recoverable MCE errors and to get the
MCE error sub-type while generating MCE error info, Add helper
functions to remove it.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/mce_power.c | 136
thereby
avoid poisoning the memory in host.
Reviewed-by: Mahesh Salgaonkar
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/mce_power.c | 10 +-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
index
Hi All,
I've already sent this almost before 6-7 hours, but the
mail did not appear on the Aug 2009 archives, So I'm sending
it again. Sorry for this!!. Thanks in advance.
I'm working on MPC860 with Linux Kernel 2.4.18.
As I'm fine tuning the FEC(Fast Ethernet Controller) driver,
I came a
Hi all,
I'm working on MPC860 with Linux Kernel 2.4.18.
As I'm fine tuning the FEC(Fast Ethernet Controller) driver,
I came across the receive side processing of the ethernet frames
where in the Rx BD rings are preallocated with the buffers and each time
a new frame is received, the whole
he fec interrupts will not get updated there(initially it used to)
again it resumes after some 45-60 seconds and the sequence repeats.
Dunno what's happening with in the FEC if configured in bridge mode
any clue on this, Thanks a lakh in advance.
--Ganesh
On Friday 28 August 2009 1
vram, call kmsg_dump()
before carrying out fadump or kdump.
Fixes: 4388c9b3a6ee ("powerpc: Do not send system reset request through the
oops path")
Reviewed-by: Mahesh Salgaonkar
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/traps.c | 1 +
1 file changed, 1 insertion(+)
diff
wed-by: Mahesh Salgaonkar
Reviewed-by: Nicholas Piggin
Signed-off-by: Ganesh Goudar
---
V2: Rephrasing the commit message
---
arch/powerpc/kernel/traps.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 11caa0291254..82f43535e686
, Please comment.
Thanks.
V2:
* Since we now have event list per phb, Have per phb event list lock.
* Appropriate names given to the locks.
* Remove stale comments (few more to be removed).
* Initialize event_id to 0 instead of 1.
* And some cosmetic changes.
Ganesh Goudar (3):
powerpc/eeh
blocking may be required. Care must be taken when ordering these locks
against the PCI rescan/remove lock and the device locks to avoid
deadlocking.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include/asm/eeh.h | 12 +-
arch/powerpc/kernel/eeh.c| 112
Based on the original work from Sam Bobroff.
Give a unique ID to each recovery event, to ease log parsing
and prepare for parallel recovery.
Also add some new messages with a very simple format that may
be useful to log-parsers.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include/asm
n the constraint, above, the driver handlers are called by
traversing the tree of affected PEs from the top, stopping to call
handlers (in parallel) when a PE with devices is discovered. When the
calls for that PE are complete, traversal continues at each child PE.
Signed-off-by: Ganesh Goudar
---
, Please comment.
Thanks.
Ganesh Goudar (3):
powerpc/eeh: Synchronization for safety
powerpc/eeh: Provide a unique ID for each EEH recovery
powerpc/eeh: Asynchronous recovery
arch/powerpc/include/asm/eeh.h | 7 +-
arch/powerpc/include/asm/eeh_event.h | 10
blocking may be required. Care must be taken when ordering these locks
against the PCI rescan/remove lock and the device locks to avoid
deadlocking.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include/asm/eeh.h | 6 +-
arch/powerpc/kernel/eeh.c| 112
n the constraint, above, the driver handlers are called by
traversing the tree of affected PEs from the top, stopping to call
handlers (in parallel) when a PE with devices is discovered. When the
calls for that PE are complete, traversal continues at each child PE.
Signed-off-by: Ganesh Goudar
---
Based on the original work from Sam Bobroff.
Give a unique ID to each recovery event, to ease log parsing
and prepare for parallel recovery.
Also add some new messages with a very simple format that may
be useful to log-parsers.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include/asm
KASAN
instrumentation.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include/asm/interrupt.h | 2 +-
arch/powerpc/include/asm/rtas.h | 4 ++--
arch/powerpc/kernel/rtas.c | 4 ++--
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/interrupt.h
b
KASAN
instrumentation.
Signed-off-by: Ganesh Goudar
---
v2: Force inline few more functions.
---
arch/powerpc/include/asm/hw_irq.h| 8
arch/powerpc/include/asm/interrupt.h | 2 +-
arch/powerpc/include/asm/rtas.h | 4 ++--
arch/powerpc/kernel/rtas.c | 4 ++--
4 files
KASAN
instrumentation.
Signed-off-by: Ganesh Goudar
---
v2: Force inline few more functions.
v3: Adding noinstr to few functions instead of __always_inline.
---
arch/powerpc/include/asm/hw_irq.h| 8
arch/powerpc/include/asm/interrupt.h | 2 +-
arch/powerpc/include/asm/rtas.h | 4
y. On powernv the improvement
is not so significant.
Ganesh Goudar (1):
powerpc/eeh: Enable PHBs to recovery in parallel
arch/powerpc/include/asm/eeh_event.h | 7 +
arch/powerpc/include/asm/pci-bridge.h | 4 +++
arch/powerpc/kernel/eeh_driver.c | 27 +--
arch/powerpc/k
.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/include/asm/eeh_event.h | 7 +
arch/powerpc/include/asm/pci-bridge.h | 4 +++
arch/powerpc/kernel/eeh_driver.c | 27 +--
arch/powerpc/kernel/eeh_event.c | 38 ++-
arch/powerpc/kernel/eeh_pe.c
/Store (foreign/control
memory) [Not recovered]
MCE: CPU24: PID: 1589811 Comm: inject-ra-err NIP: [1e48]
MCE: CPU24: Initiator CPU
MCE: CPU24: Unknown
RTAS: event: 5, Type: Platform Error (224), Severity: 3
Signed-off-by: Ganesh Goudar
Reviewed-by: Mahesh Salgaonkar
---
V2
NIP: [1e48]
MCE: CPU24: Initiator CPU
MCE: CPU24: Unknown
RTAS: event: 5, Type: Platform Error (224), Severity: 3
Signed-off-by: Ganesh Goudar
Reviewed-by: Mahesh Salgaonkar
---
V3: Rephrasing the commit message.
---
arch/powerpc/kernel/mce.c | 10 +++---
1 file changed, 7
ent failure)'
To fix the issue, set channel state to permanent failure after
notifying the drivers.
Fixes: 38ddc011478e ("powerpc/eeh: Make permanently failed devices
non-actionable")
Suggested-by: Mahesh Salgaonkar
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/eeh_driver.
like failover.
Permanently disable the device if the presence check
fails.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/eeh.c| 4 +++-
arch/powerpc/kernel/eeh_driver.c | 8 +++-
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/eeh.c b/arch/po
nt if the state is not moved to permanent failure state.
Signed-off-by: Ganesh Goudar
---
V2:
* Elobrate the commit message.
* Fix formatting issues in commit message and comments.
---
arch/powerpc/kernel/eeh.c| 11 ++-
arch/powerpc/kernel/eeh_driver.c | 13 +++--
2 files ch
on pseries machine running in hash
mmu mode.
Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI
accounting")
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/mce.c | 7 ++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/mc
Add selftest to check if the system recovers from slb multihit
errors.
Signed-off-by: Ganesh Goudar
---
tools/testing/selftests/powerpc/Makefile | 3 ++-
tools/testing/selftests/powerpc/mces/Makefile| 6 ++
tools/testing/selftests/powerpc/mces/slb_multihit.sh | 9
possible.
Ganesh Goudar (3):
powerpc/mce: remove nmi_enter/exit from real mode handler
powerpc/mce: Add debugfs interface to inject MCE
selftest/powerpc: Add slb multihit selftest
arch/powerpc/Kconfig.debug| 9 ++
arch/powerpc/kernel/mce.c | 7
To test machine check handling, add debugfs interface to inject
slb multihit errors.
To inject slb multihit:
#echo 1 > /sys/kernel/debug/powerpc/mce_error_inject/inject_slb_multihit
Signed-off-by: Ganesh Goudar
Signed-off-by: Mahesh Salgaonkar
---
arch/powerpc/Kconfig.debug |
Add PPC_SLB_MULTIHIT to lkdtm selftest framework.
Signed-off-by: Ganesh Goudar
---
tools/testing/selftests/lkdtm/tests.txt | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/lkdtm/tests.txt
b/tools/testing/selftests/lkdtm/tests.txt
index 9d266e79c6a2..7eb3cf91c89e
Add support to inject slb multihit errors, to test machine
check handling.
Based on work by Mahesh Salgaonkar and Michal Suchánek.
Cc: Mahesh Salgaonkar
Cc: Michal Suchánek
Signed-off-by: Ganesh Goudar
---
drivers/misc/lkdtm/Makefile | 4 ++
drivers/misc/lkdtm/core.c| 3 +
drivers
on pseries machine running in hash
mmu mode.
Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI
accounting")
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/mce.c | 10 ++
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kern
.
* Fix build errors and remove unused variables.
* Integrate error injection code into LKDTM.
* Add support to inject multihit in paca.
Ganesh Goudar (3):
powerpc/mce: remove nmi_enter/exit from real mode handler
lkdtm/powerpc: Add SLB multihit test
selftests/lkdtm: Enable selftest for SLB
support to inject multihit in paca.
Ganesh Goudar (2):
powerpc/mce: remove nmi_enter/exit from real mode handler
lkdtm/powerpc: Add SLB multihit test
arch/powerpc/kernel/mce.c | 10 +-
drivers/misc/lkdtm/Makefile | 1 +
drivers/misc/lkdtm/core.c | 3
on pseries machine running in hash
mmu mode.
Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI
accounting")
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/mce.c | 10 ++
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kern
To check machine check handling, add support to inject slb
multihit errors.
Reviewed-by: Michal Suchánek
Co-developed-by: Mahesh Salgaonkar
Signed-off-by: Mahesh Salgaonkar
Signed-off-by: Ganesh Goudar
---
drivers/misc/lkdtm/Makefile | 1 +
drivers/misc/lkdtm/core.c
nesting is supported.
* Fix build errors and remove unused variables.
* Integrate error injection code into LKDTM.
* Add support to inject multihit in paca.
Ganesh Goudar (2):
powerpc/mce: remove nmi_enter/exit from real mode handler
lkdtm/powerpc: Add SLB multihit test
arch/powerpc/kernel
on pseries machine running in hash
mmu mode.
Fixes: 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI
accounting")
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/mce.c | 7 +++
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kernel/mc
To check machine check handling, add support to inject slb
multihit errors.
Cc: Kees Cook
Reviewed-by: Michal Suchánek
Co-developed-by: Mahesh Salgaonkar
Signed-off-by: Mahesh Salgaonkar
Signed-off-by: Ganesh Goudar
---
drivers/misc/lkdtm/Makefile | 1 +
drivers/misc/lkdtm
machine_check_log_err() is not getting called for all
unrecoverable errors, And we are missing to log the error.
Raise irq work in save_mce_event() for unrecoverable errors,
So that we log the error from MCE event handling block in
timer handler.
Signed-off-by: Ganesh Goudar
---
arch/powerpc
] memcpy+0x88/0x90
[ 512.972456] MCE: CPU1: Initiator CPU
[ 512.972534] MCE: CPU1: Unknown
Signed-off-by: Ganesh Goudar
---
arch/powerpc/kernel/mce.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 11f0cae086ed
The error type is ICACHE and DCACHE, for case MCE_ERROR_TYPE_ICACHE.
Signed-off-by: Ganesh Goudar
---
arch/powerpc/platforms/pseries/ras.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/ras.c
b/arch/powerpc/platforms/pseries/ras.c
index
fatal as it may try to access
memory outside RMO region.
To fix this use addr_to_pfn after switching to virtual mode.
Signed-off-by: Ganesh Goudar
---
V2: Leave bare metal code and save_mce_event as is.
---
arch/powerpc/platforms/pseries/ras.c | 20 +++-
1 file changed, 11
] 79291f24 790af00e 78e70020 7d095214 <7c69502a> 2fa3 419e011c
70690040
[ 485.128152] ---[ end trace d34b27e29ae0e340 ]---
Signed-off-by: Ganesh Goudar
---
V2: Leave bare metal code and save_mce_event as is.
V3: Have separate functions for realmode and virtual mode handling.
---
arch/p
vert to use common
event code")
Signed-off-by: Ganesh Goudar
---
V2: Leave bare metal code and save_mce_event as is.
V3: Have separate functions for realmode and virtual mode handling.
V4: Fix build warning, rephrase commit message.
---
arch/powerpc/platforms/pse
1 - 100 of 152 matches
Mail list logo