[PATCH 1/1] MAINTAINERS: Remove self

2020-06-29 Thread Sam Bobroff
I'm sorry to say I can no longer maintain this position. Signed-off-by: Sam Bobroff --- MAINTAINERS | 1 - 1 file changed, 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 496fd4eafb68..7e954e4a29e1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -13187,7 +13187,6 @@ F: tool

[PATCH RFC 1/1] powerpc/eeh: PE info tree via debugfs and syslog

2020-06-23 Thread Sam Bobroff
/eeh_pe_tree Signed-off-by: Sam Bobroff --- Here's some debug code I've been using for a long time while working on EEH. I haven't posted it before because it wasn't possible to make the code safe enough (to avoid either NULL or LIST_POISON), but with the recent safety w

[PATCH RFC 1/1] powerpc/eeh: Asynchronous recovery

2020-06-23 Thread Sam Bobroff
alled by traversing the tree of affected PEs from the top, stopping to call handlers (in parallel) when a PE with devices is discovered. When the calls for that PE are complete, traversal continues at each child PE. Signed-off-by: Sam Bobroff --- This patch should be applied on top of both: "p

[PATCH RFC 1/1] powerpc/eeh: Provide a unique ID for each EEH recovery

2020-06-23 Thread Sam Bobroff
Give a unique ID to each recovery event, to ease log parsing and prepare for parallel recovery. Also add some new messages with a very simple format that may be useful to log-parsers. Signed-off-by: Sam Bobroff --- This patch should be applied on top of my recent(ish) set: "powerp

Re: powerpc/pci: [PATCH 1/1 V3] PCIE PHB reset

2020-05-28 Thread Sam Bobroff
dump. PHB reset stop all PCI > transactions from normal kernel. We have tested the patch in several > enviroments: > - direct slot adapters > - adapters under the switch > - a VF adapter in PowerVM > - a VF adapter/adapter in KVM guest. > > Signed-off-by: Wen Xiong Loo

Re: powerpc/pci: [PATCH 1/1]: PCIE PHB reset

2020-05-13 Thread Sam Bobroff
On Thu, May 07, 2020 at 08:10:37AM -0500, wenxi...@linux.vnet.ibm.com wrote: > From: Wen Xiong > > Several device drivers hit EEH(Extended Error handling) when triggering > kdump on Pseries PowerVM. This patch implemented a reset of the PHBs > in pci general code. PHB reset stop all PCI transacti

Re: powerpc/pci: [PATCH 1/1]: PCIE PHB reset

2020-05-11 Thread Sam Bobroff
On Thu, May 07, 2020 at 08:10:37AM -0500, wenxi...@linux.vnet.ibm.com wrote: > From: Wen Xiong > > Several device drivers hit EEH(Extended Error handling) when triggering > kdump on Pseries PowerVM. This patch implemented a reset of the PHBs > in pci general code. PHB reset stop all PCI transacti

[PATCH v4 1/2] powerpc/eeh: fix pseries_eeh_configure_bridge()

2020-04-27 Thread Sam Bobroff
ative values. Signed-off-by: Sam Bobroff --- v4 - Just handle the error translation locally, as it's specific to the RTAS call, but log the unaltered code in case it's useful for debugging. arch/powerpc/platforms/pseries/eeh_pseries.c | 8 +++- 1 file changed, 7 insertions

[PATCH v4 2/2] powerpc/eeh: Release EEH device state synchronously

2020-04-27 Thread Sam Bobroff
h are called synchronously in the removal path. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh.c | 31 +++ arch/powerpc/kernel/pci-hotplug.c | 2 -- 2 files changed, 31 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/k

[PATCH v4 0/2] powerpc/eeh: Release EEH device state synchronously

2020-04-27 Thread Sam Bobroff
) Patch set v1: Patch 1/4: powerpc/eeh: fix pseries_eeh_configure_bridge() Patch 2/4: powerpc/eeh: Release EEH device state synchronously Patch 3/4: powerpc/eeh: Remove workaround from eeh_add_device_late() Patch 4/4: powerpc/eeh: Clean up edev cleanup for VFs Sam Bobroff (2): powerpc/eeh: f

Re: [PATCH v3 1/3] powerpc/rtas: Export rtas_error_rc

2020-04-27 Thread Sam Bobroff
On Fri, Apr 24, 2020 at 11:07:43AM -0500, Nathan Lynch wrote: > Sam Bobroff writes: > > Export rtas_error_rc() so that it can be used by other users of > > rtas_call() (which is already exported). > > This will do the right thing for your ibm,configure-pe use case in pat

[PATCH v3 3/3] powerpc/eeh: Release EEH device state synchronously

2020-04-23 Thread Sam Bobroff
h are called synchronously in the removal path. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh.c | 31 +++ arch/powerpc/kernel/pci-hotplug.c | 2 -- 2 files changed, 31 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/k

[PATCH v3 2/3] powerpc/eeh: fix pseries_eeh_configure_bridge()

2020-04-23 Thread Sam Bobroff
ative values. Signed-off-by: Sam Bobroff --- arch/powerpc/platforms/pseries/eeh_pseries.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c index 893ba3f562c4..9ea1c06a78cd 100644 --- a

[PATCH v3 0/3] powerpc/eeh: Release EEH device state synchronously

2020-04-23 Thread Sam Bobroff
rpc/eeh: Clean up edev cleanup for VFs Patch set v1: Patch 1/4: powerpc/eeh: fix pseries_eeh_configure_bridge() Patch 2/4: powerpc/eeh: Release EEH device state synchronously Patch 3/4: powerpc/eeh: Remove workaround from eeh_add_device_late() Patch 4/4: powerpc/eeh: Clean up edev cleanup for VFs Sam

[PATCH v3 1/3] powerpc/rtas: Export rtas_error_rc

2020-04-23 Thread Sam Bobroff
Export rtas_error_rc() so that it can be used by other users of rtas_call() (which is already exported). Signed-off-by: Sam Bobroff --- v3 * New in this version. arch/powerpc/include/asm/rtas.h | 1 + arch/powerpc/kernel/rtas.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion

Re: [PATCH v2 1/2] powerpc/eeh: fix pseries_eeh_configure_bridge()

2020-04-21 Thread Sam Bobroff
On Tue, Apr 21, 2020 at 06:33:36PM -0500, Nathan Lynch wrote: > Sam Bobroff writes: > > If a device is hot unplgged during EEH recovery, it's possible for the > > RTAS call to ibm,configure-pe in pseries_eeh_configure() to return > > parameter error (-3), however nega

[PATCH v2 2/2] powerpc/eeh: Release EEH device state synchronously

2020-04-19 Thread Sam Bobroff
h are called synchronously in the removal path. Signed-off-by: Sam Bobroff --- v2 - Added comment explaining why the add case can't be handled similarly to the remove case. arch/powerpc/kernel/eeh.c | 31 +++ arch/powerpc/kernel/pci-hotplug.c | 2 -- 2 fil

[PATCH v2 1/2] powerpc/eeh: fix pseries_eeh_configure_bridge()

2020-04-19 Thread Sam Bobroff
ative values. Signed-off-by: Sam Bobroff --- arch/powerpc/platforms/pseries/eeh_pseries.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c index 893ba3f562c4..c4ef03bec0de 100644 --- a

[PATCH v2 0/2] powerpc/eeh: Release EEH device state synchronously

2020-04-19 Thread Sam Bobroff
ev cleanup for VFs Patch set v1: Patch 1/4: powerpc/eeh: fix pseries_eeh_configure_bridge() Patch 2/4: powerpc/eeh: Release EEH device state synchronously Patch 3/4: powerpc/eeh: Remove workaround from eeh_add_device_late() Patch 4/4: powerpc/eeh: Clean up edev cleanup for VFs Sam Bobroff (2): po

Re: [PATCH 3/4] powerpc/eeh: Remove workaround from eeh_add_device_late()

2020-04-14 Thread Sam Bobroff
On Wed, Apr 08, 2020 at 04:53:36PM +1000, Oliver O'Halloran wrote: > On Wed, Apr 8, 2020 at 4:22 PM Sam Bobroff wrote: > > > > On Fri, Apr 03, 2020 at 05:08:32PM +1100, Oliver O'Halloran wrote: > > > On Mon, 2020-03-30 at 15:56 +1100, Sam Bobroff wrote: > >

Re: [PATCH] powerpc/powernv: Add a print indicating when an IODA PE is released

2020-04-09 Thread Sam Bobroff
On Wed, Apr 08, 2020 at 09:22:13PM +1000, Oliver O'Halloran wrote: > Quite useful to know in some cases. > > Signed-off-by: Oliver O'Halloran Agreed. Reviewed-by: Sam Bobroff > --- > arch/powerpc/platforms/powernv/pci-ioda.c | 2 ++ > 1 file changed, 2 insertio

Re: [PATCH 4/4] powerpc/eeh: Clean up edev cleanup for VFs

2020-04-07 Thread Sam Bobroff
On Fri, Apr 03, 2020 at 04:45:47PM +1100, Oliver O'Halloran wrote: > On Mon, 2020-03-30 at 15:56 +1100, Sam Bobroff wrote: > > Because the bus notifier calls eeh_rmv_from_parent_pe() (via > > eeh_remove_device()) when a VF is removed, the call in > > remove_sr

Re: [PATCH 3/4] powerpc/eeh: Remove workaround from eeh_add_device_late()

2020-04-07 Thread Sam Bobroff
On Fri, Apr 03, 2020 at 05:08:32PM +1100, Oliver O'Halloran wrote: > On Mon, 2020-03-30 at 15:56 +1100, Sam Bobroff wrote: > > When EEH device state was released asynchronously by the device > > release handler, it was possible for an outstanding reference to > > preve

Re: [PATCH 2/4] powerpc/eeh: Release EEH device state synchronously

2020-04-07 Thread Sam Bobroff
On Fri, Apr 03, 2020 at 03:51:18PM +1100, Oliver O'Halloran wrote: > On Mon, 2020-03-30 at 15:56 +1100, Sam Bobroff wrote: > > EEH device state is currently removed (by eeh_remove_device()) during > > the device release handler, which is invoked as the device's referenc

[PATCH v2 1/1] vfio-pci/nvlink2: Allow fallback to ibm,mmio-atsd[0]

2020-03-30 Thread Sam Bobroff
quot;) Signed-off-by: Sam Bobroff --- Patch set v2: Patch 1/1: vfio-pci/nvlink2: Allow fallback to ibm,mmio-atsd[0] - Removed unnecessary warning. - Added Fixes tag. Patch set v1: Patch 1/1: vfio-pci/nvlink2: Allow fallback to ibm,mmio-atsd[0] drivers/vfio/pci/vfio_pci_nvlink2.c | 10 -

[PATCH RFC 1/1] powerpc/eeh: Synchronization for safety

2020-03-29 Thread Sam Bobroff
when ordering these locks against the PCI rescan/remove lock and the device locks to avoid deadlocking. Signed-off-by: Sam Bobroff --- Hello everyone, Here's an attempt to bring some safety to the interactions between the various moving parts involved in EEH recovery. It's based on top

[PATCH 4/4] powerpc/eeh: Clean up edev cleanup for VFs

2020-03-29 Thread Sam Bobroff
Because the bus notifier calls eeh_rmv_from_parent_pe() (via eeh_remove_device()) when a VF is removed, the call in remove_sriov_vf_pdns() is redundant. So remove the call. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/pci_dn.c | 9 + 1 file changed, 1 insertion(+), 8 deletions

[PATCH 3/4] powerpc/eeh: Remove workaround from eeh_add_device_late()

2020-03-29 Thread Sam Bobroff
that is no longer possible and the workaround is no longer necessary. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh.c | 23 +-- 1 file changed, 1 insertion(+), 22 deletions(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index c36c5a7

[PATCH 1/4] powerpc/eeh: fix pseries_eeh_configure_bridge()

2020-03-29 Thread Sam Bobroff
ative values. Signed-off-by: Sam Bobroff --- arch/powerpc/platforms/pseries/eeh_pseries.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c index 893ba3f562c4..c4ef03bec0de 100644 --- a

[PATCH 2/4] powerpc/eeh: Release EEH device state synchronously

2020-03-29 Thread Sam Bobroff
h are called synchronously in the removal path. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh.c | 26 ++ arch/powerpc/kernel/pci-hotplug.c | 2 -- 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/

[PATCH 0/4] powerpc/eeh: Release EEH device state synchronously

2020-03-29 Thread Sam Bobroff
not been able to hit them during testing. Cheers, Sam. Sam Bobroff (4): powerpc/eeh: fix pseries_eeh_configure_bridge() powerpc/eeh: Release EEH device state synchronously powerpc/eeh: Remove workaround from eeh_add_device_late() powerpc/eeh: Clean up edev cleanup for VFs

[PATCH 1/1] powerpc/eeh: fix deadlock handling dead PHB

2020-02-06 Thread Sam Bobroff
, incorrectly, processed more than once. Untangling this section can move the pe processing out of the loop and also outside the locked section, correcting both problems. Signed-off-by: Sam Bobroff --- I have only compile tested this fix, Frederic Barrat (who discovered it) has offered to test it (thanks

Re: [PATCH 1/1] vfio-pci/nvlink2: Allow fallback to ibm,mmio-atsd[0]

2020-02-06 Thread Sam Bobroff
On Fri, Feb 07, 2020 at 01:39:14PM +1100, Sam Bobroff wrote: > On Thu, Feb 06, 2020 at 03:23:03PM +1100, Alexey Kardashevskiy wrote: > > > > > > On 06/02/2020 14:17, Sam Bobroff wrote: > > > Older versions of skiboot only provide a single value in the device

Re: [PATCH 1/1] vfio-pci/nvlink2: Allow fallback to ibm,mmio-atsd[0]

2020-02-06 Thread Sam Bobroff
On Thu, Feb 06, 2020 at 03:23:03PM +1100, Alexey Kardashevskiy wrote: > > > On 06/02/2020 14:17, Sam Bobroff wrote: > > Older versions of skiboot only provide a single value in the device > > tree property "ibm,mmio-atsd", even when multiple Address Translation

Re: [PATCH 6/6] powerpc/eeh: Rework eeh_ops->probe()

2020-02-06 Thread Sam Bobroff
it does does and removes the last vestiges of the > early/late EEH probe split. Nice! Just one nit, below. Reviewed-by: Sam Bobroff > Signed-off-by: Oliver O'Halloran > --- > arch/powerpc/include/asm/eeh.h | 6 ++-- > arch/powerpc/kernel/eeh.c

Re: [PATCH 5/6] powerpc/eeh: Make early EEH init pseries specific

2020-02-06 Thread Sam Bobroff
gets called via the module init path (as rpaphp is loaded) -- I tried it and there was no deadlock. I don't think we have the lock in other situations but I haven't unravelled it all enough yet to tell, either. Regardless, good cleanup. Reviewed-by: Sam Bobroff >

Re: [PATCH 4/6] powerpc/eeh: Remove PHB check in probe

2020-02-05 Thread Sam Bobroff
ered how to test that block... and it's just dead code. Reviewed-by: Sam Bobroff > --- > arch/powerpc/kernel/eeh.c | 6 -- > 1 file changed, 6 deletions(-) > > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c > index 9cb3370..a9e4ca7 100644 >

Re: [PATCH 3/6] powerpc/eeh: Do early EEH init only when required

2020-02-05 Thread Sam Bobroff
where the early EEH probe needs to be done. > > We can move the calls to eeh_add_device_tree_early() to the locations where > it's needed and remove it from the generic path. This is preparation for > making the early EEH probe pseries specific. > > Signed-off-by: Oliver

Re: [PATCH 2/6] powerpc/eeh: Remove eeh_add_device_tree_late()

2020-02-05 Thread Sam Bobroff
esult we can remove > eeh_add_device_tree_late(). > > Signed-off-by: Oliver O'Halloran ... with pcibios_bus_add_device() being called from pci_bus_add_devices(), in this case. Looks good. Reviewed-by: Sam Bobroff > --- > arch/powerpc/include/asm/eeh.h| 3 --- &g

Re: [PATCH 1/6] powerpc/eeh: Add sysfs files in late probe

2020-02-05 Thread Sam Bobroff
d sysfs files for devices that have failed to init, because bailing out in eeh_add_device_late() (or eeh_probve_device()) will now prevent eeh_sysfs_add_device() from being called. Nice cleanup. Reviewed-by: Sam Bobroff > Signed-off-by: Oliver O'Halloran > --- > arch/powerpc/include

[PATCH 1/1] vfio-pci/nvlink2: Allow fallback to ibm,mmio-atsd[0]

2020-02-05 Thread Sam Bobroff
to be able to assign a dedicated ATSD register to each NVLink2 device. However, ATSD registers can be shared among devices. This change allows vfio-pci to fall back to sharing the register at index 0 if necessary. Signed-off-by: Sam Bobroff --- drivers/vfio/pci/vfio_pci_nvlink2.c | 13 +++

[PATCH 1/1] powerpc/eeh: differentiate duplicate detection message

2019-10-16 Thread Sam Bobroff
r EEH: eeh_dev_check_failure: Frozen PHB#0-PE#0 detected EEH: Recovering PHB#0-PE#0 Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh_driver.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index d9279d0

Re: [PATCH] powernv/eeh: Fix oops when probing cxl devices

2019-10-16 Thread Sam Bobroff
didn't > touch the pseries path. At least on pseries, if there's another > unexpected case where the pdn is NULL, we should catch it more easily > with the oops message. OK. I agree that it's not worth doing more. Reviewed-by: Sam Bobroff > arch/powerpc/platforms/p

Re: [PATCH] powerpc/eeh: Only dump stack once if an MMIO loop is detected

2019-10-15 Thread Sam Bobroff
spinning in a loop. This > results in a lot of spurious stack traces in the kernel log. > > Fix this by limiting it to printing one stack trace for each PE freeze. If > the driver is truely stuck the kernel's hung task detector is better suited > to reporting the probelm anyway.

Re: [EXTERNAL] [RFC PATCH] powernv/eeh: Fix oops when probing cxl devices

2019-10-14 Thread Sam Bobroff
On Fri, Sep 27, 2019 at 02:45:10PM +0200, Frederic Barrat wrote: > Recent cleanup in the way EEH support is added to a device causes a > kernel oops when the cxl driver probes a device and creates virtual > devices discovered on the FPGA: > > BUG: Kernel NULL pointer dereference at 0x00a0

[PATCH RFC 14/15] powerpc/eeh: Sync eeh_force_recover_write()

2019-10-01 Thread Sam Bobroff
Synchronize access to eeh_pe. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index cba16ca0694a..26d9367c41a1 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch

[PATCH RFC 05/15] powerpc/eeh: Sync eeh_pe_get_parent()

2019-10-01 Thread Sam Bobroff
Synchronize access to eeh_pe. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh_pe.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index b89ed46f14e6..0486d3c6ff20 100644 --- a/arch/powerpc

[PATCH RFC 15/15] powerpc/eeh: Sync pcibios_set_pcie_reset_state()

2019-10-01 Thread Sam Bobroff
Synchronize access to eeh_pe. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 26d9367c41a1..c61bfaf4ca26 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel

[PATCH RFC 00/15] powerpc/eeh: Synchronize access to struct eeh_pe

2019-10-01 Thread Sam Bobroff
of PEs that have been removed from the PHB tree, but not yet freed and makes that list available in debugfs. Any PEs that remain orphans for very long are going to be the result of bugs. It's extra risk because it itself could contain bugs, but it could also be useful during debugging. Cheers,

[PATCH RFC 13/15] powerpc/eeh: Sync pnv_eeh_ei_write()

2019-10-01 Thread Sam Bobroff
Synchronize access to eeh_pe. Signed-off-by: Sam Bobroff --- arch/powerpc/platforms/powernv/eeh-powernv.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c index c56a796dd894..12367ed2083b 100644

[PATCH RFC 02/15] powerpc/eeh: Rename eeh_pe_get() to eeh_pe_find()

2019-10-01 Thread Sam Bobroff
There are now functions eeh_get_pe() and eeh_pe_get() which seems likely to cause confusion. Keep eeh_get_pe() because "get" is commonly used to refer to acquiring a reference (which it does), and rename eeh_pe_get() to eeh_pe_find() because it performs a search. Signed-off-by: S

[PATCH RFC 03/15] powerpc/eeh: Track orphaned struct eeh_pe

2019-10-01 Thread Sam Bobroff
ing, so any PEs that stay longer will be the result of bugs. The list can be examined by reading from the "eeh_pe_debug" file in debugfs. Signed-off-by: Sam Bobroff --- arch/powerpc/include/asm/eeh.h | 4 +++ arch/powerpc/kernel/eeh.c | 21 ++ arch/power

[PATCH RFC 06/15] powerpc/eeh: Sync eeh_phb_pe_get()

2019-10-01 Thread Sam Bobroff
Synchronize access to eeh_pe. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh_pe.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index 0486d3c6ff20..e89a30de2e7e 100644 --- a/arch/powerpc/kernel

[PATCH RFC 12/15] powerpc/eeh: Sync eeh_pe_get_state()

2019-10-01 Thread Sam Bobroff
Synchronize access to eeh_pe. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 171be70b34d8..cba16ca0694a 100644 --- a/arch/powerpc/kernel/eeh.c +++ b

[PATCH RFC 07/15] powerpc/eeh: Sync eeh_add_to_parent_pe() and eeh_rmv_from_parent_pe()

2019-10-01 Thread Sam Bobroff
Note that even though there is currently only one place where a PE can be removed from the parent/child tree (eeh_rmv_from_parent_pe()), it is still protected against concurrent removal in case that changes in the future. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh_pe.c | 26

[PATCH RFC 11/15] powerpc/eeh: Sync eeh_dev_check_failure()

2019-10-01 Thread Sam Bobroff
Synchronize access to eeh_pe. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh.c | 26 -- 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index eb37cb384ff4..171be70b34d8 100644 --- a/arch/powerpc

[PATCH RFC 08/15] powerpc/eeh: Sync eeh_handle_normal_event()

2019-10-01 Thread Sam Bobroff
Synchronize access to eeh_pe. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh_driver.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index b3245d0cfb22..c9d73070793e 100644 --- a/arch/powerpc/kernel

[PATCH RFC 09/15] powerpw/eeh: Sync eeh_handle_special_event(), pnv_eeh_get_pe(), pnv_eeh_next_error()

2019-10-01 Thread Sam Bobroff
Synchronize access to eeh_pe. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh_driver.c | 15 +--- arch/powerpc/platforms/powernv/eeh-powernv.c | 38 2 files changed, 43 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b

[PATCH RFC 10/15] powerpc/eeh: Sync eeh_phb_check_failure()

2019-10-01 Thread Sam Bobroff
Synchronize access to eeh_pe. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 7eb6ca1ab72b..eb37cb384ff4 100644 --- a/arch/powerpc/kernel/eeh.c +++ b

[PATCH RFC 04/15] powerpc/eeh: Sync eeh_pe_next(), eeh_pe_find() and early-out traversals

2019-10-01 Thread Sam Bobroff
set to NULL on removal (see eeh_rmv_from_parent_pe()) (PHB type PEs never have their parent set, but aren't a problem: they can't be removed). If this does occur, the traversal is terminated. This may leave the traversal incomplete, but that is preferable to crashing. Signed-off

[PATCH RFC 01/15] powerpc/eeh: Introduce refcounting for struct eeh_pe

2019-10-01 Thread Sam Bobroff
provides no additional synchronization of the other EEH state, it seems to be an effective way of providing the necessary safety with a very low risk of introducing deadlocks. Signed-off-by: Sam Bobroff --- arch/powerpc/include/asm/eeh.h | 7 arch/powerpc/kernel/eeh_pe.c | 70

Re: [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug

2019-09-22 Thread Sam Bobroff
On Thu, Sep 19, 2019 at 03:28:40PM -0500, Nathan Lynch wrote: > Hello Sam, > > Sam Bobroff writes: > > On PowerNV and pSeries, devices currently acquire EEH support from > > several different places: Boot-time devices from eeh_probe_devices() > > and eeh_addr_cach

Re: [PATCH 05/14] powerpc/eeh: Defer printing stack trace

2019-09-16 Thread Sam Bobroff
On Tue, Sep 17, 2019 at 11:45:14AM +1000, Oliver O'Halloran wrote: > On Tue, Sep 17, 2019 at 11:04 AM Sam Bobroff wrote: > > > > On Tue, Sep 03, 2019 at 08:15:56PM +1000, Oliver O'Halloran wrote: > > > Currently we print a stack trace in the event handler to he

Re: [PATCH 13/14] powerpc/eeh: Add a eeh_dev_break debugfs interface

2019-09-16 Thread Sam Bobroff
ks good to me. Tested with the previous patch. Tested-by: Sam Bobroff Reviewed-by: Sam Bobroff > --- > arch/powerpc/kernel/eeh.c | 139 +- > 1 file changed, 138 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/eeh.c b/arch/p

Re: [PATCH 12/14] powerpc/eeh: Add debugfs interface to run an EEH check

2019-09-16 Thread Sam Bobroff
ace. > > Signed-off-by: Oliver O'Halloran Looks good, and I tested it with the next patch and it seems to work. But I think you should make it clear that this does not work with the hardware "EEH error injection" facility accessible via debugfs in err_injct (that doesn&#x

Re: [PATCH 11/14] powerpc/eeh: Set attention indicator while recovering

2019-09-16 Thread Sam Bobroff
> the device is present and only clear it if the device is fully recovered. > > Signed-off-by: Oliver O'Halloran Looks good, although I think it would be clearer if you could separate checking the slot from raising the alert. Reviewed-by: Sam Bobroff >

Re: [PATCH 07/14] powernv/eeh: Use generic code to handle hot resets

2019-09-16 Thread Sam Bobroff
On Tue, Sep 03, 2019 at 08:15:58PM +1000, Oliver O'Halloran wrote: > When we reset PCI devices managed by a hotplug driver the reset may > generate spurious hotplug events that cause the PCI device we're resetting > to be torn down accidently. This is a problem for EEH (when the driver is > EEH awa

Re: [PATCH 06/14] powerpc/eeh: Remove stale CAPI comment

2019-09-16 Thread Sam Bobroff
On Tue, Sep 03, 2019 at 08:15:57PM +1000, Oliver O'Halloran wrote: > Support for switching CAPI cards into and out of CAPI mode was removed a > while ago. Drop the comment since it's no longer relevant. > > Cc: Andrew Donnellan > Signed-off-by: Oliver O'Halloran

Re: [PATCH 05/14] powerpc/eeh: Defer printing stack trace

2019-09-16 Thread Sam Bobroff
On Tue, Sep 03, 2019 at 08:15:56PM +1000, Oliver O'Halloran wrote: > Currently we print a stack trace in the event handler to help with > debugging EEH issues. In the case of suprise hot-unplug this is unneeded, > so we want to prevent printing the stack trace unless we know it's due to > an actual

Re: [PATCH 04/14] powerpc/eeh: Check slot presence state in eeh_handle_normal_event()

2019-09-16 Thread Sam Bobroff
On Tue, Sep 03, 2019 at 08:15:55PM +1000, Oliver O'Halloran wrote: > When a device is surprise removed while undergoing IO we will probably > get an EEH PE freeze due to MMIO timeouts and other errors. When a freeze > is detected we send a recovery event to the EEH worker thread which will > notify

Re: [PATCH 03/14] powerpc/eeh: Make permanently failed devices non-actionable

2019-09-16 Thread Sam Bobroff
considered un-actionable. > > Signed-off-by: Oliver O'Halloran Other than the typo, looks good (I think it should always have been like this): Reviewed-by: Sam Bobroff > --- > arch/powerpc/kernel/eeh_driver.c | 12 ++-- > 1 file changed, 10 insertions(+), 2 deleti

Re: [PATCH 02/14] powerpc/eeh: Fix race when freeing PDNs

2019-09-16 Thread Sam Bobroff
l, meaning the pci_dev is already gone, the release handler is already called, and the PDN can be removed there, or b) returns non-null and atomically increases the refcount and the release handler won't be called until after we've set the DEAD flag and released our reference. Looks g

Re: [PATCH 01/14] powerpc/eeh: Clean up EEH PEs after recovery finishes

2019-09-16 Thread Sam Bobroff
change is where EEH_PE_RECOVERING affects eeh_pe_reset_and_recover() (used when a PE is passed back from a guest to the host), but the test case doesn't seem to be any worse. Reviewed-by: Sam Bobroff > --- > Sam Bobroff is working on implementing proper refcounting for EEH PEs, >

[PATCH] powerpc/eeh: Fixup EEH for pSeries hotplug

2019-08-21 Thread Sam Bobroff
Signed-off-by: Sam Bobroff --- Let's move the test into eeh_add_device_tree_late(). Thanks, Sam. arch/powerpc/kernel/eeh.c | 2 ++ arch/powerpc/kernel/of_platform.c | 3 +-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/k

Re: [PATCH 2/3] powerpc/pcidn: Make VF pci_dn management CONFIG_PCI_IOV specific

2019-08-21 Thread Sam Bobroff
hen CONFIG_PCI_IOV > is selected, and rename them to reflect their actual usage rather than > having them masquerade as generic code. > > Signed-off-by: Oliver O'Halloran Nice cleanup, Reviewed-by: Sam Bobroff > --- > arch/powerpc/include/asm/pci-bridge.h | 7

Re: [PATCH 3/3] powerpc/pcidn: Warn when sriov pci_dn management is used incorrectly

2019-08-21 Thread Sam Bobroff
d remove the dead > code that checks if the device is a VF. > > Signed-off-by: Oliver O'Halloran Looks good, but you might want to consider using WARN_ON_ONCE() just in case it gets hit a lot. Reviewed-by: Sam Bobroff > --- > arch/powerpc/kernel/pci_dn.c | 17 +++-

Re: [PATCH 1/3] powerpc/sriov: Remove VF eeh_dev state when disabling SR-IOV

2019-08-21 Thread Sam Bobroff
allbacks so > the EEH fallback path (which removes and re-probes PCI devices) > would be used. I gave this a quick test with some added instrumentation, and I can see that the new code is used during VF removal and it doesn't cause any new problems. I agree that even if it's difficult

[PATCH v5 11/12] powerpc/eeh: Remove unused return path from eeh_pe_dev_traverse()

2019-08-15 Thread Sam Bobroff
There are no users of the early-out return value from eeh_pe_dev_traverse(), so remove it. Signed-off-by: Sam Bobroff --- v5 * New in this version. arch/powerpc/include/asm/eeh.h | 6 +++--- arch/powerpc/kernel/eeh.c| 16 +--- arch/powerpc/kernel/eeh_driver.c | 26

[PATCH v5 09/12] powerpc/eeh: Convert log messages to eeh_edev_* macros

2019-08-15 Thread Sam Bobroff
n pnv_eeh_probe() is now generated slightly later, which will mean that it is no longer emitted for devices that aren't probed due to the initial checks. Signed-off-by: Sam Bobroff --- arch/powerpc/include/asm/ppc-pci.h | 5 -- arch/powerpc/kernel/eeh.c|

[PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug

2019-08-15 Thread Sam Bobroff
was not previously possible (it was already possible on pSeries). Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh.c| 2 +- arch/powerpc/kernel/of_platform.c| 3 +- arch/powerpc/platforms/powernv/eeh-powernv.c | 39 +- arch/powerpc/platforms

[PATCH v5 03/12] powerpc/eeh: Improve debug messages around device addition

2019-08-15 Thread Sam Bobroff
Also remove useless comment. Signed-off-by: Sam Bobroff Reviewed-by: Alexey Kardashevskiy --- arch/powerpc/kernel/eeh.c| 2 +- arch/powerpc/platforms/powernv/eeh-powernv.c | 14 arch/powerpc/platforms/pseries/eeh_pseries.c | 23 +++- 3 files

[PATCH v5 10/12] powerpc/eeh: Fix crash when edev->pdev changes

2019-08-15 Thread Sam Bobroff
device() on it. Use this value to release the mutex, but also pass it through to the device driver's EEH handlers so that they always see the same device. Signed-off-by: Sam Bobroff --- v5 * New in this version. arch/powerpc/kernel/eeh_driver.c | 44 +--- 1 file

[PATCH v5 06/12] powerpc/eeh: Refactor around eeh_probe_devices()

2019-08-15 Thread Sam Bobroff
Note that previously on pSeries, useless EEH sysfs files were created for some devices that did not have EEH support and this change prevents them from being created. Signed-off-by: Sam Bobroff --- arch/powerpc/include/asm/eeh.h | 7 ++--- arch/powerpc/kernel/

[PATCH v5 08/12] powerpc/eeh: Introduce EEH edev logging macros

2019-08-15 Thread Sam Bobroff
"info" level function is used here, the others will be used in followup work. Signed-off-by: Sam Bobroff --- arch/powerpc/include/asm/eeh.h | 11 +++ arch/powerpc/kernel/eeh_driver.c | 17 - 2 files changed, 11 insertions(+), 17 deletions(-) diff --git a/arch/powe

[PATCH v5 12/12] powerpc/eeh: Slightly simplify eeh_add_to_parent_pe()

2019-08-15 Thread Sam Bobroff
Simplify some needlessly complicated boolean logic in eeh_add_to_parent_pe(). Signed-off-by: Sam Bobroff --- v5 * New in this version. arch/powerpc/kernel/eeh_pe.c | 52 +++- 1 file changed, 27 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/kernel

[PATCH v5 07/12] powerpc/eeh: Add bdfn field to eeh_dev

2019-08-15 Thread Sam Bobroff
liver O'Halloran [SB: Re-wrapped commit message, fixed whitespace damage.] Signed-off-by: Sam Bobroff --- arch/powerpc/include/asm/eeh.h | 2 ++ arch/powerpc/include/asm/ppc-pci.h | 2 ++ arch/powerpc/kernel/eeh_dev.c | 2 ++ 3 files changed, 6 insertions(+) diff --git a/arch/powerpc/i

[PATCH v5 02/12] powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag

2019-08-15 Thread Sam Bobroff
lers to be incorrectly ignored). To remedy this, clear the flag at the beginning of recovery processing. The flag is still cleared at the end of recovery processing, although it is no longer really necessary. Also clear the flag during eeh_handle_special_event(), for the same reasons. Signed-off-by: S

[PATCH v5 04/12] powerpc/eeh: Initialize EEH address cache earlier

2019-08-15 Thread Sam Bobroff
step into a separate function and call it from a core_initcall (rather than a subsys initcall). This will allow future work to make use of the cache during boot time PCI scanning. Signed-off-by: Sam Bobroff Reviewed-by: Alexey Kardashevskiy --- arch/powerpc/include/asm/eeh.h | 3 +++ arch

[PATCH v5 01/12] powerpc/64: Adjust order in pcibios_init()

2019-08-15 Thread Sam Bobroff
resources before they were allocated. Signed-off-by: Sam Bobroff Reviewed-by: Alexey Kardashevskiy --- v5 - Complete rewrite of commit message based on more research. arch/powerpc/kernel/pci-common.c | 4 arch/powerpc/kernel/pci_32.c | 4 arch/powerpc/kernel/pci_64.c | 12

[PATCH v5 00/12]

2019-08-15 Thread Sam Bobroff
sages around device addition Patch 5/8: powerpc/eeh: Add eeh_show_enabled() Patch 6/8: powerpc/eeh: Initialize EEH address cache earlier Patch 7/8: powerpc/eeh: EEH for pSeries hot plug Patch 8/8: powerpc/eeh: Remove eeh_probe_devices() and eeh_addr_cache_build() Oliver O'Halloran (1): powerpc/e

[PATCH v4 9/9] powerpc/eeh: Convert log messages to eeh_edev_* macros

2019-08-06 Thread Sam Bobroff
n pnv_eeh_probe() is now generated slightly later, which will mean that it is no longer emitted for devices that aren't probed due to the initial checks. Signed-off-by: Sam Bobroff --- v4 - Fixed compile warning when compiling without CONFIG_IOV. arch/powerpc/include/asm/ppc-pci.h

[PATCH v4 6/9] powerpc/eeh: Refactor around eeh_probe_devices()

2019-08-06 Thread Sam Bobroff
Note that previously on pSeries, useless EEH sysfs files were created for some devices that did not have EEH support and this change prevents them from being created. Signed-off-by: Sam Bobroff --- arch/powerpc/include/asm/eeh.h | 7 ++--- arch/powerpc/kernel/

[PATCH v4 7/9] powerpc/eeh: Add bdfn field to eeh_dev

2019-08-06 Thread Sam Bobroff
liver O'Halloran [SB: Re-wrapped commit message, fixed whitespace damage.] Signed-off-by: Sam Bobroff --- arch/powerpc/include/asm/eeh.h | 2 ++ arch/powerpc/include/asm/ppc-pci.h | 2 ++ arch/powerpc/kernel/eeh_dev.c | 2 ++ 3 files changed, 6 insertions(+) diff --git a/arch/powerpc/i

[PATCH v4 2/9] powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag

2019-08-06 Thread Sam Bobroff
lers to be incorrectly ignored). To remedy this, clear the flag at the beginning of recovery processing. The flag is still cleared at the end of recovery processing, although it is no longer really necessary. Also clear the flag during eeh_handle_special_event(), for the same reasons. Signed-off-by: S

[PATCH v4 1/9] powerpc/64: Adjust order in pcibios_init()

2019-08-06 Thread Sam Bobroff
already the case) and at boot time, to support future work. Signed-off-by: Sam Bobroff Reviewed-by: Alexey Kardashevskiy --- arch/powerpc/kernel/pci-common.c | 4 arch/powerpc/kernel/pci_32.c | 4 arch/powerpc/kernel/pci_64.c | 12 +--- 3 files changed, 13 insertions

[PATCH v4 5/9] powerpc/eeh: EEH for pSeries hot plug

2019-08-06 Thread Sam Bobroff
was not previously possible (it was already possible on pSeries). Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh.c| 2 +- arch/powerpc/kernel/of_platform.c| 3 +- arch/powerpc/platforms/powernv/eeh-powernv.c | 39 +- arch/powerpc/platforms

[PATCH v4 8/9] powerpc/eeh: Introduce EEH edev logging macros

2019-08-06 Thread Sam Bobroff
"info" level function is used here, the others will be used in followup work. Signed-off-by: Sam Bobroff --- arch/powerpc/include/asm/eeh.h | 11 +++ arch/powerpc/kernel/eeh_driver.c | 17 - 2 files changed, 11 insertions(+), 17 deletions(-) diff --git a/arch/powe

[PATCH v4 0/9]

2019-08-06 Thread Sam Bobroff
address cache earlier Patch 7/8: powerpc/eeh: EEH for pSeries hot plug Patch 8/8: powerpc/eeh: Remove eeh_probe_devices() and eeh_addr_cache_build() Oliver O'Halloran (1): powerpc/eeh: Add bdfn field to eeh_dev Sam Bobroff (8): powerpc/64: Adjust order in pcibios_init() powerpc/eeh: Cle

[PATCH v4 3/9] powerpc/eeh: Improve debug messages around device addition

2019-08-06 Thread Sam Bobroff
Also remove useless comment. Signed-off-by: Sam Bobroff Reviewed-by: Alexey Kardashevskiy --- arch/powerpc/kernel/eeh.c| 2 +- arch/powerpc/platforms/powernv/eeh-powernv.c | 14 arch/powerpc/platforms/pseries/eeh_pseries.c | 23 +++- 3 files

[PATCH v4 4/9] powerpc/eeh: Initialize EEH address cache earlier

2019-08-06 Thread Sam Bobroff
step into a separate function and call it from a core_initcall (rather than a subsys initcall). This will allow future work to make use of the cache during boot time PCI scanning. Signed-off-by: Sam Bobroff Reviewed-by: Alexey Kardashevskiy --- arch/powerpc/include/asm/eeh.h | 3 +++ arch

  1   2   3   4   >