CXL accelerators are unfortunately not immune from failure. This patch set enables them to particpate in the Extended Error Handling process.
This series starts with a number of preparatory patches: - Patch 1 creates a kernel flag that allows us to confidently assert the hardware will not change when it's reset. - Patch 2 makes sure we don't touch the hardware when it has failed. - Patches 3-5 make the 'unplug' functions idempotent, so that if we get part way through recovery and then fail, being completely unplugged as part of removal doesn't cause us to oops out. - Patches 6 and 7 refactor init and teardown paths for the adapter and AFUs, so that they can be configured and deconfigured separately from their allocation and release. Patch 8 enables EEH, both for the CXL card, and anything attached to the virtual PHB. Only complete slot resets are supported. Daniel Axtens (8): cxl: Allow the kernel to trust that an image won't change on PERST. cxl: Drop commands if the PCI channel is not in normal state cxl: Allocate and release the SPA with the AFU cxl: Make IRQ release idempotent cxl: Clean up adapter MMIO unmap path. cxl: Refactor adaptor init/teardown cxl: Refactor AFU init/teardown cxl: EEH support Documentation/ABI/testing/sysfs-class-cxl | 10 + drivers/misc/cxl/api.c | 7 + drivers/misc/cxl/cxl.h | 38 ++- drivers/misc/cxl/file.c | 20 ++ drivers/misc/cxl/irq.c | 9 + drivers/misc/cxl/native.c | 100 +++++- drivers/misc/cxl/pci.c | 498 ++++++++++++++++++++++++------ drivers/misc/cxl/sysfs.c | 26 ++ include/misc/cxl.h | 10 + 9 files changed, 602 insertions(+), 116 deletions(-) -- 2.1.4 _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev