The series of patches intends to improve reliability of EEH on PowerNV platform. First all, we have had multiple duplicate states (flags) for PHB and PE, so we remove those duplicate states to simplify the code. Besides, we had corrupted PHB diag-data for case of frozen PE. In order to solve the problem, we introduce eeh_ops->event() and notifications are sent from EEH core to (PowerNV) platform on creating or destroying PE instance so that we can allocate or free PHB diag-data backend. Then we cache the PHB diag-data on the first call to eeh_ops->get_state() and dump it afterwards, which helps to get correct PHB diag-data.
With the patchset applied, we never dump PHB diag-data for INF errors. Instead, we just maintain statistics in /proc/powerpc/eeh_inf_err. Also, we changed the PHB diag-data dump format for a bit to have multiple fields per line and omits the line with all zero'd fields as Ben suggested. v1 -> v2: * Amending commit logs * Support eeh_ops->event() and maintain PHB diag-data on basis of PE instance * When dumping PHB diag-data, to replace "-" with "00000000" and omit the line if the fields of it are all zeros. --- arch/powerpc/include/asm/eeh.h | 7 ++- arch/powerpc/kernel/eeh.c | 10 +--- arch/powerpc/kernel/eeh_driver.c | 10 ++-- arch/powerpc/kernel/eeh_pe.c | 39 ++++++++++++- arch/powerpc/platforms/powernv/eeh-ioda.c | 193 ++++++++++++++++++++++++++++++++++++------------------------- arch/powerpc/platforms/powernv/eeh-powernv.c | 74 +++++++++++++++++++----- arch/powerpc/platforms/powernv/pci.c | 228 +++++++++++++++++++++++++++++++++++++++++------------------------- arch/powerpc/platforms/powernv/pci.h | 11 ++-- arch/powerpc/platforms/pseries/eeh_pseries.c | 3 +- 9 files changed, 358 insertions(+), 217 deletions(-) Thanks, Gavin _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev