After the 'commit b9cae27728d1 ("EDAC/ghes: Scan the system once on driver init")' applied, following error has occurred in the ghes_edac_register() when CONFIG_DEBUG_TEST_DRIVER_REMOVE is enabled. The null ghes_hw.dimms pointer in the mci_for_each_dimm() of ghes_edac_register() caused the error.
The error occurs when all the previously initialized ghes instances are removed and then probe a new ghes instance. In this case, the ghes_refcount would be 0, ghes_hw.dimms and mci already freed. The ghes_hw.dimms would be null because ghes_scan_system() would not call enumerate_dimms() again. Suggested-by: Borislav Petkov <b...@suse.de> Signed-off-by: Shiju Jose <shiju.j...@huawei.com> --- drivers/edac/ghes_edac.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index da60c29468a7..54ebc8afc6b1 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -55,6 +55,8 @@ static DEFINE_SPINLOCK(ghes_lock); static bool __read_mostly force_load; module_param(force_load, bool, 0); +static bool system_scanned; + /* Memory Device - Type 17 of SMBIOS spec */ struct memdev_dmi_entry { u8 type; @@ -225,14 +227,12 @@ static void enumerate_dimms(const struct dmi_header *dh, void *arg) static void ghes_scan_system(void) { - static bool scanned; - - if (scanned) + if (system_scanned) return; dmi_walk(enumerate_dimms, &ghes_hw); - scanned = true; + system_scanned = true; } void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) @@ -631,6 +631,8 @@ void ghes_edac_unregister(struct ghes *ghes) mutex_lock(&ghes_reg_mutex); + system_scanned = false; + if (!refcount_dec_and_test(&ghes_refcount)) goto unlock; -- 2.17.1