On Fri, Apr 02, 2021 at 06:00:42PM +0200, William Roche wrote:
> Corrected Errors are not the best indicators for a failing DIMM

In the OS, errors reported through different mechanisms is all we have.

> For the moment we will have the CE MCE handled my the MCE_HANDLED_CEC
> aware notifiers only when a page is off-lined, like it used to be.
> 
> Can we start with that small fix ?

Sure but do two variables pls - an "err" one which catches the
function's retval and a "ret" one which ce_add_elem() itself returns so
that there's no confusion like it was before:

---
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index ddecf25b5dd4..b926c679cdaf 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -312,8 +312,8 @@ static bool sanity_check(struct ce_array *ca)
 static int cec_add_elem(u64 pfn)
 {
        struct ce_array *ca = &ce_arr;
+       int count, err, ret = 0;
        unsigned int to = 0;
-       int count, ret = 0;
 
        /*
         * We can be called very early on the identify_cpu() path where we are
@@ -330,8 +330,8 @@ static int cec_add_elem(u64 pfn)
        if (ca->n == MAX_ELEMS)
                WARN_ON(!del_lru_elem_unlocked(ca));
 
-       ret = find_elem(ca, pfn, &to);
-       if (ret < 0) {
+       err = find_elem(ca, pfn, &to);
+       if (err < 0) {
                /*
                 * Shift range [to-end] to make room for one more element.
                 */

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Reply via email to