On Mon, Jun 24, 2019 at 03:08:57PM +0000, Robert Richter wrote:
> The conversion from the physical address mask to a grain (defined as
> granularity in bytes) is broken:
> 
>       e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
> 
> E.g., a physical address mask of ~0xfff should give a grain of 0x1000,
> instead the grain is wrong with the upper bits always set. We also
> remove the limitation to the page size as the granularity is unrelated
> to the page size used in the system. We fix this with:
> 
>       e->grain = ~mem_err->physical_addr_mask + 1;
> 
> Note: We need to adopt the grain_bits calculation as e->grain is now a
> power of 2 and no longer a bit mask. The formula is now the same as in
> edac_mc and can later be unified.

Please refrain from using "We" or "I" or etc personal pronouns in a
commit message and in the code comments below.

>From Documentation/process/submitting-patches.rst:

 "Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
  instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
  to do frotz", as if you are giving orders to the codebase to change
  its behaviour."

Please fix all your other commit messages for the next submission.

> Signed-off-by: Robert Richter <rrich...@marvell.com>
> ---
>  drivers/edac/ghes_edac.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 7f19f1c672c3..d095d98d6a8d 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -222,6 +222,7 @@ void ghes_edac_report_mem_error(int sev, struct 
> cper_sec_mem_err *mem_err)
>       /* Cleans the error report buffer */
>       memset(e, 0, sizeof (*e));
>       e->error_count = 1;
> +     e->grain = 1;
>       strcpy(e->label, "unknown label");
>       e->msg = pvt->msg;
>       e->other_detail = pvt->other_detail;
> @@ -317,7 +318,7 @@ void ghes_edac_report_mem_error(int sev, struct 
> cper_sec_mem_err *mem_err)
>  
>       /* Error grain */
>       if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
> -             e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
> +             e->grain = ~mem_err->physical_addr_mask + 1;

This is assuming that that ->physical_addr_mask is contiguous but I
don't trust any firmware. I guess we can leave it like that for now
until some "inventive" firmware actually does it.

>  
>       /* Memory error location, mapped on e->location */
>       p = e->location;
> @@ -433,8 +434,15 @@ void ghes_edac_report_mem_error(int sev, struct 
> cper_sec_mem_err *mem_err)
>       if (p > pvt->other_detail)
>               *(p - 1) = '\0';
>  
> +     /*
> +      * We expect the hw to report a reasonable grain, fallback to
> +      * 1 byte granularity otherwise.
> +      */
> +     if (WARN_ON_ONCE(!e->grain))

Please move that WARN_ON_ONCE in the

        if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)

branch above because you're presetting grain to 1 so the warn should be
close to where it could happen, i.e., when coming from the firmware.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Reply via email to