On Mon, Jun 24, 2019 at 03:08:57PM +0000, Robert Richter wrote: > The conversion from the physical address mask to a grain (defined as > granularity in bytes) is broken: > > e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK); > > E.g., a physical address mask of ~0xfff should give a grain of 0x1000, > instead the grain is wrong with the upper bits always set. We also > remove the limitation to the page size as the granularity is unrelated > to the page size used in the system. We fix this with: > > e->grain = ~mem_err->physical_addr_mask + 1; > > Note: We need to adopt the grain_bits calculation as e->grain is now a > power of 2 and no longer a bit mask. The formula is now the same as in > edac_mc and can later be unified.
Please refrain from using "We" or "I" or etc personal pronouns in a commit message and in the code comments below. >From Documentation/process/submitting-patches.rst: "Describe your changes in imperative mood, e.g. "make xyzzy do frotz" instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy to do frotz", as if you are giving orders to the codebase to change its behaviour." Please fix all your other commit messages for the next submission. > Signed-off-by: Robert Richter <rrich...@marvell.com> > --- > drivers/edac/ghes_edac.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c > index 7f19f1c672c3..d095d98d6a8d 100644 > --- a/drivers/edac/ghes_edac.c > +++ b/drivers/edac/ghes_edac.c > @@ -222,6 +222,7 @@ void ghes_edac_report_mem_error(int sev, struct > cper_sec_mem_err *mem_err) > /* Cleans the error report buffer */ > memset(e, 0, sizeof (*e)); > e->error_count = 1; > + e->grain = 1; > strcpy(e->label, "unknown label"); > e->msg = pvt->msg; > e->other_detail = pvt->other_detail; > @@ -317,7 +318,7 @@ void ghes_edac_report_mem_error(int sev, struct > cper_sec_mem_err *mem_err) > > /* Error grain */ > if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) > - e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK); > + e->grain = ~mem_err->physical_addr_mask + 1; This is assuming that that ->physical_addr_mask is contiguous but I don't trust any firmware. I guess we can leave it like that for now until some "inventive" firmware actually does it. > > /* Memory error location, mapped on e->location */ > p = e->location; > @@ -433,8 +434,15 @@ void ghes_edac_report_mem_error(int sev, struct > cper_sec_mem_err *mem_err) > if (p > pvt->other_detail) > *(p - 1) = '\0'; > > + /* > + * We expect the hw to report a reasonable grain, fallback to > + * 1 byte granularity otherwise. > + */ > + if (WARN_ON_ONCE(!e->grain)) Please move that WARN_ON_ONCE in the if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) branch above because you're presetting grain to 1 so the warn should be close to where it could happen, i.e., when coming from the firmware. Thx. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply.