On Sat, Apr 20, 2019 at 2:13 AM Borislav Petkov <b...@alien8.de> wrote: > > On Fri, Apr 19, 2019 at 10:43:03PM -0700, Cong Wang wrote: > > With this change, although not even compiled, mcelog should still > > receive correctable memory errors like before, even when we have > > CONFIG_RAS_CEC=y. > > > > Does this make any sense to you? > > Yes, the answer is in the mail you snipped. Did you read it?
I read it, all of your response is based on your speculation that I don't have CONFIG_X86_MCELOG_LEGACY=y, which is clearly a misunderstanding. You didn't answer my question here, because I asked you whether the following change (PoC only) makes sense: @ -567,12 +567,12 @@ static int mce_first_notifier(struct notifier_block *nb, unsigned long val, void *data) { struct mce *m = (struct mce *)data; + bool consumed; if (!m) return NOTIFY_DONE; - if (cec_add_mce(m)) - return NOTIFY_STOP; + consumed = cec_add_mce(m); /* Emit the trace record: */ trace_mce_record(m); @@ -581,7 +581,7 @@ static int mce_first_notifier(struct notifier_block *nb, unsigned long val, mce_notify_irq(); - return NOTIFY_DONE; + return consumed ? NOTIFY_STOP : NOTIFY_DONE; } > > Hint: disable CONFIG_RAS_CEC. I knew disabling it could cure the problem from the beginning, please save your own time by not repeating things we both already knew. :) Once again, I still don't think it is the right answer, which is also why I keep finding different solutions. I know you disagree, but you never explain why you disagree, you speculated CONFIG_X86_MCELOG_LEGACY, which is completely a misunderstanding. I brought up CONFIG_X86_MCELOG_LEGACY simply to show you how we could break mcelog _LOUDLY_ if we really decide to break it, currently it just breaks silently. You misinterpret it as if I understand CONFIG_RAS as a replacement for CONFIG_X86_MCELOG_LEGACY, which is a very sad misunderstanding. Thanks.