Re: [RFC EDAC/GHES] edac: lock module owner to avoid error report conflicts

2012-11-01 Thread Mauro Carvalho Chehab
Em Fri, 2 Nov 2012 00:54:57 +0100 Borislav Petkov escreveu: > On Thu, Nov 01, 2012 at 11:47:52PM +, Luck, Tony wrote: > > > Right, but at least in the csrow case, we still can compute back the > > > csrow even with the interleaving, after we know how it is done exactly > > > (on which address

Re: [RFC EDAC/GHES] edac: lock module owner to avoid error report conflicts

2012-11-01 Thread Mauro Carvalho Chehab
Em Thu, 1 Nov 2012 21:09:07 + "Luck, Tony" escreveu: > Em Thu, 1 Nov 2012 20:55:09 +0100 > Borislav Petkov escreveu: > > On Thu, Nov 01, 2012 at 09:47:21AM -0200, Mauro Carvalho Chehab wrote: > > > 1) when both APEI/GHES and sb_edac are loaded, error reports are > > >inconsistent: race i

Re: [RFC EDAC/GHES] edac: lock module owner to avoid error report conflicts

2012-11-01 Thread Borislav Petkov
On Thu, Nov 01, 2012 at 11:47:52PM +, Luck, Tony wrote: > > Right, but at least in the csrow case, we still can compute back the > > csrow even with the interleaving, after we know how it is done exactly > > (on which address bits, etc). I think this should be doable on Intel > > controllers to

RE: [RFC EDAC/GHES] edac: lock module owner to avoid error report conflicts

2012-11-01 Thread Luck, Tony
> Right, but at least in the csrow case, we still can compute back the > csrow even with the interleaving, after we know how it is done exactly > (on which address bits, etc). I think this should be doable on Intel > controllers too but I don't know. No. Architecturally all Intel provides is the p

Re: [RFC EDAC/GHES] edac: lock module owner to avoid error report conflicts

2012-11-01 Thread Borislav Petkov
On Thu, Nov 01, 2012 at 09:09:07PM +, Luck, Tony wrote: > > That is correct, unfortunately. That information is not available to > > software in all cases. Maybe APEI could be used for that DIMM location > > mapping through simple tables instead of letting it fumble the error > > handling path.

RE: [RFC EDAC/GHES] edac: lock module owner to avoid error report conflicts

2012-11-01 Thread Luck, Tony
> That is correct, unfortunately. That information is not available to > software in all cases. Maybe APEI could be used for that DIMM location > mapping through simple tables instead of letting it fumble the error > handling path. Not much hope for "simple"[1] tables. There is also a timings iss

Re: [RFC EDAC/GHES] edac: lock module owner to avoid error report conflicts

2012-11-01 Thread Borislav Petkov
On Thu, Nov 01, 2012 at 09:47:21AM -0200, Mauro Carvalho Chehab wrote: > 1) when both APEI/GHES and sb_edac are loaded, error reports are >inconsistent: race issues; bad APEI/MCE interface, etc. So, there's >curently a bug that needs to be fixed; That's correct. And we probably could add s

Re: [RFC EDAC/GHES] edac: lock module owner to avoid error report conflicts

2012-11-01 Thread Mauro Carvalho Chehab
Em Thu, 1 Nov 2012 10:25:23 -0700 Tony Luck escreveu: > On Thu, Nov 1, 2012 at 4:47 AM, Mauro Carvalho Chehab > wrote: > > Take a look at arch/x86/kernel/cpu/mcheck/mce-apei.c: > > > > void apei_mce_report_mem_error(int corrected, struct > > cper_sec_mem_err *mem_err) > > { > >

Re: [RFC EDAC/GHES] edac: lock module owner to avoid error report conflicts

2012-11-01 Thread Tony Luck
On Thu, Nov 1, 2012 at 4:47 AM, Mauro Carvalho Chehab wrote: > Take a look at arch/x86/kernel/cpu/mcheck/mce-apei.c: > > void apei_mce_report_mem_error(int corrected, struct cper_sec_mem_err > *mem_err) > { > struct mce m; > > /* Only corrected MC i

Re: [RFC EDAC/GHES] edac: lock module owner to avoid error report conflicts

2012-11-01 Thread Mauro Carvalho Chehab
Em Thu, 1 Nov 2012 12:05:12 +0100 Borislav Petkov escreveu: > + Tony. > > On Wed, Oct 31, 2012 at 11:58:15AM -0200, Mauro Carvalho Chehab wrote: > > There's a know bug that happens when apei/ghes is loaded together > > with an EDAC module: the same error is reported several times, > > as ghes ca

Re: [RFC EDAC/GHES] edac: lock module owner to avoid error report conflicts

2012-11-01 Thread Borislav Petkov
+ Tony. On Wed, Oct 31, 2012 at 11:58:15AM -0200, Mauro Carvalho Chehab wrote: > There's a know bug that happens when apei/ghes is loaded together > with an EDAC module: the same error is reported several times, > as ghes calls mcelog, with, in tune, calls edac. This is exactly why I think APEI i