Re: [PATCH v3 0/9] Extended H/W error log driver

2013-10-18 Thread Tony Luck
On Fri, Oct 18, 2013 at 2:20 AM, Borislav Petkov wrote: > It looks ok to me so far, I'm guessing Tony you're picking this up or > should I? I'll pick it up. Thanks for all the Acks & Reviews. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a messa

Re: [PATCH v3 0/9] Extended H/W error log driver

2013-10-18 Thread Borislav Petkov
On Fri, Oct 18, 2013 at 04:23:35AM -0400, Chen, Gong wrote: > OK, this is the 3rd version. Hope it is the last one :-). It looks ok to me so far, I'm guessing Tony you're picking this up or should I? > this version just updates some minors places and apply some Ack/Review > information. In this v

[PATCH v3 0/9] Extended H/W error log driver

2013-10-18 Thread Chen, Gong
[PATCH v3 1/9] ACPI, APEI, CPER: Fix status check during error printing [PATCH v3 2/9] ACPI, CPER: Update cper info [PATCH v3 3/9] bitops: Introduce a more generic BITMASK macro [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform [PATCH v3 5/9] DMI: Parse memory device (type 17) in

Re: [PATCH v2 0/9] Extended H/W error log driver

2013-10-17 Thread Borislav Petkov
On Thu, Oct 17, 2013 at 11:25:41AM -0400, Steven Rostedt wrote: > On Thu, 17 Oct 2013 10:33:48 -0400 > Chen Gong wrote: > > > > > Gong, can you try moving the CREATE_TRACE_POINTS line to a new file - > > > arch/x86/ras/ras.c and define it there and not anywhere else, i.e. move > > > it away from

Re: [PATCH v2 0/9] Extended H/W error log driver

2013-10-17 Thread Steven Rostedt
On Thu, 17 Oct 2013 10:33:48 -0400 Chen Gong wrote: > > Gong, can you try moving the CREATE_TRACE_POINTS line to a new file - > > arch/x86/ras/ras.c and define it there and not anywhere else, i.e. move > > it away from edac_mc.c. Does that help? > > In current kernel we haven't arch/x86/ras/ras

Re: [PATCH v2 0/9] Extended H/W error log driver

2013-10-17 Thread Chen Gong
, linux-a...@vger.kernel.org, > linux-kernel@vger.kernel.org > Subject: Re: [PATCH v2 0/9] Extended H/W error log driver > User-Agent: Mutt/1.5.21 (2010-09-15) > > On Wed, Oct 16, 2013 at 08:00:38PM +0200, Borislav Petkov wrote: > > Right, the only difference I can see is th

Re: Extended H/W error log driver

2013-10-17 Thread Borislav Petkov
On Thu, Oct 17, 2013 at 05:37:22PM +0530, Naveen N. Rao wrote: > That's me raising both my hands :) :-) > If you feel so strongly about it. "Corrected Error" is an oxymoron. > It's really just the hardware notifying us. Yeah, but we can't write "We just corrected a single-bit flip in DIMM array

Re: Extended H/W error log driver

2013-10-17 Thread Naveen N. Rao
On 10/16/2013 12:53 AM, Borislav Petkov wrote: On Wed, Oct 16, 2013 at 12:40:40AM +0530, Naveen N. Rao wrote: +2 ;) You're counting for 2 people, huh? That's me raising both my hands :) :-) While at it, I wonder if we're better off calling these "Hardware events" rather than "Hardware e

Re: [PATCH v2 0/9] Extended H/W error log driver

2013-10-16 Thread Borislav Petkov
On Wed, Oct 16, 2013 at 08:00:38PM +0200, Borislav Petkov wrote: > Right, the only difference I can see is that include/ras/ras_event.h > doesn't have those below: > > #undef TRACE_INCLUDE_PATH > #undef TRACE_INCLUDE_FILE > #define TRACE_INCLUDE_PATH . > > Perhaps that is the problem? > > Gong,

Re: [PATCH v2 0/9] Extended H/W error log driver

2013-10-16 Thread Borislav Petkov
On Wed, Oct 16, 2013 at 12:56:46PM -0400, Steven Rostedt wrote: > On Wed, 16 Oct 2013 18:05:50 +0200 > Borislav Petkov wrote: > > > > > For trace output format we still need further discussion. In the last > > > patch(support trace interface) I have to reserve previous Kconfig > > > format beca

Re: [PATCH v2 0/9] Extended H/W error log driver

2013-10-16 Thread Steven Rostedt
On Wed, 16 Oct 2013 18:05:50 +0200 Borislav Petkov wrote: > > For trace output format we still need further discussion. In the last > > patch(support trace interface) I have to reserve previous Kconfig > > format because I find once I put trace_event interface in the module, > > it will not wor

Re: [PATCH v2 0/9] Extended H/W error log driver

2013-10-16 Thread Joe Perches
On Wed, 2013-10-16 at 18:05 +0200, Borislav Petkov wrote: > On Wed, Oct 16, 2013 at 10:55:57AM -0400, Chen, Gong wrote: [] > > After applying this patch series, when a memory corrected error happens, > > we can get following information: > > > > dmesg output: > > > > [ 949.545817] {1}Hardware er

Re: [PATCH v2 0/9] Extended H/W error log driver

2013-10-16 Thread Borislav Petkov
On Wed, Oct 16, 2013 at 10:55:57AM -0400, Chen, Gong wrote: > [PATCH v2 1/9] ACPI, APEI, CPER: Fix status check during error printing > [PATCH v2 2/9] ACPI, CPER: Update cper info > [PATCH v2 3/9] bitops: Introduce a more generic BITMASK macro > [PATCH v2 4/9] ACPI, x86: Extended error log driver f

Re: [PATCH v2 0/9] Extended H/W error log driver

2013-10-16 Thread Chen Gong
[...] > > dmesg output format has been updated based on the suggestion from Boris. > For trace output format we still need further discussion. In the last > patch(support trace interface) I have to reserve previous Kconfig format > because I find once I put trace_event interface in the module, it

[PATCH v2 0/9] Extended H/W error log driver

2013-10-16 Thread Chen, Gong
[PATCH v2 1/9] ACPI, APEI, CPER: Fix status check during error printing [PATCH v2 2/9] ACPI, CPER: Update cper info [PATCH v2 3/9] bitops: Introduce a more generic BITMASK macro [PATCH v2 4/9] ACPI, x86: Extended error log driver for x86 platform [PATCH v2 5/9] DMI: Parse memory device (type 17) in

Re: Extended H/W error log driver

2013-10-15 Thread Borislav Petkov
On Wed, Oct 16, 2013 at 12:40:40AM +0530, Naveen N. Rao wrote: > +2 ;) You're counting for 2 people, huh? :-) > While at it, I wonder if we're better off calling these "Hardware > events" rather than "Hardware errors". Oh, please no. That's that euphemistic lying which serves no one. And here's

Re: Extended H/W error log driver

2013-10-15 Thread Naveen N. Rao
On 2013/10/15 09:15AM, Tony Luck wrote: > On Tue, Oct 15, 2013 at 2:28 AM, Borislav Petkov wrote: > > We can even add a hint for the user like: > > > > "Above errors have been corrected by the hardware and require no > > further action." > > > > Btw, this is valid for both dmesg and trace

Re: Extended H/W error log driver

2013-10-15 Thread Tony Luck
On Tue, Oct 15, 2013 at 2:28 AM, Borislav Petkov wrote: > We can even add a hint for the user like: > > "Above errors have been corrected by the hardware and require no > further action." > > Btw, this is valid for both dmesg and trace event output. > > Because from my experience so far p

Re: Extended H/W error log driver

2013-10-15 Thread Borislav Petkov
On Tue, Oct 15, 2013 at 12:07:31AM -0400, Chen Gong wrote: > Some errors have multiple sub sections like below: > > [ 1442.070522] {2}[Hardware Error]: Hardware error from APEI Generic Hardware > Error Source: 0 > [ 1442.070528] {2}[Hardware Error]: event severity: corrected > [ 1442.070531] {2}[

Re: Extended H/W error log driver

2013-10-14 Thread Chen Gong
On Mon, Oct 14, 2013 at 12:55:33PM +0200, Borislav Petkov wrote: > Date: Mon, 14 Oct 2013 12:55:33 +0200 > From: Borislav Petkov > To: Chen Gong > Cc: tony.l...@intel.com, linux-kernel@vger.kernel.org, > linux-a...@vger.kernel.org > Subject: Re: Extended H/W error log driver

Re: Extended H/W error log driver

2013-10-14 Thread Borislav Petkov
On Mon, Oct 14, 2013 at 02:49:40AM -0400, Chen Gong wrote: > On Fri, Oct 11, 2013 at 10:04:27AM +0200, Borislav Petkov wrote: > > > [56005.786154] {4}Hardware error detected on CPU0 > > > [56005.786159] {4}event severity: corrected > > > [56005.786162] {4}sub_event[0], severity: corrected > > > >

Re: Extended H/W error log driver

2013-10-14 Thread Chen Gong
On Fri, Oct 11, 2013 at 10:04:27AM +0200, Borislav Petkov wrote: > Date: Fri, 11 Oct 2013 10:04:27 +0200 > From: Borislav Petkov > To: "Chen, Gong" > Cc: tony.l...@intel.com, linux-kernel@vger.kernel.org, > linux-a...@vger.kernel.org > Subject: Re: Extended H/W e

Re: Extended H/W error log driver

2013-10-11 Thread Borislav Petkov
On Fri, Oct 11, 2013 at 02:54:13PM +, Luck, Tony wrote: > It's such a simple goal - I can't believe it took this long to get > here :-) Right, I'd guess some standard's body needed to be persuaded :-) > > Btw, what's "Memriser1"? > > Each memory controller on this machine routes to a plug-in

RE: Extended H/W error log driver

2013-10-11 Thread Luck, Tony
>> [56005.785981] {3}physical_address: 0x000851fe >> [56005.786027] {3}DIMM location: Memriser1 CHANNEL A DIMM 0 > > Very good guys, I've been waiting for years for this to be possible, > good job! :-) It's such a simple goal - I can't believe it took this long to get here :-) > Btw, what

Re: Extended H/W error log driver

2013-10-11 Thread Borislav Petkov
On Fri, Oct 11, 2013 at 02:32:38AM -0400, Chen, Gong wrote: > [56005.785917] {3}Hardware error detected on CPU0 > [56005.785959] {3}event severity: corrected > [56005.785975] {3}sub_event[0], severity: corrected > [56005.785977] {3}section_type: memory error > [56005.785981] {3}physical_address: 0x

Re: Extended H/W error log driver

2013-10-11 Thread Joe Perches
On Fri, 2013-10-11 at 02:32 -0400, Chen, Gong wrote: > This patch series adds an enhanced MCA event logging driver provided by Intel. [] > dmesg output: > > [56005.785917] {3}Hardware error detected on CPU0 > [56005.785959] {3}event severity: corrected > [56005.785975] {3}sub_event[0], severity: c

Extended H/W error log driver

2013-10-10 Thread Chen, Gong
[PATCH 1/8] ACPI, APEI, CPER: Fix status check during error printing [PATCH 2/8] ACPI, CPER: Update cper info [PATCH 3/8] ACPI, x86: Extended error log driver for x86 platform [PATCH 4/8] DMI: Parse memory device (type 17) in SMBIOS [PATCH 5/8] ACPI, APEI, CPER: Add UEFI 2.4 support for memory erro