Re: [PATCH 3/3] ie31200_edac: Add driver

2014-06-11 Thread Jason Baron
On 04/09/2014 05:33 PM, Luck, Tony wrote: >> Unfortunately, the box reporting the ue errors just went into transit (so >> that I can better examine this issue), so I will probably not be able to >> run this experiment on that specific box until next week. > > Do you have any other logs from this m

Re: [PATCH 3/3] ie31200_edac: Add driver

2014-04-10 Thread Borislav Petkov
On Wed, Apr 09, 2014 at 10:44:21PM +, Luck, Tony wrote: > Scenario: Your mission critical app is running (controlling a giant > laser cutter). Oops there is a memory error, and the bad data arrives > at the application causing it to swing the laser beam through 180 > degrees, destroying half of

Re: [PATCH 3/3] ie31200_edac: Add driver

2014-04-09 Thread Jason Baron
On 04/09/2014 06:44 PM, Luck, Tony wrote: >> So when the driver sees uncorrected errors, I'm also seeing them in my >> memory scanning program - so they correspond nicely. I didn't see anything >> logged in /var/log/mcelog, but I will update to the latest when possible. > I wonder if there are some

RE: [PATCH 3/3] ie31200_edac: Add driver

2014-04-09 Thread Luck, Tony
> So when the driver sees uncorrected errors, I'm also seeing them in my > memory scanning program - so they correspond nicely. I didn't see anything > logged in /var/log/mcelog, but I will update to the latest when possible. I wonder if there are some BIOS options to enable reporting via CMCI/MCE

Re: [PATCH 3/3] ie31200_edac: Add driver

2014-04-09 Thread Jason Baron
On 04/09/2014 05:33 PM, Luck, Tony wrote: >> Unfortunately, the box reporting the ue errors just went into transit (so >> that I can better examine this issue), so I will probably not be able to >> run this experiment on that specific box until next week. > > Do you have any other logs from this m

RE: [PATCH 3/3] ie31200_edac: Add driver

2014-04-09 Thread Luck, Tony
> Unfortunately, the box reporting the ue errors just went into transit (so > that I can better examine this issue), so I will probably not be able to > run this experiment on that specific box until next week. Do you have any other logs from this machine. Is there something logged in one (or mor

Re: [PATCH 3/3] ie31200_edac: Add driver

2014-04-09 Thread Borislav Petkov
On Wed, Apr 09, 2014 at 03:53:49PM -0400, Jason Baron wrote: > Unfortunately, the box reporting the ue errors just went into transit (so > that I can better examine this issue), so I will probably not be able to > run this experiment on that specific box until next week. > > However, I was able to

Re: [PATCH 3/3] ie31200_edac: Add driver

2014-04-09 Thread Jason Baron
On 04/09/2014 03:14 PM, Borislav Petkov wrote: > On Wed, Apr 09, 2014 at 02:57:19PM -0400, Jason Baron wrote: >> Right, so maybe the fact that its a desktop chipset means that it >> behaves differently and doesn't raise MCEs on memory errors. We have a >> bunch of these processors and we haven't ye

Re: [PATCH 3/3] ie31200_edac: Add driver

2014-04-09 Thread Borislav Petkov
On Wed, Apr 09, 2014 at 02:57:19PM -0400, Jason Baron wrote: > Right, so maybe the fact that its a desktop chipset means that it > behaves differently and doesn't raise MCEs on memory errors. We have a > bunch of these processors and we haven't yet seen an MCE raised on a > memory error. This can'

Re: [PATCH 3/3] ie31200_edac: Add driver

2014-04-09 Thread Jason Baron
On 04/09/2014 01:36 PM, Borislav Petkov wrote: > On Wed, Apr 09, 2014 at 05:17:53PM +, Luck, Tony wrote: >> The E3-12xx processors connect out to a different (desktop) chipset >> from the E5 (server parts). Perhaps that means the memory controller >> are different too??? > > You gotta love how

Re: [PATCH 3/3] ie31200_edac: Add driver

2014-04-09 Thread Jason Baron
On 04/09/2014 07:35 AM, Borislav Petkov wrote: > On Fri, Apr 04, 2014 at 09:14:04PM +, Jason Baron wrote: >> Add 'ie31200_edac' driver for the E3-1200 series of Intel processors. Driver >> is based on the following E3-1200 specs: >> >> http://www.intel.com/content/www/us/en/processors/xeon/xeon

RE: [PATCH 3/3] ie31200_edac: Add driver

2014-04-09 Thread Luck, Tony
>> Why not put it into sb_edac - it is small enough and if you're lucky, >> you might even share functionality? > > By quickly looking at the driver (sorry Jason, no proper review yet :( ) > it's a very different beast. Tony, any insights on why? The E3-12xx processors connect out to a different (

Re: [PATCH 3/3] ie31200_edac: Add driver

2014-04-09 Thread Borislav Petkov
On Wed, Apr 09, 2014 at 05:17:53PM +, Luck, Tony wrote: > The E3-12xx processors connect out to a different (desktop) chipset > from the E5 (server parts). Perhaps that means the memory controller > are different too??? You gotta love how Intel has a different memory controller for server and

Re: [PATCH 3/3] ie31200_edac: Add driver

2014-04-09 Thread Aristeu Rozanski
On Wed, Apr 09, 2014 at 01:35:52PM +0200, Borislav Petkov wrote: > Btw, remind me again why this isn't part of the sb_edac? AFAICT, the > e3-12xx thing is a Sandybridge, right? > > Why not put it into sb_edac - it is small enough and if you're lucky, > you might even share functionality? By quick

Re: [PATCH 3/3] ie31200_edac: Add driver

2014-04-09 Thread Borislav Petkov
On Fri, Apr 04, 2014 at 09:14:04PM +, Jason Baron wrote: > Add 'ie31200_edac' driver for the E3-1200 series of Intel processors. Driver > is based on the following E3-1200 specs: > > http://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200-family-vol-2-datasheet.html > http://www.in

Re: [PATCH 3/3] ie31200_edac: Add driver

2014-04-09 Thread Borislav Petkov
On Tue, Apr 08, 2014 at 06:16:43PM -0400, Jason Baron wrote: > I also noticed that some EDAC drivers do a 'pci_dev_get()' in their > 'init_one' function, so I'm not clear if that's needed as well (I'm > hoping the MCH can't be removed at run-time :)). That'll be a fun stunt if it were possible. :-

Re: [PATCH 3/3] ie31200_edac: Add driver

2014-04-09 Thread Borislav Petkov
On Tue, Apr 08, 2014 at 11:03:08PM -0400, Jason Baron wrote: > Hmmm...as I said, I'm not getting any machine checks with ue errors. > I've got a fairly old kernel on the system atm, I will try loading a > newer kernel, to see if that makes any difference... Well, regardless of the kernel, if the m

Re: [PATCH 3/3] ie31200_edac: Add driver

2014-04-08 Thread Jason Baron
On 04/08/2014 06:34 PM, Luck, Tony wrote: >>> Btw, this driver is polling, AFAICT. Doesn't e3-12xx support the CMCI >>> interrupt which you can feed into this driver directly and thus not need >>> the polling at all? >> On the system with the ce and ue events that I'm testing on, I don't see >> 'MC

RE: [PATCH 3/3] ie31200_edac: Add driver

2014-04-08 Thread Luck, Tony
>> Btw, this driver is polling, AFAICT. Doesn't e3-12xx support the CMCI >> interrupt which you can feed into this driver directly and thus not need >> the polling at all? > > On the system with the ce and ue events that I'm testing on, I don't see > 'MCE' nudge above 0, in /proc/interrupts. So I t

Re: [PATCH 3/3] ie31200_edac: Add driver

2014-04-08 Thread Jason Baron
Hi, On 04/08/2014 05:09 AM, Borislav Petkov wrote: > On Fri, Apr 04, 2014 at 09:14:04PM +, Jason Baron wrote: >> Add 'ie31200_edac' driver for the E3-1200 series of Intel processors. Driver >> is based on the following E3-1200 specs: >> >> http://www.intel.com/content/www/us/en/processors/xeo

Re: [PATCH 3/3] ie31200_edac: Add driver

2014-04-08 Thread Borislav Petkov
On Fri, Apr 04, 2014 at 09:14:04PM +, Jason Baron wrote: > Add 'ie31200_edac' driver for the E3-1200 series of Intel processors. Driver > is based on the following E3-1200 specs: > > http://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200-family-vol-2-datasheet.html > http://www.in