Re: [CentOS] debugging RAM issues

2012-03-14 Thread Les Mikesell
On Wed, Mar 14, 2012 at 2:35 PM, John R Pierce wrote: > On 03/14/12 12:16 PM, Les Mikesell wrote: >> If you were running software RAID1 on that box, don't trust anything >> on the drives now.   Maybe even if you weren't, but it is especially >> weird when alternate reads randomly revive bad data t

Re: [CentOS] debugging RAM issues

2012-03-14 Thread Alan McKay
On Wed, Mar 14, 2012 at 3:16 PM, Les Mikesell wrote: > If you were running software RAID1 on that box, don't trust anything > on the drives now. Maybe even if you weren't, but it is especially > weird when alternate reads randomly revive bad data that you thought > had been fixed already. > > N

Re: [CentOS] debugging RAM issues

2012-03-14 Thread John R Pierce
On 03/14/12 12:16 PM, Les Mikesell wrote: > If you were running software RAID1 on that box, don't trust anything > on the drives now. Maybe even if you weren't, but it is especially > weird when alternate reads randomly revive bad data that you thought > had been fixed already. and the worst par

Re: [CentOS] debugging RAM issues

2012-03-14 Thread Les Mikesell
On Wed, Mar 14, 2012 at 1:43 PM, Alan McKay wrote: > Well I did exactly what I'd done 3 months ago and found a faulty RAM chip > this time > > My guess is that back then the chip was still functioning some of the time, > and happened to be fine just when I was doing the tests. > > This time I foun

Re: [CentOS] debugging RAM issues

2012-03-14 Thread Alan McKay
Well I did exactly what I'd done 3 months ago and found a faulty RAM chip this time My guess is that back then the chip was still functioning some of the time, and happened to be fine just when I was doing the tests. This time I found it fairly easily with a systematic approach. -- “Don't eat a

Re: [CentOS] debugging RAM issues

2012-03-13 Thread Alan McKay
On Tue, Mar 13, 2012 at 2:15 PM, Scott Silva wrote: > It could also be a power supply problem... Add memory load, and a bit of > heat, > and voltage drops a bit... > Problem is that even if I leave it unplugged for some time I can get the problem. And I have the heat sensors all graphed, and

Re: [CentOS] debugging RAM issues

2012-03-13 Thread Scott Silva
on 3/13/2012 11:07 AM Ross Walker spake the following: > On Mar 13, 2012, at 12:50 PM, Alan McKay wrote: > >> Back about 3 months ago I took this system down and removed all the RAM, >> and stuck individual chips into it and booted it, testing each chip on its >> own. At that time every single o

Re: [CentOS] debugging RAM issues

2012-03-13 Thread Alan McKay
On Tue, Mar 13, 2012 at 2:07 PM, Ross Walker wrote: > It could be a bad physical RAM slot on the motherboard. > > Oh dang, why didn't I think of that! I'll try that next -- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food" _

Re: [CentOS] debugging RAM issues

2012-03-13 Thread Ross Walker
On Mar 13, 2012, at 12:50 PM, Alan McKay wrote: > Back about 3 months ago I took this system down and removed all the RAM, > and stuck individual chips into it and booted it, testing each chip on its > own. At that time every single one of them worked! But I'm about to try > this again to see

Re: [CentOS] debugging RAM issues

2012-03-13 Thread m . roth
Alan McKay wrote: > Hey folks, > > I have 1 system ( Sunfire x2250 running 5.7 ) that is having issues with > RAM, but I'm not sure how to debug it. And unfortunately it is not under > support anymore. Oy, as they say, vey. You still *might* be able to email Sun, er, Oracle support without payin

Re: [CentOS] debugging RAM issues

2012-03-13 Thread Les Mikesell
On Tue, Mar 13, 2012 at 11:50 AM, Alan McKay wrote: > > Back about 3 months ago I took this system down and removed all the RAM, > and stuck individual chips into it and booted it, testing each chip on its > own.   At that time every single one of them worked!   But I'm about to try > this again t

[CentOS] debugging RAM issues

2012-03-13 Thread Alan McKay
Hey folks, I have 1 system ( Sunfire x2250 running 5.7 ) that is having issues with RAM, but I'm not sure how to debug it. And unfortunately it is not under support anymore. I started the job about 4 months ago and when I came aboard the guy who handed stuff over to me told me this issue was on