On Friday, December 24, 2010 3:47:16 am Matthew D. Fuller wrote: > On Wed, Dec 22, 2010 at 09:57:26AM -0500 I heard the voice of > John Baldwin, and lo! it spake thus: > > > > You are getting corrected ECC errors in your RAM. > > Actually, don't > > > CPU 0 0 data cache > > ADDR 236493c0 > > Data cache ECC error (syndrome 1c) > > > CPU 0 1 instruction cache > > ADDR 2a1c9440 > > Instruction cache ECC error > > > CPU 0 2 bus unit > > L2 cache ECC error > > > CPU 1 0 data cache > > ADDR 23649640 > > Data cache ECC error (syndrome 1c) > > > CPU 1 1 instruction cache > > ADDR 2a1c9440 > > Instruction cache ECC error > > > CPU 1 2 bus unit > > L2 cache ECC error > > suggest CPU cache, not RAM? > > (that's actually a question; I don't know, but that's what a naive > reading suggests...)
Hmm, I don't know for certain. My interpretation is that the CPU errors were just secondary errors from a memory error like this one that was in the middle of his reported errors. It was also only reported on CPU 0 and not CPU 1: STATUS d000400000000863 MCGSTATUS 0 MCGCAP 105 APICID 0 SOCKETID 0 CPUID Vendor AMD Family 15 Model 67 HARDWARE ERROR. This is NOT a software problem! Please contact your hardware vendor CPU 0 4 northbridge MISC e00d0fff00000000 ADDR 2cac9678 Northbridge RAM ECC error ECC syndrome = 1c bit33 = err cpu1 bit46 = corrected ecc error bit59 = misc error valid bit62 = error overflow (multiple errors) bus error 'local node origin, request didn't time out generic read mem transaction memory access, level generic' On Intel systems (which I am much more familiar with as far as machine checks go), corrected ECC errors did not result in additional events in the CPU caches themselves, but I don't know if AMD is different in this regard. It could be that both CPUs and a DIMM are failing, but replacing a DIMM is cheaper and simpler and you can always replace the CPUs later if CPU errors continue. Of course, I can't tell you which DIMM to replace from these messages, but in this case since they are so easily reproducible, you could probably swap them out one at a time to test. -- John Baldwin _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"