On Tuesday 21 September 2010, Stroller wrote: > On 21 Sep 2010, at 18:37, Grant wrote: > >>>> I'm getting a lot of machine check exception errors in dmesg on my > >>>> hosted server. Running mcelog I get: > >>>> ... > > > > They offered to take my machine down and do a memory test which they > > said would take a number of hours. Is a memory test likely to help? > > Did you suggest reseating or replacing RAM modules as opposed to a > > memory test because it will result in less downtime? > > I suspect that your hosting provider are offering you this memory test > because they don't want to go swapping out memory modules willy-nilly. > > How do they know that the problem is really memory, and not your operating > system? If they take all this RAM out and put new RAM in, what do they do > with the old RAM? They don't know if it's good or bad, so are they > expected to just slap it in a server belonging to another customer, and > stitch him up? > > A memory test is likely to identify bad RAM, if it is bad, so you should > proceed with this. This is likely the best route to solving the problem. >
sure? this is ecc ram - does memtest report ecc-corrected errors? i don't think so. The mce errors say: we detected an error. Error was corrected. Applications will not see error. Everything marches on. The ram is borked and must be replaced.