On Tuesday 21 September 2010 20:15:05 Stroller wrote:
> On 21 Sep 2010, at 18:37, Grant wrote:
> >>>> I'm getting a lot of machine check exception errors in dmesg on my
> >>>> hosted server.  Running mcelog I get:
> >>>> ...
> > 
> > They offered to take my machine down and do a memory test which they
> > said would take a number of hours.  Is a memory test likely to help?
> > Did you suggest reseating or replacing RAM modules as opposed to a
> > memory test because it will result in less downtime?
> 
> I suspect that your hosting provider are offering you this memory test
> because they don't want to go swapping out memory modules willy-nilly.
> 
> How do they know that the problem is really memory, and not your operating
> system? If they take all this RAM out and put new RAM in, what do they do
> with the old RAM? They don't know if it's good or bad, so are they
> expected to just slap it in a server belonging to another customer, and
> stitch him up?
> 
> A memory test is likely to identify bad RAM, if it is bad, so you should
> proceed with this. This is likely the best route to solving the problem.
> 
> I think that ideally, for you, they would move the system image onto a
> different known-good server with the same configuration. Then you cannot
> complain if the same problems start occurring again. If the problem is
> genuinely hardware then they won't. And the hosting provider is free to
> run diagnostics on your old machine.
> 
> But realistically, the memory test is likely to show up a bad RAM module,
> you'll get it replaced and be up and running within a few hours. Why would
> you refuse? If your system needed a guaranteed uptime you'd perhaps have
> to pay for a higher level of service than the fees you're paying at
> present.

I run memory tests overnight.  If a module is seriously borked then it will 
fail earlier.  Reseating/replacing takes a few minutes, instead of hours.

If they have spare machines (for dev't or testing) they can fit the memory 
module(s) there and test them exhaustively, before they put the good ones back 
into a customer's machine.
-- 
Regards,
Mick

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to