>> >> >>>> I'm getting a lot of machine check exception errors in dmesg on my
>> >> >>>> hosted server.  Running mcelog I get:
>> >> >>>> ...
>> >> >
>> >> > They offered to take my machine down and do a memory test which they
>> >> > said would take a number of hours.  Is a memory test likely to help?
>> >> > Did you suggest reseating or replacing RAM modules as opposed to a
>> >> > memory test because it will result in less downtime?
>> >>
>> >> I suspect that your hosting provider are offering you this memory test
>> >> because they don't want to go swapping out memory modules willy-nilly.
>> >>
>> >> How do they know that the problem is really memory, and not your
>> >> operating system? If they take all this RAM out and put new RAM in,
>> >> what do they do with the old RAM? They don't know if it's good or bad,
>> >> so are they expected to just slap it in a server belonging to another
>> >> customer, and stitch him up?
>> >>
>> >> A memory test is likely to identify bad RAM, if it is bad, so you should
>> >> proceed with this. This is likely the best route to solving the problem.
>> >>
>> >> I think that ideally, for you, they would move the system image onto a
>> >> different known-good server with the same configuration. Then you cannot
>> >> complain if the same problems start occurring again. If the problem is
>> >> genuinely hardware then they won't. And the hosting provider is free to
>> >> run diagnostics on your old machine.
>> >>
>> >> But realistically, the memory test is likely to show up a bad RAM
>> >> module, you'll get it replaced and be up and running within a few
>> >> hours. Why would you refuse? If your system needed a guaranteed uptime
>> >> you'd perhaps have to pay for a higher level of service than the fees
>> >> you're paying at present.
>> >
>> > I run memory tests overnight.  If a module is seriously borked then it
>> > will fail earlier.  Reseating/replacing takes a few minutes, instead of
>> > hours.
>> >
>> > If they have spare machines (for dev't or testing) they can fit the
>> > memory module(s) there and test them exhaustively, before they put the
>> > good ones back into a customer's machine.
>>
>> Thanks Mick and Stroller.  I'll see if they'll go for this.
>
> You're welcome.  Bear in mind though that a lot of hosters are just glorified
> resellers with an account in a bigger data centre.  In many cases they do not
> even have physical access to the machines.  Only the data centre techies do
> and they may be less willing to oblige and break procedure or routine, just
> because one end user out of hundreds/thousands complained about some memory
> errors.

Thanks Mick.  My host is big with multiple data centers of their own.
They did exactly as I asked and I'm running on new RAM.  There was a
problem bringing my system back online and the cause was purported to
be an unseated ethernet cable.  I handed over my root password as I
was requested to do, and then started to get paranoid.  I suppose I
shouldn't though because with physical access to my machine they
pretty much have full access anyway, right?

- Grant

Reply via email to