>> >> >>>> I'm getting a lot of machine check exception errors in dmesg on my >> >> >>>> hosted server. Running mcelog I get: >> >> >>>> ... >> >> > >> >> > They offered to take my machine down and do a memory test which they >> >> > said would take a number of hours. Is a memory test likely to help? >> >> > Did you suggest reseating or replacing RAM modules as opposed to a >> >> > memory test because it will result in less downtime? >> >> >> >> I suspect that your hosting provider are offering you this memory test >> >> because they don't want to go swapping out memory modules willy-nilly. >> >> >> >> How do they know that the problem is really memory, and not your >> >> operating system? If they take all this RAM out and put new RAM in, >> >> what do they do with the old RAM? They don't know if it's good or bad, >> >> so are they expected to just slap it in a server belonging to another >> >> customer, and stitch him up? >> >> >> >> A memory test is likely to identify bad RAM, if it is bad, so you should >> >> proceed with this. This is likely the best route to solving the problem. >> >> >> >> I think that ideally, for you, they would move the system image onto a >> >> different known-good server with the same configuration. Then you cannot >> >> complain if the same problems start occurring again. If the problem is >> >> genuinely hardware then they won't. And the hosting provider is free to >> >> run diagnostics on your old machine. >> >> >> >> But realistically, the memory test is likely to show up a bad RAM >> >> module, you'll get it replaced and be up and running within a few >> >> hours. Why would you refuse? If your system needed a guaranteed uptime >> >> you'd perhaps have to pay for a higher level of service than the fees >> >> you're paying at present. >> > >> > I run memory tests overnight. If a module is seriously borked then it >> > will fail earlier. Reseating/replacing takes a few minutes, instead of >> > hours. >> > >> > If they have spare machines (for dev't or testing) they can fit the >> > memory module(s) there and test them exhaustively, before they put the >> > good ones back into a customer's machine. >> >> Thanks Mick and Stroller. I'll see if they'll go for this. > > You're welcome. Bear in mind though that a lot of hosters are just glorified > resellers with an account in a bigger data centre. In many cases they do not > even have physical access to the machines. Only the data centre techies do > and they may be less willing to oblige and break procedure or routine, just > because one end user out of hundreds/thousands complained about some memory > errors.
Thanks Mick. My host is big with multiple data centers of their own. They did exactly as I asked and I'm running on new RAM. There was a problem bringing my system back online and the cause was purported to be an unseated ethernet cable. I handed over my root password as I was requested to do, and then started to get paranoid. I suppose I shouldn't though because with physical access to my machine they pretty much have full access anyway, right? - Grant