Re: [Discuss] Server won't boot kernel. initramfs problem?

2013-02-24 Thread Bill Bogstad
On Sat, Feb 23, 2013 at 12:22 PM, John Abreau wrote: > RAM going bad silently is an aggravating problem, and we often don't think > to test the RAM when some mysterious error crops up. It would be great if > Nagios was able to test RAM automatically. > > Is it possible to test RAM on a live system

Re: [Discuss] Server won't boot kernel. initramfs problem?

2013-02-24 Thread John Abreau
I recall hearing something a few years ago about memtest functionality being added to the Linux kernel. Seems to me that making this functionality visible to something like nagios would be an obvious goal. On Feb 24, 2013, at 2:52 AM, Tom Metro wrote: > Rich Pieri wrote: >> John Abreau wrot

Re: [Discuss] Server won't boot kernel. initramfs problem?

2013-02-24 Thread John Abreau
Maybe not every 5 minutes the way most things are configured in nagios. But running it once a day, or even once a week, to allow nagios a chance to detect memory errors might be worth the overhead. It would be sufficient just to detect that bad RAM exists. You have to power off the server anyw

Re: [Discuss] Server won't boot kernel. initramfs problem?

2013-02-24 Thread Bill Bogstad
On Sun, Feb 24, 2013 at 3:39 PM, John Abreau wrote: > I recall hearing something a few years ago about memtest functionality being > added to the Linux kernel. Seems to me that making this functionality visible > to something like nagios would be an obvious goal. I decided to look into this a l

Re: [Discuss] Server won't boot kernel. initramfs problem?

2013-02-24 Thread John Abreau
I wonder if it could be automated? Perhaps a weekly or monthly cron job that temporarily sets grub to default to the memtest config, then reboots, runs the memtest and logs the results, and finally sets grub back to its previous config? I firmly believe that if a process can only be run manually,

Re: [Discuss] Server won't boot kernel. initramfs problem?

2013-02-24 Thread Rich Pieri
On Sun, 24 Feb 2013 19:13:21 -0500 John Abreau wrote: > I wonder if it could be automated? Perhaps a weekly or monthly cron > job that temporarily sets grub to default to the memtest config, then > reboots, runs the memtest and logs the results, and finally sets grub > back to its previous config

Re: [Discuss] Server won't boot kernel. initramfs problem?

2013-02-24 Thread John Abreau
My understanding is that ECC RAM merely makes a server crash on a memory error, not detect the error and alert the sysadmin. Is that not the case? And no, I'm not looking to implement this in the near future, I just prefer automating routine sysadmin chores and relying on Nagios for routine system

[Discuss] Server won't boot kernel. initramfs problem?

2013-02-24 Thread Shirley Márquez Dúlcey
Incorrect. ECC RAM lets the server repair a single bit error and continue operating without interruption. (The error may be logged if the motherboard supports that and a suitable daemon is active. See the EDAC project: http://bluesmoke.sourceforge.net/ ) Parity memory will crash the server with a m

Re: [Discuss] Server won't boot kernel. initramfs problem?

2013-02-24 Thread Rich Pieri
On Sun, 24 Feb 2013 19:34:19 -0500 John Abreau wrote: > My understanding is that ECC RAM merely makes a server crash on a > memory error, not detect the error and alert the sysadmin. Is that > not the case? It does neither. ECC (error-correcting code) RAM corrects single-bit errors automatically