On Fri, Jul 07, 2000 at 10:17:37AM -0700, Eric Councell wrote: > > I have ordered replacement memory for the machine, but I > am curious as to any other possible causes. I know you were asking for other suggestions, but there's a convenient, (except it requires a reboot) way to test your memory under linux. You simply take any large tarfile of text files, (such as the linux kernel), extract it into one directory, and then extract it again, multiple times, into a second directory, doing recursive diffs after every extract. If there are differences between the two directories, then you've probably got a memory problem. One annoyance is that to do the test right, you would want to disable all memory caches from your BIOS. In other words, you would need to schedule downtime for the reboot(s). (It's even better if you can run the test with the different caches enabled one by one. If the problem shows up when you have all caches disabled then you probably have a memory problem, although it could still be other hardware, but if the problem shows up with only one particular cache enabled, then you'll know you have a bad motherboard or cache or cpu.) I've attached part of a linux-kernel thread from a while back that describes the test with sample code--Doug Ledford suggests and discusses the following script in the second email in the attached thread: #!/bin/sh cd /tmp tar xzf linux-2.1.123.tar.gz mv linux linux.save for i in 1 2 3 4 5 6 7 8 9 10 do tar xzf linux-2.1.123.tar.gz diff -U 3 -rN linux.save linux done (Note that with some kernels, you can get some ignorable errors associated with tar extracts and permissions.) Good luck. -Mark Shewmaker [EMAIL PROTECTED]
>From [EMAIL PROTECTED] Tue Sep 29 06:02:54 1998 Received: from listserv.funet.fi (listserv.funet.fi [128.214.248.27]) by primefactor.com (8.8.7/8.8.7) with ESMTP id GAA32121 for <[EMAIL PROTECTED]>; Tue, 29 Sep 1998 06:02:53 -0400 Received: from vger.rutgers.edu ([128.6.190.2]:59969 "EHLO vger.rutgers.edu" ident: "NO-IDENT-SERVICE[2]") by listserv.funet.fi with ESMTP id <10715-6289>; Tue, 29 Sep 1998 13:00:37 +0300 Received: by vger.rutgers.edu id <154750-4055>; Tue, 29 Sep 1998 00:21:48 -0400 Received: from 3dyn43.delft.casema.net ([195.96.104.43]:26269 "EHLO rosie.BitWizard.nl" ident: "root") by vger.rutgers.edu with ESMTP id <154875-4055>; Mon, 28 Sep 1998 23:36:05 -0400 Received: from cave.BitWizard.nl ([EMAIL PROTECTED] [130.161.127.248]) by rosie.BitWizard.nl (8.8.5/8.8.5) with ESMTP id KAA04807; Tue, 29 Sep 1998 10:17:42 +0200 Received: (from wolff@localhost) by cave.BitWizard.nl (8.8.8/8.8.8) id KAA00648; Tue, 29 Sep 1998 10:17:48 +0200 Message-Id: <[EMAIL PROTECTED]> Subject: Re: utility for testing ram? In-Reply-To: <[EMAIL PROTECTED]> from Henrik Olsen at "Sep 29, 98 01:43:34 am" To: [EMAIL PROTECTED] (Henrik Olsen) Date: Tue, 29 Sep 1998 10:17:48 +0200 (MEST) Cc: [EMAIL PROTECTED], [EMAIL PROTECTED] From: [EMAIL PROTECTED] (Rogier Wolff) X-Mailer: ELM [version 2.4ME+ PL37 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Orcpt: rfc822;[EMAIL PROTECTED] Sender: [EMAIL PROTECTED] Precedence: bulk X-Loop: [EMAIL PROTECTED] Status: RO Content-Length: 1481 Lines: 56 Henrik Olsen wrote: > On Mon, 28 Sep 1998, Ricardo Kleemann wrote: > > > Hi, > > > > Anyone know of a utility to extensively test a system's ram, in order to > > determine whether the ram has any faults? > > The classic test is to compile the kernel repeatedly, as mentioned several > times before, this will give the machine a thorough workout, including > running at close to 100% cpu use for a long time, making for no cooldown > in idling, which will make marginal components even more likely to fail. > > A simple script to do continuous testing would be: > > #!/bin/sh > while true > do > make clean > make > done > > Start it running overnight, if it's still running when you wake up, your > memory's likely to be ok. No. MOst likely gcc will crash, give an aborted message and the make aborts the current build, but your make clean, next make will clear all traces of this going wrong.... Try this: #!/bin/sh t=0 while true do make clean make 2>&1 > log.$t t=`expr $t + 1` done The logs should end up all being identical..... Roger. -- | Most people would die sooner than think.... | [EMAIL PROTECTED] | in fact, most do. -- Bertrand Russsell | phone: +31-15-2137555 We write Linux device drivers for any device you may have! fax: ..-2138217 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ >From [EMAIL PROTECTED] Tue Sep 29 02:59:09 1998 Received: from listserv.funet.fi (listserv.funet.fi [128.214.248.27]) by primefactor.com (8.8.7/8.8.7) with ESMTP id CAA30095 for <[EMAIL PROTECTED]>; Tue, 29 Sep 1998 02:59:08 -0400 Received: from vger.rutgers.edu ([128.6.190.2]:51319 "EHLO vger.rutgers.edu" ident: "NO-IDENT-SERVICE[2]") by listserv.funet.fi with ESMTP id <10841-6289>; Tue, 29 Sep 1998 09:57:22 +0300 Received: by vger.rutgers.edu id <154633-4055>; Mon, 28 Sep 1998 20:58:52 -0400 Received: from dledford.dialnet.net ([206.65.249.116]:9198 "EHLO dledford.dialnet.net" ident: "root") by vger.rutgers.edu with ESMTP id <154899-4055>; Mon, 28 Sep 1998 19:51:45 -0400 Received: from dialnet.net (dledford@localhost [127.0.0.1]) by dledford.dialnet.net (8.8.7/8.8.7) with ESMTP id XAA22495; Mon, 28 Sep 1998 23:30:18 -0500 Message-ID: <[EMAIL PROTECTED]> Date: Mon, 28 Sep 1998 23:30:18 -0500 From: Doug Ledford <[EMAIL PROTECTED]> X-Mailer: Mozilla 4.06 [en] (X11; I; Linux 2.0.35 i686) MIME-Version: 1.0 To: Henrik Olsen <[EMAIL PROTECTED]> CC: Ricardo Kleemann <[EMAIL PROTECTED]>, [EMAIL PROTECTED] Subject: Re: utility for testing ram? References: <[EMAIL PROTECTED]> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orcpt: rfc822;[EMAIL PROTECTED] Sender: [EMAIL PROTECTED] Precedence: bulk X-Loop: [EMAIL PROTECTED] Status: RO Content-Length: 2216 Lines: 62 Henrik Olsen wrote: > > On Mon, 28 Sep 1998, Ricardo Kleemann wrote: > > > Hi, > > > > Anyone know of a utility to extensively test a system's ram, in order to > > determine whether the ram has any faults? > > The classic test is to compile the kernel repeatedly, as mentioned several > times before, this will give the machine a thorough workout, including > running at close to 100% cpu use for a long time, making for no cooldown > in idling, which will make marginal components even more likely to fail. > > A simple script to do continuous testing would be: > > #!/bin/sh > while true > do > make clean > make > done > > Start it running overnight, if it's still running when you wake up, your > memory's likely to be ok. No, no, and NO! If you want to test your RAM, you can't run some test that is CPU power limited. You'll never access your RAM here faster than the CPU can compile the kernel, and I got news for people out there. There ain't no CPU yet that compiles a kernel faster than your RAM can read/write those source code and object code pages. This is a good CPU test, not a good RAM test. If your RAM fails during this test with Sig11's or whatever, then it really wasn't marginal to begin with. I know I've posted this test to the list before, but without someone posting a better test, I still claim that your best memory tester that exists is this script: #!/bin/sh cd /tmp tar xzf linux-2.1.123.tar.gz mv linux linux.save for i in 1 2 3 4 5 6 7 8 9 10 do tar xzf linux-2.1.123.tar.gz diff -U 3 -rN linux.save linux done If that script spews anything to the screen, you've failed your memory test. The only exception to this is if your disk sub-system doesn't use DMA, then this test is not as good as it could be, but if your system uses DMA (such as a decent SCSI controller, or DMA IDE) then this test will show bad RAM much faster and more reliably than compiling a kernel. -- Doug Ledford <[EMAIL PROTECTED]> Opinions expressed are my own, but they should be everybody's. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ >From [EMAIL PROTECTED] Tue Sep 29 06:06:19 1998 Received: from listserv.funet.fi (listserv.funet.fi [128.214.248.27]) by primefactor.com (8.8.7/8.8.7) with ESMTP id GAA32125 for <[EMAIL PROTECTED]>; Tue, 29 Sep 1998 06:06:18 -0400 Received: from vger.rutgers.edu ([128.6.190.2]:59969 "EHLO vger.rutgers.edu" ident: "NO-IDENT-SERVICE[2]") by listserv.funet.fi with ESMTP id <10507-5487>; Tue, 29 Sep 1998 13:03:40 +0300 Received: by vger.rutgers.edu id <154753-4055>; Tue, 29 Sep 1998 00:21:57 -0400 Received: from ferret.lmh.ox.ac.uk ([163.1.138.204]:19424 "HELO ferret.lmh.ox.ac.uk" ident: "qmailr") by vger.rutgers.edu with SMTP id <154897-4055>; Mon, 28 Sep 1998 23:49:35 -0400 Received: (qmail 1713 invoked by uid 504); 29 Sep 1998 08:31:33 -0000 Received: from localhost ([EMAIL PROTECTED]) by localhost with SMTP; 29 Sep 1998 08:31:32 -0000 Date: Tue, 29 Sep 1998 09:31:32 +0100 (GMT) From: Matthew Kirkwood <[EMAIL PROTECTED]> To: Ricardo Kleemann <[EMAIL PROTECTED]> cc: [EMAIL PROTECTED] Subject: Re: utility for testing ram? In-Reply-To: <[EMAIL PROTECTED]> Message-ID: <[EMAIL PROTECTED]> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;[EMAIL PROTECTED] Sender: [EMAIL PROTECTED] Precedence: bulk X-Loop: [EMAIL PROTECTED] Status: RO Content-Length: 1011 Lines: 29 On Mon, 28 Sep 1998, Ricardo Kleemann wrote: > Anyone know of a utility to extensively test a system's ram, in order to > determine whether the ram has any faults? It's called gcc :) Seriously, a 24-hour repeated kernel compile will stress your memory much harder than things like memtest86 (sunsite:/pub/linux/handware/somewhere) because the CPU, RAM, cache and various other peripherals are working hard, and probably not following some easily identifiable pattern. ( while true; do make dep clean zImage modules done ) >/dev/null 2>/tmp/errlog & and wait. After a day or so, grep /tmp/errlog for "signal" (usually 11 or 6). Short of a gcc error (relatively unlikely :) each of those signals is a bit-flip. (Don't rely on al of them being caught, either.) See http://bitwizard.nl/sig11/ for more information. Matthew. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ >From [EMAIL PROTECTED] Thu Oct 1 16:04:31 1998 Received: from listserv.funet.fi (listserv.funet.fi [128.214.248.27]) by primefactor.com (8.8.7/8.8.7) with ESMTP id QAA14908 for <[EMAIL PROTECTED]>; Thu, 1 Oct 1998 16:04:30 -0400 Received: from vger.rutgers.edu ([128.6.190.2]:47404 "EHLO vger.rutgers.edu" ident: "TIMEDOUT") by listserv.funet.fi with ESMTP id <10693-24079>; Thu, 1 Oct 1998 23:01:51 +0300 Received: by vger.rutgers.edu id <154151-7446>; Thu, 1 Oct 1998 10:46:23 -0400 Received: from terrorist.math.ntu.edu.tw ([140.112.50.234]:4812 "EHLO terrorist.math.ntu.edu.tw" ident: "TIMEDOUT2") by vger.rutgers.edu with ESMTP id <154544-7446>; Thu, 1 Oct 1998 09:47:01 -0400 Received: (from root@localhost) by terrorist.math.ntu.edu.tw (8.8.5/8.8.5) id BAA12466; Fri, 2 Oct 1998 01:45:15 +0800 Date: Fri, 2 Oct 1998 01:45:15 +0800 Message-Id: <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] CC: Doug Ledford <[EMAIL PROTECTED]>, [EMAIL PROTECTED] From: [EMAIL PROTECTED] In-reply-to: <[EMAIL PROTECTED]> ([EMAIL PROTECTED]) Subject: Re: utility for testing ram? X-Orcpt: rfc822;[EMAIL PROTECTED] Sender: [EMAIL PROTECTED] Precedence: bulk Reply-To: [EMAIL PROTECTED] X-Loop: [EMAIL PROTECTED] Status: RO Content-Length: 1191 Lines: 27 Thus spake Doug Ledford: * I'll stand by my claim that my test will trounce a gcc compile test any day * of the week for finding bad RAM %^) ... experience. Find a machine that * fails my test on one out of every four passes, and I'll show you a machine * that will compile kernels all day long without a hiccup (x86 arch anyway). Doug: It is amazing! You are right, I had run your tests on my four dual PPro boxen, and three of them did just fine while one had console output. No wonder I thought disk copies were corrupting data! And it (a SuperMicro P6DNE) question compiled kernel about seventy times in a test with no problems whatsoever. I am getting some parity EDO SIMMs first out from Net Express-- the machine that had the problem was the only of 4 that did not have ECC on (two were SuperMicro P6DOF's which are Orion and parity FPM only, one was Intel Providence with EDO DIMMs ECC and buffered). [The 4 machines total 1Gig of RAM ....] Thanks, B.Y. PS also testing your 5.1.0-p13~ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ >From [EMAIL PROTECTED] Tue Sep 29 06:06:19 1998 Received: from listserv.funet.fi (listserv.funet.fi [128.214.248.27]) by primefactor.com (8.8.7/8.8.7) with ESMTP id GAA32125 for <[EMAIL PROTECTED]>; Tue, 29 Sep 1998 06:06:18 -0400 Received: from vger.rutgers.edu ([128.6.190.2]:59969 "EHLO vger.rutgers.edu" ident: "NO-IDENT-SERVICE[2]") by listserv.funet.fi with ESMTP id <10507-5487>; Tue, 29 Sep 1998 13:03:40 +0300 Received: by vger.rutgers.edu id <154753-4055>; Tue, 29 Sep 1998 00:21:57 -0400 Received: from ferret.lmh.ox.ac.uk ([163.1.138.204]:19424 "HELO ferret.lmh.ox.ac.uk" ident: "qmailr") by vger.rutgers.edu with SMTP id <154897-4055>; Mon, 28 Sep 1998 23:49:35 -0400 Received: (qmail 1713 invoked by uid 504); 29 Sep 1998 08:31:33 -0000 Received: from localhost ([EMAIL PROTECTED]) by localhost with SMTP; 29 Sep 1998 08:31:32 -0000 Date: Tue, 29 Sep 1998 09:31:32 +0100 (GMT) From: Matthew Kirkwood <[EMAIL PROTECTED]> To: Ricardo Kleemann <[EMAIL PROTECTED]> cc: [EMAIL PROTECTED] Subject: Re: utility for testing ram? In-Reply-To: <[EMAIL PROTECTED]> Message-ID: <[EMAIL PROTECTED]> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;[EMAIL PROTECTED] Sender: [EMAIL PROTECTED] Precedence: bulk X-Loop: [EMAIL PROTECTED] Status: RO Content-Length: 1011 Lines: 29 On Mon, 28 Sep 1998, Ricardo Kleemann wrote: > Anyone know of a utility to extensively test a system's ram, in order to > determine whether the ram has any faults? It's called gcc :) Seriously, a 24-hour repeated kernel compile will stress your memory much harder than things like memtest86 (sunsite:/pub/linux/handware/somewhere) because the CPU, RAM, cache and various other peripherals are working hard, and probably not following some easily identifiable pattern. ( while true; do make dep clean zImage modules done ) >/dev/null 2>/tmp/errlog & and wait. After a day or so, grep /tmp/errlog for "signal" (usually 11 or 6). Short of a gcc error (relatively unlikely :) each of those signals is a bit-flip. (Don't rely on al of them being caught, either.) See http://bitwizard.nl/sig11/ for more information. Matthew. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/