On Sun, Apr 13, 2008 at 05:55:36AM +0200, NN_il_Confusionario wrote: > On Sat, Apr 12, 2008 at 11:23:48PM +0000, [EMAIL PROTECTED] wrote: > > [EMAIL PROTECTED]:~$ free -b > > total used free shared buffers cached > > Mem: 1061478400 311463936 750014464 0 100552704 105132032 > > -/+ buffers/cache: 105779200 955699200 > > Swap: 699138048 0 699138048 > > . .Detected 1495.263 MHz processor. > > For my standards this is a very modern and powerful box. > > If memterst86 (or memtest from memtester package, if you cannot spare > the box) and the check of logs does not show anything, I will > _temporarilly_ try another kernel (a newer one from etch-and-half, > backports and/or an older one from sarge; or even the suse kernel that > was running fine before) to understand if a bug report agaisnt the > current kernel in etch is needed
Further to my previous, I gave it a try with 300M: [EMAIL PROTECTED]:~$ sudo memtest 300M -l memtest v. 2.93.1 (C) 2000 Charles Cazabon <[EMAIL PROTECTED]> Original v.1 (C) 1999 Simon Kirby <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> Current limits: RLIMIT_RSS 0xffffffff RLIMIT_VMEM 0xffffffff Raising limits... Allocated 314572800 bytes...trying mlock...success. Starting tests... Testing 314568704 bytes at 0xa51da000 (4088 bytes lost to page alignment). Run 1: Test 1: Stuck Address: Testing...Passed. Test 2: Random value: Setting...Testing...Passed. Test 3: XOR comparison: Setting...Testing...Passed. Test 4: SUB comparison: Setting...Testing...Passed. Test 5: MUL comparison: Setting...Testing...Passed. Test 6: DIV comparison: Setting...Testing...Passed. Test 7: OR comparison: Setting...Testing...Passed. Test 8: AND comparison: Setting...Testing...Passed. Test 9: Sequential Increment: Setting...Testing...Passed. Test 10: Solid Bits: Testing...Passed. Test 11: Block Sequential: Testing... 15 free showed a reasonable amount of memory still in the buffer pool: [EMAIL PROTECTED]:/var/log$ free -b total used free shared buffers cached Mem: 1061478400 1045397504 16080896 0 269635584 139096064 -/+ buffers/cache: 636665856 424812544 Swap: 699138048 0 699138048 So I tried upping the memtest to 500M..... Received signal 2 (Interrupt) munlock'ed memory. 0 runs completed. 0 errors detected. Total runtime: 130 seconds. Exiting... [EMAIL PROTECTED]:~$ sudo memtest 500M -l memtest v. 2.93.1 (C) 2000 Charles Cazabon <[EMAIL PROTECTED]> Original v.1 (C) 1999 Simon Kirby <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> Current limits: RLIMIT_RSS 0xffffffff RLIMIT_VMEM 0xffffffff Raising limits... Allocated 524288000 bytes...trying mlock... Message from [EMAIL PROTECTED] at Sun Apr 13 15:38:38 2008 ... tuko kernel: Oops: 0002 [#1] Message from [EMAIL PROTECTED] at Sun Apr 13 15:38:38 2008 ... tuko kernel: EIP is at _spin_lock+0x1/0xf Message from [EMAIL PROTECTED] at Sun Apr 13 15:38:38 2008 ... tuko kernel: eax: 00000044 ebx: 00000000 ecx: 00000001 edx: e7893d98 Message from [EMAIL PROTECTED] at Sun Apr 13 15:38:38 2008 ... tuko kernel: esi: e7893d98 edi: 00000025 ebp: 00000025 esp: dfa67f00 Message from [EMAIL PROTECTED] at Sun Apr 13 15:38:38 2008 ... tuko kernel: ds: 007b es: 007b ss: 0068 Message from [EMAIL PROTECTED] at Sun Apr 13 15:38:38 2008 ... tuko kernel: Process kswapd0 (pid: 122, ti=dfa66000 task=dff98550 task.ti=dfa66000) Message from [EMAIL PROTECTED] at Sun Apr 13 15:38:38 2008 ... tuko kernel: Stack: c015e27d e7893ca4 00000000 c016f31e 00000080 e7893ea4 e7887ab4 0001a004 Message from [EMAIL PROTECTED] at Sun Apr 13 15:38:38 2008 ... tuko kernel: dfffeac0 00000088 000000d0 c0148ca8 00680100 00000000 00680100 00031357 Message from [EMAIL PROTECTED] at Sun Apr 13 15:38:38 2008 ... tuko kernel: 00000080 00000000 00000000 c02ccec0 c02ccec0 00000003 c0149053 00000000 Message from [EMAIL PROTECTED] at Sun Apr 13 15:38:38 2008 ... tuko kernel: Call Trace: Message from [EMAIL PROTECTED] at Sun Apr 13 15:38:38 2008 ... tuko kernel: Code: 05 90 ff 02 30 c9 89 c8 c3 89 c2 90 81 28 00 00 00 01 0f 94 c0 84 c0 b9 01 00 00 00 75 09 90 81 02 00 00 00 01 30 c9 89 c8 c3 90 <fe> 08 79 09 f3 90 80 38 00 7e f9 eb f2 c3 90 81 28 00 00 00 01 Message from [EMAIL PROTECTED] at Sun Apr 13 15:38:38 2008 ... tuko kernel: EIP: [<c028091a>] _spin_lock+0x1/0xf SS:ESP 0068:dfa67f00 Same error instantly... so I am guessing that a memory error would have been detected more gracefully, and this is more likely to indicate something going seriously wrong when kswapd becomes active... It looks like the system is still running, but any attempt to access the hard drive gets stuck in an uninteruptible sleep. At least the problem seems to be easily reproduced... Regards, DigbyT -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]