Windsor Dave L. (AdP/TEF7.1) wrote:
Hello Everyone,

I recently installed CentOS 5.5 x86_64 on a brand new ProLiant DL380 G7.  I have 
identical OS software running reock-solid on two other DL380 ProLiant servers, but they 
are G6 models, not G7.  On the G7, the installation went perfectly and the machine ran 
great for about 2 weeks, when it just seemed to "stop".  The system stopped 
responding on the network, and there was no video on the console (or remote console via 
iLO).  It would not reboot or cold boot through iLO, I actually had to hold the power to 
turn it off and then hit it again to power up.

This happened several times within a few days of each other.  Each time, there 
was no evidence in any logs of a problem - the system just seemed to stop or 
lock up.   We did have a CPU problem light appear on the front, so HP came in 
and replaced the one 4-core CPU.  Since then, it has run as long as two weeks, 
but still crashes randomly.  After the last reboot, I left the console in text 
mode on vt1, and when it crashed again this morning this was displayed on the 
screen:

CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8100dc435cf0  CR3: 000000008a6ca000 CR4: 00000000000006e0
Process smbd (pid: 18970, threadinfo ffff81001529e000, task ffff81011f5347a0)
Stack:  ffff81011e4e71c0 0000000000000000 ffff8100cf12a015 ffffffff80009c41
 ffff81011e4e71c0 0000000100000000 000000030027ea9d ffff8100cf12a011
 ffff81011e4e71c0 ffff81010d9cf300 ffff81011e4e71c0 ffff8101044099c0
Call Trace:
 [<ffffffff80009c41>] __link_path_walk+0x3a6/0xf5b
 [<ffffffff8000ea4b>] link_path_walk+0x42/0xb2
 [<ffffffff8000cd72>] do_path_lookup+0x275/0x2f1
 [<ffffffff80012851>] getname+0x15b/0x1c2
 [<ffffffff800239d1>] __user_walk_fd+0x37/0x4c
 [<ffffffff80028905>] vfs_stat_fd+0x1b/0x4a
 [<ffffffff80039fa2>] fcntl_setlk+0x243/0x273
 [<ffffffff80023703>] sys_newstat+0x19/0x31
 [<ffffffff8005d229>] tracesys+0x71/0xe0
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0


Code: 00 00 00 00 00 00 00 00 70 4d 4f 9d 00 81 ff ff 98 e4 4b dc
RIP  [<ffff8100dc435cf0>]
 RSP <ffff81001529fd18>
CR2: ffff8100dc435cf0
 <0>Kernel panic - not syncing: Fatal exception


This suggests that something happened in a Samba process.  I have the Samba3x 
packages installed since we are beginning to introduce Win7 clients into our 
environment.

Googling "Kernel panic - not syncing: Fatal exception" and "CentOS" produced 
many hits, but nothing that seemed to exactly match my problem.  Since this is the only G7 server I 
have here right now, I can't reproduce the problem on another machine.  The G6s I have running the 
identical version of CentOS have no problems.

I am trying to determine if this is pointing to a hardware or software issue.  
Some of the Google results suggested using a Centosplus kernel - is this a good 
idea?

The server is a HP DL380 G7 Server with 4 GB RAM (1 DIMM 1333 MHz), one 4-core CPU (2133 
MHz), 4 built-in Broadcom "NetExtreme II BCM5709 II Gigabit Ethernet" NICs, and 
a P410 Smart Array Controller.  The P410 and the system BIOS have both been updated to 
the latest levels to see if that fixes the crashes, with no change.

Any idea where I should look next?
Run memtest for 48 hours - also check temperature of system - I have seen errors like these from overheating. HTH
Thanks for any help anyone can provide!

Best Regards,

Dave Windsor

Robert Bosch LLC
Team Leader, MES Database Infrastructure Group (AdP/TEF7.1)
4421 Highway 81 North
Anderson, SC 29621 USA
www.bosch.us

Tel: 1 (864) 260-8459
Fax: 1 (864) 260-8422
dave.wind...@us.bosch.com


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

<<attachment: rkampen.vcf>>

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Reply via email to