during the check cycle that you have the hang, did you have entries that gave a down? If so what alerts do they have?
Dirk Bulinckx. From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Monday, June 30, 2008 12:56 PM To: Servers Alive Discussion List Subject: RE: [SA-list] SA hanging Not sure what you mean by the DOWNs entries. Will send you a copy of the full logfile + config file off list. Ian _________________________________ Ian K Gray OEL IS - European Infrastructure Support Tel: +44 1236 502661 Mob: +44 7881 518854 Ad eundum quo nemo ante iit "Dirk" <[EMAIL PROTECTED]> Sent by: Servers Alive Discussion List <[email protected]> 30/06/2008 11:45 Please respond to Servers Alive Discussion List <[email protected]> To Servers Alive Discussion List <[email protected]> cc Subject RE: [SA-list] SA hanging On the DOWNs entries, what type of alerts do you have? Dirk Bulinckx. From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Monday, June 30, 2008 12:31 PM To: Servers Alive Discussion List Subject: [SA-list] SA hanging We've been having problems for a while with SA intermittently freezing on a check - this appeared to start when we moved SA to a VM host, although in fairness we changed the version at the same time. It would happen quite frequently on perfmon checks of the VM host itself (reported at the time). I disabled those perfmon checks, but we still see occasional freezes of SA. It happened three times yesterday (unusually bad), co-inciding with timeouts from the beta Notes db checker (650 email alerts! ahhhh!!). The SA log doesn't tell me anything - maybe it does to you? Here's an extract from the time of the freeze: Sunday, June 29, 2008 5:28:12 PM SNMPGet (3) 1.3.6.1.2.1.2.2.1.8.11 on 172.17.238.6 Sunday, June 29, 2008 5:28:13 PM SNMP took 31 ms(331) Sunday, June 29, 2008 5:28:13 PM FRLH VzB 2ry Sunday, June 29, 2008 5:28:13 PM SNMPGet (3) 1.3.6.1.2.1.2.2.1.8.2 on 172.17.238.7 Sunday, June 29, 2008 5:28:13 PM SNMP took 31 ms(332) Sunday, June 29, 2008 5:28:13 PM PLWA VzB 1ry Sunday, June 29, 2008 5:28:13 PM SNMPGet (3) 1.3.6.1.2.1.2.2.1.8.2 on 172.17.240.1 Sunday, June 29, 2008 5:28:13 PM SNMP took 46 ms(336) Sunday, June 29, 2008 5:28:13 PM HUBU VzB 1ry Sunday, June 29, 2008 5:28:13 PM SNMPGet (3) 1.3.6.1.2.1.2.2.1.8.3 on 172.17.242.1 Sunday, June 29, 2008 5:28:13 PM SNMP took 47 ms(346) Sunday, June 29, 2008 5:28:14 PM TRIS VzB 1ry Sunday, June 29, 2008 5:28:14 PM SNMPGet (3) 1.3.6.1.2.1.2.2.1.8.3 on 172.17.243.1 Sunday, June 29, 2008 5:28:14 PM SNMP took 79 ms(350) Sunday, June 29, 2008 5:28:14 PM CZPG VzB 1ry Sunday, June 29, 2008 5:28:14 PM SNMPGet (3) 1.3.6.1.2.1.2.2.1.8.2 on 172.17.244.1 Sunday, June 29, 2008 5:28:14 PM SNMP took 31 ms(337) Sunday, June 29, 2008 5:28:14 PM DEDU VzB 1ry Sunday, June 29, 2008 5:28:14 PM SNMPGet (3) 1.3.6.1.2.1.2.2.1.8.8 on 172.17.246.2 Sunday, June 29, 2008 5:28:14 PM SNMP took 15 ms(334) Sunday, June 29, 2008 5:28:15 PM DEDU VzB 2ry Sunday, June 29, 2008 5:28:15 PM SNMPGet (3) 1.3.6.1.2.1.2.2.1.8.2 on 172.17.246.3 Sunday, June 29, 2008 5:28:15 PM SNMP took 31 ms(335) Sunday, June 29, 2008 5:28:15 PM GBFE VzB Solo Sunday, June 29, 2008 5:28:15 PM SNMPGet (3) 1.3.6.1.2.1.2.2.1.8.8 on 172.17.255.1 Sunday, June 29, 2008 5:28:15 PM SSunday, June 29, 2008 5:54:48 PM Woodstone Servers Alive version 6.1.2249.3 Sunday, June 29, 2008 5:54:48 PM Running on Microsoft Windows Server 2003 Family Standard Edition (3790) Service Pack 2 (32 bits) Sunday, June 29, 2008 5:54:48 PM Using C:\Documents and Settings\Default User\My Documents\Servers Alive\temp as temp directory for HTTP(S) checks Sunday, June 29, 2008 5:54:48 PM Threading is set to TRUE Sunday, June 29, 2008 5:54:48 PM Oracle Core40.dll/core35.dll/oracore8.dll/oracore9.dll/oracore10.dll library not available Sunday, June 29, 2008 5:54:48 PM No SQL libs found! Sunday, June 29, 2008 5:54:48 PM Netware library's not available Sunday, June 29, 2008 5:54:48 PM DUN installed and available for SA Sunday, June 29, 2008 5:54:58 PM Check cycle starts ( 1- 1) Sunday, June 29, 2008 5:54:58 PM EDC Netbotz Sunday, June 29, 2008 5:54:58 PM PING DEBUG: event start for ID= 0 pingID= 273 Checks normally run every 2 minutes - I have a secondary SA box checking the primary, and if it detects that a key output file has not been updated for more than 20 minutes, it reboots the primary box. Nothing in the Windows event logs. Currently running 6.1.2249. Any suggestions for how to troubleshoot this? Ian _________________________________ Ian K Gray OEL IS - European Infrastructure Support Tel: +44 1236 502661 Mob: +44 7881 518854 Ad eundum quo nemo ante iit ______________________________________________________________________________ Any opinions expressed in this email are those of the individual and not necessarily of the Company. This email and any files transmitted with it, including replies and forwarded copies (which may contain alterations) subsequently transmitted from the Company are confidential and solely for the use of the intended recipient. It may contain material protected by legal privilege. If you are not the intended recipient or the person responsible for delivering to the intended recipient, be advised that you have received this email in error and that any use is strictly prohibited. Please notify the sender immediately of the error and delete any copies of this message Warning: Although the Company has taken reasonable precautions to ensure that no viruses are present in this e-mail, the Company cannot accept responsibility for any loss or damage arising from the use of this e-mail or attachments. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. ______________________________________________________________________________ Any opinions expressed in this email are those of the individual and not necessarily of the Company. This email and any files transmitted with it, including replies and forwarded copies (which may contain alterations) subsequently transmitted from the Company are confidential and solely for the use of the intended recipient. It may contain material protected by legal privilege. If you are not the intended recipient or the person responsible for delivering to the intended recipient, be advised that you have received this email in error and that any use is strictly prohibited. Please notify the sender immediately of the error and delete any copies of this message Warning: Although the Company has taken reasonable precautions to ensure that no viruses are present in this e-mail, the Company cannot accept responsibility for any loss or damage arising from the use of this e-mail or attachments. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list.
