Without a snif of what is happening it will be difficult to say for sure if there is a problem or not.
It's normal that IF the remote systems don't respond to the ping that the checking itself is slower. With the 15s timeout, the check itself can take 15s, but if the remote system does respond then the check is maybe done in 100ms. Dirk Bulinckx. -----Original Message----- From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of GLENN GASPAR Sent: Friday, May 23, 2008 11:35 PM To: Servers Alive Discussion List Subject: RE: [SA-list] First line of check fails after some modifications Well, never mind.. As soon as I logged out of the server that SA is running on ( I was logged in for about an hour) it started slowing down again. Glenn >>> "GLENN GASPAR" <[EMAIL PROTECTED]> 5/23/2008 4:25 PM >>> Dirk, I don't have the tools/know-how to run a sniffer... However, I observed something that is quite strange... Before restarting SA about an hour ago, I turned off the primary & alternate SMTP mail. I then restarted the SA service and for about an hour now there hasn't been any slowdown. All the ping checks seem to be running at normal speed. Glenn >>> "Dirk" <[EMAIL PROTECTED]> 5/23/2008 2:25 PM >>> This means that the frames are send and are not coming back within the given timeout. Example: 10 frames 15 seconds timeout => frame 1 is send and we wait a max of 1.5 seconds, if we get a response back from the pinged IP then we flag it as a GOOD frame else as a BAD frame => frame 2 is send ... ... at the end we see how many GOOD frames we have a calculate the % so for some reason your pinged hosts start to fail, if you get that 0%, try running a sniffer (ethereal/wireshark/netmon/...) to see if the frames are send and IF they come back too. Dirk Bulinckx. -----Original Message----- From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of GLENN GASPAR Sent: Friday, May 23, 2008 9:00 PM To: Servers Alive Discussion List Subject: RE: [SA-list] First line of check fails after some modifications It is taking more time... Here's an excerpt of the log: Friday, May 23, 2008 1:13:55 PM Atlanta Router 20.1 failed due to a successrate of only 0% Friday, May 23, 2008 1:13:55 PM Houston Router 36.1 failed due to a successrate of only 0% Both of these routers have 15 seconds in timeout value and have "second knock" checked. Most if not all of the entries in the log after I restarted the server are saying "x failed due to a successrate of only 0%". Glenn >>> "Dirk" <[EMAIL PROTECTED]> 5/23/2008 1:36 PM >>> it "seems" or it "is" taking more time? The roundtrip time is in the GUI so you can see it by that value Dirk Bulinckx. -----Original Message----- From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of GLENN GASPAR Sent: Friday, May 23, 2008 8:30 PM To: Servers Alive Discussion List Subject: RE: [SA-list] First line of check fails after some modifications It seems that the ping checks would take more time than usual. We give our router checks a timeout value of either 5, 10, 15 seconds (depending on location). After about an hour or so after the restart the ping checks to the routers are very slow that it looks like their timing out. Glenn >>> "Dirk" <[EMAIL PROTECTED]> 5/23/2008 12:36 PM >>> Define "slow" Dirk Bulinckx. -----Original Message----- From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of GLENN GASPAR Sent: Friday, May 23, 2008 7:05 PM To: Servers Alive Discussion List Subject: RE: [SA-list] First line of check fails after some modifications Dirk, Just want to add additional observations... I just restarted the server that runs SA and initially it ran fine. After the 1st cycle of checks however, it seems to slow down for some reason (the checks came back ok though but its really slow). Glenn >>> "Dirk" <[EMAIL PROTECTED]> 5/23/2008 11:05 AM >>> What type of checks are you using (that give the 'false' downs)? What does SA show as reason for the down? What exact version of Servers Alive are you using? Dirk Bulinckx. -----Original Message----- From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of GLENN GASPAR Sent: Friday, May 23, 2008 5:55 PM To: Servers Alive Discussion List Subject: [SA-list] First line of check fails after some modifications Hello, In the past we've noticed that every now and then Servers Alive would report that some of our routers (first line of checks) are down where in reality they are still up and running. We would restart SA and then the problem seems to go away. However this past Tuesday, I made some SNMP check modifications as some of our servers Open Manager versions were updated and I've also added entries in the People group for email notifications. After I made these modifications it seems that SA would act normally and then after a few hours it would report that most if not all of our routers are down. Any suggestions? Thanks, Glenn Gaspar To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list.= To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list.= To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list.= To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list.= To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list.= To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list.
