Dirk,

     We turned on full logging and discovered that PING DEBUG stopped appearing 
in the log file right about the time the check cycle started slowing down 
considerably. Below is an excerpt of the log. (I can send you more of it if 
necessary). Any ideas?

Thanks,
Glenn


Wednesday, May 28, 2008 10:24:19 PM NETWARE took 2469 ms(110)
Wednesday, May 28, 2008 10:24:19 PM Westlake Router 72.1
Wednesday, May 28, 2008 10:24:20 PM PING DEBUG: event start for ID= 0 pingID=  
34
Wednesday, May 28, 2008 10:24:20 PM Westlake Router 72.1 OK with a successrate 
of 100% and an average roundtriptime of 104ms
Wednesday, May 28, 2008 10:24:20 PM PING DEBUG: REMOVE for ID= 0 pingID=  34
Wednesday, May 28, 2008 10:24:20 PM PING DEBUG: REMOVED for ID= 0 pingID=  34
Wednesday, May 28, 2008 10:24:20 PM Lake Charles VCM PIMS Server
Wednesday, May 28, 2008 10:24:20 PM Lake Charles chemstation server
Wednesday, May 28, 2008 10:24:20 PM Lake Charles Server WLADM1
Wednesday, May 28, 2008 10:24:21 PM PING DEBUG: event start for ID= 0 pingID=  
73
Wednesday, May 28, 2008 10:24:21 PM Lake Charles VCM PIMS Server OK with a 
successrate of 100% and an average roundtriptime of 166ms
Wednesday, May 28, 2008 10:24:21 PM PING DEBUG: REMOVE for ID= 0 pingID=  73
Wednesday, May 28, 2008 10:24:21 PM PING DEBUG: REMOVED for ID= 0 pingID=  73
Wednesday, May 28, 2008 10:24:21 PM PING DEBUG: event start for ID= 1 pingID=  
35
Wednesday, May 28, 2008 10:24:21 PM Lake Charles chemstation server OK with a 
successrate of 100% and an average roundtriptime of 170ms
Wednesday, May 28, 2008 10:24:21 PM PING DEBUG: REMOVE for ID= 1 pingID=  35
Wednesday, May 28, 2008 10:24:21 PM PING DEBUG: REMOVED for ID= 1 pingID=  35
Wednesday, May 28, 2008 10:24:35 PM NETWARE took 14391 ms(126)
Wednesday, May 28, 2008 10:24:35 PM Madison Router 100.1
Wednesday, May 28, 2008 10:25:06 PM Madison Router 100.1 OK with a successrate 
of 100% and an average roundtriptime of 35ms
Wednesday, May 28, 2008 10:25:06 PM Madison Server MDADM1
Wednesday, May 28, 2008 10:25:08 PM NETWARE took 2578 ms(106)
Wednesday, May 28, 2008 10:25:08 PM Madison Router 100.1 - 2nd check
Wednesday, May 28, 2008 10:25:39 PM Madison Router 100.1 - 2nd check OK with a 
successrate of 100% and an average roundtriptime of 35ms
Wednesday, May 28, 2008 10:25:39 PM Ross D/R Database Server
Wednesday, May 28, 2008 10:26:10 PM Ross D/R Database Server OK with a 
successrate of 100% and an average roundtriptime of 35ms
Wednesday, May 28, 2008 10:26:10 PM Ross D/R Application Server
Wednesday, May 28, 2008 10:26:41 PM Ross D/R Application Server OK with a 
successrate of 100% and an average roundtriptime of 36ms


>>> "Dirk" <[EMAIL PROTECTED]> 5/23/2008 4:55 PM >>>
Without a snif of what is happening it will be difficult to say for sure if
there is a problem or not.

It's normal that IF the remote systems don't respond to the ping that the
checking itself is slower.

With the 15s timeout, the check itself can take 15s, but if the remote system
does respond then the check is maybe done in 100ms.


Dirk Bulinckx. 

-----Original Message-----
From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of
GLENN GASPAR
Sent: Friday, May 23, 2008 11:35 PM
To: Servers Alive Discussion List
Subject: RE: [SA-list] First line of check fails after some modifications

Well, never mind.. As soon as I logged out of the server that SA is running on (
I was logged in for about an hour) it started slowing down again. 

Glenn

>>> "GLENN GASPAR" <[EMAIL PROTECTED]> 5/23/2008 4:25 PM >>>
Dirk,

     I don't have the tools/know-how to run a sniffer... However, I observed
something that is quite strange...
Before restarting SA about an hour ago, I turned off the primary & alternate
SMTP mail. I then restarted the SA service and for about an hour now there
hasn't been any slowdown. All the ping checks seem to be running at normal
speed.

Glenn

>>> "Dirk" <[EMAIL PROTECTED]> 5/23/2008 2:25 PM >>>
This means that the frames are send and are not coming back within the given
timeout.

Example:
        10 frames
        15 seconds timeout
=> frame 1 is send and we wait a max of 1.5 seconds, if we get a response back
from the pinged IP then we flag it as a GOOD frame else as a BAD frame
=> frame 2 is send ...
...
at the end we see how many GOOD frames we have a calculate the %


so for some reason your pinged hosts start to fail, if you get that 0%, try
running a sniffer (ethereal/wireshark/netmon/...) to see if the frames are send
and IF they come back too.


Dirk Bulinckx. 

-----Original Message-----
From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of
GLENN GASPAR
Sent: Friday, May 23, 2008 9:00 PM
To: Servers Alive Discussion List
Subject: RE: [SA-list] First line of check fails after some modifications

It is taking more time...

Here's an excerpt of the log:

Friday, May 23, 2008 1:13:55 PM Atlanta Router 20.1 failed due to a successrate
of only 0%
Friday, May 23, 2008 1:13:55 PM Houston Router 36.1 failed due to a successrate
of only 0%

Both of these routers have 15 seconds in timeout value and have "second knock"
checked.

Most if not all of the entries in the log after I restarted the server are
saying "x failed due to a successrate of only 0%".

Glenn
>>> "Dirk" <[EMAIL PROTECTED]> 5/23/2008 1:36 PM >>>
it "seems" or it "is" taking more time?
The roundtrip time is in the GUI so you can see it by that value

Dirk Bulinckx. 

-----Original Message-----
From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of
GLENN GASPAR
Sent: Friday, May 23, 2008 8:30 PM
To: Servers Alive Discussion List
Subject: RE: [SA-list] First line of check fails after some modifications

It seems that the ping checks would take more time than usual. We give our
router checks a timeout value of either 5, 10, 15 seconds (depending on
location).

After about an hour or so after the restart the ping checks to the routers are
very slow that it looks like their timing out.

Glenn
>>> "Dirk" <[EMAIL PROTECTED]> 5/23/2008 12:36 PM >>>
Define "slow"

Dirk Bulinckx. 

-----Original Message-----
From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of
GLENN GASPAR
Sent: Friday, May 23, 2008 7:05 PM
To: Servers Alive Discussion List
Subject: RE: [SA-list] First line of check fails after some modifications

Dirk,

    Just want to add additional observations... I just restarted the server that
runs SA and initially it ran fine. After the 1st cycle of checks however, it
seems to slow down for some reason (the checks came back ok though but its
really slow).

Glenn

>>> "Dirk" <[EMAIL PROTECTED]> 5/23/2008 11:05 AM >>>
What type of checks are you using (that give the 'false' downs)?
What does SA show as reason for the down?
What exact version of Servers Alive are you using?


Dirk Bulinckx. 

-----Original Message-----
From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of
GLENN GASPAR
Sent: Friday, May 23, 2008 5:55 PM
To: Servers Alive Discussion List
Subject: [SA-list] First line of check fails after some modifications

Hello,

      In the past we've noticed that every now and then Servers Alive would
report that some of our routers (first line of checks) are down where in reality
they are still up and running. We would restart SA and then the problem seems to
go away.

      However this past Tuesday, I made some SNMP check modifications as some of
our servers Open Manager versions were updated and I've also added entries in
the People group for email notifications. After I made these modifications it
seems that SA would act normally and then after a few hours it would report that
most if not all of our routers are down. Any suggestions?

Thanks,
Glenn Gaspar

To unsubscribe send a message with UNSUBSCRIBE in the subject line to
[email protected] 
If you use auto-responders (like out-of-the-office messages), make sure that
they are not sent to the list nor to individual members.  Doing so will cause
you to be automatically removed from the list.=


To unsubscribe send a message with UNSUBSCRIBE in the subject line to
[email protected] 
If you use auto-responders (like out-of-the-office messages), make sure that
they are not sent to the list nor to individual members.  Doing so will cause
you to be automatically removed from the list.

To unsubscribe send a message with UNSUBSCRIBE in the subject line to
[email protected] 
If you use auto-responders (like out-of-the-office messages), make sure that
they are not sent to the list nor to individual members.  Doing so will cause
you to be automatically removed from the list.=


To unsubscribe send a message with UNSUBSCRIBE in the subject line to
[email protected] 
If you use auto-responders (like out-of-the-office messages), make sure that
they are not sent to the list nor to individual members.  Doing so will cause
you to be automatically removed from the list.

To unsubscribe send a message with UNSUBSCRIBE in the subject line to
[email protected] 
If you use auto-responders (like out-of-the-office messages), make sure that
they are not sent to the list nor to individual members.  Doing so will cause
you to be automatically removed from the list.=


To unsubscribe send a message with UNSUBSCRIBE in the subject line to
[email protected] 
If you use auto-responders (like out-of-the-office messages), make sure that
they are not sent to the list nor to individual members.  Doing so will cause
you to be automatically removed from the list.

To unsubscribe send a message with UNSUBSCRIBE in the subject line to
[email protected] 
If you use auto-responders (like out-of-the-office messages), make sure that
they are not sent to the list nor to individual members.  Doing so will cause
you to be automatically removed from the list.=


To unsubscribe send a message with UNSUBSCRIBE in the subject line to
[email protected] 
If you use auto-responders (like out-of-the-office messages), make sure that
they are not sent to the list nor to individual members.  Doing so will cause
you to be automatically removed from the list.

To unsubscribe send a message with UNSUBSCRIBE in the subject line to
[email protected] 
If you use auto-responders (like out-of-the-office messages), make sure that
they are not sent to the list nor to individual members.  Doing so will cause
you to be automatically removed from the list.

To unsubscribe send a message with UNSUBSCRIBE in the subject line to
[email protected] 
If you use auto-responders (like out-of-the-office messages), make sure that
they are not sent to the list nor to individual members.  Doing so will cause
you to be automatically removed from the list.=


To unsubscribe send a message with UNSUBSCRIBE in the subject line to 
[email protected] 
If you use auto-responders (like out-of-the-office messages), make sure that 
they are not sent to the list nor to individual members.  Doing so will cause 
you to be automatically removed from the list.

To unsubscribe send a message with UNSUBSCRIBE in the subject line to 
[email protected]
If you use auto-responders (like out-of-the-office messages), make sure that 
they are not sent to the list nor to individual members.  Doing so will cause 
you to be automatically removed from the list.

Reply via email to