so it's still doing the ping and logging them to dbgview. are you also using db logging? if so could you check if the db isn't almost full? (typical pb with Access or SQL Server Express)
Dirk Bulinckx ----- Original Message ----- From: "GLENN GASPAR" <[EMAIL PROTECTED]> To: "Servers Alive Discussion List" <[email protected]> Sent: Thursday, May 29, 2008 11:20 PM Subject: RE: [SA-list] First line of check fails after some modifications We've restarted Servers Alive and ran dbgview a few hours ago. It has started to slow down about half an hour ago. Here's an excerpt of the dbgview output: [2872] Created new instance of wsPingThr! [2872] thPingStart - exit loop forced after more then 30 seconds (16:07:13) [2872] SA VB ping event START for www.ggc.com 316- 403 [2872] DEBUG: thpingstart pingEVENT 316-www.ggc.com [2872] SA VB entry_thr_check_post start 316- 403 [2872] SA DEBUG start of alerting engine 316- 403 [2872] SA DEBUG end AlertEngines_DoAlerts: none defined for this entry 316- 403 [2872] SA DEBUG end of alerting engine 316- 403 [2872] SA DEBUG adapt icon 316- 403 [2872] SA DEBUG end entry_thr_check_post 316- 403 [2872] SA VB ping event END for www.ggc.com 316- 403 >>> "Dirk" <[EMAIL PROTECTED]> 5/29/2008 10:50 AM >>> Not sure what I should see in this log.... If you run dbgview next to SA (http://beta.woodstone.nu/soft/temp/debugview.zip), does this still show the ping loggings? Dirk Bulinckx. -----Original Message----- From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of GLENN GASPAR Sent: Thursday, May 29, 2008 5:40 PM To: Servers Alive Discussion List Subject: RE: [SA-list] First line of check fails after some modifications Dirk, We turned on full logging and discovered that PING DEBUG stopped appearing in the log file right about the time the check cycle started slowing down considerably. Below is an excerpt of the log. (I can send you more of it if necessary). Any ideas? Thanks, Glenn Wednesday, May 28, 2008 10:24:19 PM NETWARE took 2469 ms(110) Wednesday, May 28, 2008 10:24:19 PM Westlake Router 72.1 Wednesday, May 28, 2008 10:24:20 PM PING DEBUG: event start for ID= 0 pingID= 34 Wednesday, May 28, 2008 10:24:20 PM Westlake Router 72.1 OK with a successrate of 100% and an average roundtriptime of 104ms Wednesday, May 28, 2008 10:24:20 PM PING DEBUG: REMOVE for ID= 0 pingID= 34 Wednesday, May 28, 2008 10:24:20 PM PING DEBUG: REMOVED for ID= 0 pingID= 34 Wednesday, May 28, 2008 10:24:20 PM Lake Charles VCM PIMS Server Wednesday, May 28, 2008 10:24:20 PM Lake Charles chemstation server Wednesday, May 28, 2008 10:24:20 PM Lake Charles Server WLADM1 Wednesday, May 28, 2008 10:24:21 PM PING DEBUG: event start for ID= 0 pingID= 73 Wednesday, May 28, 2008 10:24:21 PM Lake Charles VCM PIMS Server OK with a successrate of 100% and an average roundtriptime of 166ms Wednesday, May 28, 2008 10:24:21 PM PING DEBUG: REMOVE for ID= 0 pingID= 73 Wednesday, May 28, 2008 10:24:21 PM PING DEBUG: REMOVED for ID= 0 pingID= 73 Wednesday, May 28, 2008 10:24:21 PM PING DEBUG: event start for ID= 1 pingID= 35 Wednesday, May 28, 2008 10:24:21 PM Lake Charles chemstation server OK with a successrate of 100% and an average roundtriptime of 170ms Wednesday, May 28, 2008 10:24:21 PM PING DEBUG: REMOVE for ID= 1 pingID= 35 Wednesday, May 28, 2008 10:24:21 PM PING DEBUG: REMOVED for ID= 1 pingID= 35 Wednesday, May 28, 2008 10:24:35 PM NETWARE took 14391 ms(126) Wednesday, May 28, 2008 10:24:35 PM Madison Router 100.1 Wednesday, May 28, 2008 10:25:06 PM Madison Router 100.1 OK with a successrate of 100% and an average roundtriptime of 35ms Wednesday, May 28, 2008 10:25:06 PM Madison Server MDADM1 Wednesday, May 28, 2008 10:25:08 PM NETWARE took 2578 ms(106) Wednesday, May 28, 2008 10:25:08 PM Madison Router 100.1 - 2nd check Wednesday, May 28, 2008 10:25:39 PM Madison Router 100.1 - 2nd check OK with a successrate of 100% and an average roundtriptime of 35ms Wednesday, May 28, 2008 10:25:39 PM Ross D/R Database Server Wednesday, May 28, 2008 10:26:10 PM Ross D/R Database Server OK with a successrate of 100% and an average roundtriptime of 35ms Wednesday, May 28, 2008 10:26:10 PM Ross D/R Application Server Wednesday, May 28, 2008 10:26:41 PM Ross D/R Application Server OK with a successrate of 100% and an average roundtriptime of 36ms >>> "Dirk" <[EMAIL PROTECTED]> 5/23/2008 4:55 PM >>> Without a snif of what is happening it will be difficult to say for sure if there is a problem or not. It's normal that IF the remote systems don't respond to the ping that the checking itself is slower. With the 15s timeout, the check itself can take 15s, but if the remote system does respond then the check is maybe done in 100ms. Dirk Bulinckx. -----Original Message----- From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of GLENN GASPAR Sent: Friday, May 23, 2008 11:35 PM To: Servers Alive Discussion List Subject: RE: [SA-list] First line of check fails after some modifications Well, never mind.. As soon as I logged out of the server that SA is running on ( I was logged in for about an hour) it started slowing down again. Glenn >>> "GLENN GASPAR" <[EMAIL PROTECTED]> 5/23/2008 4:25 PM >>> Dirk, I don't have the tools/know-how to run a sniffer... However, I observed something that is quite strange... Before restarting SA about an hour ago, I turned off the primary & alternate SMTP mail. I then restarted the SA service and for about an hour now there hasn't been any slowdown. All the ping checks seem to be running at normal speed. Glenn >>> "Dirk" <[EMAIL PROTECTED]> 5/23/2008 2:25 PM >>> This means that the frames are send and are not coming back within the given timeout. Example: 10 frames 15 seconds timeout => frame 1 is send and we wait a max of 1.5 seconds, if we get a response back from the pinged IP then we flag it as a GOOD frame else as a BAD frame => frame 2 is send ... ... at the end we see how many GOOD frames we have a calculate the % so for some reason your pinged hosts start to fail, if you get that 0%, try running a sniffer (ethereal/wireshark/netmon/...) to see if the frames are send and IF they come back too. Dirk Bulinckx. -----Original Message----- From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of GLENN GASPAR Sent: Friday, May 23, 2008 9:00 PM To: Servers Alive Discussion List Subject: RE: [SA-list] First line of check fails after some modifications It is taking more time... Here's an excerpt of the log: Friday, May 23, 2008 1:13:55 PM Atlanta Router 20.1 failed due to a successrate of only 0% Friday, May 23, 2008 1:13:55 PM Houston Router 36.1 failed due to a successrate of only 0% Both of these routers have 15 seconds in timeout value and have "second knock" checked. Most if not all of the entries in the log after I restarted the server are saying "x failed due to a successrate of only 0%". Glenn >>> "Dirk" <[EMAIL PROTECTED]> 5/23/2008 1:36 PM >>> it "seems" or it "is" taking more time? The roundtrip time is in the GUI so you can see it by that value Dirk Bulinckx. -----Original Message----- From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of GLENN GASPAR Sent: Friday, May 23, 2008 8:30 PM To: Servers Alive Discussion List Subject: RE: [SA-list] First line of check fails after some modifications It seems that the ping checks would take more time than usual. We give our router checks a timeout value of either 5, 10, 15 seconds (depending on location). After about an hour or so after the restart the ping checks to the routers are very slow that it looks like their timing out. Glenn >>> "Dirk" <[EMAIL PROTECTED]> 5/23/2008 12:36 PM >>> Define "slow" Dirk Bulinckx. -----Original Message----- From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of GLENN GASPAR Sent: Friday, May 23, 2008 7:05 PM To: Servers Alive Discussion List Subject: RE: [SA-list] First line of check fails after some modifications Dirk, Just want to add additional observations... I just restarted the server that runs SA and initially it ran fine. After the 1st cycle of checks however, it seems to slow down for some reason (the checks came back ok though but its really slow). Glenn >>> "Dirk" <[EMAIL PROTECTED]> 5/23/2008 11:05 AM >>> What type of checks are you using (that give the 'false' downs)? What does SA show as reason for the down? What exact version of Servers Alive are you using? Dirk Bulinckx. -----Original Message----- From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of GLENN GASPAR Sent: Friday, May 23, 2008 5:55 PM To: Servers Alive Discussion List Subject: [SA-list] First line of check fails after some modifications Hello, In the past we've noticed that every now and then Servers Alive would report that some of our routers (first line of checks) are down where in reality they are still up and running. We would restart SA and then the problem seems to go away. However this past Tuesday, I made some SNMP check modifications as some of our servers Open Manager versions were updated and I've also added entries in the People group for email notifications. After I made these modifications it seems that SA would act normally and then after a few hours it would report that most if not all of our routers are down. Any suggestions? Thanks, Glenn Gaspar To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list.= To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list.= To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list.= To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list.= To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list.= To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list.= To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list. To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list.= To unsubscribe send a message with UNSUBSCRIBE in the subject line to [email protected] If you use auto-responders (like out-of-the-office messages), make sure that they are not sent to the list nor to individual members. Doing so will cause you to be automatically removed from the list.
