Dirk,

 

I like the approach with severities.  I know lots of monitoring applications 
that allow a “green/yellow/red” status, but I think your approach is more 
flexible.

 

Can I ask a favor?   If there can be different levels of severity for a 
check, then can we also layer that into the alert process?

I’m envisioning a grid:    alerts down the side, and a column for each 
severity level.   They we could just check a box at the intersection point to 
determine what alerts get sent/processed (to whom) at each severity level for a 
single check.

 

Eg:  Ping  Check

Severity               Check Values     
                                
SevText                                Status

=====                   
==========                                    
 ===========   =======

Sev 1:                    100% failure.   
                                   
FAILED                  (DOWN)

Sev 2:                    over 2000ms delay         
                 High Latency      (DOWN)

Sev 3:                    20ms to 1999ms delay   
                 Caution Level     (UP)

Sev 4:                    1ms to 20ms delay       
                    Normal                 
(UP)

 

Alerts                                   Sev 
4     Sev 3     Sev 2      Sev 1

Email  help desk                                
X             X             X

Email network admin                     
                X             X

Text the 
VP                                                                       
 X

 

Also kind of like the idea of allowing each ALERT to define a chunk of text 
(Severity Text) that can be used as a variable in the alerts, as shown above, 
and allowing more than one severity level to set the status to DOWN. 

 

Just my 2 cents.

 

-Tom

 

 

From: Servers Alive Discussion List [mailto:[email protected]] On Behalf Of 
Dirk Bulinckx
Sent: Monday, December 19, 2011 3:31 PM
To: Servers Alive Discussion List
Subject: RE: [SA-list] Latency check?

 

I can see that for some type of checks a new status could be usefull…HOWEVER 
based on how SA curently works I would prefer a different approche.

 

The status would still be UP or DOWN (or maintenance or …) but you could give 
a severity to the check (that is already possible now via a hidden "switch"), 
and have even several severities.

That way you *could* get

                * up

                * down  with lowest severity      (example: less then 100gb 
free space)

                * down with low sevirity               (less then 50gb)

                * down with high severity            (less then 10gb)

                * down with highest severity     (less then 1gb)

 

 

This is something we have on our todo list…but it will NOT be in the next 
major release (due out beginning of 2012).

 

 

 

Dirk Bulinckx. 

Network Monitoring by Servers Alive - http://www.woodstone.nu 
(http://www.woodstone.nu)
DNS Hosting with ipv4 and ipv6 on http://www.stellardns.com 
(http://www.stellardns.com)

 

From: Servers Alive Discussion List [mailto:[email protected]] On Behalf Of 
Jason Passow
Sent: Monday, December 19, 2011 8:16 PM
To: Servers Alive Discussion List
Subject: RE: [SA-list] Latency check?

 

Close to down SHOULD be another check as it is another animal altogether.  I 
think another status is opens a up a whole host of other discussions.  Perhaps 
it is worth having those discussions so as to determine that it makes sense to 
everyone but me (or what really matters that it makes sense to Dirk).  

 

Which checks require a caution status?  Looking through my checks (and I do not 
use all of the possible checks), I see disk space, ping, CPU usage, count 
files, external error level, and process check.  How do you define the gray 
area? With a +/- x %?  

 


As far as I understand as merely a user, off the top of my head I can see a lot 
of changes required.  The entire infrastructure would need to be changed to 
have a caution definition.  Also a ping check does not currently wait 
infinitely for a return.  If your timeout is 5 pings and 2 seconds, it waits 
literally 400 ms for each ping.  If it received no response then it is down.  
Based on that knowledge the ping check would have to be written to wait longer 
for the ping to return and then analyze the result for the +-%.   Also the 
alerting structure would have to change to allow for when down or caution.   
Presumably there needs to be either check boxes or three boxes (only when down, 
when down or caution, when up or back up).    Then do you want the up to be up 
after caution or after down only.   It just seems like it would make the 
alerting significantly more complex than it is now.   Especially considering 
the "workaround" is pretty straight forward.   To me it does not seem like a 
work around.   


 


To me each check is and should be black and white.  If I want to know when it 
is 1.0000001 GB then I would set an alert for that.  I want to know when it is 
less than 1GB.   If that is not enough notice then I change my check to say 
when it is less than 2GB or less than 1.25.    


 


>From an alerting perspective, I have a tiered alerting schedule so that I find 
>out before the other admins what is wrong.  If it is a Servers Alive glitch I 
>can disable the checks or restart the service before others are alerted.   If 
>it is down I can resolve the down condition before others are alerted.  If not 
>then there are others to pick up the slack.

 


 


Jason Passow
Network Administrator
Mississippi Welders Supply
http://www.mwsco.com (http://www.mwsco.com) 
[email protected] (mailto:[email protected])
ph: (507) 494-5178
fax: (507) 454-8104
--------------------------------------------------------------------------------



From: Chris Mang [mailto:[email protected]]
To: Servers Alive Discussion List [mailto:[email protected]]
Sent: Mon, 19 Dec 2011 11:30:33 -0600
Subject: RE: [SA-list] Latency check?

Jason,


Thanks for your input.  Your solution is a good workaround - unfortunately, 
that’s what it is:  a workaround.  To me, it doesn’t make sense 
to add a second ping check with different parameters.  Why monitor the same 
item twice when, with additional “sensing” of the ping response 
data, you can monitor the item once and post different statuses based on that 
data?

 

My issue with the current statuses are that they are black or white, good or 
not good, UP or DOWN.  With a ping, a slow link isn’t DOWN, it’s 
slow.  There is a gray area that Servers Alive cannot currently report on 
without a workaround like yours.  This can also be applied to disk space 
checks.  If you have a disk space check that reports down on “ < 1 
GB” for example, SA reports it as UP even if it is at 1.000001 GB.  But 
if it is at 999.999999 MB, it’s down.  Why not have an additional status 
that tells me when it’s close?  This is a bit more proactive.

 

The new status shouldn’t be called LATENT because it could be applied to 
other check types (although all I can think of right now is disk space).  Maybe 
the new status should be CAUTION or something similar.

 

I look forward to your response.


Chris>>

 

From: Servers Alive Discussion List [mailto:[email protected] 
(mailto:[email protected])] On Behalf Of Jason Passow
Sent: Monday, December 19, 2011 10:46 AM
To: Servers Alive Discussion List
Subject: RE: [SA-list] Latency check?

 

A whole new status for this seems a bit unnecessary.  I have a similar set up 
using multiple pings and dependencies. 

 


If ping1 frames received in less than 2000ms then up else down (name this one 
Internet)


if ping2 (which depends on ping1 being up) receives frames in less than 200 ms 
then up else down(Name this one Internet Slow)


 


 


Ping 2 would not run if ping1 is down therefore it is effectively between 200 
ms and 2000ms that creates a down.  So you would get a warning at 500 MS that 
your "internet slow is down".    Even without explanation to other admins I 
think that is is clear that things are slow and not down.  If pings take less 
than 200 MS then all is well.  If pings take more than 2000 ms then warning 
will be "Internet is Down and all hell will break loose."



Jason Passow
Network Administrator
Mississippi Welders Supply
http://www.mwsco.com (http://www.mwsco.com) 
[email protected] (mailto:[email protected])
ph: (507) 494-5178
fax: (507) 454-8104
--------------------------------------------------------------------------------



From: Heath Abbate [mailto:[email protected] (mailto:[email protected])]
To: Servers Alive Discussion List [mailto:[email protected] 
(mailto:[email protected])]
Sent: Mon, 19 Dec 2011 10:30:26 -0600
Subject: RE: [SA-list] Latency check?

Right on the money Chris.

-----Original Message-----
From: Servers Alive Discussion List [mailto:[email protected] 
(mailto:[email protected])] On Behalf Of Chris Mang
Sent: Monday, December 19, 2011 7:31 AM
To: Servers Alive Discussion List
Subject: RE: [SA-list] Latency check?

Dirk,

I think I know what Heath is asking for, so I'll make a "feature request".

I suggest a ping response range with the ability to set the "fast" and "slow" 
ends of the range. If the response is received faster than the "fast" end, the 
item is UP. If the response is within the range, the item is "LATENT". If the 
response is received slower than the "slow" end, the item is DOWN. I guess this 
means a new status too.

In your example:

All frames are received in less than 200ms -> UP
If one of the frames takes more than 200ms but less than 2000ms -> LATENT
If one of the frames takes more than 2000ms -> DOWN

Does that make sense, and would it be possible?

Chris Mang>>
Senior Security Administrator
JDA Software Group, Inc.

-----Original Message-----
From: Servers Alive Discussion List [mailto:[email protected] 
(mailto:[email protected])] On Behalf Of Dirk Bulinckx
Sent: Sunday, December 18, 2011 3:01 PM
To: Servers Alive Discussion List
Subject: RE: [SA-list] Latency check?

That's how the ping check works.

For example
1 second timeout
5 frames
100% should response

if one of the frames takes more then 200ms ->DOWN




Dirk Bulinckx.
Network Monitoring by Servers Alive - http://www.woodstone.nu 
(http://www.woodstone.nu) DNS Hosting with ipv4 and ipv6 on 
http://www.stellardns.com (http://www.stellardns.com)


-----Original Message-----
From: Servers Alive Discussion List [mailto:[email protected] 
(mailto:[email protected])] On Behalf Of Heath Abbate
Sent: Sunday, December 18, 2011 9:01 PM
To: Servers Alive Discussion List
Subject: [SA-list] Latency check?

Doable in Salive?

I would like to ping a host and if the latency for the round trip exceeds a 
certain value then get notified.





The contents of this message, together with any attachments, are intended only 
for the use of the individual or entity to which they are addressed and may 
contain information that is confidential and exempt from disclosure. If you are 
not the intended recipient, you are hereby notified that any dissemination, 
distribution, or copying of this message, or any attachment, is strictly 
prohibited. If you have received this message in error, please notify the 
original sender immediately by telephone or by return E-mail and delete this 
message, along with any attachments, from your computer. Thank you.

To unsubscribe send a message with UNSUBSCRIBE in the subject line to 
[email protected] (mailto:[email protected]) If you use auto-responders 
(like out-of-the-office messages), make sure that they are not sent to the list 
nor to individual members. Doing so will cause you to be automatically removed 
from the list.

To unsubscribe send a message with UNSUBSCRIBE in the subject line to 
[email protected] (mailto:[email protected]) If you use auto-responders 
(like out-of-the-office messages), make sure that they are not sent to the list 
nor to individual members. Doing so will cause you to be automatically removed 
from the list.

To unsubscribe send a message with UNSUBSCRIBE in the subject line to 
[email protected] (mailto:[email protected]) If you use auto-responders 
(like out-of-the-office messages), make sure that they are not sent to the list 
nor to individual members. Doing so will cause you to be automatically removed 
from the list.



The contents of this message, together with any attachments, are intended only 
for the use of the individual or entity to which they are addressed and may 
contain information that is confidential and exempt from disclosure. If you are 
not the intended recipient, you are hereby notified that any dissemination, 
distribution, or copying of this message, or any attachment, is strictly 
prohibited. If you have received this message in error, please notify the 
original sender immediately by telephone or by return E-mail and delete this 
message, along with any attachments, from your computer. Thank you.

To unsubscribe send a message with UNSUBSCRIBE in the subject line to 
[email protected] (mailto:[email protected])
If you use auto-responders (like out-of-the-office messages), make sure that 
they are not sent to the list nor to individual members. Doing so will cause 
you to be automatically removed from the list.



To unsubscribe send a message with UNSUBSCRIBE in the subject line to 
[email protected] (mailto:[email protected])
If you use auto-responders (like out-of-the-office messages), make sure that 
they are not sent to the list nor to individual members. Doing so will cause 
you to be automatically removed from the list.



To unsubscribe send a message with UNSUBSCRIBE in the subject line to 
[email protected] (mailto:[email protected])
If you use auto-responders (like out-of-the-office messages), make sure that 
they are not sent to the list nor to individual members. Doing so will cause 
you to be automatically removed from the list. 



To unsubscribe send a message with UNSUBSCRIBE in the subject line to 
[email protected] (mailto:[email protected])
If you use auto-responders (like out-of-the-office messages), make sure that 
they are not sent to the list nor to individual members. Doing so will cause 
you to be automatically removed from the list.



To unsubscribe send a message with UNSUBSCRIBE in the subject line to 
[email protected] (mailto:[email protected])
If you use auto-responders (like out-of-the-office messages), make sure that 
they are not sent to the list nor to individual members. Doing so will cause 
you to be automatically removed from the list.

To unsubscribe send a message with UNSUBSCRIBE in the subject line to 
[email protected]
If you use auto-responders (like out-of-the-office messages), make sure that 
they are not sent to the list nor to individual members. Doing so will cause 
you to be automatically removed from the list.

Reply via email to