I think the problem is  looking at it as a binary up/down issue when in fact 
you should be able to determine the problem is occurring when the check takes 
longer than a specific threshold. A page taking a second or two to load is 
going to cause close to the havoc a 404 does.

 As far as false positives go the same is true for the failed attempt. This is 
why Nagios and most other monitoring systems offer the ability to confirm a 
failed check. Personally I like to check at moderately long intervals but 
recheck quickly if I discover a possible failure.

Finally, as Adam pointed out, just because the page returns does not determine 
that it's functional. Acceptable user experience is (should) be the thing you 
are trying to verify.


________________________________________
From: tech-boun...@lists.lopsa.org [tech-boun...@lists.lopsa.org] on behalf of 
Edward Ned Harvey (lopser) [lop...@nedharvey.com]
Sent: Monday, August 24, 2015 6:51 AM
To: Adam Moskowitz; tech@lists.lopsa.org
Subject: Re: [lopsa-tech] Server Overload and Log Processing

> From: tech-boun...@lists.lopsa.org [mailto:tech-boun...@lists.lopsa.org]
> On Behalf Of Adam Moskowitz
>
> I don't see how that can be true: If "a bunch of users" will get errors,
> I believe your page download tester will also see those same errors. If
> it's not seeing those errors, what good is it?

If you server can handle 100,000 requests per minute, and you get 101,000 
requests a minute, then 1% of your users will get "Page cannot be displayed" or 
something similar. You have a 99% chance that your download tester will fail to 
detect the problem. If it's sustained, you'll probably detect the problem after 
100 minutes, but you really should have detected it sooner, and if you detect 
the problem only as "page failed to download" by your download test, then you 
don't know why it failed, and the problem doesn't persist, and you'll probably 
brush it off as a false alarm.


> Yes, you should still be looking at your logs, but I believe that what's
> more critical is that you monitor the service *from the user's point of
> view*, and that monitoring should reflect the users' experiences as
> closely as possible.

Agreed.
_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/
Please be advised that this email may contain confidential information. If you 
are not the intended recipient, please notify us by email by replying to the 
sender and delete this message. The sender disclaims that the content of this 
email constitutes an offer to enter into, or the acceptance of, any agreement; 
provided that the foregoing does not invalidate the binding effect of any 
digital or other electronic reproduction of a manual signature that is included 
in any attachment.
_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to