On Mon, Aug 24, 2015 at 4:03 PM, Page, Jeremy <jeremy.p...@gilbarco.com> wrote:
> Sorry, was not saying don't look at logs, just saying logs are only > reactive and only see things you're logging (if the server crashes you may > log nada but that's definitely an issue! I also personally find correlation > easier when I have graphic data but something like the ELK stack could help > here, I have checks that look at ELK and then alert when they find > pertinent data (they could also watch logs but this way they're in a single > place and can also look for negatives (i.e. no one has logged in for 15 > minutes is an error even if everything else is "green). > > "looking at logs is 100% accurate at detecting logged problems :-)" - I'm > stealing this. > You need both, no doubt. An interesting presentation[1] by Jos Boumans discussed a single graph containing both succeeded hits (HTTP 200/300) and errors (400/500's), showing both errors and response times. We won't be doing that anytime soon over here, but his thoughts might be of use. [1] https://www.youtube.com/watch?v=VTFEG8sQwS8 Hans > I think the problem is looking at it as a binary up/down issue when in > fact > > you should be able to determine the problem is occurring when the check > takes > > longer than a specific threshold. A page taking a second or two to load > is > > going to cause close to the havoc a 404 does. > > The problem is when you don't have a site-wide problem, but rather an > intermitten problem, your external test may or may no see it. > > If you have 10 systems behind a load balancer, and one of those 10 systems > has a > problem, at best your external test is going to get an error 1 out of 10 > times > (at worst, your load balancer is going to tend to put your external test > to one > server instead of rotating it across all 10, in which case you may never > see the > problem) > > looking at logs is 100% accurate at detecting logged problems :-) > > logs won't detect problems that aren't logged (which is why I think you > should > log how long it took to service the request), so you need the external > test as > well. But there is a LOT of stuff the extenal test won't detect. > > David Lang > > > As far as false positives go the same is true for the failed attempt. > This is > > why Nagios and most other monitoring systems offer the ability to > confirm a > > failed check. Personally I like to check at moderately long intervals but > > recheck quickly if I discover a possible failure. > > > > Finally, as Adam pointed out, just because the page returns does not > determine > > that it's functional. Acceptable user experience is (should) be the > thing you > > are trying to verify. > > > > > > ________________________________________ > > From: tech-boun...@lists.lopsa.org [tech-boun...@lists.lopsa.org] on > behalf of Edward Ned Harvey (lopser) [lop...@nedharvey.com] > > Sent: Monday, August 24, 2015 6:51 AM > > To: Adam Moskowitz; tech@lists.lopsa.org > > Subject: Re: [lopsa-tech] Server Overload and Log Processing > > > >> From: tech-boun...@lists.lopsa.org [mailto:tech-boun...@lists.lopsa.org > ] > >> On Behalf Of Adam Moskowitz > >> > >> I don't see how that can be true: If "a bunch of users" will get errors, > >> I believe your page download tester will also see those same errors. If > >> it's not seeing those errors, what good is it? > > > > If you server can handle 100,000 requests per minute, and you get > 101,000 requests a minute, then 1% of your users will get "Page cannot be > displayed" or something similar. You have a 99% chance that your download > tester will fail to detect the problem. If it's sustained, you'll probably > detect the problem after 100 minutes, but you really should have detected > it sooner, and if you detect the problem only as "page failed to download" > by your download test, then you don't know why it failed, and the problem > doesn't persist, and you'll probably brush it off as a false alarm. > > > > > >> Yes, you should still be looking at your logs, but I believe that what's > >> more critical is that you monitor the service *from the user's point of > >> view*, and that monitoring should reflect the users' experiences as > >> closely as possible. > > > > Agreed. > > _______________________________________________ > > Tech mailing list > > Tech@lists.lopsa.org > > https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech > > This list provided by the League of Professional System Administrators > > http://lopsa.org/ > > Please be advised that this email may contain confidential information. > If you are not the intended recipient, please notify us by email by > replying to the sender and delete this message. The sender disclaims that > the content of this email constitutes an offer to enter into, or the > acceptance of, any agreement; provided that the foregoing does not > invalidate the binding effect of any digital or other electronic > reproduction of a manual signature that is included in any attachment. > > _______________________________________________ > > Tech mailing list > > Tech@lists.lopsa.org > > https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech > > This list provided by the League of Professional System Administrators > > http://lopsa.org/ > > > Please be advised that this email may contain confidential information. If > you are not the intended recipient, please notify us by email by replying > to the sender and delete this message. The sender disclaims that the > content of this email constitutes an offer to enter into, or the acceptance > of, any agreement; provided that the foregoing does not invalidate the > binding effect of any digital or other electronic reproduction of a manual > signature that is included in any attachment. > _______________________________________________ > Tech mailing list > Tech@lists.lopsa.org > https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech > This list provided by the League of Professional System Administrators > http://lopsa.org/ >
_______________________________________________ Tech mailing list Tech@lists.lopsa.org https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/