On Mon, Aug 24, 2015 at 4:03 PM, Page, Jeremy <jeremy.p...@gilbarco.com>
wrote:

> Sorry, was not saying don't look at logs, just saying logs are only
> reactive and only see things you're logging (if the server crashes you may
> log nada but that's definitely an issue! I also personally find correlation
> easier when I have graphic data but something like the ELK stack could help
> here, I have checks that look at ELK and then alert when they find
> pertinent data (they could also watch logs but this way they're in a single
> place and can also look for negatives (i.e. no one has logged in for 15
> minutes is an error even if everything else is "green).
>
> "looking at logs is 100% accurate at detecting logged problems :-)" - I'm
> stealing this.
>

You need both, no doubt. An interesting presentation[1] by Jos Boumans
discussed a single graph containing both succeeded hits (HTTP 200/300) and
errors (400/500's), showing both errors and response times. We won't be
doing that anytime soon over here, but his thoughts might be of use.

[1] https://www.youtube.com/watch?v=VTFEG8sQwS8

Hans





> I think the problem is looking at it as a binary up/down issue when in
> fact
> > you should be able to determine the problem is occurring when the check
> takes
> > longer than a specific threshold. A page taking a second or two to load
> is
> > going to cause close to the havoc a 404 does.
>
> The problem is when you don't have a site-wide problem, but rather an
> intermitten problem, your external test may or may no see it.
>
> If you have 10 systems behind a load balancer, and one of those 10 systems
> has a
> problem, at best your external test is going to get an error 1 out of 10
> times
> (at worst, your load balancer is going to tend to put your external test
> to one
> server instead of rotating it across all 10, in which case you may never
> see the
> problem)
>
> looking at logs is 100% accurate at detecting logged problems :-)
>
> logs won't detect problems that aren't logged (which is why I think you
> should
> log how long it took to service the request), so you need the external
> test as
> well. But there is a LOT of stuff the extenal test won't detect.
>
> David Lang
>
> > As far as false positives go the same is true for the failed attempt.
> This is
> > why Nagios and most other monitoring systems offer the ability to
> confirm a
> > failed check. Personally I like to check at moderately long intervals but
> > recheck quickly if I discover a possible failure.
> >
> > Finally, as Adam pointed out, just because the page returns does not
> determine
> > that it's functional. Acceptable user experience is (should) be the
> thing you
> > are trying to verify.
> >
> >
> > ________________________________________
> > From: tech-boun...@lists.lopsa.org [tech-boun...@lists.lopsa.org] on
> behalf of Edward Ned Harvey (lopser) [lop...@nedharvey.com]
> > Sent: Monday, August 24, 2015 6:51 AM
> > To: Adam Moskowitz; tech@lists.lopsa.org
> > Subject: Re: [lopsa-tech] Server Overload and Log Processing
> >
> >> From: tech-boun...@lists.lopsa.org [mailto:tech-boun...@lists.lopsa.org
> ]
> >> On Behalf Of Adam Moskowitz
> >>
> >> I don't see how that can be true: If "a bunch of users" will get errors,
> >> I believe your page download tester will also see those same errors. If
> >> it's not seeing those errors, what good is it?
> >
> > If you server can handle 100,000 requests per minute, and you get
> 101,000 requests a minute, then 1% of your users will get "Page cannot be
> displayed" or something similar. You have a 99% chance that your download
> tester will fail to detect the problem. If it's sustained, you'll probably
> detect the problem after 100 minutes, but you really should have detected
> it sooner, and if you detect the problem only as "page failed to download"
> by your download test, then you don't know why it failed, and the problem
> doesn't persist, and you'll probably brush it off as a false alarm.
> >
> >
> >> Yes, you should still be looking at your logs, but I believe that what's
> >> more critical is that you monitor the service *from the user's point of
> >> view*, and that monitoring should reflect the users' experiences as
> >> closely as possible.
> >
> > Agreed.
> > _______________________________________________
> > Tech mailing list
> > Tech@lists.lopsa.org
> > https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
> > This list provided by the League of Professional System Administrators
> > http://lopsa.org/
> > Please be advised that this email may contain confidential information.
> If you are not the intended recipient, please notify us by email by
> replying to the sender and delete this message. The sender disclaims that
> the content of this email constitutes an offer to enter into, or the
> acceptance of, any agreement; provided that the foregoing does not
> invalidate the binding effect of any digital or other electronic
> reproduction of a manual signature that is included in any attachment.
> > _______________________________________________
> > Tech mailing list
> > Tech@lists.lopsa.org
> > https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
> > This list provided by the League of Professional System Administrators
> > http://lopsa.org/
> >
> Please be advised that this email may contain confidential information. If
> you are not the intended recipient, please notify us by email by replying
> to the sender and delete this message. The sender disclaims that the
> content of this email constitutes an offer to enter into, or the acceptance
> of, any agreement; provided that the foregoing does not invalidate the
> binding effect of any digital or other electronic reproduction of a manual
> signature that is included in any attachment.
> _______________________________________________
> Tech mailing list
> Tech@lists.lopsa.org
> https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
> This list provided by the League of Professional System Administrators
>  http://lopsa.org/
>
_______________________________________________
Tech mailing list
Tech@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to