On Wed, Apr 11, 2018 at 01:17:14AM -0400, Peter Booth wrote: > There are some very good reasons for doing things in what sounds > like a heavy inefficient manner.
I suspected, thanks for the explanations. > The first point is that there are some big differences between > application code /business logic and monitoring code: > > [...] good summary, I agree with you. > tailing a log file doesnt sound sexy, but its also pretty hard to > mess it up. I monitored a high traffic email site with a very short > Ruby script that would tail an nginx log, pushing messages ten at a > time as UDP datagrams to an influxdb. The script would do its thing > for 15 mins then die. cron ensured a new instance started every 15 > minutes. It was more efficient than a shell script because it didn't > start new processes in a pipeline. It's hard to mess up as long as you're not interested in exactly-once. ;-) The tail solution has the particularity that (1) it could miss things if the short gap between process death and process start sees more events than tail catches at startup or if the log file rotates a few seconds into that 15 minute period, and (2) it could duplicate things in case of very few events in that period. Now, with telegraf/influx, duplicates aren't a concern, because influx keys on time, and our site is probably not getting so much traffic that a tail restart is a big deal, although log rotation could lead to gaps we don't like. Of course, this is why Logwatch was written... > I like the scalar guide but I disagree with their advice on active > monitoring I think its smarter to use real user requests to test if > servers are up. i have seen many high profile sites that end up > serving more synthetic requests than real customer initiated > requests. I'm not sure I understood what you mean by "active monitoring". I've understood "sending http queries to see if they are handled properly". In that context: I think both submitting queries (from outside one's own network) and passively watching stats on the service itself are essential. Passively watching stats gives me information on internal state, useful in itself but also when debugging problems. Active monitoring from a different network can alert me to problems that may not be specific to any one service, maybe even are at the network level. Of course, yes, active monitoring shouldn't be trying to DoS my service. ;-) Jeff Abrahamson https://www.p27.eu/jeff/ > On 11 Apr 2018, at 12:19 AM, Jeff Abrahamson <j...@p27.eu> wrote: > > I want to monitor nginx better: http returns (e.g., how many > 500's, how many 404's, how many 200's, etc.), as well as request > rates, response times, etc. All the solutions I've found start > with "set up something to watch and parse your logs, then ..." > > Here's one of the better examples of that: > > > https://www.scalyr.com/community/guides/how-to-monitor-nginx-the-essential-guide > > Perhaps I'm wrong to find this curious. It seems somewhat heavy > and inefficient to put this functionality into log watching, > which means another service and being sensitive to an eventual > change in log format. > > Is this, indeed, the recommended solution? > > And, for my better understanding, can anyone explain why this > makes more sense than native nginx support of sending UDP > packets to a monitor collector (in our case, telegraf)? > > -- > > Jeff Abrahamson > +33 6 24 40 01 57 > +44 7920 594 255 > > http://p27.eu/jeff/ _______________________________________________ nginx mailing list nginx@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx