Just to be clear, I’m not contrasting active synthetic testing with monitoring resource consumption. I think that the highest value variable is $, or those variables that have highest correlation to profit. The real customer experience is probably #2 after sales. Monitoring things like active connections, cache hit ratios etc is important to understand “what is normal?” It’s easy for our mental model of how a site works to differ markedly from reality.
Sent from my iPhone > On Apr 11, 2018, at 2:04 AM, Jeff Abrahamson <j...@p27.eu> wrote: > >> On Wed, Apr 11, 2018 at 01:17:14AM -0400, Peter Booth wrote: >> There are some very good reasons for doing things in what sounds >> like a heavy inefficient manner. > > I suspected, thanks for the explanations. > > >> The first point is that there are some big differences between >> application code /business logic and monitoring code: >> >> [...] > > good summary, I agree with you. > > >> tailing a log file doesnt sound sexy, but its also pretty hard to >> mess it up. I monitored a high traffic email site with a very short >> Ruby script that would tail an nginx log, pushing messages ten at a >> time as UDP datagrams to an influxdb. The script would do its thing >> for 15 mins then die. cron ensured a new instance started every 15 >> minutes. It was more efficient than a shell script because it didn't >> start new processes in a pipeline. > > It's hard to mess up as long as you're not interested in > exactly-once. ;-) > > The tail solution has the particularity that (1) it could miss things > if the short gap between process death and process start sees more > events than tail catches at startup or if the log file rotates a few > seconds into that 15 minute period, and (2) it could duplicate things > in case of very few events in that period. Now, with telegraf/influx, > duplicates aren't a concern, because influx keys on time, and our site > is probably not getting so much traffic that a tail restart is a big > deal, although log rotation could lead to gaps we don't like. > > Of course, this is why Logwatch was written... > > >> I like the scalar guide but I disagree with their advice on active >> monitoring I think its smarter to use real user requests to test if >> servers are up. i have seen many high profile sites that end up >> serving more synthetic requests than real customer initiated >> requests. > > I'm not sure I understood what you mean by "active monitoring". I've > understood "sending http queries to see if they are handled properly". > > In that context: I think both submitting queries (from outside one's > own network) and passively watching stats on the service itself are > essential. Passively watching stats gives me information on internal > state, useful in itself but also when debugging problems. Active > monitoring from a different network can alert me to problems that may > not be specific to any one service, maybe even are at the network > level. > > Of course, yes, active monitoring shouldn't be trying to DoS my > service. ;-) > > Jeff Abrahamson > https://www.p27.eu/jeff/ > > >> On 11 Apr 2018, at 12:19 AM, Jeff Abrahamson <j...@p27.eu> wrote: >> >> I want to monitor nginx better: http returns (e.g., how many >> 500's, how many 404's, how many 200's, etc.), as well as request >> rates, response times, etc. All the solutions I've found start >> with "set up something to watch and parse your logs, then ..." >> >> Here's one of the better examples of that: >> >> >> https://www.scalyr.com/community/guides/how-to-monitor-nginx-the-essential-guide >> >> Perhaps I'm wrong to find this curious. It seems somewhat heavy >> and inefficient to put this functionality into log watching, >> which means another service and being sensitive to an eventual >> change in log format. >> >> Is this, indeed, the recommended solution? >> >> And, for my better understanding, can anyone explain why this >> makes more sense than native nginx support of sending UDP >> packets to a monitor collector (in our case, telegraf)? >> >> -- >> >> Jeff Abrahamson >> +33 6 24 40 01 57 >> +44 7920 594 255 >> >> http://p27.eu/jeff/ > _______________________________________________ > nginx mailing list > nginx@nginx.org > http://mailman.nginx.org/mailman/listinfo/nginx _______________________________________________ nginx mailing list nginx@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx