Huh I guess I should read the patch, it already does this ... then I agree with Sudheer and we should replace all of this with the new one, perhaps with configuration for each log level on frequency.
I don’t like the idea of having two Warning flavors, when would I use one over the other ? I like consistency more than most other things. Cheers, — Leif > On Oct 17, 2020, at 21:35, Leif Hedstrom <zw...@apache.org> wrote: > > > >> On Oct 16, 2020, at 18:00, Sudheer Vinukonda >> <sudheervinuko...@yahoo.com.invalid> wrote: >> >> +1 >> >> This sort of throttling ability is very important and useful to have. >> >> A minor suggestion on the implementation proposal - it feels like it’d far >> simpler to localize the change to the definition of Warning, Error etc >> instead of having to replace each and every call to those functions. ie >> Check if the throttling is enabled and call the throttled version inside the >> definition of the respective methods? > > I think a complete modernization is in order as well. Also, I think the > proposal here actually loses vital information: > > Rather than just not throttling (dropping) the messages, we really need to > aggregate them. This is pretty common in system level logging, such as “error > repeated 3678 times”. Without it you can’t tell if something is happening > millions of times, or once a minute, and this can be the difference of > detecting a P0 or not. > > I wonder if we should instead replace all this error logging with modern > logging which already has such aggregation in place? > > Cheers, > > — Leif >> >>>> On Oct 16, 2020, at 2:13 PM, Brian Neradt >>>> <brian.ner...@verizonmedia.com.invalid> wrote: >>> >>> dev@trafficserver.apache.org, >>> >>> We've had issues with excessive logging under conditions where something >>> basic goes wrong. For instance the log pipe mechanism will emit a warning >>> or error log for every logging event if the pipe's buffer fills up or the >>> reader goes down. This can result in thousands of log messages per second. >>> >>> I propose we add throttled versions of our current logging functions that >>> will emit a message on a certain periodicity (currently 60 seconds as a >>> default): >>> >>> ThrottledStatus >>> ThrottledNote >>> ThrottledWarning >>> ThrottledError >>> >>> I've created a draft PR with this implemented and used in a few places: >>> https://github.com/apache/trafficserver/pull/7279 >>> >>> I've verified in our production simulation environment that this patch >>> achieves throttling as desired. I disabled our log pipe reader, ran a few >>> hundred thousand transactions for a minute, and the messages look like the >>> following: >>> >>> [Oct 16 17:52:55.097] [LOG_FLUSH] ERROR: Failed to write log to >>> /some/path.pipe: [tried 298, wrote 0, Broken pipe] >>> [Oct 16 17:53:55.280] [LOG_FLUSH] ERROR: Skipped the following message >>> 257999 times. >>> [Oct 16 17:53:55.280] [LOG_FLUSH] ERROR: Failed to write log >>> to /some/path.pipe: [tried 298, wrote 0, Broken pipe] >>> >>> Notice that rather than having about 260,000 log messages, the throttling >>> results in three. >>> >>> To use these new throttling functions, we simply change the Error, Warning, >>> etc., log function call to the Throttled version: >>> >>> - Warning("File:%s was closed, have dropped (%d) bytes.", >>> logfile->get_name(), total_bytes); >>> + ThrottledWarning("File:%s was closed, have dropped (%d) bytes.", >>> logfile->get_name(), total_bytes); >>> >>> This change should be pretty low risk since from an implementation >>> standpoint only the messages we convert to these throttled versions are >>> impacted. >>> >>> Let me know if you have any thoughts or ideas for improvement, either with >>> the design or the implementation. >>> >>> Thanks! >>> -- >>> Brian Neradt >>> >>> 1908 S. First Street >>> >>> Champaign, IL 61820 >>