Thanks for the feedback. I agree that if we were to replace all of them, we'd might as well replace the implementations of Warning, Error, etc. with this throttled version. My initial intent was not to replace all of the messages, but to replace a handful of Errors, Warnings, etc. with the throttled versions that have been observed to be overly noisy. Once the community had some exposure to the use of these and made any desired tweaks, we could then consider rolling it out more broadly. Maybe that's being overly slow. I'm open to input.
On Fri, Oct 16, 2020 at 7:00 PM Sudheer Vinukonda <sudheervinuko...@yahoo.com.invalid> wrote: > +1 > > This sort of throttling ability is very important and useful to have. > > A minor suggestion on the implementation proposal - it feels like it’d far > simpler to localize the change to the definition of Warning, Error etc > instead of having to replace each and every call to those functions. ie > Check if the throttling is enabled and call the throttled version inside > the definition of the respective methods? > > > On Oct 16, 2020, at 2:13 PM, Brian Neradt > <brian.ner...@verizonmedia.com.invalid> wrote: > > > > dev@trafficserver.apache.org, > > > > We've had issues with excessive logging under conditions where something > > basic goes wrong. For instance the log pipe mechanism will emit a warning > > or error log for every logging event if the pipe's buffer fills up or the > > reader goes down. This can result in thousands of log messages per > second. > > > > I propose we add throttled versions of our current logging functions that > > will emit a message on a certain periodicity (currently 60 seconds as a > > default): > > > > ThrottledStatus > > ThrottledNote > > ThrottledWarning > > ThrottledError > > > > I've created a draft PR with this implemented and used in a few places: > > https://github.com/apache/trafficserver/pull/7279 > > > > I've verified in our production simulation environment that this patch > > achieves throttling as desired. I disabled our log pipe reader, ran a few > > hundred thousand transactions for a minute, and the messages look like > the > > following: > > > > [Oct 16 17:52:55.097] [LOG_FLUSH] ERROR: Failed to write log to > > /some/path.pipe: [tried 298, wrote 0, Broken pipe] > > [Oct 16 17:53:55.280] [LOG_FLUSH] ERROR: Skipped the following message > > 257999 times. > > [Oct 16 17:53:55.280] [LOG_FLUSH] ERROR: Failed to write log > > to /some/path.pipe: [tried 298, wrote 0, Broken pipe] > > > > Notice that rather than having about 260,000 log messages, the throttling > > results in three. > > > > To use these new throttling functions, we simply change the Error, > Warning, > > etc., log function call to the Throttled version: > > > > - Warning("File:%s was closed, have dropped (%d) bytes.", > > logfile->get_name(), total_bytes); > > + ThrottledWarning("File:%s was closed, have dropped (%d) bytes.", > > logfile->get_name(), total_bytes); > > > > This change should be pretty low risk since from an implementation > > standpoint only the messages we convert to these throttled versions are > > impacted. > > > > Let me know if you have any thoughts or ideas for improvement, either > with > > the design or the implementation. > > > > Thanks! > > -- > > Brian Neradt > > > > 1908 S. First Street > > > > Champaign, IL 61820 > > -- "Come to Me, all who are weary and heavy-laden, and I will give you rest. Take My yoke upon you and learn from Me, for I am gentle and humble in heart, and you will find rest for your souls. For My yoke is easy and My burden is light." ~ Matthew 11:28-30