Re: Design Review Request: Log Throttling

Leif Hedstrom Sat, 17 Oct 2020 20:39:51 -0700

Huh I guess I should read the patch, it already does this ... then I agree with 
Sudheer and we should replace all of this with the new one, perhaps with 
configuration for each log level on frequency.


I don’t like the idea of having two Warning flavors, when would I use one over 
the other ? I like consistency more than most other things.

Cheers,

— Leif 

> On Oct 17, 2020, at 21:35, Leif Hedstrom <zw...@apache.org> wrote:
> 
> 
> 
>> On Oct 16, 2020, at 18:00, Sudheer Vinukonda 
>> <sudheervinuko...@yahoo.com.invalid> wrote:
>> 
>> +1 
>> 
>> This sort of throttling ability is very important and useful to have. 
>> 
>> A minor suggestion on the implementation proposal - it feels like it’d far 
>> simpler to localize the change to the definition of Warning, Error etc 
>> instead of having to replace each and every call to those functions. ie 
>> Check if the throttling is enabled and call the throttled version inside the 
>> definition of the respective methods?
> 
> I think a complete modernization is in order as well. Also, I think the 
> proposal here actually loses vital information:
> 
> Rather than just not throttling (dropping) the messages, we really need to 
> aggregate them. This is pretty common in system level logging, such as “error 
> repeated 3678 times”. Without it you can’t tell if something is happening 
> millions of times, or once a minute, and this can be the difference of 
> detecting a P0 or not.
> 
> I wonder if we should instead replace all this error logging with modern 
> logging which already has such aggregation in place?
> 
> Cheers,
> 
> — Leif 
>> 
>>>> On Oct 16, 2020, at 2:13 PM, Brian Neradt 
>>>> <brian.ner...@verizonmedia.com.invalid> wrote:
>>> 
>>> dev@trafficserver.apache.org,
>>> 
>>> We've had issues with excessive logging under conditions where something
>>> basic goes wrong. For instance the log pipe mechanism will emit a warning
>>> or error log for every logging event if the pipe's buffer fills up or the
>>> reader goes down. This can result in thousands of log messages per second.
>>> 
>>> I propose we add throttled versions of our current logging functions that
>>> will emit a message on a certain periodicity (currently 60 seconds as a
>>> default):
>>> 
>>> ThrottledStatus
>>> ThrottledNote
>>> ThrottledWarning
>>> ThrottledError
>>> 
>>> I've created a draft PR with this implemented and used in a few places:
>>> https://github.com/apache/trafficserver/pull/7279
>>> 
>>> I've verified in our production simulation environment that this patch
>>> achieves throttling as desired. I disabled our log pipe reader, ran a few
>>> hundred thousand transactions for a minute, and the messages look like the
>>> following:
>>> 
>>> [Oct 16 17:52:55.097] [LOG_FLUSH] ERROR: Failed to write log to
>>> /some/path.pipe: [tried 298, wrote 0, Broken pipe]
>>> [Oct 16 17:53:55.280] [LOG_FLUSH] ERROR: Skipped the following message
>>> 257999 times.
>>> [Oct 16 17:53:55.280] [LOG_FLUSH] ERROR: Failed to write log
>>> to /some/path.pipe: [tried 298, wrote 0, Broken pipe]
>>> 
>>> Notice that rather than having about 260,000 log messages, the throttling
>>> results in three.
>>> 
>>> To use these new throttling functions, we simply change the Error, Warning,
>>> etc., log function call to the Throttled version:
>>> 
>>> -        Warning("File:%s was closed, have dropped (%d) bytes.",
>>> logfile->get_name(), total_bytes);
>>> +        ThrottledWarning("File:%s was closed, have dropped (%d) bytes.",
>>> logfile->get_name(), total_bytes);
>>> 
>>> This change should be pretty low risk since from an implementation
>>> standpoint only the messages we convert to these throttled versions are
>>> impacted.
>>> 
>>> Let me know if you have any thoughts or ideas for improvement, either with
>>> the design or the implementation.
>>> 
>>> Thanks!
>>> -- 
>>> Brian Neradt
>>> 
>>> 1908 S. First Street
>>> 
>>> Champaign, IL 61820
>>

Re: Design Review Request: Log Throttling

Reply via email to