> On Oct 16, 2020, at 18:00, Sudheer Vinukonda
> <sudheervinuko...@yahoo.com.invalid> wrote:
>
> +1
>
> This sort of throttling ability is very important and useful to have.
>
> A minor suggestion on the implementation proposal - it feels like it’d far
> simpler to localize the change to the definition of Warning, Error etc
> instead of having to replace each and every call to those functions. ie Check
> if the throttling is enabled and call the throttled version inside the
> definition of the respective methods?
I think a complete modernization is in order as well. Also, I think the
proposal here actually loses vital information:
Rather than just not throttling (dropping) the messages, we really need to
aggregate them. This is pretty common in system level logging, such as “error
repeated 3678 times”. Without it you can’t tell if something is happening
millions of times, or once a minute, and this can be the difference of
detecting a P0 or not.
I wonder if we should instead replace all this error logging with modern
logging which already has such aggregation in place?
Cheers,
— Leif
>
>> On Oct 16, 2020, at 2:13 PM, Brian Neradt
>> <brian.ner...@verizonmedia.com.invalid> wrote:
>>
>> dev@trafficserver.apache.org,
>>
>> We've had issues with excessive logging under conditions where something
>> basic goes wrong. For instance the log pipe mechanism will emit a warning
>> or error log for every logging event if the pipe's buffer fills up or the
>> reader goes down. This can result in thousands of log messages per second.
>>
>> I propose we add throttled versions of our current logging functions that
>> will emit a message on a certain periodicity (currently 60 seconds as a
>> default):
>>
>> ThrottledStatus
>> ThrottledNote
>> ThrottledWarning
>> ThrottledError
>>
>> I've created a draft PR with this implemented and used in a few places:
>> https://github.com/apache/trafficserver/pull/7279
>>
>> I've verified in our production simulation environment that this patch
>> achieves throttling as desired. I disabled our log pipe reader, ran a few
>> hundred thousand transactions for a minute, and the messages look like the
>> following:
>>
>> [Oct 16 17:52:55.097] [LOG_FLUSH] ERROR: Failed to write log to
>> /some/path.pipe: [tried 298, wrote 0, Broken pipe]
>> [Oct 16 17:53:55.280] [LOG_FLUSH] ERROR: Skipped the following message
>> 257999 times.
>> [Oct 16 17:53:55.280] [LOG_FLUSH] ERROR: Failed to write log
>> to /some/path.pipe: [tried 298, wrote 0, Broken pipe]
>>
>> Notice that rather than having about 260,000 log messages, the throttling
>> results in three.
>>
>> To use these new throttling functions, we simply change the Error, Warning,
>> etc., log function call to the Throttled version:
>>
>> - Warning("File:%s was closed, have dropped (%d) bytes.",
>> logfile->get_name(), total_bytes);
>> + ThrottledWarning("File:%s was closed, have dropped (%d) bytes.",
>> logfile->get_name(), total_bytes);
>>
>> This change should be pretty low risk since from an implementation
>> standpoint only the messages we convert to these throttled versions are
>> impacted.
>>
>> Let me know if you have any thoughts or ideas for improvement, either with
>> the design or the implementation.
>>
>> Thanks!
>> --
>> Brian Neradt
>>
>> 1908 S. First Street
>>
>> Champaign, IL 61820
>