dev@trafficserver.apache.org,

We've had issues with excessive logging under conditions where something
basic goes wrong. For instance the log pipe mechanism will emit a warning
or error log for every logging event if the pipe's buffer fills up or the
reader goes down. This can result in thousands of log messages per second.

I propose we add throttled versions of our current logging functions that
will emit a message on a certain periodicity (currently 60 seconds as a
default):

ThrottledStatus
ThrottledNote
ThrottledWarning
ThrottledError

I've created a draft PR with this implemented and used in a few places:
https://github.com/apache/trafficserver/pull/7279

I've verified in our production simulation environment that this patch
achieves throttling as desired. I disabled our log pipe reader, ran a few
hundred thousand transactions for a minute, and the messages look like the
following:

[Oct 16 17:52:55.097] [LOG_FLUSH] ERROR: Failed to write log to
/some/path.pipe: [tried 298, wrote 0, Broken pipe]
[Oct 16 17:53:55.280] [LOG_FLUSH] ERROR: Skipped the following message
257999 times.
[Oct 16 17:53:55.280] [LOG_FLUSH] ERROR: Failed to write log
to /some/path.pipe: [tried 298, wrote 0, Broken pipe]

Notice that rather than having about 260,000 log messages, the throttling
results in three.

To use these new throttling functions, we simply change the Error, Warning,
etc., log function call to the Throttled version:

-        Warning("File:%s was closed, have dropped (%d) bytes.",
logfile->get_name(), total_bytes);
+        ThrottledWarning("File:%s was closed, have dropped (%d) bytes.",
logfile->get_name(), total_bytes);

This change should be pretty low risk since from an implementation
standpoint only the messages we convert to these throttled versions are
impacted.

Let me know if you have any thoughts or ideas for improvement, either with
the design or the implementation.

Thanks!
-- 
Brian Neradt

1908 S. First Street

Champaign, IL 61820

Reply via email to