On 11.12.2016 16:17, Wietse Venema wrote:
Wagner, Patrick:
According to the log files the server wasn't in STRESS mode at this
point in time (about an hour before, it had entered and left STRESS mode
within 6 seconds), so this leaves a corrupted verify_cache.db.
That is incorrect. STRESS mode persists for at least 1000 seconds.

         if (serv->stress_param_val != 0) {
             now = event_time();
             if (serv->busy_warn_time < now - 1000) {
                 serv->busy_warn_time = now;
                 msg_warn("service \"%s\" (%s) has reached its process limit 
\"%d

The STRESS warning is logged at 1000-second intervals. All this is
done to avoid spamming the logfile and flapping the service as the
load fluctuates.

Here I'm grepping for STRESS, as you've suggested in your first mail, in the same maillog I took my example mailflow from, and getting only two log entries 6 seconds apart:

# zgrep STRESS /var/log/maillog-20161211.gz
Dec 10 09:42:47 mx10 postfix/postscreen[1186]: entering STRESS mode with 90 connections Dec 10 09:42:53 mx10 postfix/postscreen[1186]: leaving STRESS mode with 70 connections

Grepping for "has reached its process limit" in the same maillog returns no lines.

I've removed the database, reloaded postfix, and a test mail to this
non-existent address has correctly detected and honored the
undeliverable status and I got the NOQUEUE: reject line that I expected.
So it was a corrupted database, which means that (the information
you read) differs from (the information you wrote).
In this particular example I'd now like to think the "culprit" was optimistic caching for positive results which I had failed to account for because I knew that this particular mail account had been disabled months ago. Unbeknownst to me, it was "resurrected" in November for a short amount of time, and that's when the "good" result correctly entered the cache and it would've stayed there presumably until December 22/23 if I hadn't removed the entire database today.

- Patrick

Reply via email to