On 11.12.2016 16:17, Wietse Venema wrote:
Wagner, Patrick:
According to the log files the server wasn't in STRESS mode at this
point in time (about an hour before, it had entered and left STRESS mode
within 6 seconds), so this leaves a corrupted verify_cache.db.
That is incorrect. STRESS mode persists for at least 1000 seconds.
if (serv->stress_param_val != 0) {
now = event_time();
if (serv->busy_warn_time < now - 1000) {
serv->busy_warn_time = now;
msg_warn("service \"%s\" (%s) has reached its process limit
\"%d
The STRESS warning is logged at 1000-second intervals. All this is
done to avoid spamming the logfile and flapping the service as the
load fluctuates.
Here I'm grepping for STRESS, as you've suggested in your first mail, in
the same maillog I took my example mailflow from, and getting only two
log entries 6 seconds apart:
# zgrep STRESS /var/log/maillog-20161211.gz
Dec 10 09:42:47 mx10 postfix/postscreen[1186]: entering STRESS mode with
90 connections
Dec 10 09:42:53 mx10 postfix/postscreen[1186]: leaving STRESS mode with
70 connections
Grepping for "has reached its process limit" in the same maillog returns
no lines.
I've removed the database, reloaded postfix, and a test mail to this
non-existent address has correctly detected and honored the
undeliverable status and I got the NOQUEUE: reject line that I expected.
So it was a corrupted database, which means that (the information
you read) differs from (the information you wrote).
In this particular example I'd now like to think the "culprit" was
optimistic caching for positive results which I had failed to account
for because I knew that this particular mail account had been disabled
months ago.
Unbeknownst to me, it was "resurrected" in November for a short amount
of time, and that's when the "good" result correctly entered the cache
and it would've stayed there presumably until December 22/23 if I hadn't
removed the entire database today.
- Patrick