Occasional transient "Insufficient system storage" errors

Dara Poon Thu, 15 Oct 2020 00:46:29 -0700

(Well, that was embarrassing!  I had a Spamassassin milter on outbound mail 
that tagged my own message as a false positive.  Sending it again for 
readability.  Sorry!)




I'm seeing very occasional "Insufficient system storage" errors in my 
/var/log/mail.log, but it's not the usual lack of disk space.  Here is an 
excerpt from /var/log/mail.log:

Oct 14 20:07:10 smtp2 postfix/smtpd[473668]: connect from 
m144-mta1-lv.wordfly.com[52.124.61.163]
Oct 14 20:07:13 smtp2 postfix/smtpd[473668]: 01F6F1FE18: 
client=m144-mta1-lv.wordfly.com[52.124.61.163]
Oct 14 20:07:13 smtp2 opendkim[28076]: 01F6F1FE18: s=wordfly02 d=wordfly.com 
a=rsa-sha256 SSL
Oct 14 20:07:16 smtp2 postfix/cleanup[473260]: 45CF420090: 
message-id=<80fc24fc-0e93-11eb-bca1-redac...@facebookmail.com>
Oct 14 20:07:16 smtp2 opendkim[28076]: 45CF420090: s=s1024-2013-q3 
d=facebookmail.com a=rsa-sha256 SSL
Oct 14 20:07:25 smtp2 postfix/smtpd[473668]: disconnect from 
m144-mta1-lv.wordfly.com[52.124.61.163] ehlo=2 starttls=1 mail=1 rcpt=1 data=1 
quit=1 commands=7
Oct 14 20:07:26 smtp2 postfix/local[474024]: 45CF420090: 
to=<moredac...@ece.ubc.ca>, relay=local, delay=10, delays=10/0/0/0.01, 
dsn=2.0.0, status=sent (forwarded as 541A820093)
Oct 14 20:07:30 smtp2 postfix/smtpd[473637]: connect from 
pmta205.journal-oncology.info[191.101.211.207]
Oct 14 20:07:30 smtp2 postfix/smtpd[473668]: connect from 
m144-mta1-lv.wordfly.com[52.124.61.163]
Oct 14 20:07:30 smtp2 postfix/smtpd[473668]: NOQUEUE: reject: MAIL from 
m144-mta1-lv.wordfly.com[52.124.61.163]: 452 4.3.1 Insufficient system storage; 
proto=ESMTP helo=<m144-mta1-lv.wordfly.com>
Oct 14 20:07:30 smtp2 postfix/smtpd[473668]: warning: not enough free space in 
mail queue: 0 bytes < 1.5*message size limit 
Oct 14 20:07:30 smtp2 postfix/smtpd[473637]: NOQUEUE: reject: MAIL from 
pmta205.journal-oncology.info[191.101.211.207]: 452 4.3.1 Insufficient system 
storage; proto=ESMTP helo=<pmta205.journal-oncology.info>
Oct 14 20:07:30 smtp2 postfix/smtpd[473637]: warning: not enough free space in 
mail queue: 0 bytes < 1.5*message size limit
Oct 14 20:07:30 smtp2 postfix/cleanup[473260]: 9F3F120331: 
message-id=<20201015030730.9f3f120...@smtp2.ece.ubc.ca>
Oct 14 20:07:30 smtp2 postfix/cleanup[473260]: warning: 9F3F120331: write queue 
file: No space left on device
Oct 14 20:07:30 smtp2 postfix/smtpd[473637]: disconnect from 
pmta205.journal-oncology.info[191.101.211.207] ehlo=2 starttls=1 mail=0/1 
quit=1 commands=4/5
Oct 14 20:07:35 smtp2 postfix/qmgr[407832]: 501A91FB53: 
from=<double-bou...@ece.ubc.ca>, size=1302, nrcpt=1 (queue active)
Oct 14 20:07:35 smtp2 postfix/smtpd[473668]: disconnect from 
m144-mta1-lv.wordfly.com[52.124.61.163] ehlo=2 starttls=1 mail=0/1 quit=1 
commands=4/5
Oct 14 20:07:36 smtp2 postfix/smtp[474095]: 541A820093: 
to=<mnredac...@gmail.com>, orig_to=<moredac...@ece.ubc.ca>, relay=none, 
delay=10, delays=0/0.02/10/0, dsn=4.4.3, status=deferred (Host or domain name 
not found. Name service error for name=gmail.com type=MX: Host not found, try 
again)
Oct 14 20:07:45 smtp2 postfix/local[473993]: 01F6F1FE18: 
to=<heredac...@ece.ubc.ca>, relay=local, delay=36, delays=9.8/0/0/26, 
dsn=2.0.0, status=sent (delivered to command: /usr/bin/procmail -p)
Oct 14 20:07:45 smtp2 postfix/smtpd[473637]: connect from 
unknown[45.193.210.121]
Oct 14 20:07:46 smtp2 postfix/smtpd[473637]: NOQUEUE: reject: RCPT from 
unknown[45.193.210.121]: 450 4.7.25 Client host rejected: cannot find your 
hostname, [45.193.210.121]; 
from=<bounce-108980-1597160-105716-redac...@galasone.com> 
to=<cgredac...@ece.ubc.ca> proto=ESMTP helo=<galasone.top>
Oct 14 20:07:46 smtp2 postfix/smtpd[473637]: disconnect from 
unknown[45.193.210.121] ehlo=1 mail=1 rcpt=0/1 data=0/1 quit=1 commands=3/5


Observations
============

0. Postfix 3.4.13 on Ubuntu 20.04 (Linux kernel 5.4.0-48-generic) in a vSphere 
virtual machine.  Also previously observed on Ubuntu 18.04, but not 16.04.

1. This happens very infrequently, but seemingly randomly.  For a rough sense 
of scale, I count 17500 instances of "delivered" in mail.log, but only 16 
instances of "Insufficient system storage" in the same mail.log.

2. When it happens, it may happen to more than one process at around the same 
time.  The excerpt above shows two such complaints from two different smtpd 
processes, which is too tight of a cluster to be a coincidence, given the 
rarity.  The instance before that was at Oct 14 14:50:30.  And it has been fine 
for at least two hours since then.

3. Sometimes, these errors result in a double-bounce notification to 
postmaster.  Here is what the double-bounce notification generated at 20:07:35 
says:

===== BEGIN double-bounce BODY 1 =====
Transcript of session follows.

Out: 220 mx.ece.ubc.ca ESMTP Postfix
In:  EHLO m144-mta1-lv.wordfly.com
Out: 250-smtp2.ece.ubc.ca
Out: 250-PIPELINING
Out: 250-SIZE 52428800
Out: 250-ETRN
Out: 250-STARTTLS
Out: 250-ENHANCEDSTATUSCODES
Out: 250-8BITMIME
Out: 250-DSN
Out: 250-SMTPUTF8
Out: 250 CHUNKING
In:  STARTTLS
Out: 220 2.0.0 Ready to start TLS
In:  EHLO m144-mta1-lv.wordfly.com
Out: 250-smtp2.ece.ubc.ca
Out: 250-PIPELINING
Out: 250-SIZE 52428800
Out: 250-ETRN
Out: 250-AUTH PLAIN LOGIN
Out: 250-AUTH=PLAIN LOGIN
Out: 250-ENHANCEDSTATUSCODES
Out: 250-8BITMIME
Out: 250-DSN
Out: 250-SMTPUTF8
Out: 250 CHUNKING
In:  MAIL FROM:<redac...@return.wordfly.com>
Out: 452 4.3.1 Insufficient system storage
In:  QUIT
Out: 221 2.0.0 Bye


For other details, see the local mail logfile
===== END double-bounce BODY 1 =====

But it can also happen at other phases of the SMTP transaction:

===== BEGIN double-bounce BODY 2 =====
Transcript of session follows.

Out: 220 mx.ece.ubc.ca ESMTP Postfix
In:  HELO lampe16.ece.ubc.ca
Out: 250 smtp2.ece.ubc.ca
In:  MAIL FROM: <cfeng...@lampe16.ece.ubc.ca>
Out: 452 4.3.1 Insufficient system storage

Session aborted, reason: lost connection

For other details, see the local mail logfile
===== BEGIN double-bounce BODY 2 =====


What I have tried
=================

root@smtp2:~# df -h /var
Filesystem          Size  Used Avail Use% Mounted on
/dev/mapper/vg-var  7.8G  4.7G  2.7G  64% /var
root@smtp2:~# df -i /var
Filesystem         Inodes IUsed  IFree IUse% Mounted on
/dev/mapper/vg-var 518144  7638 510506    2% /var
root@smtp2:~# postconf message_size_limit queue_minfree
message_size_limit = 52428800
queue_minfree = 0

I've turned on debug_peer_list, and managed to get some verbose logging for the 
lampe16.ece.ubc.ca example above:

Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: ...[SNIP]...
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: milter8_mail_event: milter 
unix:/clamav/clamav-milter.ctl: mail <cfeng...@lampe16.ece.ubc.ca>
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: event: SMFIC_MAIL; macros: 
{mail_addr}=cfeng...@lampe16.ece.ubc.ca 
{mail_host}=smtp2.ece.ubc.ca{mail_mailer}=local
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: reply: SMFIR_CONTINUE data 0 bytes
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: smtpd_check_rewrite: trying: 
permit_inet_interfaces
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: permit_inet_interfaces: 
lampe16.ece.ubc.ca 137.82.57.92
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: fsspace: .: block size 4096, 
blocks free 0
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: smtpd_check_queue: blocks 4096 
avail 0 min_free 0 msg_size_limit 52428800
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: NOQUEUE: reject: MAIL from 
lampe16.ece.ubc.ca[137.82.57.92]: 452 4.3.1 Insufficient system storage; 
proto=SMTP helo=<lampe16.ece.ubc.ca>
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: warning: not enough free space in 
mail queue: 0 bytes < 1.5*message size limit
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: abort all milters
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: milter8_abort: abort milter 
inet:localhost:8892
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: milter8_abort: abort milter 
unix:/spamass/spamass.sock
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: milter8_abort: abort milter 
unix:/clamav/clamav-milter.ctl
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: > 
lampe16.ece.ubc.ca[137.82.57.92]: 452 4.3.1 Insufficient system storage
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: watchdog_pat: 0x56190ce2a700
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: smtp_get: EOF
Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: ...[SNIP]...


The error message is coming from smtpd_check_queue() in 
src/smtpd/smtpd_check.c.  The disk space check uses fsspace() in 
src/util/fsspace.c, which on Linux calls statfs(2).  The code confirms that 
when it complains of insufficient system storage, it means disk blocks, not 
inodes, memory, shared memory, or anything else.


Which filesystem is it testing when it does fsspace(".", &fsbuf)?  If I look at 
a typical smtpd process, /proc/$pid/cwd is a symlink to /var/spool/postfix, as 
I would expect, so it should be testing my /var filesystem.  (I don't have 
separate filesystems under /var such as /var/log, /var/spool, or anything.)

Perhaps smtpd is doing a sneaky momentary chdir(2) to the wrong place before 
calling fsspace()?  I grepped the entire Postfix source code, and it's always 
either chdir(var_queue_dir), or chdir(var_daemon_dir), or 
chdir(var_command_dir), or chdir("/") in a chrooted context — all of which 
should be in the same /var filesystem.  The exception is chdir(args.cwd) in 
src/global/pipe_command.c, but surely that's irrelevant? Anyway, as I noted, it 
affects multiple processes simultaneously.


Perhaps there is really something that is momentarily putting huge files in my 
/var and immediately deleting them?  I monitored that too:

root@smtp2:~# tail -f /var/log/mail.log | while read -r line ; do case "$line" 
in *Insufficient*) date ; df --block-size=4K /var ; esac ; done 
Wed 14 Oct 2020 08:07:30 PM PDT
Filesystem         4K-blocks    Used Available Use% Mounted on
/dev/mapper/vg-var   2020590 1191809    721210  63% /var
Wed 14 Oct 2020 08:07:30 PM PDT
Filesystem         4K-blocks    Used Available Use% Mounted on
/dev/mapper/vg-var   2020590 1191809    721210  63% /var


There's nothing interesting in /var/log/messages during the times that this 
happens.

Whenever it happens, it's always complaining that there are ZERO bytes free, 
not some small number of bytes free.  Besides, my /var/log/mail.log is on the 
same filesystem, and as far as I can tell, my syslog is working fine.

So, at this point, I'm stumped.  Any ideas?

Dara Poon
IT Services, Department of Electrical and Computer Engineering, University of 
British Columbia

Occasional transient "Insufficient system storage" errors

Reply via email to