(Well, that was embarrassing! I had a Spamassassin milter on outbound mail that tagged my own message as a false positive. Sending it again for readability. Sorry!)
I'm seeing very occasional "Insufficient system storage" errors in my /var/log/mail.log, but it's not the usual lack of disk space. Here is an excerpt from /var/log/mail.log: Oct 14 20:07:10 smtp2 postfix/smtpd[473668]: connect from m144-mta1-lv.wordfly.com[52.124.61.163] Oct 14 20:07:13 smtp2 postfix/smtpd[473668]: 01F6F1FE18: client=m144-mta1-lv.wordfly.com[52.124.61.163] Oct 14 20:07:13 smtp2 opendkim[28076]: 01F6F1FE18: s=wordfly02 d=wordfly.com a=rsa-sha256 SSL Oct 14 20:07:16 smtp2 postfix/cleanup[473260]: 45CF420090: message-id=<80fc24fc-0e93-11eb-bca1-redac...@facebookmail.com> Oct 14 20:07:16 smtp2 opendkim[28076]: 45CF420090: s=s1024-2013-q3 d=facebookmail.com a=rsa-sha256 SSL Oct 14 20:07:25 smtp2 postfix/smtpd[473668]: disconnect from m144-mta1-lv.wordfly.com[52.124.61.163] ehlo=2 starttls=1 mail=1 rcpt=1 data=1 quit=1 commands=7 Oct 14 20:07:26 smtp2 postfix/local[474024]: 45CF420090: to=<moredac...@ece.ubc.ca>, relay=local, delay=10, delays=10/0/0/0.01, dsn=2.0.0, status=sent (forwarded as 541A820093) Oct 14 20:07:30 smtp2 postfix/smtpd[473637]: connect from pmta205.journal-oncology.info[191.101.211.207] Oct 14 20:07:30 smtp2 postfix/smtpd[473668]: connect from m144-mta1-lv.wordfly.com[52.124.61.163] Oct 14 20:07:30 smtp2 postfix/smtpd[473668]: NOQUEUE: reject: MAIL from m144-mta1-lv.wordfly.com[52.124.61.163]: 452 4.3.1 Insufficient system storage; proto=ESMTP helo=<m144-mta1-lv.wordfly.com> Oct 14 20:07:30 smtp2 postfix/smtpd[473668]: warning: not enough free space in mail queue: 0 bytes < 1.5*message size limit Oct 14 20:07:30 smtp2 postfix/smtpd[473637]: NOQUEUE: reject: MAIL from pmta205.journal-oncology.info[191.101.211.207]: 452 4.3.1 Insufficient system storage; proto=ESMTP helo=<pmta205.journal-oncology.info> Oct 14 20:07:30 smtp2 postfix/smtpd[473637]: warning: not enough free space in mail queue: 0 bytes < 1.5*message size limit Oct 14 20:07:30 smtp2 postfix/cleanup[473260]: 9F3F120331: message-id=<20201015030730.9f3f120...@smtp2.ece.ubc.ca> Oct 14 20:07:30 smtp2 postfix/cleanup[473260]: warning: 9F3F120331: write queue file: No space left on device Oct 14 20:07:30 smtp2 postfix/smtpd[473637]: disconnect from pmta205.journal-oncology.info[191.101.211.207] ehlo=2 starttls=1 mail=0/1 quit=1 commands=4/5 Oct 14 20:07:35 smtp2 postfix/qmgr[407832]: 501A91FB53: from=<double-bou...@ece.ubc.ca>, size=1302, nrcpt=1 (queue active) Oct 14 20:07:35 smtp2 postfix/smtpd[473668]: disconnect from m144-mta1-lv.wordfly.com[52.124.61.163] ehlo=2 starttls=1 mail=0/1 quit=1 commands=4/5 Oct 14 20:07:36 smtp2 postfix/smtp[474095]: 541A820093: to=<mnredac...@gmail.com>, orig_to=<moredac...@ece.ubc.ca>, relay=none, delay=10, delays=0/0.02/10/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service error for name=gmail.com type=MX: Host not found, try again) Oct 14 20:07:45 smtp2 postfix/local[473993]: 01F6F1FE18: to=<heredac...@ece.ubc.ca>, relay=local, delay=36, delays=9.8/0/0/26, dsn=2.0.0, status=sent (delivered to command: /usr/bin/procmail -p) Oct 14 20:07:45 smtp2 postfix/smtpd[473637]: connect from unknown[45.193.210.121] Oct 14 20:07:46 smtp2 postfix/smtpd[473637]: NOQUEUE: reject: RCPT from unknown[45.193.210.121]: 450 4.7.25 Client host rejected: cannot find your hostname, [45.193.210.121]; from=<bounce-108980-1597160-105716-redac...@galasone.com> to=<cgredac...@ece.ubc.ca> proto=ESMTP helo=<galasone.top> Oct 14 20:07:46 smtp2 postfix/smtpd[473637]: disconnect from unknown[45.193.210.121] ehlo=1 mail=1 rcpt=0/1 data=0/1 quit=1 commands=3/5 Observations ============ 0. Postfix 3.4.13 on Ubuntu 20.04 (Linux kernel 5.4.0-48-generic) in a vSphere virtual machine. Also previously observed on Ubuntu 18.04, but not 16.04. 1. This happens very infrequently, but seemingly randomly. For a rough sense of scale, I count 17500 instances of "delivered" in mail.log, but only 16 instances of "Insufficient system storage" in the same mail.log. 2. When it happens, it may happen to more than one process at around the same time. The excerpt above shows two such complaints from two different smtpd processes, which is too tight of a cluster to be a coincidence, given the rarity. The instance before that was at Oct 14 14:50:30. And it has been fine for at least two hours since then. 3. Sometimes, these errors result in a double-bounce notification to postmaster. Here is what the double-bounce notification generated at 20:07:35 says: ===== BEGIN double-bounce BODY 1 ===== Transcript of session follows. Out: 220 mx.ece.ubc.ca ESMTP Postfix In: EHLO m144-mta1-lv.wordfly.com Out: 250-smtp2.ece.ubc.ca Out: 250-PIPELINING Out: 250-SIZE 52428800 Out: 250-ETRN Out: 250-STARTTLS Out: 250-ENHANCEDSTATUSCODES Out: 250-8BITMIME Out: 250-DSN Out: 250-SMTPUTF8 Out: 250 CHUNKING In: STARTTLS Out: 220 2.0.0 Ready to start TLS In: EHLO m144-mta1-lv.wordfly.com Out: 250-smtp2.ece.ubc.ca Out: 250-PIPELINING Out: 250-SIZE 52428800 Out: 250-ETRN Out: 250-AUTH PLAIN LOGIN Out: 250-AUTH=PLAIN LOGIN Out: 250-ENHANCEDSTATUSCODES Out: 250-8BITMIME Out: 250-DSN Out: 250-SMTPUTF8 Out: 250 CHUNKING In: MAIL FROM:<redac...@return.wordfly.com> Out: 452 4.3.1 Insufficient system storage In: QUIT Out: 221 2.0.0 Bye For other details, see the local mail logfile ===== END double-bounce BODY 1 ===== But it can also happen at other phases of the SMTP transaction: ===== BEGIN double-bounce BODY 2 ===== Transcript of session follows. Out: 220 mx.ece.ubc.ca ESMTP Postfix In: HELO lampe16.ece.ubc.ca Out: 250 smtp2.ece.ubc.ca In: MAIL FROM: <cfeng...@lampe16.ece.ubc.ca> Out: 452 4.3.1 Insufficient system storage Session aborted, reason: lost connection For other details, see the local mail logfile ===== BEGIN double-bounce BODY 2 ===== What I have tried ================= root@smtp2:~# df -h /var Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg-var 7.8G 4.7G 2.7G 64% /var root@smtp2:~# df -i /var Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/vg-var 518144 7638 510506 2% /var root@smtp2:~# postconf message_size_limit queue_minfree message_size_limit = 52428800 queue_minfree = 0 I've turned on debug_peer_list, and managed to get some verbose logging for the lampe16.ece.ubc.ca example above: Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: ...[SNIP]... Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: milter8_mail_event: milter unix:/clamav/clamav-milter.ctl: mail <cfeng...@lampe16.ece.ubc.ca> Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: event: SMFIC_MAIL; macros: {mail_addr}=cfeng...@lampe16.ece.ubc.ca {mail_host}=smtp2.ece.ubc.ca{mail_mailer}=local Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: reply: SMFIR_CONTINUE data 0 bytes Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: smtpd_check_rewrite: trying: permit_inet_interfaces Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: permit_inet_interfaces: lampe16.ece.ubc.ca 137.82.57.92 Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: fsspace: .: block size 4096, blocks free 0 Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: smtpd_check_queue: blocks 4096 avail 0 min_free 0 msg_size_limit 52428800 Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: NOQUEUE: reject: MAIL from lampe16.ece.ubc.ca[137.82.57.92]: 452 4.3.1 Insufficient system storage; proto=SMTP helo=<lampe16.ece.ubc.ca> Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: warning: not enough free space in mail queue: 0 bytes < 1.5*message size limit Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: abort all milters Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: milter8_abort: abort milter inet:localhost:8892 Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: milter8_abort: abort milter unix:/spamass/spamass.sock Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: milter8_abort: abort milter unix:/clamav/clamav-milter.ctl Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: > lampe16.ece.ubc.ca[137.82.57.92]: 452 4.3.1 Insufficient system storage Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: watchdog_pat: 0x56190ce2a700 Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: smtp_get: EOF Oct 14 13:03:00 smtp2 postfix/smtpd[433514]: ...[SNIP]... The error message is coming from smtpd_check_queue() in src/smtpd/smtpd_check.c. The disk space check uses fsspace() in src/util/fsspace.c, which on Linux calls statfs(2). The code confirms that when it complains of insufficient system storage, it means disk blocks, not inodes, memory, shared memory, or anything else. Which filesystem is it testing when it does fsspace(".", &fsbuf)? If I look at a typical smtpd process, /proc/$pid/cwd is a symlink to /var/spool/postfix, as I would expect, so it should be testing my /var filesystem. (I don't have separate filesystems under /var such as /var/log, /var/spool, or anything.) Perhaps smtpd is doing a sneaky momentary chdir(2) to the wrong place before calling fsspace()? I grepped the entire Postfix source code, and it's always either chdir(var_queue_dir), or chdir(var_daemon_dir), or chdir(var_command_dir), or chdir("/") in a chrooted context — all of which should be in the same /var filesystem. The exception is chdir(args.cwd) in src/global/pipe_command.c, but surely that's irrelevant? Anyway, as I noted, it affects multiple processes simultaneously. Perhaps there is really something that is momentarily putting huge files in my /var and immediately deleting them? I monitored that too: root@smtp2:~# tail -f /var/log/mail.log | while read -r line ; do case "$line" in *Insufficient*) date ; df --block-size=4K /var ; esac ; done Wed 14 Oct 2020 08:07:30 PM PDT Filesystem 4K-blocks Used Available Use% Mounted on /dev/mapper/vg-var 2020590 1191809 721210 63% /var Wed 14 Oct 2020 08:07:30 PM PDT Filesystem 4K-blocks Used Available Use% Mounted on /dev/mapper/vg-var 2020590 1191809 721210 63% /var There's nothing interesting in /var/log/messages during the times that this happens. Whenever it happens, it's always complaining that there are ZERO bytes free, not some small number of bytes free. Besides, my /var/log/mail.log is on the same filesystem, and as far as I can tell, my syslog is working fine. So, at this point, I'm stumped. Any ideas? Dara Poon IT Services, Department of Electrical and Computer Engineering, University of British Columbia