On 15/10/2020 09:44, Dara Poon wrote: > (Well, that was embarrassing! I had a Spamassassin milter on outbound mail > that tagged my own message as a false positive. Sending it again for > readability. Sorry!) > > > > I'm seeing very occasional "Insufficient system storage" errors in my > /var/log/mail.log, but it's not the usual lack of disk space. Here is an > excerpt from /var/log/mail.log: > > ... > > > The error message is coming from smtpd_check_queue() in > src/smtpd/smtpd_check.c. The disk space check uses fsspace() in > src/util/fsspace.c, which on Linux calls statfs(2). The code confirms that > when it complains of insufficient system storage, it means disk blocks, not > inodes, memory, shared memory, or anything else. > > > Which filesystem is it testing when it does fsspace(".", &fsbuf)? If I look > at a typical smtpd process, /proc/$pid/cwd is a symlink to > /var/spool/postfix, as I would expect, so it should be testing my /var > filesystem. (I don't have separate filesystems under /var such as /var/log, > /var/spool, or anything.) > > Perhaps smtpd is doing a sneaky momentary chdir(2) to the wrong place before > calling fsspace()? I grepped the entire Postfix source code, and it's always > either chdir(var_queue_dir), or chdir(var_daemon_dir), or > chdir(var_command_dir), or chdir("/") in a chrooted context — all of which > should be in the same /var filesystem. The exception is chdir(args.cwd) in > src/global/pipe_command.c, but surely that's irrelevant? Anyway, as I noted, > it affects multiple processes simultaneously. > > > Perhaps there is really something that is momentarily putting huge files in > my /var and immediately deleting them? I monitored that too: > > root@smtp2:~# tail -f /var/log/mail.log | while read -r line ; do case > "$line" in *Insufficient*) date ; df --block-size=4K /var ; esac ; done > Wed 14 Oct 2020 08:07:30 PM PDT > Filesystem 4K-blocks Used Available Use% Mounted on > /dev/mapper/vg-var 2020590 1191809 721210 63% /var > Wed 14 Oct 2020 08:07:30 PM PDT > Filesystem 4K-blocks Used Available Use% Mounted on > /dev/mapper/vg-var 2020590 1191809 721210 63% /var > > > There's nothing interesting in /var/log/messages during the times that this > happens. > > Whenever it happens, it's always complaining that there are ZERO bytes free, > not some small number of bytes free. Besides, my /var/log/mail.log is on the > same filesystem, and as far as I can tell, my syslog is working fine. > > So, at this point, I'm stumped. Any ideas? > > Dara Poon > IT Services, Department of Electrical and Computer Engineering, University of > British Columbia
Dara Sounds like a difficult probem. I don't see where postfix could be chrooting to some different directory for smptd process other than the configured queue directory. Do you really have any other filesystem at zero space available that could lend weight to the wrong file system hypothesis? Also this code has not changed in a long time. The fact that the message prints out 4096 as the block size indicates that info is being returned from the statfs call. I don't think there can be any other conclusion than that the statfs is indeed returning 0 for the bavail field, probably an issue that postfix could do little about. One thing you might try instead of using df to debug it, is to use the postfix test program fsspace, which is not compiled by default (or equivalent c program) which make the same statfs call as postfix on the queue directory in normal conditions and when the error is occuring. John