Re: Occasional transient "Insufficient system storage" errors

John Fawcett Thu, 15 Oct 2020 05:43:29 -0700

On 15/10/2020 09:44, Dara Poon wrote:
> (Well, that was embarrassing!  I had a Spamassassin milter on outbound mail 
> that tagged my own message as a false positive.  Sending it again for 
> readability.  Sorry!)
>
>
>
> I'm seeing very occasional "Insufficient system storage" errors in my 
> /var/log/mail.log, but it's not the usual lack of disk space.  Here is an 
> excerpt from /var/log/mail.log:
>
> ...
>
>
> The error message is coming from smtpd_check_queue() in 
> src/smtpd/smtpd_check.c.  The disk space check uses fsspace() in 
> src/util/fsspace.c, which on Linux calls statfs(2).  The code confirms that 
> when it complains of insufficient system storage, it means disk blocks, not 
> inodes, memory, shared memory, or anything else.
>
>
> Which filesystem is it testing when it does fsspace(".", &fsbuf)?  If I look 
> at a typical smtpd process, /proc/$pid/cwd is a symlink to 
> /var/spool/postfix, as I would expect, so it should be testing my /var 
> filesystem.  (I don't have separate filesystems under /var such as /var/log, 
> /var/spool, or anything.)
>
> Perhaps smtpd is doing a sneaky momentary chdir(2) to the wrong place before 
> calling fsspace()?  I grepped the entire Postfix source code, and it's always 
> either chdir(var_queue_dir), or chdir(var_daemon_dir), or 
> chdir(var_command_dir), or chdir("/") in a chrooted context — all of which 
> should be in the same /var filesystem.  The exception is chdir(args.cwd) in 
> src/global/pipe_command.c, but surely that's irrelevant? Anyway, as I noted, 
> it affects multiple processes simultaneously.
>
>
> Perhaps there is really something that is momentarily putting huge files in 
> my /var and immediately deleting them?  I monitored that too:
>
> root@smtp2:~# tail -f /var/log/mail.log | while read -r line ; do case 
> "$line" in *Insufficient*) date ; df --block-size=4K /var ; esac ; done 
> Wed 14 Oct 2020 08:07:30 PM PDT
> Filesystem         4K-blocks    Used Available Use% Mounted on
> /dev/mapper/vg-var   2020590 1191809    721210  63% /var
> Wed 14 Oct 2020 08:07:30 PM PDT
> Filesystem         4K-blocks    Used Available Use% Mounted on
> /dev/mapper/vg-var   2020590 1191809    721210  63% /var
>
>
> There's nothing interesting in /var/log/messages during the times that this 
> happens.
>
> Whenever it happens, it's always complaining that there are ZERO bytes free, 
> not some small number of bytes free.  Besides, my /var/log/mail.log is on the 
> same filesystem, and as far as I can tell, my syslog is working fine.
>
> So, at this point, I'm stumped.  Any ideas?
>
> Dara Poon
> IT Services, Department of Electrical and Computer Engineering, University of 
> British Columbia


Dara

Sounds like a difficult probem. I don't see where postfix could be
chrooting to some different directory for smptd process other than the
configured queue directory.  Do you really have any other filesystem at
zero space available that could lend weight to the wrong file system
hypothesis?

Also this code has not changed in a long time. The fact that the message
prints out 4096 as the block size indicates that info is being returned
from the statfs call. I don't think there can be any other conclusion
than that the statfs is indeed returning 0 for the bavail field,
probably an issue that postfix could do little about.

One thing you might try instead of using df to debug it, is to use the
postfix test program fsspace, which is not compiled by default (or
equivalent c program) which make the same statfs call as postfix on the
queue directory in normal conditions and when the error is occuring.

John

Re: Occasional transient "Insufficient system storage" errors

Reply via email to