Thanks very much Victor for your time, it is much appreciated.
Before my responses to your questions, I believe I've narrowed this down
(with the help of Mark Sapiro on the mailman list). It only happens when
multiple messages are simultaneously sent invoking postfix/local - which
according to Mark is what happens when a message is sent to a mailman
list that has multiple members that are lists. Real email addresses are
delivered by virtual, but the list addresses generate a new message
using local.
I've confirmed that individual messages invoking postfix/local also work
fine, (ie, individual emails from cron jobs (8 from last night and this
morning), etc):
2013-06-09T03:13:30-04:00 moria postfix/local[9769]: 42C22AB25C7:
to=<r...@smtp.media-brokers.com>, orig_to=<root>, relay=local,
delay=0.18, delays=0.07/0/0/0.11, dsn=2.0.0, status=sent (forwarded as
51836AB1A82)
To recap the warnings I'm getting when the problem occurs...
The postfix/master warning only shows up rarely - 11 times since I got
the system back up, and within 5 or 10 minutes (but usually with 5 or 10
seconds) of postfix being restarted:
postfix/master[6406]: warning: master_wakeup_timer_event: service
tlsmgr(private/tlsmgr): Resource temporarily unavailable
Then these (never accompanied by the postfix/master warning that I have
seen) whenever I try to send to the lists-all list (easiest way to test):
warning: connect to transport private/local: Resource temporarily
unavailable
warning: connect to transport private/retry: Resource temporarily
unavailable
Ok, now to your questions...
On 2013-06-08 6:13 PM, Viktor Dukhovni <postfix-us...@dukhovni.org> wrote:
Make sure all databases (built via postalias, ...) are in
good working order. Rebuild them all from the plaintext
source files.
I ran newaliases (I've never used the postalias command directly), and
checked /etc/mail/aliases and everything looks normal there, restarted
postfix, but the problem persists.
My primary userdb is in mysql, but haven't had any issues with it - and
then of course the mailman aliases. So, I may be wrong but I don't think
I have any databases built by postalias - unless there are some
internal/default ones that are created/used by postfix at install time?
Also, to confirm - you are saying that userdb problems/corruption can
cause these warnings? That will be good info to know for future reference.
Use netstat/lsof to find processes listening on or connected to
various sockets in /var/spool/postfix/private, which sockets have
the most processes?
Hmmm... maybe significant? netstat lsof | grep /var only shows one (at
least when I ran the command):
unix 3 [ ] STREAM CONNECTED 148439
/var/run/mysqld/mysqld.sock
unix 3 [ ] STREAM CONNECTED 148408
/var/run/mysqld/mysqld.sock
unix 3 [ ] STREAM CONNECTED 148407
/var/run/mysqld/mysqld.sock
unix 3 [ ] STREAM CONNECTED 149106
/var/spool/postfix/private/auth
unix 3 [ ] STREAM CONNECTED 148344
/var/run/mysqld/mysqld.sock
unix 3 [ ] STREAM CONNECTED 148345
/var/run/dovecot/auth-worker
unix 3 [ ] STREAM CONNECTED 149040
/var/run/dovecot/login/login
unix 3 [ ] STREAM CONNECTED 148583
/var/run/dovecot/anvil-auth-penalty
unix 3 [ ] STREAM CONNECTED 33348
/var/run/dovecot/anvil
unix 3 [ ] STREAM CONNECTED 33797
/var/run/dovecot/config
Running the command with one of the messages stuck in the queue shows
the same output.
Relevant things to check:
- Kernel resource limits, perhaps you need to run fewer processes.
First, remember, this system had been running fine for years with the
current settings and kernel.
But I did try changing the default_process_limit to 10 with no help.
- Which program appears the largest number of times in "ps" output.
root, then postfix (but dovecot runs as postfix user, and most of the
'postfix' processes are dovecot/imap):
# ps aux|grep postfix
postfix 3619 0.0 0.1 25228 2876 ? S 05:43 0:00
dovecot/imap
postfix 4074 0.0 0.1 25220 2692 ? S 07:13 0:00
dovecot/imap
postfix 4177 0.0 0.1 26324 3940 ? S 07:26 0:00
dovecot/imap
postfix 4227 0.0 0.1 25600 3244 ? S 07:29 0:00
dovecot/imap
postfix 4291 0.0 0.2 26616 4128 ? S 07:37 0:00
dovecot/imap
root 4562 0.0 0.0 20588 1144 ? Ss 07:57 0:00
/usr/libexec/postfix/master -w
postfix 4563 0.0 0.1 38236 2088 ? S 07:57 0:00 pickup
-l -t unix -u
postfix 4564 0.0 0.1 38424 2404 ? S 07:57 0:00 qmgr -l
-t unix -u
postfix 4610 0.0 0.1 38244 2180 ? S 08:00 0:00 tlsmgr
-l -t unix -u
postfix 4814 0.0 0.2 27940 5604 ? S 08:32 0:01
dovecot/imap
postfix 4825 0.0 0.3 31768 7428 ? S 08:32 0:00
dovecot/imap
postfix 4932 0.0 0.2 27704 5456 ? S 08:48 0:00
dovecot/imap
postfix 5013 0.0 0.1 38228 2228 ? S 08:58 0:00 anvil
-l -t unix -u
postfix 5023 0.0 0.1 25180 2608 ? S 08:58 0:00
dovecot/imap
postfix 5026 0.0 0.1 26580 3672 ? S 08:59 0:00
dovecot/imap
postfix 5039 0.0 0.2 28668 6012 ? S 08:59 0:00
dovecot/imap
postfix 5074 0.0 0.1 25180 2640 ? S 09:01 0:00
dovecot/imap
postfix 5103 0.0 0.1 25220 2908 ? S 09:07 0:00
dovecot/imap
postfix 5115 0.0 0.1 25924 3252 ? S 09:08 0:00
dovecot/imap
postfix 5127 0.0 0.1 25440 3032 ? S 09:09 0:00
dovecot/imap
postfix 5128 0.0 0.1 25928 3276 ? S 09:09 0:00
dovecot/imap
postfix 5130 0.0 0.1 25928 3276 ? S 09:09 0:00
dovecot/imap
postfix 5161 0.0 0.1 25180 2608 ? S 09:13 0:00
dovecot/imap
postfix 5162 0.0 0.1 26580 3664 ? S 09:13 0:00
dovecot/imap
postfix 5169 0.1 0.4 32628 9932 ? S 09:15 0:00
dovecot/imap
postfix 5172 0.0 0.1 25180 2668 ? S 09:15 0:00
dovecot/imap
postfix 5174 0.0 0.1 25388 2964 ? S 09:16 0:00
dovecot/imap
postfix 5176 0.0 0.1 25956 3340 ? S 09:16 0:00
dovecot/imap
postfix 5177 0.0 0.1 25232 2740 ? S 09:16 0:00
dovecot/imap
postfix 5193 0.0 0.2 26736 4360 ? S 09:17 0:00
dovecot/imap
postfix 5194 0.0 0.1 25180 2596 ? S 09:17 0:00
dovecot/imap
postfix 5200 0.1 0.3 31756 8192 ? S 09:19 0:00
dovecot/imap
postfix 5201 0.0 0.1 25232 2748 ? S 09:19 0:00
dovecot/imap
root 5220 0.0 0.0 8568 868 pts/1 R+ 09:22 0:00 grep
--colour=auto postfix
postfix 6578 0.0 0.1 25772 3436 ? S Jun08 0:00
dovecot/imap
postfix 6579 0.0 0.1 26060 3556 ? S Jun08 0:00
dovecot/imap
postfix 6588 0.0 0.2 27664 5252 ? S Jun08 0:02
dovecot/imap
postfix 6589 0.0 0.1 25224 2584 ? S Jun08 0:00
dovecot/imap
postfix 6613 0.0 0.4 31964 9680 ? S Jun08 0:04
dovecot/imap
postfix 6614 0.0 0.2 27152 4584 ? S Jun08 0:01
dovecot/imap
postfix 6634 0.0 0.2 26928 4480 ? S Jun08 0:01
dovecot/imap
postfix 6655 0.0 0.1 25224 2932 ? S Jun08 0:00
dovecot/imap
postfix 6657 0.0 0.1 25480 3112 ? S Jun08 0:00
dovecot/imap
postfix 6683 0.0 0.1 25324 3116 ? S Jun08 0:00
dovecot/imap
postfix 6700 0.0 0.1 26172 3684 ? S Jun08 0:00
dovecot/imap
postfix 6711 0.0 0.2 28320 5700 ? S Jun08 0:01
dovecot/imap
postfix 6712 0.0 0.1 25384 2984 ? S Jun08 0:00
dovecot/imap
postfix 6714 0.0 0.1 26028 3644 ? S Jun08 0:00
dovecot/imap
postfix 6716 0.0 0.1 25224 2904 ? S Jun08 0:00
dovecot/imap
postfix 6847 0.0 0.1 25468 3084 ? S Jun08 0:00
dovecot/imap
postfix 6850 0.0 0.1 25388 2972 ? S Jun08 0:00
dovecot/imap
postfix 6851 0.0 0.1 25224 2900 ? S Jun08 0:00
dovecot/imap
postfix 6990 0.0 0.1 25588 3296 ? S Jun08 0:00
dovecot/imap
postfix 8731 0.0 0.1 25228 2872 ? S Jun08 0:00
dovecot/imap
postfix 8734 0.0 0.1 25228 2872 ? S Jun08 0:00
dovecot/imap
postfix 9037 0.0 0.1 25324 2988 ? S 00:23 0:00
dovecot/imap
postfix 9559 0.0 0.1 25228 2808 ? S 02:44 0:00
dovecot/imap
- Is there a lot of mail in the deferred queue?
None.
- Are you generating postmaster notices that may make the problem
worse?
When the problem happens, some postmaster notices are generated, but it
is only for those messages...
How many local(8) processes are running?
I don't see any in the list above?
Is there any logging from postfix/local? ... Is there any logging by local
other than successful deliveries (if any).
Yes - successful deliveries fir local are logged (example above), and
then the warnings for local about 'Resource temporarily unavailable'...
Is local(8) able to read the passwd file,
# sudo -u postfix less /etc/passwd
and it showed me the contents of the file and everything looked fine.
But... does that mean local(8) can read it?
or are that file's permissions broken?
# ls -al /etc | grep passwd
-rw-r--r-- 1 root root 2973 Jun 4 2012 passwd
But could they still be broken?
Run (id -nu) as "postfix" does it work?
Looks like it:
# sudo -u postfix id -nu
postfix
I do back things up multiple times throughout the day to a different
machine, so I can restore anything that you think might fix things from
the last back before the crash...
Thanks again Victor... sorry this is so long, but I wanted to answer all
of your questions with as much detail as possible...
Charles