Thanks to everyone for suggestions about the load issue. I will
endeavor to provide more specific information. It took a while to move
everything off of that server so I could do the load testing with
smtp-source. Let me preface by saying that my real systems
administrator took another job, so I am filling in. Unfortunately, I am
not proficient with commands to analyze disk and memory performance. I
ran iostat, but I'm not sure how to interpret the results.
On 12/9/2010 3:39 PM, Victor Duchovni wrote:
On Thu, Dec 09, 2010 at 02:59:56PM -0500, Dave Brodin wrote:
Old server - Dual 2.8 GHz Xeon processors with 4 GB RAM
New server - Dual 2.8 quad-core processors with 8 GB RAM
Things run fine on old server, but hardware is starting to fail. When we
start postfix on the new server and everything runs fine for about 5-10
minutes as mail starts flooding in. CPU idle time is almost 100% and mail
is being processed just fine. All of sudden, the CPU load starts to rise
quickly, the smptd active processes start consuming large amounts of
processor time, and the active queue starts to grow out of control.
Can you report specific measurements that show that lots of CPU is
consumed in smtpd(8)? Does logging indicate any change in the pattern of
mail coming in or going out? Progress is only possible if you can report
detailed quantitative non-anecdotal measurements.
I ran the following command:
time /usr/local/bin/smtp-source -s 10 -l 10120 -m 500 -c -f
t...@bluemarble.net -t dbro...@bluemarble.net localhost:25
And got the following output at the end:
real 0m58.261s
user 0m0.055s
sys 0m0.126s
During the test, CPU went from 100% idle to 0.0% idle (see top sample
below):
last pid: 3429; load averages: 1.85, 0.44, 0.16 up 6+04:33:18
13:17:44
84 processes: 13 running, 71 sleeping
CPU: 1.9% user, 0.0% nice, 98.1% system, 0.0% interrupt, 0.0% idle
Mem: 171M Active, 6548M Inact, 842M Wired, 246M Cache, 827M Buf, 104M Free
Swap: 4096M Total, 60K Used, 4096M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
COMMAND
3350 postfix 1 100 0 37580K 5572K RUN 4 0:08 25.32%
smtpd
3353 postfix 1 100 0 37580K 5572K RUN 7 0:08 25.31%
smtpd
3351 postfix 1 20 0 37580K 5572K lockf 3 0:07 23.97%
smtpd
3354 postfix 1 99 0 37580K 5572K CPU1 7 0:07 23.97%
smtpd
3357 postfix 1 99 0 37580K 5572K CPU4 6 0:07 23.65%
smtpd
3371 postfix 1 100 0 37580K 5572K CPU5 3 0:06 23.26%
smtpd
3368 postfix 1 100 0 37580K 5572K CPU6 1 0:06 22.99%
smtpd
3367 postfix 1 4 0 37580K 5572K kqread 4 0:06 22.54%
smtpd
3359 postfix 1 99 0 37580K 5572K CPU0 0 0:06 22.25%
smtpd
3363 postfix 1 100 0 37580K 5572K RUN 1 0:06 22.22%
smtpd
3360 postfix 1 99 0 37580K 5572K RUN 1 0:07 21.87%
smtpd
3375 postfix 1 100 0 37580K 5572K RUN 7 0:05 20.59%
smtpd
3425 postfix 1 20 0 16052K 3440K lockf 0 0:00 15.89%
cleanup
3374 postfix 1 20 0 16052K 3440K lockf 0 0:00 0.20%
cleanup
3365 postfix 1 4 0 16052K 3440K kqread 1 0:00 0.13%
cleanup
3364 postfix 1 4 0 16048K 3348K kqread 4 0:00 0.06%
trivial
5225 mysql 9 20 0 121M 57784K sigwai 1 1:45 0.00%
mysqld
This is the same thing that happens under real e-mail conditions.
Here's what iostat does during that same run with updates every 10
seconds. It took 40-60 seconds after completion of the test for the
mail to finish delivering.
[root] iostat 10
tty mfid0 mfid1 cpu
tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id
0 119 28.15 7 0.20 95.44 6 0.58 57 0 0 0 43
0 18 6.67 0 0.00 0.00 0 0.00 0 0 0 0 100
0 6 0.00 0 0.00 0.00 0 0.00 0 0 0 0 100
0 39 12.98 32 0.40 12.89 1 0.01 2 0 81 0 16
0 38 12.55 31 0.38 0.00 0 0.00 2 0 98 0 0
0 41 12.93 31 0.39 40.91 1 0.04 2 0 98 0 0
0 40 13.27 35 0.45 32.00 0 0.02 2 0 98 0 0
0 42 12.79 31 0.39 16.00 12 0.19 2 0 98 0 0
0 46 12.55 29 0.36 40.17 1 0.05 2 0 90 0 8
0 6 16.86 7 0.12 43.78 2 0.08 3 0 22 0 75
0 6 19.33 1 0.01 0.00 0 0.00 3 0 22 0 75
0 6 0.00 0 0.00 85.40 3 0.25 3 0 22 0 75
0 6 15.85 8 0.12 19.82 13 0.25 3 0 23 0 74
0 276 6.67 0 0.00 0.00 0 0.00 2 0 12 0 86
0 6 0.00 0 0.00 16.00 0 0.00 0 0 0 0 100
Not sure if that is going to be helpful. I'm trying to read through
some unix books to get more up to speed on troubleshooting i/o issues.
More responses to questions below.
alias_database = hash:/usr/local/etc/postfix/aliases,
hash:/usr/local/etc/postfix/aliases.majordomo
alias_maps = hash:/usr/local/etc/postfix/aliases,
hash:/usr/local/etc/postfix/aliases.majordomo
You could try "cdb", instead of "hash". Which version of Berkeley DB
is this?
http://www.postfix.org/CDB_README.html
We are using Berkeley DB 4.7.25.4.
bounce_queue_lifetime = 1d
A bit aggressive.
default_process_limit = 500
local_recipient_maps = $alias_maps unix:passwd.byname
How bit is the "passwd" file? What's in nsswitch.conf or
equivalent for "passwd"?
There are 18,983 entries in the passwd file.
This is my nsswitch.com:
group: compat
group_compat: nis
hosts: files dns
networks: files
passwd: compat
passwd_compat: nis
shells: files
services: compat
services_compat: nis
protocols: files
rpc: files
maximal_queue_lifetime = 2d
A bit aggressive.
minimal_backoff_time = 30m
A bad idea, the default setting scales better.
mydestination = $myhostname, localhost.$mydomain, $mydomain
myorigin = $mydomain
relay_domains = hash:/usr/local/etc/postfix/relay
You may want to set "fast_flush_domains = " if you don't support ETRN.
Not familiar with that. But I'll research it.
smtpd_tls_security_level = may
smtpd_tls_cert_file = /usr/local/certs/smtpd.pem
smtpd_tls_key_file = $smtp_tls_cert_file
smtpd_tls_loglevel = 1
smtpd_tls_received_header = yes
smtpd_tls_mandatory_protocols = TLSv1
Enabling TLS session caching may be a good idea, recommended database
type for that is "btree".
smtpd_client_restrictions =
check_client_access hash:/usr/local/etc/postfix/client_access
Could try "cdb" instead of "hash".
smtpd_recipient_restrictions =
reject_unknown_sender_domain,
check_sender_access hash:/usr/local/etc/postfix/sender_access,
Poor placement, see:
http://www.postfix.org/SMTPD_ACCESS_README.html#danger
check_recipient_access hash:/usr/local/etc/postfix/recipient_access,
permit_mynetworks,
# reject_rbl_client sbl-xbl.spamhaus.org,
You really should use a local mirror of Zen, sign-up for a SpamHaus
data feed.
permit_sasl_authenticated,
Same place as permit_mynetworks above the RBL if you ever add one.
reject
Much better would be:
smtpd_recipient_restrictions =
... Filters for trusted clients that may REJECT some traffic ...
permit_mynetworks,
permit_sasl_authenticated,
reject_unauth_destination,
... Sender filters ...
... Additional recipient filters ...
... RBL checks ...
virtual_maps = hash:/usr/local/etc/postfix/virtual
Consider "cdb".
transport_maps = hash:/usr/local/etc/postfix/transport
Consider "cdb".
# About 2 Gigs
mailbox_size_limit = 2048576000
Not a good idea. Mailboxes of this size need to be maildirs.
We recently went to using mbx format files in user home directories. So
the mail is delivered first to dmail, which then puts it in the files.
I wasn't involved in this decision, but it seems to be working find on
our current server. I'll have to research maildirs to see if that makes
more sense.
--
Dave Brodin
Network Operations Manager
Smithville Digital
This message may contain information that is confidential. If you are not an
intended recipient, use and disclosure of this message are prohibited. If you
received this transmission in error, please notify the sender by reply e-mail
and delete the message and any attachments.