Gordon Tetlow wrote:
> On Tue, 20 Feb 2001 [EMAIL PROTECTED] wrote:
>
> > Aha. That explains it. You use HW raid. I wondered why you were
> > only doing 4 million mails for *30* boxes. Dan, is doing 500K, on a
> > completely idle box (cpu/ram/I/O wise), with vinum, Postfix, and RAID-0.
> > Have you seen brad knowles papers on vinum vs HW raid? It's erm
> > enlightening to say the least :) Id be happy to dig up the URL if you are
> > interested. I personally will be using Vinum from now on. The performance
> > is very impressive.
>
> Well, as I said, these boxes are rather bored. I don't think the load
> reaches above 0.05. Most of the time is delivering mail trying to
> negotiate with destination hosts. I don't think that the mailers are IO
> bound, but I haven't really looked to find out to tell you the truth. Once
> the mailers are set up we treat them as black boxes. They just work.
>
> Also, the 500K number, is that per day? The 4 million was in 4 hours, not
> a day.
Another bored box:
mx1.freebsd.org$ grep 'status=sent' /var/log/mail | wc -l
331877
It is 8 hours since the last rollover. Unfortunately, it spends most of its
time waiting for something to do and looking at broken mail servers. It
delivers most of its mail in a few seconds. We see it peaking at delivering
several hundred envelopes per second shortly after getting a large mailing
list to digest. Here's a quick histogram of what those 8 hours look like:
mx1.freebsd.org$ sh hist.sh
zero 1292 1292 0.36577 0.36577
one 4983 6275 1.41071 1.77648
two 7680 13955 2.17424 3.95072
three 10741 24696 3.04082 6.99154
five 30853 55549 8.73461 15.7261
seven 37626 93175 10.6521 26.3782
ten 48169 141344 13.6368 40.0151
fifteen 66877 208221 18.9332 58.9482
twenty 44244 252465 12.5257 71.4739
thirty 48059 300524 13.6057 85.0796
fourtyfive 23626 324150 6.68862 91.7682
sixty 6902 331052 1.95398 93.7222
ninety 7082 338134 2.00494 95.7271
twomin 2336 340470 0.66133 96.3884
threemin 1521 341991 0.43060 96.819
rest 11236 353227 3.18096 100
total 353227
First field: number of seconds. Second is number of deliveries in that
interval, third is percentage of total that this represents, and last is an
accumulated percentage.
This is a 24 hour run for yesterday (1am -> 1am):
> sh hist.sh
zero 3186 3186 0.29641 0.29641
one 13724 16910 1.27684 1.57325
two 19948 36858 1.8559 3.42915
three 29557 66415 2.74989 6.17904
five 87973 154388 8.18473 14.3638
seven 104690 259078 9.74003 24.1038
ten 144142 403220 13.4105 37.5143
fifteen 208335 611555 19.3828 56.8971
twenty 134030 745585 12.4697 69.3669
thirty 148163 893748 13.7846 83.1515
fourtyfive 74129 967877 6.89673 90.0482
sixty 34204 1002081 3.18223 93.2305
ninety 28955 1031036 2.69388 95.9243
twomin 7146 1038182 0.66484 96.5892
threemin 4297 1042479 0.39977 96.989
rest 32364 1074843 3.01104 100
total 1074843
Some random samples of mail servers in the 5 to 20 second range show most
of this delay is due to remote sendmail response time, the ident lookup, etc.
I'm pretty pleased to see that 83% of mail is delivered in less than 30
seconds and that 90% is out by 45 seconds. The 'zero' count is because
there are a couple of other well connected postfix servers nearby that have
a handful of subscribers :-)
The machine is only non-trivially busy for a small percentage of its time,
it could easily deliver 10 or 20 times that much mail before it was
really under load. That is easily 10 to 20 million per day for one box.
This is a p3-800 w/ one ide disk. We're in the process of switching it
to SCSI because of IDE drive problems. The postfix spool will probably be
mirrored for safety. Incidently, the spool is mostly write-only as the
entire spool fits cached in memory.
mx1.freebsd.org$ mailq
-Queue ID- --Size-- ----Arrival Time---- -Sender/Recipient-------
....
F40BC6E323E 2021 Wed Feb 21 02:42:14 [EMAIL PROTECTED]
(connect to mx1.mainstreet.net[207.5.0.50]: Operation timed out)
[EMAIL PROTECTED]
(connect to foobar.nisse.dk[24.232.51.205]: Operation timed out)
[EMAIL PROTECTED]
(connect to osfmail.isc.rit.edu[129.21.2.241]: read timeout)
[EMAIL PROTECTED]
(connect to mx.mainstreet.net[207.5.0.45]: Operation timed out)
[EMAIL PROTECTED]
....
(connect to mailhub.state.me.us[141.114.122.227]: No route to host)
[EMAIL PROTECTED]
(connect to mail.is-one.net[210.75.223.43]: read timeout)
[EMAIL PROTECTED]
(conversation with mbox.iyard.org[140.117.11.95] timed out while sending RCPT TO)
[EMAIL PROTECTED]
(conversation with relay.orsk.ru[193.233.163.2] timed out while sending RCPT TO)
[EMAIL PROTECTED]
-- 104395 Kbytes in 3639 Requests.
The queue (104MB on disk) fits comfortably in memory right now. postfix
itself is very light on memory demands.
Some other postfix tuning stats:
- parallel outbound smtp sender processes: 500
- various qmgr params changed to keep the queue state in memory (ie: deal
with something like 100,000 recipients and/or envelopes)
- We use bulk_mailer to inject mail on hub.freebsd.org from majordomo
and avoid the -outgoing aliases. bulk_mailer was hacked to not split the
envelopes unless it got to 100,000 recipients and to not sort the addresses.
- hub uses mx1 as a mail exploder, leaving hub to the mailing list management,
archiving and searching roles and mx1 solely to delivery. We have seen
it pump something like 2000 seperate messages in 3 seconds flat to mx1.
The only real problems we've had have been DNS related and disk media
errors on the cursed IBM DTLA drives.
Cheers,
-Peter
--
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
"All of this is for nothing if we don't go to the stars" - JMS/B5
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message