Okay, I have some updates. Reducing the concurrency down (and ultimately to 1) increased performance: it seems the delays I was seeing were related to concurrency inside SQL. However, I am still seeing half second delays in local delivery while my insertion application is showing ~.003s processing times. I was able to verify that it's not my insertion application by making a simple script that soft-bounces mail (exit code 75) and does nothing else -- there's still a half a second delay on action. Tracing the individual processes is proving to be a challenge as this is an Apple system and dtruss can only be run as root. Any tips?
Jan 14 18:30:58 x postfix/local[27253]: 03B2A13E351: to=<xxx>, relay=local, delay=40172, delays=37592/2580/0/0.51, dsn=4.3.0, status=deferred (temporary failure) Jan 14 18:30:59 x postfix/local[27253]: 03B31132541: to=<xxx>, relay=local, delay=70394, delays=67813/2581/0/0.53, dsn=4.3.0, status=deferred (temporary failure) Jan 14 18:30:59 x postfix/local[27253]: 03B371416A7: to=<xxx>, relay=local, delay=36726, delays=34144/2581/0/0.53, dsn=4.3.0, status=deferred (temporary failure) Jan 14 18:31:00 x postfix/local[27253]: 03B3714401A: to=<xxx>, relay=local, delay=34562, delays=31980/2582/0/0.54, dsn=4.3.0, status=deferred (temporary failure) On Thu, Jan 14, 2010 at 4:47 PM, Wietse Venema <wie...@porcupine.org> wrote: > Wendigo Thompson: >> Wietse: >> >> Sadly I generally don't keep logs at this granular of a level >> (mail.crit is what I usually store forever) but I can turn logging >> back on after the queue empties over the weekend and see if the delay >> is persisting. I know the command itself is completing quickly -- it >> seems like something in between is causing a delay. Do you have any >> suggestions for how I could profile the handoff from local to the >> executable? > > The fourth "delays" timer, which show 7.7s delays and more, > stops as soon as your /usr/bin/db_capture terminates. > > Perhaps it is doing hostname lookup and the resolv.conf > or nsswitch.conf files are out of whack. Perhaps it is > taking forever to log in. > > To investigate, configure Postfix to run it under > strace (Linux) truss (Solaris) or other. > > In your alias: > "|strace -tt -o /tmp/db_capture.$$$$ /usr/bin/db_capture" > > (I think you need to specify four $$$$ to get two, which > is what I want here, so that each /tmp/db_capture file > is tagged with the process ID). > > Wietse > > >> Thanks again, >> Wendy >> >> On Thu, Jan 14, 2010 at 4:13 PM, Wietse Venema <wie...@porcupine.org> wrote: >> > Wendigo Thompson: >> >> Wietse: >> >> >> >> ? I know who you are -- big fan of yours ever since tcpwrappers :-) >> > >> > Hah! >> > >> > (As for my claim, of course other people also wrote parts of >> > Postfix, in particular Victor has been a major contributor). >> > >> >> ? The only reason I wrote in is that this queueing of mail issue >> >> appeared fairly recently, and other times when I have had to collect >> >> large volumes of mail from this client my system has been fine. ?Below >> >> I have pasted some logs that show long delivery times (sitting in the >> >> queue for a day) but the last delay (delivery delay?) is in the >> >> upper-teens. ?Just trying to reconcile why this is: if it's something >> >> on my end that is fine and I can work it out, but maybe it is an >> >> unforeseen issue with Postfix and large volumes of piped mail. ?The >> >> system itself is a Nehalem Xserve, so it seems to have plenty of >> >> horsepower to handle the load. >> >> >> >> (anonymized a bit in hostname and email address) >> >> Jan 14 15:45:43 server001 postfix/local[24570]: 26E371265ED: >> >> to=<receiver_em...@receiver_ip>, relay=local, delay=81352, >> >> delays=0.59/81343/0/7.7, dsn=2.0.0, status=sent (delivered to command: >> >> /usr/bin/db_capture) >> > >> > This looks problematic: it took 7.7s from the time that the Postfix >> > local delivery agent was tasked to deliver the message, to the time >> > that the db_capture command reported completion. >> > >> > Plus, these messages had already been sitting in the queue for >> > almost 24 hours, so the problem will have started much earlier than >> > that. >> > >> >> delays=0.5/81344/0/8.2, dsn=2.0.0, ... >> >> delays=0.34/81341/0/13, dsn=2.0.0, ... >> >> delays=0.41/81344/0/10, dsn=2.0.0, ... >> >> delays=0/81339/0/19, dsn=2.0.0, ... >> >> delays=0.42/81342/0/14, dsn=2.0.0, ... >> >> delays=0.19/81344/0/11, dsn=2.0.0, ... >> >> delays=0/81341/0/15, dsn=2.0.0, ... >> >> delays=0/81337/0/21, dsn=2.0.0, ... >> >> delays=0.57/81343/0/11, dsn=2.0.0, ... >> >> delays=0.32/81342/0/16, dsn=2.0.0, ... >> >> delays=0.21/81341/0/18, dsn=2.0.0, ... >> >> delays=0.35/81348/0/7, dsn=2.0.0, ... >> >> delays=0.12/81339/0/21, dsn=2.0.0, ... >> >> delays=0.09/81340/0/20, dsn=2.0.0, ... >> > >> > Now the question is why did /usr/bin/db_capture take so long? I am >> > not sure that my "red" and "green" options are going to solve your >> > performance problem. >> > >> > If you still have the logs, it will be worthwhile to find out when >> > the delivery delay started climbing through the roof. >> > >> > ? ? ? ?Wietse >> > >> >> > >