Okay, I have some updates.  Reducing the concurrency down (and
ultimately to 1) increased performance: it seems the delays I was
seeing were related to concurrency inside SQL.  However, I am still
seeing half second delays in local delivery while my insertion
application is showing ~.003s processing times.  I was able to verify
that it's not my insertion application by making a simple script that
soft-bounces mail (exit code 75) and does nothing else -- there's
still a half a second delay on action.  Tracing the individual
processes is proving to be a challenge as this is an Apple system and
dtruss can only be run as root.  Any tips?

Jan 14 18:30:58 x postfix/local[27253]: 03B2A13E351: to=<xxx>,
relay=local, delay=40172, delays=37592/2580/0/0.51, dsn=4.3.0,
status=deferred (temporary failure)
Jan 14 18:30:59 x postfix/local[27253]: 03B31132541: to=<xxx>,
relay=local, delay=70394, delays=67813/2581/0/0.53, dsn=4.3.0,
status=deferred (temporary failure)
Jan 14 18:30:59 x postfix/local[27253]: 03B371416A7: to=<xxx>,
relay=local, delay=36726, delays=34144/2581/0/0.53, dsn=4.3.0,
status=deferred (temporary failure)
Jan 14 18:31:00 x postfix/local[27253]: 03B3714401A: to=<xxx>,
relay=local, delay=34562, delays=31980/2582/0/0.54, dsn=4.3.0,
status=deferred (temporary failure)


On Thu, Jan 14, 2010 at 4:47 PM, Wietse Venema <wie...@porcupine.org> wrote:
> Wendigo Thompson:
>> Wietse:
>>
>>   Sadly I generally don't keep logs at this granular of a level
>> (mail.crit is what I usually store forever) but I can turn logging
>> back on after the queue empties over the weekend and see if the delay
>> is persisting.  I know the command itself is completing quickly -- it
>> seems like something in between is causing a delay.  Do you have any
>> suggestions for how I could profile the handoff from local to the
>> executable?
>
> The fourth "delays" timer, which show 7.7s delays and more,
> stops as soon as your /usr/bin/db_capture terminates.
>
> Perhaps it is doing hostname lookup and the resolv.conf
> or nsswitch.conf files are out of whack. Perhaps it is
> taking forever to log in.
>
> To investigate, configure Postfix to run it under
> strace (Linux) truss (Solaris) or other.
>
> In your alias:
>    "|strace -tt -o /tmp/db_capture.$$$$ /usr/bin/db_capture"
>
> (I think you need to specify four $$$$ to get two, which
> is what I want here, so that each /tmp/db_capture file
> is tagged with the process ID).
>
>        Wietse
>
>
>> Thanks again,
>> Wendy
>>
>> On Thu, Jan 14, 2010 at 4:13 PM, Wietse Venema <wie...@porcupine.org> wrote:
>> > Wendigo Thompson:
>> >> Wietse:
>> >>
>> >> ? I know who you are -- big fan of yours ever since tcpwrappers :-)
>> >
>> > Hah!
>> >
>> > (As for my claim, of course other people also wrote parts of
>> > Postfix, in particular Victor has been a major contributor).
>> >
>> >> ? The only reason I wrote in is that this queueing of mail issue
>> >> appeared fairly recently, and other times when I have had to collect
>> >> large volumes of mail from this client my system has been fine. ?Below
>> >> I have pasted some logs that show long delivery times (sitting in the
>> >> queue for a day) but the last delay (delivery delay?) is in the
>> >> upper-teens. ?Just trying to reconcile why this is: if it's something
>> >> on my end that is fine and I can work it out, but maybe it is an
>> >> unforeseen issue with Postfix and large volumes of piped mail. ?The
>> >> system itself is a Nehalem Xserve, so it seems to have plenty of
>> >> horsepower to handle the load.
>> >>
>> >> (anonymized a bit in hostname and email address)
>> >> Jan 14 15:45:43 server001 postfix/local[24570]: 26E371265ED:
>> >> to=<receiver_em...@receiver_ip>, relay=local, delay=81352,
>> >> delays=0.59/81343/0/7.7, dsn=2.0.0, status=sent (delivered to command:
>> >> /usr/bin/db_capture)
>> >
>> > This looks problematic: it took 7.7s from the time that the Postfix
>> > local delivery agent was tasked to deliver the message, to the time
>> > that the db_capture command reported completion.
>> >
>> > Plus, these messages had already been sitting in the queue for
>> > almost 24 hours, so the problem will have started much earlier than
>> > that.
>> >
>> >> delays=0.5/81344/0/8.2, dsn=2.0.0, ...
>> >> delays=0.34/81341/0/13, dsn=2.0.0, ...
>> >> delays=0.41/81344/0/10, dsn=2.0.0, ...
>> >> delays=0/81339/0/19, dsn=2.0.0, ...
>> >> delays=0.42/81342/0/14, dsn=2.0.0, ...
>> >> delays=0.19/81344/0/11, dsn=2.0.0, ...
>> >> delays=0/81341/0/15, dsn=2.0.0, ...
>> >> delays=0/81337/0/21, dsn=2.0.0, ...
>> >> delays=0.57/81343/0/11, dsn=2.0.0, ...
>> >> delays=0.32/81342/0/16, dsn=2.0.0, ...
>> >> delays=0.21/81341/0/18, dsn=2.0.0, ...
>> >> delays=0.35/81348/0/7, dsn=2.0.0, ...
>> >> delays=0.12/81339/0/21, dsn=2.0.0, ...
>> >> delays=0.09/81340/0/20, dsn=2.0.0, ...
>> >
>> > Now the question is why did /usr/bin/db_capture take so long? I am
>> > not sure that my "red" and "green" options are going to solve your
>> > performance problem.
>> >
>> > If you still have the logs, it will be worthwhile to find out when
>> > the delivery delay started climbing through the roof.
>> >
>> > ? ? ? ?Wietse
>> >
>>
>>
>
>

Reply via email to