The problem happened intermittently with different senders and I found it usually involved multiple recipients and have attachments. ISP informs me that they timeout in 10 minutes in expecting a return 250 OK after they sent the ". <CRLF>" which Postfix was not acknowledging.
What I don't understand is Postfix qmgr took over 2 hours later to process that stucked mail while the ISP has retried the mail 2nd time in less than an hour and was delivered successfully. I don't see any deferred mails though. I deliberately tune up the timeouts to see if it's the problem. No, I don't have proxy services and there's no virtual users/aliases except the standards one defined in Postfix. Below is the test result from my ISP on their end dated a week earlier. Did it shred some light into the matter? >From To Subject Time Action Size(KB) n...@ttasia.com axi...@ss-intl.com test4... 22/10/2012 18:07:38 Queue with Routing Engine 5608.00 Delivery Log Date Time Command Response Destination IP Address 22/10/2012 18:07:39 Connection-Comm 202.64.220.52 22/10/2012 18:07:39 Connection-Comm 202.64.220.52 22/10/2012 18:07:39 Connection 220 mailgate2.ss-intl.com ESMTP Postfix 202.64.220.52 22/10/2012 18:07:39 ehlo-Cmd ELHO ttsenmta10.ttasia.com 202.64.220.52 22/10/2012 18:07:39 Connection 220 mailgate2.ss-intl.com ESMTP Postfix 202.64.220.52 22/10/2012 18:07:39 ehlo-Cmd ELHO ttsenmta10.ttasia.com 202.64.220.52 22/10/2012 18:07:39 ehlo-Resp 250-mailgate2.ss-intl.com 250-PIPELINING 250-SIZE 838860800 250-ETRN 250-ENHANCEDSTATUSCODES 250-8BITMIME 250 DSN 202.64.220.52 22/10/2012 18:07:39 mailfrom-Cmd MAIL FROM:n...@ttasia.com 202.64.220.52 22/10/2012 18:07:39 ehlo-Resp 250-mailgate2.ss-intl.com 250-PIPELINING 250-SIZE 838860800 250-ETRN 250-ENHANCEDSTATUSCODES 250-8BITMIME 250 DSN 202.64.220.52 22/10/2012 18:07:39 rcptto-Cmd RCPT TO:axi...@ss-intl.com 202.64.220.52 22/10/2012 18:07:39 mailfrom-Cmd MAIL FROM:n...@ttasia.com 202.64.220.52 22/10/2012 18:07:39 rcptto-Cmd RCPT TO:axi...@ss-intl.com 202.64.220.52 22/10/2012 18:07:39 Data-Cmd DATA 202.64.220.52 22/10/2012 18:07:39 Data-Cmd DATA 202.64.220.52 22/10/2012 18:07:39 mailfrom-Resp 250 2.1.0 Ok 202.64.220.52 22/10/2012 18:07:39 mailfrom-Resp 250 2.1.0 Ok 202.64.220.52 22/10/2012 18:07:39 rcptto-Resp 250 2.1.5 Ok 202.64.220.52 22/10/2012 18:07:39 rcptto-Resp 250 2.1.5 Ok 202.64.220.52 22/10/2012 18:07:39 Data-Resp 354 End data with . 202.64.220.52 22/10/2012 18:07:39 Data-Resp 354 End data with . 202.64.220.52 22/10/2012 18:07:39 Eod-Cmd . 202.64.220.52 22/10/2012 18:07:39 Eod-Cmd . 202.64.220.52 22/10/2012 18:23:00 Connection-Comm 202.64.220.52 22/10/2012 18:23:00 Connection-Comm 202.64.220.52 22/10/2012 18:23:00 Connection 220 mailgate2.ss-intl.com ESMTP Postfix 202.64.220.52 22/10/2012 18:23:00 ehlo-Cmd ELHO ttsenmta10.ttasia.com 202.64.220.52 22/10/2012 18:23:00 Connection 220 mailgate2.ss-intl.com ESMTP Postfix 202.64.220.52 22/10/2012 18:23:00 ehlo-Cmd ELHO ttsenmta10.ttasia.com 202.64.220.52 22/10/2012 18:23:00 mailfrom-Cmd MAIL FROM:n...@ttasia.com 202.64.220.52 22/10/2012 18:23:00 Data-Cmd DATA 202.64.220.52 22/10/2012 18:23:00 Data-Cmd DATA 202.64.220.52 22/10/2012 18:23:00 ehlo-Resp 250-mailgate2.ss-intl.com 250-PIPELINING 250-SIZE 838860800 250-ETRN 250-ENHANCEDSTATUSCODES 250-8BITMIME 250 DSN 202.64.220.52 22/10/2012 18:23:00 mailfrom-Cmd MAIL FROM:n...@ttasia.com 202.64.220.52 22/10/2012 18:23:00 ehlo-Resp 250-mailgate2.ss-intl.com 250-PIPELINING 250-SIZE 838860800 250-ETRN 250-ENHANCEDSTATUSCODES 250-8BITMIME 250 DSN 202.64.220.52 22/10/2012 18:23:00 rcptto-Cmd RCPT TO:axi...@ss-intl.com 202.64.220.52 22/10/2012 18:23:00 rcptto-Cmd RCPT TO:axi...@ss-intl.com 202.64.220.52 22/10/2012 18:23:00 mailfrom-Resp 250 2.1.0 Ok 202.64.220.52 22/10/2012 18:23:00 mailfrom-Resp 250 2.1.0 Ok 202.64.220.52 22/10/2012 18:23:00 rcptto-Resp 250 2.1.5 Ok 202.64.220.52 22/10/2012 18:23:00 rcptto-Resp 250 2.1.5 Ok 202.64.220.52 22/10/2012 18:23:00 Data-Resp 354 End data with . 202.64.220.52 22/10/2012 18:23:00 Data-Resp 354 End data with . 202.64.220.52 22/10/2012 18:23:00 Eod-Cmd . 202.64.220.52 22/10/2012 18:23:00 Eod-Cmd . 202.64.220.52 22/10/2012 18:23:00 Eod-Cmd . 202.64.220.52 22/10/2012 18:23:00 Eod-Cmd . 202.64.220.52 22/10/2012 18:24:01 Quit-Cmd QUIT 202.64.220.52 22/10/2012 18:24:01 eod-Resp 250 2.0.0 Ok: queued as D8C3A7F07 202.64.220.52 22/10/2012 18:24:01 eod-Resp 250 2.0.0 Ok: queued as D8C3A7F07 202.64.220.52 22/10/2012 18:24:01 eod-Resp 250 2.0.0 Ok: queued as D8C3A7F07 202.64.220.52 22/10/2012 18:24:01 Quit-Cmd QUIT 202.64.220.52 22/10/2012 18:24:01 eod-Resp 250 2.0.0 Ok: queued as D8C3A7F07 202.64.220.52 22/10/2012 18:24:01 Quit-Resp 221 2.0.0 Bye 202.64.220.52 22/10/2012 18:24:01 Quit-Resp 221 2.0.0 Bye 202.64.220.52 22/10/2012 18:37:39 eod-Resp 441 4.4.2. Session timeout 202.64.220.52 22/10/2012 18:37:39 eod-Resp 441 4.4.2. Session timeout 202.64.220.52 Regards,Fred​ On Thu, 11/01/2012 02:08 AM, Viktor Dukhovni <postfix-us...@dukhovni.org> wrote: > On Thu, Nov 01, 2012 at 12:38:26AM +0800, Fred Ho wrote: > > > OK, smtpd[9647]: disconnect happened at 13:04:23. There's no > > smtpd[9647] entries after 10:15:16. > > > > Oct 31 10:15:16 mailgate2 postfix/smtpd[9647]: > > connect from ttsenmta2.ttasia.com[210.17.183.8] > > Oct 31 10:15:17 mailgate2 postfix/smtpd[9647]: 4803D7F20: > > client=ttsenmta2.ttasia.com[210.17.183.8] > > Oct 31 10:15:17 mailgate2 postfix/cleanup[9648]: 4803D7F20: > > message-id=<002601cdb70d$8e5363e0$aafa2ba0$@com.tw> > > Oct 31 13:04:23 mailgate2 postfix/smtpd[9647]: > > disconnect from ttsenmta2.ttasia.com[210.17.183.8] > > Oct 31 13:04:23 mailgate2 postfix/qmgr[4869]: 4803D7F20: > > from=<v...@otogarment.com>, size=595890, nrcpt=3 (queue active) > > [...] > > As suspected the disconnect coincides with the start of delivery > processing. There are two possibilities: > > 1. The smtpd(8) server did not see the ".<CRLF>" from the client > until the client disconnected. This could be a bug in the > client PIPELINING implementation, or some firewall between the > client and SMTP server. > > 2. The ".<CRLF>" was received in a timely manner, but smtpd(8) > and/or cleanup(8) took a long time to commit the message to > the incoming queue, by which time the client had long disconnected, > but this fact was only discovered once the message processing was > completed. > > One possibility for case 2 is very slow processing of virtual alias > expansion or very slow transport resolution, ... which block the > cleanup(8) service. > > Does the problem happen intermittently with all senders, or just > with some senders and not others? > > You can add one of the more frequent problem senders to "debug_peer_list" > and we'll see when the SMTP server receives ".<CRLF>". Otherwise you > need to capture a tcpdump "pcap" file of the problem transactions and > look at packet timestamps. > > > > Here's the postconf -n output: > > > default_destination_concurrency_limit = 100 > > This is a bad idea. > > > default_destination_recipient_limit = 300 > > This violates RFC limits and is also a bad idea. > > > message_size_limit = 838860800 > > I'd recommend something <= 100 MB, the default is 10 MB. > > > relay_domains = $mydestination > > Make "relay_domains" empty instead. If you want to relay specific > sub-domains of domains listed in mydestination, list them explicitly, > and set "parent_domain_matches_subdomains" to just list > "smtpd_access_maps", and omit "relay_domains". > > With local delivery disabled, you should probably also have > mydestination empty! > > > smtp_connect_timeout = 300s > > This is too long the default is probably better. > > > smtp_data_init_timeout = 300s > > smtp_data_xfer_timeout = 240s > > You probably don't need to tune these. > > > smtpd_policy_service_timeout = 300s > > smtpd_proxy_timeout = 300s > > Are you using a proxy filter? Your logs don't show this. Could > it be responsible for the delay? > > > smtpd_timeout = 600s > > You probably don't need to change this, and Postfix by default > makes this stress-dependent, which is a good idea. > > There's no evidence of any slow lookup mechanisms used for > address rewriting or transport lookups, so firewall issues > or pipelining bugs in clients are the most likely. > > -- > Viktor. > >